Brugia malayi: Difference between revisions

From Cbcb
Jump to navigation Jump to search
Line 357: Line 357:
=== CBCB 454 newbler refMapper ===
=== CBCB 454 newbler refMapper ===


*  Assembler: newbler 2.3
* Host: CBCB walnut server
* Input
* Input
   # NCBI ref assembly stats
   # NCBI ref assembly
                                       ctgs      min  q1  q2  q3  max    mean  n50  sum
                                       ctgs      min  q1  q2  q3  max    mean  n50  sum
   All                                  26,879    200  836  1005 1495 611244 3241  18986 87,119,350  
   All                                  26,879    200  836  1005 1495 611244 3241  18986 87,119,350  


   #Sff read stats
   #Sff reads
   .                                    seqs      min  q1  q2  q3  max  mean    n50  sum         
   .                                    seqs      min  q1  q2  q3  max  mean    n50  sum         
   All                                  3,214,044  0    240  274  383  2042  294.72  326  947,254,956
   All                                  3,214,044  0    240  274  383  2042  294.72  326  947,254,956

Revision as of 19:14, 19 February 2010

Articles

Genome Info

  • 6 chromosomes: 1-5, XY ; diploit genome ~ 110M bp
  • 30% GC,
  • 32% coding, 15% repeats

Genome Project

Brugia malayi has a diploid genome of approximately 110 Mb, organized in 6 pairs of chromosomes (five pairs of autosomes and one pair of sex chromosomes). In addition to the nuclear genome, B. malayi has a mitochondrial genome of about 14kb, and the genome of the harbored bacterial endosymbiont Wolbachia sp (1-2Mb).

The B. malayi genome project has been completed by The Institute for Genomic Research. Whole Genome Shotgun sequencing was used to obtain more than eight-fold coverage of the genome. The complete genome was assembled into approximately 8200 scaffolds and deposited in GenBank. The accession for the WGS project is AAQA00000000 and consists of sequences AAQA01000001-AAQA01029808. File location:

 /fs/szasmg3/dpuiu/Brugia_malayi/Data/Bm.fasta 
 ctgs               min    q1     q2     q3     max        mean       n50        sum            
 26,879             200    836    1005   1495   611,244    3241.17    18986      87,119,350   

Contamination

 /fs/szasmg3/dpuiu/Brugia_malayi/Data/contam.fasta
 contaminants       min    q1     q2     q3     max        mean       n50        sum            
 2,929              200    527    675    820    8994       740.04     762        2,167,588
  • pUC19c vector: 2686bp, 50.63% GC
    • of some 454 libs
    • what about Sanger reads???

Data

 1.26M Sanger reads (original TA) :       medLen=773bp; medGC=32.57%
 1.26M Sanger reads (contamination free): medLen=771bp; medGC=32.36%
 3.21M 454 reads (original sff) :         medLen=274bp
 3.29M 454 reads (linker free)  :         medLen=247bp; medGC=36.39%

Original Traces

 SEQ_LIB_ID       INSERT_SIZE  INSERT_STDEV  TRACE_TYPE_CODE         
 1047113828118    1000         300           WGS              13500   
 1047113856575    1000         300           PRIMERWALK       325     
 1047111632737    1258         377           PRIMERWALK       3       
 1047111632737    1258         377           WGS              305,906  
 1047111540304    1415         424           WGS              51772   
 1047112577106    1415         424           WGS              337,789  
 1047111718946    3123         936           WGS              47597   
 1047113358719    3123         936           PRIMERWALK       173     
 1047113358719    3123         936           WGS              246,185  
 1047174912885    3123         936           TRANSPOSON       1437    
 1047113570927    6000         1800          WGS              3193    
 1047111814561    7158         2147          WGS              219,306  
 1047111480027    17168        5150          WGS              4087    
 1047111488095    17168        5150          WGS              3434    
 1047111495007    17168        5150          WGS              3716    
 1047111501919    17168        5150          WGS              3697    
 1047111480605    22419        6725          WGS              4638    
 1047111516154    22419        6725          WGS              4004    
 1047111523212    22419        6725          WGS              3766    
 1047111530126    22419        6725          WGS              5686    
 1047113855421    23000        6900          WGS              1       
 total                                                        1,260,215

FRG file: (contaminant free)

  • FRG.src : TI's
  • FRG.acc: 2 ..
  • DST.acc: 1260217, ... , 1260234
  • Location
 /fs/szasmg3/dpuiu/Brugia_malayi/Data/nucmer_seq/Bm-all.frg 
 DST     15
 FRG     1178192
 LKG     530930
 
         seqs         min    q1     q2     q3     max        mean       n50        sum            
 len     1,178,192    65     645    771    850    1214       724        800        853,847,771  => 8X
 gc%     1,178,192    0.00   29     32.36  35     100        32.41      33         .

Problems:

  • All library insert sizes are underestimated ???
  • The contaminant reads align at ~91-93% id to the contaminant ctgs while the Mt/We reads align at 99% id to Mt/We finished seq. What %id thold to use for contaminant?

BACS

8,000 BAC clones @Children's Hospital Oakland Research Institute. (!!! no NCBI TA submission)

PITT FTP data

  • 3.21M 454 reads
  • 3K insert flx libraries (estimated to 2K based on alignment to the existing assembly)
  • 20K insert tit libraries (estimated to 28K ...)

CBCB Location:

 /fs/szattic-asmg4/brugia_malayi/Data/
 /fs/szattic-asmg4/brugia_malayi/Data/Sff/  # Sff files
 /fs/szattic-asmg4/brugia_malayi/Data/Frg/  # Frg files
 /fs/szattic-asmg4/brugia_malayi/Data/Seq/  # Seq files

FTP access:

 lftp -u bma 136.142.191.201
 pass: 6279
 user: bma
 # empty as of --Dpuiu 12:04, 8 January 2010 (EST)

Elodie's table:

 /scratch1/brugia_malayi/brugia-sequencing-summary.txt.csv
 #    elodie's date  protocol  platform  type            description                                run_name                                                                                                  Reads    Mates
 1    01/17/2008     WGS       Standard  Full run (2/2)  Mix of worms (calibration of the machine)  R_2008_01_31_18_01_35_FLX10070260_adminrig_ghedintestsample                                               534822   0
 2    07/01/2008     3Kb       Standard  Full            single worm (pUC contamination)            R_2008_08_06_13_52_29_FLX10070260_adminrig_080608_Ghedin-BrugiaLTPE1                                      492575   84341 
 3    09/11/2008     3kb       Standard  4/8 wells       single worm (pUC contamination)            R_2008_09_19_14_17_55_FLX10070260_adminrig_091908_HATFULL-MIDrepeat_GHEDIN-LTPE1                          263421   49258 
 4    10/01/2008     3Kb       Standard  Full            Mix of worms (still pUC contamination)     R_2008_10_14_15_06_50_FLX10070260_adminrig_101408_GHEDIN-Brugia-pool_LTPEtest                             59711    5096  
 5    02/01/2009     WGS       Standard  1/4 wells       Mix of worms; regions 2 & 3 were myxoma    R_2009_02_27_16_11_34_FLX10070260_adminrig_022709_GHEDIN                                                  18025    0
 6    04/06/2009     WGS       Standard  1/4 wells       Mix of worms; with comp. bio run           R_2009_04_15_14_46_56_FLX10070260_adminrig_041509_GHEDIN_r1-WGS1_r2-LMW4_r3-pool2compbio_r4-pool3compbio  118490   0
 7    05/01/2009     20Kb      Titanium  7/8 wells       Mix of worms                               R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1                              631287   213524
 8    10/28/2009     20Kb      Titanium  Full            Mix of worms                               R_2009_10_22_15_30_12_FLX10070260_adminrig_102209_GHEDIN_Brugia20kb2                                      1095713  377547
 9    Pending        3 Kb      Titanium  Full            Mix of worms                               ?                                                                                                         ?        ?
 .    Total                                                                                                                                                                                                   2542707  729766
  • 22 Sff files:
      run                                                                                                                                                                     sffReads  linker
 1    R_2008_01_31_18_01_35_FLX10070260_adminrig_ghedintestsample/D_2008_01_31_18_01_35_FLX10070260_adminrig_FullAnalysis/sff/E4RA0X101.sff                                     272923  .
 1    R_2008_01_31_18_01_35_FLX10070260_adminrig_ghedintestsample/D_2008_01_31_18_01_35_FLX10070260_adminrig_FullAnalysis/sff/E4RA0X102.sff                                     261899  .
 
 2    R_2008_08_06_13_52_29_FLX10070260_adminrig_080608_Ghedin-BrugiaLTPE1/D_2009_02_12_22_12_04_j_SignalProcessing/sff/FEZH5RS01.sff                                           228204  flx
 2    R_2008_08_06_13_52_29_FLX10070260_adminrig_080608_Ghedin-BrugiaLTPE1/D_2009_02_12_22_12_04_j_SignalProcessing/sff/FEZH5RS02.sff                                           264371  flx

 3    R_2008_09_19_14_17_55_FLX10070260_adminrig_091908_HATFULL-MIDrepeat_GHEDIN-LTPE1/FHAVB5T02.sff                                                                            86862   flx
 3    R_2008_09_19_14_17_55_FLX10070260_adminrig_091908_HATFULL-MIDrepeat_GHEDIN-LTPE1/FHAVB5T03.sff                                                                            87488   flx
 3    R_2008_09_19_14_17_55_FLX10070260_adminrig_091908_HATFULL-MIDrepeat_GHEDIN-LTPE1/FHAVB5T04.sff                                                                            89071   flx

 4    R_2008_10_14_15_06_50_FLX10070260_adminrig_101408_GHEDIN-Brugia-pool_LTPEtest/FIOXLOM01.sff                                                                               13695   flx
 4    R_2008_10_14_15_06_50_FLX10070260_adminrig_101408_GHEDIN-Brugia-pool_LTPEtest/FIOXLOM02.sff                                                                               14197   flx
 4    R_2008_10_14_15_06_50_FLX10070260_adminrig_101408_GHEDIN-Brugia-pool_LTPEtest/FIOXLOM03.sff                                                                               15515   flx
 4    R_2008_10_14_15_06_50_FLX10070260_adminrig_101408_GHEDIN-Brugia-pool_LTPEtest/FIOXLOM04.sff                                                                               16304   flx

 5    R_2009_02_27_16_11_34_FLX10070260_adminrig_022709_GHEDIN/FRLDXKV01.sff                                                                                                    18025   .

 6    R_2009_04_15_14_46_56_FLX10070260_adminrig_041509_GHEDIN_r1-WGS1_r2-LMW4_r3-pool2compbio_r4-pool3compbio/D_2009_04_16_14_19_21_morty_fullProcessing/FT9KOI001.sff         118490  .

 7    R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1/D_2009_06_08_15_32_36_compute-0-2_fullProcessing/sff/FW1OXFY01.sff                           73807   tit
 7    R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1/D_2009_06_08_15_32_36_compute-0-2_fullProcessing/sff/FW1OXFY02.sff                           91698   tit
 7    R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1/D_2009_06_08_15_32_36_compute-0-2_fullProcessing/sff/FW1OXFY03.sff                           93878   tit
 7    R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1/D_2009_06_08_15_32_36_compute-0-2_fullProcessing/sff/FW1OXFY04.sff                           90232   tit
 7    R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1/D_2009_06_08_15_32_36_compute-0-2_fullProcessing/sff/FW1OXFY05.sff                           97065   tit
 7    R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1/D_2009_06_08_15_32_36_compute-0-2_fullProcessing/sff/FW1OXFY06.sff                           94326   tit
 7    R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1/D_2009_06_08_15_32_36_compute-0-2_fullProcessing/sff/FW1OXFY07.sff                           90281   tit

 8    R_2009_10_22_15_30_12_FLX10070260_adminrig_102209_GHEDIN_Brugia20kb2/F4H5CMB01.sff                                                                                        551263  tit
 8    R_2009_10_22_15_30_12_FLX10070260_adminrig_102209_GHEDIN_Brugia20kb2/F4H5CMB02.sff                                                                                        544450  tit
 
      total                                                                                                                                                                     3,214,044 .
  • 22 Frg Libraries
. lib       meanIns(orig)  meanIns(est)    #reads       #mates  linker medLen  medGC
1 E4RA0X101    0           0               271066       0       .      250     37.04
1 E4RA0X102    0           0               260166       0       .      249     37.06

2 FEZH5RS01    3000        2000            181035       18676   flx    228     37.89
2 FEZH5RS02    3000        2000            211064       22270   flx    227     37.79

3 FHAVB5T02    3000        2000            68708        7850    flx    244     38.35
3 FHAVB5T03    3000        2000            69306        8227    flx    243     37.98
3 FHAVB5T04    3000        2000            70028        8353    flx    243     38.10

4 FIOXLOM01    3000        0               10921        0       flx    102     43.93   # no mates , shorter read length, highest GC !!!
4 FIOXLOM02    3000        0               11157        0       flx    103     43.75   # no mates , shorter read length, highest GC !!!
4 FIOXLOM03    3000        0               12197        0       flx    103     43.51   # no mates , shorter read length, highest GC !!!
4 FIOXLOM04    3000        0               12727        0       flx    103     43.56   # no mates , shorter read length, highest GC !!!

5 FRLDXKV01    0           0               17349        0       .      255     36.02

6 FT9KOI001    0           0               108826       0       .      256     36.10

7 FW1OXFY01    20000       28000           86127        15825   tit    276     35.40
7 FW1OXFY02    20000       28000           106911       19668   tit    275     35.37
7 FW1OXFY03    20000       28000           109874       20396   tit    271     35.23
7 FW1OXFY04    20000       28000           104797       18933   tit    269     35.23
7 FW1OXFY05    20000       28000           113716       20649   tit    265     35.09
7 FW1OXFY06    20000       28000           110693       20326   tit    270     35.05
7 FW1OXFY07    20000       28000           105931       19176   tit    271     35.18

8 F4H5CMB01    20000       28000           626046       109918  tit    241     36.08
8 F4H5CMB02    20000       28000           628432       118903  tit    256     35.87
 
. total        .           .               3,297,077    429,170 .      .       .
  • Clr of the Sff seqs (good qual)
 .         seqs         min    q1     q2     q3     max        mean       n50        sum            
 all       3,214,044    0      240    274    383    2042       294        326        947,254,956 => 9.4X    
  • Clr of the Frg seqs (good qual , no linker)
           seqs         min    q1     q2     q3     max        mean       n50        sum            
 all       3,297,077    3      156    248    301    2043       244        275        806,091,347 => 8X
 mated       858,340    64     107    156    223    612        171        201        147,070,298      
 unmated   2,438,737    2      207    261    335    2042       268        286        655,723,972
  • Locations:
 /fs/szattic-asmg4/brugia_malayi/Data/Sff/ 
 /fs/szattic-asmg4/brugia_malayi/Data/Frg/

Contaminant search

nucmer -maxmatch -c 65 -l 20
                       Sanger    454
 jird(26,879 ctgs)     31,501    197,420
 mouse                 ?         ?          # we'd probably find more contaminated reads if we align all the reads to the whole mouse genome

 Mt                    1,507     2,634

 We                    49,014    23,249

 UniVec                ?         549,378    # most hits to "Cloning vector pBR322"
 pUC19                 ?         551,100

Assemblies

TIGR/NCBI

  • 9X coverage, 856K Sanger traces => 8,200 scaff & 29,808 ctg (avg. scaff=~10K & avg ctg=~3K)
  • "scaffolds totaling ~71 Mb of data with a further ~17.5 Mb of contigs not integrated into any scaffold (orphan contigs)" (Science 2007)
  • NCBI AAQA00000000 AAQA01000001-AAQA01029808
 * 26,879 good ctgs
 * 2,929 jird contaminants (Example: AAQA01001321 : mouse 99%id hits)
  • Stats
 .                    elem       min    q1     q2     q3     max        mean       n50        sum            
 ctg.good             26879      200    836    1005   1495   611244*    3241.17    18986      87,119,350       
 ctg.good.2K+         4887       2000   2914   4380   10049  611244     13363.00   37130      65,304,988    
 
 ctg.contaminant      2929       200    527    675    820    8994       740.04     762        2,167,588       
  • Location
 /fs/szasmg3/dpuiu/Brugia_malayi/Assembly/TIGR/ <-> NCBI

PITT

  • Best so far
  • Date: 11/05/08
  • Stats:
                      elem       min    q1     q2     q3     max        mean       n50        sum
 scf.2K+              3170       2000   2917   4483   14471  6534162*   22916      112914     72,643,770
  • Location:
 /fs/szasmg3/dpuiu/Brugia_malayi/Assembly/PITT/

CBCB Sanger

  • Assembler: wgs 5.1
  • Input: filtered Sanger reads
  • better assembly than the published one
  • Stats:
 .                    elem       min    q1     q2     q3     max        mean       n50        sum            
 ctg                  12753      273    1245   1632   3873   376744     6113.39    24748      77,964,006       
 deg                  9661       65     858    949    1023   72494      1240.97    1008       11,988,997       
 scf                  10317      935    1215   1538   3462   3890532    8018.85    41716      82,730,474       

 .                    elem       min    q1     q2     q3     max        mean       n50        sum            
 ctg.2K+              5210       2000   3049   4813   13528  376744     13013.29   30835      67,799,238       
 deg.2K+              391        2000   3009   4693   10099  72494      8104.38    12352      3,168,812        
 scf.2K+              3656       2001   3181   5733   18904  3890532    20189.57   50293      73,813,083       

 reads                1178192(100%)
 singletons           134119 (11.43%)
  • Location:
 /fs/szasmg3/dpuiu/Brugia_malayi/Assembly/CBCB/2008_0826_CA/

CBCB 454 CA

  • Assembler: wgs 6.0-beta
  • Host: CBCB ginkgo server (32 proc, 128G mem)
  • Input: 3,297,077 454 sffToCA processed reads
 gatekeeper -dumpinfo -lastfragiid  asm.gkpStore
 Last frag in store is iid = 3297077
  • Location:
 ginkgo:/scratch1/brugia_malayi/Assembly/454/CA  
 /scratch1/ -> umiacsfs01:/xraid03
 ginkgo: 32 processor machine
  • Problems
    • olap-from-seeds very memory/cpu intensive!!!
 # overmerry.sh jobs ->  ./1-overlapper/seeds/
 my $ovmBatchSize = getGlobal("merOverlapperSeedBatchSize");    # default 100,000
 my $ovmJobs      = int(($numFrags - 1) / $ovmBatchSize) + 1;   # int(3297076/100000)+1=33 
 # olap-from-seeds.sh jobs -> ./1-overlapper/olaps/
 my $olpBatchSize = getGlobal("merOverlapperExtendBatchSize");  # default 75,000 ; reduce to 20,000 
 my $olpJobs      = int(($numFrags - 1) / $olpBatchSize) + 1;   # int(3297076/20000)+1=165  
  • Example: 6 jobs: each is 2 thread, ~ 20G mem
 merOverlapperSeedConcurrency=6 => 6 jobs
 merOverlapperExtendBatchSize=20000
 $ ps -C olap-from-seeds 
 PID %MEM   RSZ(KB) %CPU STIME TIME     CMD
 13158  0.0 1132     0.0 10:21 00:00:00 /bin/sh 1-overlapper/olap-from-seeds.sh 90
 13159  0.0 1136     0.0 10:21 00:00:00 /bin/sh 1-overlapper/olap-from-seeds.sh 91
 13160  0.0 1136     0.0 10:21 00:00:00 /bin/sh 1-overlapper/olap-from-seeds.sh 92
 13161  0.0 1136     0.0 10:21 00:00:00 /bin/sh 1-overlapper/olap-from-seeds.sh 93
 13162  0.0 952      0.0 10:21 00:00:00 /bin/sh 1-overlapper/olap-from-seeds.sh 94
 13163  0.0 1136     0.0 10:21 00:00:00 /bin/sh 1-overlapper/olap-from-seeds.sh 95
 13199 15.6 20675720 133 10:21 02:46:39 olap-from-seeds -a -b -t 2 -S 1-overlapper/asm.merStore -c 3-overlapcorrection/0092.frgcorr.WORKING -o 1-overlapper/olaps/0092.ovb.WORKING.gz asm.gkpStore 1820001 1840000
 13200 13.9 18383164 126 10:21 02:37:22 olap-from-seeds -a -b -t 2 -S 1-overlapper/asm.merStore -c 3-overlapcorrection/0094.frgcorr.WORKING -o 1-overlapper/olaps/0094.ovb.WORKING.gz asm.gkpStore 1860001 1880000
 13201 16.5 21870940 135 10:21 02:49:08 olap-from-seeds -a -b -t 2 -S 1-overlapper/asm.merStore -c 3-overlapcorrection/0093.frgcorr.WORKING -o 1-overlapper/olaps/0093.ovb.WORKING.gz asm.gkpStore 1840001 1860000
 13203 17.8 23603964 130 10:21 02:41:51 olap-from-seeds -a -b -t 2 -S 1-overlapper/asm.merStore -c 3-overlapcorrection/0095.frgcorr.WORKING -o 1-overlapper/olaps/0095.ovb.WORKING.gz asm.gkpStore 1880001 1900000
 13204 12.6 16763480 130 10:21 02:41:47 olap-from-seeds -a -b -t 2 -S 1-overlapper/asm.merStore -c 3-overlapcorrection/0090.frgcorr.WORKING -o 1-overlapper/olaps/0090.ovb.WORKING.gz asm.gkpStore 1780001 1800000
 13205 15.2 20139808 138 10:21 02:52:35 olap-from-seeds -a -b -t 2 -S 1-overlapper/asm.merStore -c 3-overlapcorrection/0091.frgcorr.WORKING -o 1-overlapper/olaps/0091.ovb.WORKING.gz asm.gkpStore 1800001 1820000
 $ vmstat
 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 9  0 9534576 121356  10124 2743568    5   23     7    23   36   69 16  0 83  1  0
 $ free 
              total       used       free     shared    buffers     cached
 Mem:     132168632  132043920     124712          0      10064    2934076
 -/+ buffers/cache:  129099780    3068852
 Swap:     67108856    8842720   58266136

CBCB 454 CA redo

  • Assembler: wgs 6.0-beta
  • Input: 3,297,077 454 sffToCA processed reads
  • host: IPST genome6 server (32 proc, 256G mem)
  • runCA spec file provided by Aleksey
  • Location
 genome6.umd.edu:/genome6/raid/dpuiu/Brugia_malayi/Assembly/CA.bog/
  • Problems:
    • Segmentation fault in overlapStore after overmerry.sh
 /genome7/raid/software/Linux64/wgs-6.0-beta/Linux-amd64/bin/overlapStore 
      -c /genome6/raid/dpuiu/Brugia_malayi/Assembly/CA.bog/./0-overlaptrim-overlap/asm.merStore.WORKING 
      -g /genome6/raid/dpuiu/Brugia_malayi/Assembly/CA.bog/./asm.gkpStore 
      -M 4096 
      -L /genome6/raid/dpuiu/Brugia_malayi/Assembly/CA.bog/./0-overlaptrim-overlap/asm.merStore.list 
       > /genome6/raid/dpuiu/Brugia_malayi/Assembly/CA.bog/./0-overlaptrim-overlap/asm.merStore.err 2>&1

CBCB 454 newbler deNovo

CBCB 454 newbler refMapper

  • Assembler: newbler 2.3
  • Host: CBCB walnut server
  • Input
 # NCBI ref assembly
                                      ctgs       min   q1   q2   q3   max    mean   n50   sum
 All                                  26,879     200   836  1005 1495 611244 3241   18986 87,119,350 
 #Sff reads
 .                                    seqs       min  q1   q2   q3   max   mean    n50  sum        
 All                                  3,214,044  0    240  274  383  2042  294.72  326  947,254,956
  • Output
 #Ctg stats
 .                                    ctgs       min   q1   q2   q3   max   mean    n50  sum        
 All                                  101,286    100   236  323  530  7013  433.36  535  43,893,507
 #Trimmed read stats
 .                                    seqs       min  q1   q2   q3   max   mean    n50  sum        
 All                                  3,898,373  1    45   163  264  1995  167.84  265  654319111
 Full|Partial                         1,085,167  20   119  216  285  706   214.13  271  232364804
 Chimeric|Repeat|Unmapped|TooShort    2,015,920  20   111  221  276  1995  208.85  263  421032444
 Deleted                              797,286    1    1    1    1    19    1.16    1    921863
 #Trimmed read counts
             count    %
 All         3898373  100
 Chimeric    25460    0.65   
 Deleted     797286   20.45  !!!
 Full        1001745  25.7   
 Partial     83422    2.14   
 Repeat      406119   10.42  !!!
 TooShort    14031    0.36   
 Unmapped    1570310  40.28  !!!
 #Mate pair counts
                 count   %
 BothUnmapped    301390  42.86
 OneUnmapped     110922  15.77
 MultiplyMapped  108641  15.45
 FalsePair       106249  15.11
 TruePair        75992   10.81

Files

 /fs/szattic/asmg1/adelcher/Genomes/Brugia             : Art's files
 /fs/sztmpscratch/cole/tarchive_download/brugia_malay  : Cole's files
 /fs/szasmg3/dpuiu/Brugia_malayi/                      : Daniela's files

 /scratch1/brugia_malayi/Data/                         : ftp PITT data
 /fs/szattic-asmg4/brugia_malayi                       : ftp PITT data (as well)