Pine tree: Difference between revisions

From Cbcb
Jump to navigation Jump to search
Line 102: Line 102:
   pFosDT5_2      8345  47.93
   pFosDT5_2      8345  47.93


== Reads ==     
== Reads (Drosophila) ==     
   lib                      readLen  #mates
   lib                      readLen  #mates   mea,std
   FC70M6V_6_001            160,156  23546475   
   FC70M6V_6_001            160,156  23546475 343,30
   TIL_242_FC70M6V_2_002    160,156  9917211   
   
   TIL_242_FC70M6V_3_002    160,156  6276300   
   TIL_242_FC70M6V_2_002    160,156  9917211  242 
   TIL_254_FC70M6V_2_004    160,156  9279789   
   TIL_242_FC70M6V_3_002    160,156  6276300  242
   TIL_254_FC70M6V_3_004    160,156  5924239   
   TIL_270_FC70M6V_2_003    160,156  10188776   
   TIL_254_FC70M6V_2_004    160,156  9279789  254
   TIL_270_FC70M6V_3_003    160,156  6556676   
   TIL_254_FC70M6V_3_004    160,156  5924239  254 
   TIL_288_FC70M6V_2_001    160,156  9524524   
   TIL_288_FC70M6V_3_001    160,156  6158919
   TIL_270_FC70M6V_2_003    160,156  10188776  270
   TIL_270_FC70M6V_3_003    160,156  6556676  270
   TIL_288_FC70M6V_2_001    160,156  9524524  288
   TIL_288_FC70M6V_3_001    160,156  6158919   288
 
* kastevens@ucdavis.edu:
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations.
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp.
** The loss in quality was quantitativly small, so we don't expect the extra expense of lowering the concentration will be justified empirically.
 
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename.
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are < 313bp and  have > 3bp overlap. This seems to fit well with your result.
 
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density.
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the
** Drosophila libraries run in lane 2 at nominal density.

Revision as of 16:54, 15 July 2011

Links

Data

  • UCDAVIS plone
 https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  
 dpuiu
 ddr5fft6 
 https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/
  • IPST ftp
 ftp genomepc1.umd.edu
 ftpuser
 pinegenome

 cd PineUpload052911/
 bin
 prompt             # no Y/N?
 mget *
  • Local data
 ginkgo:
 /scratch1/dpuiu/PINE/PineUpload052911
 /scratch1/dpuiu/PINE/PineUpload070711

PineUpload052911

Chloroplast

                len      gc%
 cChloroplast   120481   38.55

cBACs

 .       elem       min    q1     q2     q3     max        mean       n50        sum            
 len     102        8288   89909  116121 140549 172161     113400     126689     11566806       
 gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        

Reads

 library        readLen   #mates
 FC638TR_001_8  146       22,729,231
 FC638TR_002_8  146       18,412,638
  • Notes
    • First 2bp of each read have higher A count
    • GC% variation:
                medianGC% 
 cChloroplast   38.55
 cBAC           37.61
 reads          39.04
  • cCholoplast alignments (bwasw)
 library               #hits   %hits 
 FC638TR_001_8_1	475254	2.09
 FC638TR_001_8_2	473331	2.08
 FC638TR_002_8_1	1009331	5.48
 FC638TR_002_8_2	1004341	5.45
  • cBAC alignments (bwasw)
 library               #hits   %hits 
 FC638TR_001_8_1	9722204	42.77
 FC638TR_001_8_2	9481188	41.71
 FC638TR_002_8_1	7684164	41.73
 FC638TR_002_8_2	7469151	40.56

Sampled reads

  • 100K sampled reads from each library (2*2*100K=400K)
 .       elem       min    q1     q2     q3     max        mean       n50        sum            
 gc%     400000     0.68   34.93  39.04  43.15  95.89      39.20      40.41      .
  • FC638TR_001_8_1 : 100K reads
 ref            qry               aligner      #hits      %hits   %identity(median)
 cBAC           FC638TR_001_8_1   bwasw        42971      43 
                                  nucmer       12477      12.5    95
                                  bowtie       1186       1.2%
 cChloroplast                     bwasw        2031       2%
                                  nucmer       1943       1.9%    100
                                  bowtie       1490       1.5%
  • FC638TR_00[12]_8_[12] : 4*100K reads
 ref            qry               aligner      #hits      %hits 
 cBAC           FC638TR_001_8_1   bwasw        42971      43
                FC638TR_001_8_2                41915      42
                FC638TR_002_8_1                42128      42
                FC638TR_002_8_2                40606      41

 cChloroplast   FC638TR_001_8_1                2031       2
                FC638TR_001_8_2                2033       2
                FC638TR_002_8_1                5370       5.3
                FC638TR_002_8_2                5330       5.3

PineUpload070711

Ecoli

                len     gc%
 cE_coli        4639675 50.79  

Cloning vector

                len    gc% 
 pFosDT5_2      8345   47.93

Reads (Drosophila)

 lib                      readLen  #mates    mea,std
 FC70M6V_6_001            160,156  23546475  343,30

 TIL_242_FC70M6V_2_002    160,156  9917211   242  
 TIL_242_FC70M6V_3_002    160,156  6276300   242

 TIL_254_FC70M6V_2_004    160,156  9279789   254
 TIL_254_FC70M6V_3_004    160,156  5924239   254  

 TIL_270_FC70M6V_2_003    160,156  10188776  270
 TIL_270_FC70M6V_3_003    160,156  6556676   270

 TIL_288_FC70M6V_2_001    160,156  9524524   288
 TIL_288_FC70M6V_3_001    160,156  6158919   288
  • kastevens@ucdavis.edu:
    • The files labeled TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.
    • Regarding pairing, each insert size was run in two lanes Y at two different concentrations.
    • Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp.
    • The loss in quality was quantitativly small, so we don't expect the extra expense of lowering the concentration will be justified empirically.
    • The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename.
    • However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are < 313bp and have > 3bp overlap. This seems to fit well with your result.
    • Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density.
    • The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the
    • Drosophila libraries run in lane 2 at nominal density.