Pine tree: Difference between revisions

From Cbcb
Jump to navigation Jump to search
No edit summary
Line 40: Line 40:


== Reads ==
== Reads ==
   library        readLen  #mates
   library        readLen  #mates       mea,std      ~gc%
   FC638TR_001_8  146      22,729,231
   FC638TR_001_8  146      22,729,231   ?            39.04
   FC638TR_002_8  146      18,412,638
   FC638TR_002_8  146      18,412,638                 39.04


* Notes  
* Notes  
** First 2bp of each read have higher A count
** First 2bp of each read have higher A count
** GC% variation:
** GC% variation: cBAC < cChloroplast < reads
                medianGC%
 
  cChloroplast   38.55
  cBAC          37.61
  reads          39.04


* cCholoplast alignments (bwasw)
* cCholoplast alignments (bwasw)

Revision as of 17:27, 15 July 2011

Links

Data

  • UCDAVIS plone
 https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  
 dpuiu
 ddr5fft6 
 https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/
  • IPST ftp
 ftp genomepc1.umd.edu
 ftpuser
 pinegenome

 cd PineUpload052911/
 bin
 prompt             # no Y/N?
 mget *
  • Local data
 ginkgo:
 /fs/szattic-asmg7/PINE/PineUpload052911
 /fs/szattic-asmg7/PINE/PineUpload070711

PineUpload052911

Chloroplast

                len      gc%
 cChloroplast   120481   38.55

cBACs

 .       elem       min    q1     q2     q3     max        mean       n50        sum            
 len     102        8288   89909  116121 140549 172161     113400     126689     11566806       
 gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        

Reads

 library        readLen   #mates        mea,std      ~gc%
 FC638TR_001_8  146       22,729,231    ?            39.04
 FC638TR_002_8  146       18,412,638                 39.04
  • Notes
    • First 2bp of each read have higher A count
    • GC% variation: cBAC < cChloroplast < reads


  • cCholoplast alignments (bwasw)
 library               #hits   %hits 
 FC638TR_001_8_1	475254	2.09
 FC638TR_001_8_2	473331	2.08
 FC638TR_002_8_1	1009331	5.48
 FC638TR_002_8_2	1004341	5.45
  • cBAC alignments (bwasw)
 library               #hits   %hits 
 FC638TR_001_8_1	9722204	42.77
 FC638TR_001_8_2	9481188	41.71
 FC638TR_002_8_1	7684164	41.73
 FC638TR_002_8_2	7469151	40.56

Sampled reads

  • 100K sampled reads from each library (2*2*100K=400K)
 .       elem       min    q1     q2     q3     max        mean       n50        sum            
 gc%     400000     0.68   34.93  39.04  43.15  95.89      39.20      40.41      .
  • FC638TR_001_8_1 : 100K reads
 ref            qry               aligner      #hits      %hits   %identity(median)
 cBAC           FC638TR_001_8_1   bwasw        42971      43 
                                  nucmer       12477      12.5    95
                                  bowtie       1186       1.2%
 cChloroplast                     bwasw        2031       2%
                                  nucmer       1943       1.9%    100
                                  bowtie       1490       1.5%
  • FC638TR_00[12]_8_[12] : 4*100K reads
 ref            qry               aligner      #hits      %hits 
 cBAC           FC638TR_001_8_1   bwasw        42971      43
                FC638TR_001_8_2                41915      42
                FC638TR_002_8_1                42128      42
                FC638TR_002_8_2                40606      41

 cChloroplast   FC638TR_001_8_1                2031       2
                FC638TR_001_8_2                2033       2
                FC638TR_002_8_1                5370       5.3
                FC638TR_002_8_2                5330       5.3

PineUpload070711

Ecoli

                len     gc%
 cE_coli        4639675 50.79  

Cloning vector

                len    gc% 
 pFosDT5_2      8345   47.93

Drosophila refseq

 Chromosome      len            gc%
 2L              23,011,544     41
 2R              21,146,708     43
 3L              24,543,557     41
 3R              27,905,053     42
 4               1,351,857      35
 X               22,422,827     42 
 un              10,049,037     ?    
 mitochondrion   19,517         17
 total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100

Reads (Drosophila)

 lib                      readLen  #mates    mea,std
 FC70M6V_6_001            160,156  23546475  343,30

 TIL_242_FC70M6V_2_002    160,156  9917211   242  
 TIL_242_FC70M6V_3_002    160,156  6276300   242

 TIL_254_FC70M6V_2_004    160,156  9279789   254
 TIL_254_FC70M6V_3_004    160,156  5924239   254  

 TIL_270_FC70M6V_2_003    160,156  10188776  270
 TIL_270_FC70M6V_3_003    160,156  6556676   270

 TIL_288_FC70M6V_2_001    160,156  9524524   288
 TIL_288_FC70M6V_3_001    160,156  6158919   288
  • kastevens@ucdavis.edu:
    • The files labeled TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.
    • Regarding pairing, each insert size was run in two lanes Y at two different concentrations.
    • Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp.
    • The loss in quality was quantitativly small, so we don't expect the extra expense of lowering the concentration will be justified empirically.
    • The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename.
    • However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are < 313bp and have > 3bp overlap. This seems to fit well with your result.
    • Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density.
    • The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the
    • Drosophila libraries run in lane 2 at nominal density.