Links
Data
 https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  
 dpuiu
 ddr5fft6 
 https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/
 ftp genomepc1.umd.edu
 ftpuser
 pinegenome
 cd PineUpload052911/
 bin
 prompt             # no Y/N?
 mget *
 ginkgo:
 /scratch1/dpuiu/PINE/PineUpload052911
 /scratch1/dpuiu/PINE/PineUpload070711
PineUpload052911
Chloroplast
                len      gc%
 cChloroplast   120481   38.55
cBACs
 .       elem       min    q1     q2     q3     max        mean       n50        sum            
 len     102        8288   89909  116121 140549 172161     113400     126689     11566806       
 gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        
Reads
 library        readLen   #mates
 FC638TR_001_8  146       22,729,231
 FC638TR_002_8  146       18,412,638
- Notes
- First 2bp of each read have higher A count
- GC% variation:
 
                medianGC% 
 cChloroplast   38.55
 cBAC           37.61
 reads          39.04
- cCholoplast alignments (bwasw)
 library               #hits   %hits 
 FC638TR_001_8_1	475254	2.09
 FC638TR_001_8_2	473331	2.08
 FC638TR_002_8_1	1009331	5.48
 FC638TR_002_8_2	1004341	5.45
 library               #hits   %hits 
 FC638TR_001_8_1	9722204	42.77
 FC638TR_001_8_2	9481188	41.71
 FC638TR_002_8_1	7684164	41.73
 FC638TR_002_8_2	7469151	40.56
Sampled reads
- 100K sampled reads from each library (2*2*100K=400K)
 .       elem       min    q1     q2     q3     max        mean       n50        sum            
 gc%     400000     0.68   34.93  39.04  43.15  95.89      39.20      40.41      .
- FC638TR_001_8_1 : 100K reads
 ref            qry               aligner      #hits      %hits   %identity(median)
 cBAC           FC638TR_001_8_1   bwasw        42971      43 
                                  nucmer       12477      12.5    95
                                  bowtie       1186       1.2%
 cChloroplast                     bwasw        2031       2%
                                  nucmer       1943       1.9%    100
                                  bowtie       1490       1.5%
- FC638TR_00[12]_8_[12] : 4*100K reads
 ref            qry               aligner      #hits      %hits 
 cBAC           FC638TR_001_8_1   bwasw        42971      43
                FC638TR_001_8_2                41915      42
                FC638TR_002_8_1                42128      42
                FC638TR_002_8_2                40606      41
 cChloroplast   FC638TR_001_8_1                2031       2
                FC638TR_001_8_2                2033       2
                FC638TR_002_8_1                5370       5.3
                FC638TR_002_8_2                5330       5.3
PineUpload070711
Ecoli
                len     gc%
 cE_coli        4639675 50.79  
Cloning vector
                len    gc% 
 pFosDT5_2      8345   47.93
Reads (Drosophila)
 lib                      readLen  #mates    mea,std
 FC70M6V_6_001            160,156  23546475  343,30
 TIL_242_FC70M6V_2_002    160,156  9917211   242  
 TIL_242_FC70M6V_3_002    160,156  6276300   242
 TIL_254_FC70M6V_2_004    160,156  9279789   254
 TIL_254_FC70M6V_3_004    160,156  5924239   254  
 TIL_270_FC70M6V_2_003    160,156  10188776  270
 TIL_270_FC70M6V_3_003    160,156  6556676   270
 TIL_288_FC70M6V_2_001    160,156  9524524   288
 TIL_288_FC70M6V_3_001    160,156  6158919   288
- kastevens@ucdavis.edu:
- The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.
- Regarding pairing, each insert size was run in two lanes Y at two different concentrations.
- Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp.
- The loss in quality was quantitativly small, so we don't expect the extra expense of lowering the concentration will be justified empirically.
- The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename.
- However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are < 313bp and  have > 3bp overlap. This seems to fit well with your result.
- Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density.
- The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the
- Drosophila libraries run in lane 2 at nominal density.