Pine tree
Jump to navigation
Jump to search
Links
- dendrome@ucdavis
- pinegenome.org
- NCBI Taxonomy record Pinus taeda or "loblolly pine"
Data
UCDAVIS plone
- Links
https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq dpuiu ddr5fft6 https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/
- Documents
- PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods 21 July 2011
IPST ftp
ftp genomepc1.umd.edu ftpuser pinegenome cd PineUpload052911/ bin prompt # no Y/N? mget *
Local data
ginkgo: /fs/szattic-asmg7/PINE/PineUpload052911 /fs/szattic-asmg7/PINE/PineUpload070711
PineUpload052911
Chloroplast
len gc% cChloroplast 120481 38.55
cBACs
. elem min q1 q2 q3 max mean n50 sum len 102 8288 89909 116121 140549 172161 113400 126689 11566806 gc% 102 34.44 36.56 37.61 38.80 52.88 37.94 37.66 3870.87
Reads
lane readLen #mates mea,std ~gc% FC638TR_001_8 146 22,729,231 400 39.04 FC638TR_002_8 146 18,412,638 400 39.04
- Quality decreases sharply after pos 120
FC638TR.qual.png
- First 10bp of each read have higher AG count
FC638TR.content.png
- Over 0.5% Ns certain positions
fwd: 1.015% pos=100 ; 0.81% pos=119 rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21
FC638TR.Ns.png
- GC% variation: cBAC(37.5%) < cChloroplast(38.5%) < reads(39%) < mito (44%+)
- cCholoplast alignments (bwasw)
lane #hits %hits #hits(uniq) FC638TR_001_8_1 475254 2.09 468309 FC638TR_001_8_2 473331 2.08 466185 FC638TR_002_8_1 1009331 5.48 995291 FC638TR_002_8_2 1004341 5.45 990122
- cBAC alignments (bwasw)
lane #hits %hits #hits(uniq) FC638TR_001_8_1 9722204 42.77 9533849 FC638TR_001_8_2 9481188 41.71 9303475 FC638TR_002_8_1 7684164 41.73 7535809 FC638TR_002_8_2 7469151 40.56 7330078
Sampled reads
- 100K sampled reads from each library (2*2*100K=400K)
. elem min q1 q2 q3 max mean n50 sum gc% 400000 0.68 34.93 39.04 43.15 95.89 39.20 40.41 .
- FC638TR_001_8_1 alignments
ref qry aligner #hits %hits %identity(median) cBAC FC638TR_001_8_1 bwasw 42971 43 nucmer 12477 12.5 95 bowtie 1186 1.2% cChloroplast bwasw 2031 2% nucmer 1943 1.9% 100 bowtie 1490 1.5%
- FC638TR_00[12]_8_[12] bwa alignments
ref qry aligner #hits %hits cBAC FC638TR_001_8_1 bwasw 42971 43 FC638TR_001_8_2 41915 42 FC638TR_002_8_1 42128 42 FC638TR_002_8_2 40606 41 cChloroplast FC638TR_001_8_1 2031 2 FC638TR_001_8_2 2033 2 FC638TR_002_8_1 5370 5.3 FC638TR_002_8_2 5330 5.3
SOAPdenovo's
#scaffold stats . elem min q1 q2 q3 max mean n50 sum -K47 -max_rd_len100 211820 100 143 156* 187 23273 227.95 . 48284629 -K31 -max_rd_len100 13747338 100 100 100 100 9185 108.04 . 1485269562 -K31 -d2 -D3 -max_rd_len100 74820 100 105 125 390 31673 320.75 . 23998536 -K31 -d20 -M3 -max_rd_len100 7859* 100 113 139 284 43079* 331.49 . 2605184* -K27 -d 2 -D 3 -max_rd_len100 70246 100 107 137 413 30683 369.81 . 25977758 -K27 -d 2 -D 2 -max_rd_len146 224963 100 110 128 343 23410 260.64 . 58635190
SOAPdenovo-31mer -K 27 -d 2 -D 3 -max_rd_len 100
#scaffold stats . elem min q1 q2 q3 max mean n50 sum scf 70246 100 107 137 413 30683* 369.81 . 25977758 ctg 8641885 28 28 31 37 7238 36.1 . 312425669
Alignment1
nucmer default parameters # Legend: all : all SOAPdenovo scaffolds cBAC : scaffolds aligned to cBACs cChloroplast : scaffolds aligned to cChloroplast mito : scaffolds aligned to at least one of the 31 complete plant mitochondrion sequence mito.Cycas_taitungensis : scaffolds aligned to at least one of the Cycas_taitungensis mitochondrion sequence (most hits) other : unaligned scaffolds
# scaffold length stats . elem min q1 q2 q3 max mean n50 sum all 70246 100 107 137 413 30683 369.81 . 25977758 cBAC 1839 100 124 242 625 23267 637.13 . 1171678 cChloroplast 73 100 117 139 185 416 161.47 . 11787 # why so bad??? mito 68 131 867 2274 7241 30683 4675.18 . 317912 mito.Cycas_taitungensis 64 111 844 1931 7114 30683* 4529.91 . 289914 other 68266 100 106 136 412 26715 358.54 . 24476381
#scaffold gc stats . elem min q1 q2 q3 max mean n50 sum all 70246 4.90 35.40 40.74 44.52 74.26 39.78 . . cBAC 1839 10.64 35.63 41.22 44.87 74.26 39.95 . . cChloroplast 73 25.65 31.09 33.33 36.89 42.31 33.76 . . mito 68 43.08 45.96 47.45 49.19 56.41 47.77 . . mito.Cycas_taitungensis 64 41.44 46.27 47.81 50.00 56.41 48.16 . . other 68266 4.90 35.40 40.71 44.50 70.00 39.77 . .
- The longest assembled scaffold was 30683bp and aligned to the mitochondrion database.
- The mitochondrion gc% seems to be significantly higher than the one of rest of the genome (48% vs 40%)
- The Cycas taitungensis mitochondrion (414903bp, 46.92%gc) had the most scaffolds aligned to it (64 out of 68).
NC_009618 Cycas taitungensis chloroplast, complete genome DNA; circular; Length: 163,403 nt NC_010303 Cycas taitungensis mitochondrion, complete genome DNA; circular; Length: 414,903 nt Cycas_taitungensis_mito-chloroplast.png
- Mitochondrial scaffolds
. elem min q1 q2 q3 max mean n50 sum scf 68 131 867 2274 7241 30683 4675.18 9407 317912 # used for alignment scf.gc% 68 43.08 45.96 47.45 49.19 56.41 47.77 47.45 3248.1 scf.noGaps 68 131 743 2049 6660 27931 4262.46 9052 289847
- Reads aligned to mitochondrial scaffolds (bwa bwasw)
lane #hits %hits FC638TR_001_8_1 12307 0.054 FC638TR_001_8_2 11933 FC638TR_002_8_1 28707 0.12 FC638TR_002_8_2 27211 total 80158 # 20X cvg for 100bp read len & 400K mito genome ; 29X cvg for 146bp read len
Alignment2
nucmer -l 20 -c 20; delta-filter -l 65 -q -o 75 ; filter for gc% >=44 #some of the mito hits align to cChloroplast & cBAC => might have an overestimate
# Mitochondrial scaffolds . elem min q1 q2 q3 max mean n50 sum scf.len 102 101 608 1931 7271 30683 5044.88 11204 514578 scf.gc% 102 44.07 46.12 47.45 49.33 56.41 48.05 47.47 4901.06
lane #hits %hits FC638TR_001_8_1 18614 FC638TR_001_8_2 18035 FC638TR_002_8_1 43961 FC638TR_002_8_2 42101 total 122707 # 30X cvg for 100bp read len & 400K mito genome
Alignments3
nucmer -l 20 -c 20; delta-filter -l 100 -q -o 75
. elem min q1 q2 q3 max mean n50 sum cChloroplast 136 100 117 142 187 628 168.34 0 22894 cBAC 6385 100 116 187 499 23267 597.00 0 3811871 mito 84 110 479 1791 7050 30683 4268.99 0 358595 other 63641 100 106 134 409 22471 342.30 0 21784398
SOAPdenovo-31mer -K 31 -d 20 -M 3 -max_rd_len 100
#scaffold stats . elem min q1 q2 q3 max mean n50 sum scf 7859* 100 113 139 284 43079* 331.49 . 2605184 ctg 200062 32 33 37 47 10392 48.52 . 9707307
# scaffold length stats . elem min q1 q2 q3 max mean n50 sum all 7859* 100 113 139 284 43079* 331.49 . 2605184 cChloroplast 20 111 193 436 6140 43079 5951.05 0 119021 cBAC 5117 100 114 141 320 13733 334.94 0 1713870 mito 8 101 134 685 1396 2166 749.75 0 5998 !!! VERY BAD other 2714 100 111 133 226 7353 282.35 0 766295
SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 -M 3 choloplast_mated_reads
#scaffold stats . elem min q1 q2 q3 max mean n50 sum scf 20 111 193 436 6140 42707 5928.20 0 118564
PineUpload070711
Ecoli
len gc% cE_coli 4639675 50.79
Cloning vector
len gc% pFosDT5_2 8345 47.93
Drosophila refseq
Chromosome len gc% 2L 23,011,544 41 2R 21,146,708 43 3L 24,543,557 41 3R 27,905,053 42 4 1,351,857 35 X 22,422,827 42 un 10,049,037 ? mitochondrion 19,517 17 total 137,586,636 ? # actually the chromosome lengths sum to 130,450,100
Reads (Drosophila)
lib readLen #mates mea,std ~gc% %merged(Tanja) %cChloroplast %cE_coli FC70M6V_6_001 160,156 23546475 343,30 42.5 0.1 12 TIL_242_FC70M6V_2_002 160,156 9917211 242 . 91.4% TIL_242_FC70M6V_3_002 160,156 6276300 242 92.7% TIL_254_FC70M6V_2_004 160,156 9279789 254 . 91.5% TIL_254_FC70M6V_3_004 160,156 5924239 254 92.9% TIL_270_FC70M6V_2_003 160,156 10188776 270 . 88.1% TIL_270_FC70M6V_3_003 160,156 6556676 270 90.3% TIL_288_FC70M6V_2_001 160,156 9524524 288 . 80.0% TIL_288_FC70M6V_3_001 160,156 6158919 288 83.0%
- kastevens@ucdavis.edu:
- The files labeled TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.
- Regarding pairing, each insert size was run in two lanes Y at two different concentrations.
- Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp.
- The loss in quality was quantitativly small, so we don't expect the extra expense of lowering the concentration will be justified empirically.
- The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename.
- However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are < 313bp and have > 3bp overlap. This seems to fit well with your result.
- Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density.
- The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the
- Drosophila libraries run in lane 2 at nominal density.