<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.umiacs.umd.edu/cbcb/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Dpuiu</id>
	<title>Cbcb - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.umiacs.umd.edu/cbcb/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Dpuiu"/>
	<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php/Special:Contributions/Dpuiu"/>
	<updated>2026-04-12T15:46:06Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.43.7</generator>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=User:Dpuiu&amp;diff=8932</id>
		<title>User:Dpuiu</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=User:Dpuiu&amp;diff=8932"/>
		<updated>2011-12-27T15:34:40Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* Other */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daniela Puiu&#039;s page&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= NCBI =&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/ NCBI] ; [http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi?cmd=browse&amp;amp;f=&amp;amp;m=main&amp;amp;s=browse AA] ; [http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?PAGE=Nucleotides&amp;amp;PROGRAM=blastn&amp;amp;MEGABLAST=on&amp;amp;BLAST_PROGRAMS=megaBlast&amp;amp;PAGE_TYPE=BlastSearch&amp;amp;SHOW_DEFAULTS=on BLAST] ; [http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi Microbial_Genomes] ; [http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi? SRA] [ftp://ftp.ncbi.nih.gov/pub/TraceDB/ShortRead/ SRA_FTP] ; [http://www.ncbi.nlm.nih.gov/Traces/trace.cgi? TA] ; [ftp://ftp.ncbi.nih.gov/pub/TraceDB/ TA_FTP] ; [ftp://ftp.ncbi.nih.gov/genomes/ Genomes_FTP] ; [ftp://ftp.ncbi.nih.gov/genomes/Bacteria Bacterial Genomes FTP] ; [http://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi Taxonomy CommonTree] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/genomes/genlist.cgi?taxid=2&amp;amp;type=0&amp;amp;name=Complete%20Bacteria Complete Bacteria]&lt;br /&gt;
&lt;br /&gt;
= Links =&lt;br /&gt;
&lt;br /&gt;
* [http://www.hgsc.bcm.tmc.edu/ Baylor]&lt;br /&gt;
* [http://www.embl.org/ EMBL]&lt;br /&gt;
* [http://emboss.sourceforge.net/apps/#Overview EMBOSS]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&amp;amp;cmd=search&amp;amp;term=homo%20sapiens Homo Sapiens Genome Projects]&lt;br /&gt;
* [http://genome.jgi-psf.org/mic_home.html JGI Microbial Genomes]&lt;br /&gt;
* [http://www.genomesonline.org/ JGI Genomes Online (GOLD)]&lt;br /&gt;
* [http://img.jgi.doe.gov/cgi-bin/pub/main.cgi JGI Integrated Microbial Genomes (IMG)]&lt;br /&gt;
* [http://www.sanger.ac.uk/ Sanger]&lt;br /&gt;
* [http://www.sanger.ac.uk/Projects/Microbes/ Sanger Microbial Genomes]&lt;br /&gt;
&lt;br /&gt;
* [http://www.thearkdb.org/arkdb/index.jsp ARKDB] genetic maps - cow, chicken ...&lt;br /&gt;
&lt;br /&gt;
= Projects =&lt;br /&gt;
 &lt;br /&gt;
&#039;&#039;&#039;DHS genomes&#039;&#039;&#039;&lt;br /&gt;
* [[Bioterrorism|Bioterrorism]]&lt;br /&gt;
* [[Bacillus_anthracis|Bacillus anthracis]]&lt;br /&gt;
* [[Burkholderia_mallei|Burkholderia mallei]]&lt;br /&gt;
* [[Burkholderia_pseudomallei|Burkholderia pseudomallei]]&lt;br /&gt;
* [[Clostridium_botulinum|Clostridium botulinum]] (Hall strain A str. ATCC 3502) ; uploaded to Insignia ; (other) ; some uploaded to Insignia&lt;br /&gt;
* [[Clostridium_perfringens|Clostridium perfringens]] ; some uploaded to Insignia&lt;br /&gt;
* [[Cryptosporidium_hominis|Cryptosporidium hominis]] (test)&lt;br /&gt;
* [[Francisella_tularensis_holarctica_OSU18|Francisella tularensis OSU18]] in Insignia&lt;br /&gt;
* [[Francisella_tularensis|Francisella tularensis]] some uploaded to Insignia&lt;br /&gt;
* [[Salmonella|Salmonella]] (Washington Univ in St Louis)&lt;br /&gt;
* [[Yersinia_enterocolitica|Yersinia enterocolitica]]&lt;br /&gt;
* [[Yersinia_pestis|Yersinia pestis]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
&#039;&#039;&#039;Other genomes&#039;&#039;&#039;&lt;br /&gt;
* [[Bumblebee|Bombus impatiens]]&lt;br /&gt;
* [[Brugia_malayi|Brugia malayi]]&lt;br /&gt;
* [[Bos_taurus|Bos taurus]] , [[Bos_taurus_redo|Bos taurus redo]] , [[Bos_taurus_3.0|Bos taurus 3.0]]&lt;br /&gt;
* [[dpuiu_cat|Cat]]&lt;br /&gt;
* [[Coffee_bacs|Coffee BACs]]&lt;br /&gt;
* [[Culex_pipiens_symbiont|Culex pipiens wolbachia symbiont]]&lt;br /&gt;
* [[Culex_pipiens_mitochondrion|Culex pipiens wolbachia mitochondrion]]&lt;br /&gt;
* [[Helicobacter_pylori|Helicobacter pylori]]&lt;br /&gt;
* [[Homo_sapiens|Homo sapiens]]&lt;br /&gt;
* [[Kalanchoe|Kalanchoe]]&lt;br /&gt;
* [[Megachile_rotundata|Megachile rotundata]]&lt;br /&gt;
* [[Methanobrevibacter_smithii|Methanobrevibacter smithii]]&lt;br /&gt;
* [[Pine_tree|Pine tree]]&lt;br /&gt;
* [[Pseudomonas_aeruginosa|Pseudomonas aeruginosa]]&lt;br /&gt;
* [[Pseudodomonas_syringae|Pseudomonas syringae]]&lt;br /&gt;
* [[Sea_urchin|Sea urchin]]&lt;br /&gt;
* [[Turkey|Turkey]]&lt;br /&gt;
* [[Xanthomonas_oryzae|Xanthomonas oryzae]] XOO&lt;br /&gt;
* [[Xanthomonas_campestris_pv_raphani|Xanthomonas campestris pv. raphani]] XCR&lt;br /&gt;
* [[Strawberry|Strawberry]]&lt;br /&gt;
* [[Ecoli_germany|Ecoli Germany]]&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Other =&lt;br /&gt;
* [[Trace_formatting|Sequencing]] &lt;br /&gt;
* [[dpuiu_CA|CA]] ; [[Assembly_merge|Assembly]]&lt;br /&gt;
* [[Comparative_assemblies|Comparative assemblies]]&lt;br /&gt;
* [[Metagenoms|Metagenomics]]&lt;br /&gt;
* [[NCBI_submission|NCBI submission]]&lt;br /&gt;
* [[Repeat_search|Repeat search]]&lt;br /&gt;
* [[Vector_trimming|Vector trimming]]&lt;br /&gt;
* [[Data_formats|Data formats]]&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
* [[dpuiu_Assemblathon|Assemblathon]]&lt;br /&gt;
* [[dpuiu_HTS|HTS]]&lt;br /&gt;
* [[dpuiu_definitions|Definitions]]&lt;br /&gt;
* [[dpuiu_meeting_notes|Meeting notes]]&lt;br /&gt;
* [[dpuiu_articles|My articles]]&lt;br /&gt;
* [[dpuiu_CS|CS]] ; [[dpuiu_Perl|Perl]] ; [[dpuiu_C|C]] ; [[dpuiu_Linux|Linux]] ; [[dpuiu_DOS|DOS]] ; [[douiu_wiki|wiki]] ; [[dpuiu_thunderbird|thunderbird]]&lt;br /&gt;
* [[Pop_group_meeting|Pop group meeting]] (Monday &#039;s 3pm)&lt;br /&gt;
* [[dpuiu_snp|Snp calling]]&lt;br /&gt;
* [[dpuiu_todo|ToDo]]&lt;br /&gt;
* [[dpuiu_JHU|JHU]]&lt;br /&gt;
--&amp;gt;&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=User:Dpuiu&amp;diff=8931</id>
		<title>User:Dpuiu</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=User:Dpuiu&amp;diff=8931"/>
		<updated>2011-12-27T15:33:49Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* Other */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daniela Puiu&#039;s page&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= NCBI =&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/ NCBI] ; [http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi?cmd=browse&amp;amp;f=&amp;amp;m=main&amp;amp;s=browse AA] ; [http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?PAGE=Nucleotides&amp;amp;PROGRAM=blastn&amp;amp;MEGABLAST=on&amp;amp;BLAST_PROGRAMS=megaBlast&amp;amp;PAGE_TYPE=BlastSearch&amp;amp;SHOW_DEFAULTS=on BLAST] ; [http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi Microbial_Genomes] ; [http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi? SRA] [ftp://ftp.ncbi.nih.gov/pub/TraceDB/ShortRead/ SRA_FTP] ; [http://www.ncbi.nlm.nih.gov/Traces/trace.cgi? TA] ; [ftp://ftp.ncbi.nih.gov/pub/TraceDB/ TA_FTP] ; [ftp://ftp.ncbi.nih.gov/genomes/ Genomes_FTP] ; [ftp://ftp.ncbi.nih.gov/genomes/Bacteria Bacterial Genomes FTP] ; [http://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi Taxonomy CommonTree] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/genomes/genlist.cgi?taxid=2&amp;amp;type=0&amp;amp;name=Complete%20Bacteria Complete Bacteria]&lt;br /&gt;
&lt;br /&gt;
= Links =&lt;br /&gt;
&lt;br /&gt;
* [http://www.hgsc.bcm.tmc.edu/ Baylor]&lt;br /&gt;
* [http://www.embl.org/ EMBL]&lt;br /&gt;
* [http://emboss.sourceforge.net/apps/#Overview EMBOSS]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&amp;amp;cmd=search&amp;amp;term=homo%20sapiens Homo Sapiens Genome Projects]&lt;br /&gt;
* [http://genome.jgi-psf.org/mic_home.html JGI Microbial Genomes]&lt;br /&gt;
* [http://www.genomesonline.org/ JGI Genomes Online (GOLD)]&lt;br /&gt;
* [http://img.jgi.doe.gov/cgi-bin/pub/main.cgi JGI Integrated Microbial Genomes (IMG)]&lt;br /&gt;
* [http://www.sanger.ac.uk/ Sanger]&lt;br /&gt;
* [http://www.sanger.ac.uk/Projects/Microbes/ Sanger Microbial Genomes]&lt;br /&gt;
&lt;br /&gt;
* [http://www.thearkdb.org/arkdb/index.jsp ARKDB] genetic maps - cow, chicken ...&lt;br /&gt;
&lt;br /&gt;
= Projects =&lt;br /&gt;
 &lt;br /&gt;
&#039;&#039;&#039;DHS genomes&#039;&#039;&#039;&lt;br /&gt;
* [[Bioterrorism|Bioterrorism]]&lt;br /&gt;
* [[Bacillus_anthracis|Bacillus anthracis]]&lt;br /&gt;
* [[Burkholderia_mallei|Burkholderia mallei]]&lt;br /&gt;
* [[Burkholderia_pseudomallei|Burkholderia pseudomallei]]&lt;br /&gt;
* [[Clostridium_botulinum|Clostridium botulinum]] (Hall strain A str. ATCC 3502) ; uploaded to Insignia ; (other) ; some uploaded to Insignia&lt;br /&gt;
* [[Clostridium_perfringens|Clostridium perfringens]] ; some uploaded to Insignia&lt;br /&gt;
* [[Cryptosporidium_hominis|Cryptosporidium hominis]] (test)&lt;br /&gt;
* [[Francisella_tularensis_holarctica_OSU18|Francisella tularensis OSU18]] in Insignia&lt;br /&gt;
* [[Francisella_tularensis|Francisella tularensis]] some uploaded to Insignia&lt;br /&gt;
* [[Salmonella|Salmonella]] (Washington Univ in St Louis)&lt;br /&gt;
* [[Yersinia_enterocolitica|Yersinia enterocolitica]]&lt;br /&gt;
* [[Yersinia_pestis|Yersinia pestis]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
&#039;&#039;&#039;Other genomes&#039;&#039;&#039;&lt;br /&gt;
* [[Bumblebee|Bombus impatiens]]&lt;br /&gt;
* [[Brugia_malayi|Brugia malayi]]&lt;br /&gt;
* [[Bos_taurus|Bos taurus]] , [[Bos_taurus_redo|Bos taurus redo]] , [[Bos_taurus_3.0|Bos taurus 3.0]]&lt;br /&gt;
* [[dpuiu_cat|Cat]]&lt;br /&gt;
* [[Coffee_bacs|Coffee BACs]]&lt;br /&gt;
* [[Culex_pipiens_symbiont|Culex pipiens wolbachia symbiont]]&lt;br /&gt;
* [[Culex_pipiens_mitochondrion|Culex pipiens wolbachia mitochondrion]]&lt;br /&gt;
* [[Helicobacter_pylori|Helicobacter pylori]]&lt;br /&gt;
* [[Homo_sapiens|Homo sapiens]]&lt;br /&gt;
* [[Kalanchoe|Kalanchoe]]&lt;br /&gt;
* [[Megachile_rotundata|Megachile rotundata]]&lt;br /&gt;
* [[Methanobrevibacter_smithii|Methanobrevibacter smithii]]&lt;br /&gt;
* [[Pine_tree|Pine tree]]&lt;br /&gt;
* [[Pseudomonas_aeruginosa|Pseudomonas aeruginosa]]&lt;br /&gt;
* [[Pseudodomonas_syringae|Pseudomonas syringae]]&lt;br /&gt;
* [[Sea_urchin|Sea urchin]]&lt;br /&gt;
* [[Turkey|Turkey]]&lt;br /&gt;
* [[Xanthomonas_oryzae|Xanthomonas oryzae]] XOO&lt;br /&gt;
* [[Xanthomonas_campestris_pv_raphani|Xanthomonas campestris pv. raphani]] XCR&lt;br /&gt;
* [[Strawberry|Strawberry]]&lt;br /&gt;
* [[Ecoli_germany|Ecoli Germany]]&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Other =&lt;br /&gt;
* [[Trace_formatting|Sequencing]] &lt;br /&gt;
* [[dpuiu_CA|CA]] ; [[Assembly_merge|Assembly]]&lt;br /&gt;
* [[dpuiu_Assemblathon|Assemblathon]]&lt;br /&gt;
* [[dpuiu_HTS|HTS]]&lt;br /&gt;
* [[Comparative_assemblies|Comparative assemblies]]&lt;br /&gt;
* [[Metagenoms|Metagenomics]]&lt;br /&gt;
* [[NCBI_submission|NCBI submission]]&lt;br /&gt;
* [[Repeat_search|Repeat search]]&lt;br /&gt;
* [[Vector_trimming|Vector trimming]]&lt;br /&gt;
* [[Data_formats|Data formats]]&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
* [[dpuiu_definitions|Definitions]]&lt;br /&gt;
* [[dpuiu_meeting_notes|Meeting notes]]&lt;br /&gt;
* [[dpuiu_articles|My articles]]&lt;br /&gt;
* [[dpuiu_CS|CS]] ; [[dpuiu_Perl|Perl]] ; [[dpuiu_C|C]] ; [[dpuiu_Linux|Linux]] ; [[dpuiu_DOS|DOS]] ; [[douiu_wiki|wiki]] ; [[dpuiu_thunderbird|thunderbird]]&lt;br /&gt;
* [[Pop_group_meeting|Pop group meeting]] (Monday &#039;s 3pm)&lt;br /&gt;
* [[dpuiu_snp|Snp calling]]&lt;br /&gt;
* [[dpuiu_todo|ToDo]]&lt;br /&gt;
* [[dpuiu_JHU|JHU]]&lt;br /&gt;
--&amp;gt;&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=User:Dpuiu&amp;diff=8930</id>
		<title>User:Dpuiu</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=User:Dpuiu&amp;diff=8930"/>
		<updated>2011-12-27T15:33:03Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* Test data sets */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daniela Puiu&#039;s page&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= NCBI =&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/ NCBI] ; [http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi?cmd=browse&amp;amp;f=&amp;amp;m=main&amp;amp;s=browse AA] ; [http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?PAGE=Nucleotides&amp;amp;PROGRAM=blastn&amp;amp;MEGABLAST=on&amp;amp;BLAST_PROGRAMS=megaBlast&amp;amp;PAGE_TYPE=BlastSearch&amp;amp;SHOW_DEFAULTS=on BLAST] ; [http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi Microbial_Genomes] ; [http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi? SRA] [ftp://ftp.ncbi.nih.gov/pub/TraceDB/ShortRead/ SRA_FTP] ; [http://www.ncbi.nlm.nih.gov/Traces/trace.cgi? TA] ; [ftp://ftp.ncbi.nih.gov/pub/TraceDB/ TA_FTP] ; [ftp://ftp.ncbi.nih.gov/genomes/ Genomes_FTP] ; [ftp://ftp.ncbi.nih.gov/genomes/Bacteria Bacterial Genomes FTP] ; [http://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi Taxonomy CommonTree] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/genomes/genlist.cgi?taxid=2&amp;amp;type=0&amp;amp;name=Complete%20Bacteria Complete Bacteria]&lt;br /&gt;
&lt;br /&gt;
= Links =&lt;br /&gt;
&lt;br /&gt;
* [http://www.hgsc.bcm.tmc.edu/ Baylor]&lt;br /&gt;
* [http://www.embl.org/ EMBL]&lt;br /&gt;
* [http://emboss.sourceforge.net/apps/#Overview EMBOSS]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&amp;amp;cmd=search&amp;amp;term=homo%20sapiens Homo Sapiens Genome Projects]&lt;br /&gt;
* [http://genome.jgi-psf.org/mic_home.html JGI Microbial Genomes]&lt;br /&gt;
* [http://www.genomesonline.org/ JGI Genomes Online (GOLD)]&lt;br /&gt;
* [http://img.jgi.doe.gov/cgi-bin/pub/main.cgi JGI Integrated Microbial Genomes (IMG)]&lt;br /&gt;
* [http://www.sanger.ac.uk/ Sanger]&lt;br /&gt;
* [http://www.sanger.ac.uk/Projects/Microbes/ Sanger Microbial Genomes]&lt;br /&gt;
&lt;br /&gt;
* [http://www.thearkdb.org/arkdb/index.jsp ARKDB] genetic maps - cow, chicken ...&lt;br /&gt;
&lt;br /&gt;
= Projects =&lt;br /&gt;
 &lt;br /&gt;
&#039;&#039;&#039;DHS genomes&#039;&#039;&#039;&lt;br /&gt;
* [[Bioterrorism|Bioterrorism]]&lt;br /&gt;
* [[Bacillus_anthracis|Bacillus anthracis]]&lt;br /&gt;
* [[Burkholderia_mallei|Burkholderia mallei]]&lt;br /&gt;
* [[Burkholderia_pseudomallei|Burkholderia pseudomallei]]&lt;br /&gt;
* [[Clostridium_botulinum|Clostridium botulinum]] (Hall strain A str. ATCC 3502) ; uploaded to Insignia ; (other) ; some uploaded to Insignia&lt;br /&gt;
* [[Clostridium_perfringens|Clostridium perfringens]] ; some uploaded to Insignia&lt;br /&gt;
* [[Cryptosporidium_hominis|Cryptosporidium hominis]] (test)&lt;br /&gt;
* [[Francisella_tularensis_holarctica_OSU18|Francisella tularensis OSU18]] in Insignia&lt;br /&gt;
* [[Francisella_tularensis|Francisella tularensis]] some uploaded to Insignia&lt;br /&gt;
* [[Salmonella|Salmonella]] (Washington Univ in St Louis)&lt;br /&gt;
* [[Yersinia_enterocolitica|Yersinia enterocolitica]]&lt;br /&gt;
* [[Yersinia_pestis|Yersinia pestis]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
&#039;&#039;&#039;Other genomes&#039;&#039;&#039;&lt;br /&gt;
* [[Bumblebee|Bombus impatiens]]&lt;br /&gt;
* [[Brugia_malayi|Brugia malayi]]&lt;br /&gt;
* [[Bos_taurus|Bos taurus]] , [[Bos_taurus_redo|Bos taurus redo]] , [[Bos_taurus_3.0|Bos taurus 3.0]]&lt;br /&gt;
* [[dpuiu_cat|Cat]]&lt;br /&gt;
* [[Coffee_bacs|Coffee BACs]]&lt;br /&gt;
* [[Culex_pipiens_symbiont|Culex pipiens wolbachia symbiont]]&lt;br /&gt;
* [[Culex_pipiens_mitochondrion|Culex pipiens wolbachia mitochondrion]]&lt;br /&gt;
* [[Helicobacter_pylori|Helicobacter pylori]]&lt;br /&gt;
* [[Homo_sapiens|Homo sapiens]]&lt;br /&gt;
* [[Kalanchoe|Kalanchoe]]&lt;br /&gt;
* [[Megachile_rotundata|Megachile rotundata]]&lt;br /&gt;
* [[Methanobrevibacter_smithii|Methanobrevibacter smithii]]&lt;br /&gt;
* [[Pine_tree|Pine tree]]&lt;br /&gt;
* [[Pseudomonas_aeruginosa|Pseudomonas aeruginosa]]&lt;br /&gt;
* [[Pseudodomonas_syringae|Pseudomonas syringae]]&lt;br /&gt;
* [[Sea_urchin|Sea urchin]]&lt;br /&gt;
* [[Turkey|Turkey]]&lt;br /&gt;
* [[Xanthomonas_oryzae|Xanthomonas oryzae]] XOO&lt;br /&gt;
* [[Xanthomonas_campestris_pv_raphani|Xanthomonas campestris pv. raphani]] XCR&lt;br /&gt;
* [[Strawberry|Strawberry]]&lt;br /&gt;
* [[Ecoli_germany|Ecoli Germany]]&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Other =&lt;br /&gt;
* [[Trace_formatting|Sequencing]] &lt;br /&gt;
* [[dpuiu_alignment|Alignment]] *&lt;br /&gt;
* [[dpuiu_CA|CA]] ; [[Assembly_merge|Assembly]]&lt;br /&gt;
* [[dpuiu_Assemblathon|Assemblathon]]&lt;br /&gt;
* [[dpuiu_HTS|HTS]]&lt;br /&gt;
* [[Comparative_assemblies|Comparative assemblies]]&lt;br /&gt;
* [[Metagenoms|Metagenomics]]&lt;br /&gt;
* [[NCBI_submission|NCBI submission]]&lt;br /&gt;
* [[Repeat_search|Repeat search]]&lt;br /&gt;
* [[Vector_trimming|Vector trimming]]&lt;br /&gt;
* [[Data_formats|Data formats]]&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
* [[dpuiu_definitions|Definitions]]&lt;br /&gt;
* [[dpuiu_meeting_notes|Meeting notes]]&lt;br /&gt;
* [[dpuiu_articles|My articles]]&lt;br /&gt;
* [[dpuiu_CS|CS]] ; [[dpuiu_Perl|Perl]] ; [[dpuiu_C|C]] ; [[dpuiu_Linux|Linux]] ; [[dpuiu_DOS|DOS]] ; [[douiu_wiki|wiki]] ; [[dpuiu_thunderbird|thunderbird]]&lt;br /&gt;
* [[Pop_group_meeting|Pop group meeting]] (Monday &#039;s 3pm)&lt;br /&gt;
* [[dpuiu_snp|Snp calling]]&lt;br /&gt;
* [[dpuiu_todo|ToDo]]&lt;br /&gt;
* [[dpuiu_JHU|JHU]]&lt;br /&gt;
--&amp;gt;&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=User:Dpuiu&amp;diff=8929</id>
		<title>User:Dpuiu</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=User:Dpuiu&amp;diff=8929"/>
		<updated>2011-12-27T15:32:39Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* CBCB software */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daniela Puiu&#039;s page&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= NCBI =&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/ NCBI] ; [http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi?cmd=browse&amp;amp;f=&amp;amp;m=main&amp;amp;s=browse AA] ; [http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?PAGE=Nucleotides&amp;amp;PROGRAM=blastn&amp;amp;MEGABLAST=on&amp;amp;BLAST_PROGRAMS=megaBlast&amp;amp;PAGE_TYPE=BlastSearch&amp;amp;SHOW_DEFAULTS=on BLAST] ; [http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi Microbial_Genomes] ; [http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi? SRA] [ftp://ftp.ncbi.nih.gov/pub/TraceDB/ShortRead/ SRA_FTP] ; [http://www.ncbi.nlm.nih.gov/Traces/trace.cgi? TA] ; [ftp://ftp.ncbi.nih.gov/pub/TraceDB/ TA_FTP] ; [ftp://ftp.ncbi.nih.gov/genomes/ Genomes_FTP] ; [ftp://ftp.ncbi.nih.gov/genomes/Bacteria Bacterial Genomes FTP] ; [http://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi Taxonomy CommonTree] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/genomes/genlist.cgi?taxid=2&amp;amp;type=0&amp;amp;name=Complete%20Bacteria Complete Bacteria]&lt;br /&gt;
&lt;br /&gt;
= Links =&lt;br /&gt;
&lt;br /&gt;
* [http://www.hgsc.bcm.tmc.edu/ Baylor]&lt;br /&gt;
* [http://www.embl.org/ EMBL]&lt;br /&gt;
* [http://emboss.sourceforge.net/apps/#Overview EMBOSS]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&amp;amp;cmd=search&amp;amp;term=homo%20sapiens Homo Sapiens Genome Projects]&lt;br /&gt;
* [http://genome.jgi-psf.org/mic_home.html JGI Microbial Genomes]&lt;br /&gt;
* [http://www.genomesonline.org/ JGI Genomes Online (GOLD)]&lt;br /&gt;
* [http://img.jgi.doe.gov/cgi-bin/pub/main.cgi JGI Integrated Microbial Genomes (IMG)]&lt;br /&gt;
* [http://www.sanger.ac.uk/ Sanger]&lt;br /&gt;
* [http://www.sanger.ac.uk/Projects/Microbes/ Sanger Microbial Genomes]&lt;br /&gt;
&lt;br /&gt;
* [http://www.thearkdb.org/arkdb/index.jsp ARKDB] genetic maps - cow, chicken ...&lt;br /&gt;
&lt;br /&gt;
= Projects =&lt;br /&gt;
 &lt;br /&gt;
&#039;&#039;&#039;DHS genomes&#039;&#039;&#039;&lt;br /&gt;
* [[Bioterrorism|Bioterrorism]]&lt;br /&gt;
* [[Bacillus_anthracis|Bacillus anthracis]]&lt;br /&gt;
* [[Burkholderia_mallei|Burkholderia mallei]]&lt;br /&gt;
* [[Burkholderia_pseudomallei|Burkholderia pseudomallei]]&lt;br /&gt;
* [[Clostridium_botulinum|Clostridium botulinum]] (Hall strain A str. ATCC 3502) ; uploaded to Insignia ; (other) ; some uploaded to Insignia&lt;br /&gt;
* [[Clostridium_perfringens|Clostridium perfringens]] ; some uploaded to Insignia&lt;br /&gt;
* [[Cryptosporidium_hominis|Cryptosporidium hominis]] (test)&lt;br /&gt;
* [[Francisella_tularensis_holarctica_OSU18|Francisella tularensis OSU18]] in Insignia&lt;br /&gt;
* [[Francisella_tularensis|Francisella tularensis]] some uploaded to Insignia&lt;br /&gt;
* [[Salmonella|Salmonella]] (Washington Univ in St Louis)&lt;br /&gt;
* [[Yersinia_enterocolitica|Yersinia enterocolitica]]&lt;br /&gt;
* [[Yersinia_pestis|Yersinia pestis]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
&#039;&#039;&#039;Other genomes&#039;&#039;&#039;&lt;br /&gt;
* [[Bumblebee|Bombus impatiens]]&lt;br /&gt;
* [[Brugia_malayi|Brugia malayi]]&lt;br /&gt;
* [[Bos_taurus|Bos taurus]] , [[Bos_taurus_redo|Bos taurus redo]] , [[Bos_taurus_3.0|Bos taurus 3.0]]&lt;br /&gt;
* [[dpuiu_cat|Cat]]&lt;br /&gt;
* [[Coffee_bacs|Coffee BACs]]&lt;br /&gt;
* [[Culex_pipiens_symbiont|Culex pipiens wolbachia symbiont]]&lt;br /&gt;
* [[Culex_pipiens_mitochondrion|Culex pipiens wolbachia mitochondrion]]&lt;br /&gt;
* [[Helicobacter_pylori|Helicobacter pylori]]&lt;br /&gt;
* [[Homo_sapiens|Homo sapiens]]&lt;br /&gt;
* [[Kalanchoe|Kalanchoe]]&lt;br /&gt;
* [[Megachile_rotundata|Megachile rotundata]]&lt;br /&gt;
* [[Methanobrevibacter_smithii|Methanobrevibacter smithii]]&lt;br /&gt;
* [[Pine_tree|Pine tree]]&lt;br /&gt;
* [[Pseudomonas_aeruginosa|Pseudomonas aeruginosa]]&lt;br /&gt;
* [[Pseudodomonas_syringae|Pseudomonas syringae]]&lt;br /&gt;
* [[Sea_urchin|Sea urchin]]&lt;br /&gt;
* [[Turkey|Turkey]]&lt;br /&gt;
* [[Xanthomonas_oryzae|Xanthomonas oryzae]] XOO&lt;br /&gt;
* [[Xanthomonas_campestris_pv_raphani|Xanthomonas campestris pv. raphani]] XCR&lt;br /&gt;
* [[Strawberry|Strawberry]]&lt;br /&gt;
* [[Ecoli_germany|Ecoli Germany]]&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Other =&lt;br /&gt;
* [[Trace_formatting|Sequencing]] &lt;br /&gt;
* [[dpuiu_alignment|Alignment]] *&lt;br /&gt;
* [[dpuiu_CA|CA]] ; [[Assembly_merge|Assembly]]&lt;br /&gt;
* [[dpuiu_Assemblathon|Assemblathon]]&lt;br /&gt;
* [[dpuiu_HTS|HTS]]&lt;br /&gt;
* [[Comparative_assemblies|Comparative assemblies]]&lt;br /&gt;
* [[Metagenoms|Metagenomics]]&lt;br /&gt;
* [[NCBI_submission|NCBI submission]]&lt;br /&gt;
* [[Repeat_search|Repeat search]]&lt;br /&gt;
* [[Vector_trimming|Vector trimming]]&lt;br /&gt;
* [[Data_formats|Data formats]]&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
* [[dpuiu_definitions|Definitions]]&lt;br /&gt;
* [[dpuiu_meeting_notes|Meeting notes]]&lt;br /&gt;
* [[dpuiu_articles|My articles]]&lt;br /&gt;
* [[dpuiu_CS|CS]] ; [[dpuiu_Perl|Perl]] ; [[dpuiu_C|C]] ; [[dpuiu_Linux|Linux]] ; [[dpuiu_DOS|DOS]] ; [[douiu_wiki|wiki]] ; [[dpuiu_thunderbird|thunderbird]]&lt;br /&gt;
* [[Pop_group_meeting|Pop group meeting]] (Monday &#039;s 3pm)&lt;br /&gt;
* [[dpuiu_snp|Snp calling]]&lt;br /&gt;
* [[dpuiu_todo|ToDo]]&lt;br /&gt;
* [[dpuiu_JHU|JHU]]&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Test data sets =&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Sanger (1)&#039;&#039;&#039;&lt;br /&gt;
* 23,536 reads ; 11,652 mates; 1 lib : 4.5K mean; 7X cvg&lt;br /&gt;
* Location: /fs/szasmg/Bacteria/F_tularensis/TA_FTP/libs/G878A1.frg &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Sanger (2)&#039;&#039;&#039;&lt;br /&gt;
* 100,781 reads ; 50,008 mates; 24 libs : 4.5K &amp;amp; 10K means; 16X cvg&lt;br /&gt;
* Location: /fs/szasmg/Bacteria/F_tularensis/TA_FTP/libs/G*frg &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa unpaired&#039;&#039;&#039;&lt;br /&gt;
* 2,659,250 36bp reads&lt;br /&gt;
* Location: /fs/szdata/Solexa/Streptococcus_suis/suisp17/strip3.fna&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa paired (1)&#039;&#039;&#039;&lt;br /&gt;
* 142,858 35bp reads; &lt;br /&gt;
* assemble into one 99,995 ctg/scaff by velvet&lt;br /&gt;
* Location /nfshomes/dpuiu/szdevel/velvet/data/test_reads.fa &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa paired (2)&#039;&#039;&#039;&lt;br /&gt;
* 92,143 35bp reads; &lt;br /&gt;
* assemble into 1 scaff by velvetp.sh  (velveth -shortPaired)&lt;br /&gt;
* assemble into 5 contigs by velvet.sh            (velveth)&lt;br /&gt;
* Location /nfshomes/dpuiu/szdevel/velvet/data/test_reads.subset.fa&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa &amp;amp; 454&#039;&#039;&#039;&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?study=SRP001087 E coli Whole Genome Sequencing on 454 and Illumina]&lt;br /&gt;
* Location /fs/szattic-asmg4/dpuiu/Ecoli&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=User:Dpuiu&amp;diff=8928</id>
		<title>User:Dpuiu</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=User:Dpuiu&amp;diff=8928"/>
		<updated>2011-12-27T15:32:15Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* Other */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daniela Puiu&#039;s page&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= NCBI =&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/ NCBI] ; [http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi?cmd=browse&amp;amp;f=&amp;amp;m=main&amp;amp;s=browse AA] ; [http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?PAGE=Nucleotides&amp;amp;PROGRAM=blastn&amp;amp;MEGABLAST=on&amp;amp;BLAST_PROGRAMS=megaBlast&amp;amp;PAGE_TYPE=BlastSearch&amp;amp;SHOW_DEFAULTS=on BLAST] ; [http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi Microbial_Genomes] ; [http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi? SRA] [ftp://ftp.ncbi.nih.gov/pub/TraceDB/ShortRead/ SRA_FTP] ; [http://www.ncbi.nlm.nih.gov/Traces/trace.cgi? TA] ; [ftp://ftp.ncbi.nih.gov/pub/TraceDB/ TA_FTP] ; [ftp://ftp.ncbi.nih.gov/genomes/ Genomes_FTP] ; [ftp://ftp.ncbi.nih.gov/genomes/Bacteria Bacterial Genomes FTP] ; [http://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi Taxonomy CommonTree] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/genomes/genlist.cgi?taxid=2&amp;amp;type=0&amp;amp;name=Complete%20Bacteria Complete Bacteria]&lt;br /&gt;
&lt;br /&gt;
= Links =&lt;br /&gt;
&lt;br /&gt;
* [http://www.hgsc.bcm.tmc.edu/ Baylor]&lt;br /&gt;
* [http://www.embl.org/ EMBL]&lt;br /&gt;
* [http://emboss.sourceforge.net/apps/#Overview EMBOSS]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&amp;amp;cmd=search&amp;amp;term=homo%20sapiens Homo Sapiens Genome Projects]&lt;br /&gt;
* [http://genome.jgi-psf.org/mic_home.html JGI Microbial Genomes]&lt;br /&gt;
* [http://www.genomesonline.org/ JGI Genomes Online (GOLD)]&lt;br /&gt;
* [http://img.jgi.doe.gov/cgi-bin/pub/main.cgi JGI Integrated Microbial Genomes (IMG)]&lt;br /&gt;
* [http://www.sanger.ac.uk/ Sanger]&lt;br /&gt;
* [http://www.sanger.ac.uk/Projects/Microbes/ Sanger Microbial Genomes]&lt;br /&gt;
&lt;br /&gt;
* [http://www.thearkdb.org/arkdb/index.jsp ARKDB] genetic maps - cow, chicken ...&lt;br /&gt;
&lt;br /&gt;
= Projects =&lt;br /&gt;
 &lt;br /&gt;
&#039;&#039;&#039;DHS genomes&#039;&#039;&#039;&lt;br /&gt;
* [[Bioterrorism|Bioterrorism]]&lt;br /&gt;
* [[Bacillus_anthracis|Bacillus anthracis]]&lt;br /&gt;
* [[Burkholderia_mallei|Burkholderia mallei]]&lt;br /&gt;
* [[Burkholderia_pseudomallei|Burkholderia pseudomallei]]&lt;br /&gt;
* [[Clostridium_botulinum|Clostridium botulinum]] (Hall strain A str. ATCC 3502) ; uploaded to Insignia ; (other) ; some uploaded to Insignia&lt;br /&gt;
* [[Clostridium_perfringens|Clostridium perfringens]] ; some uploaded to Insignia&lt;br /&gt;
* [[Cryptosporidium_hominis|Cryptosporidium hominis]] (test)&lt;br /&gt;
* [[Francisella_tularensis_holarctica_OSU18|Francisella tularensis OSU18]] in Insignia&lt;br /&gt;
* [[Francisella_tularensis|Francisella tularensis]] some uploaded to Insignia&lt;br /&gt;
* [[Salmonella|Salmonella]] (Washington Univ in St Louis)&lt;br /&gt;
* [[Yersinia_enterocolitica|Yersinia enterocolitica]]&lt;br /&gt;
* [[Yersinia_pestis|Yersinia pestis]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
&#039;&#039;&#039;Other genomes&#039;&#039;&#039;&lt;br /&gt;
* [[Bumblebee|Bombus impatiens]]&lt;br /&gt;
* [[Brugia_malayi|Brugia malayi]]&lt;br /&gt;
* [[Bos_taurus|Bos taurus]] , [[Bos_taurus_redo|Bos taurus redo]] , [[Bos_taurus_3.0|Bos taurus 3.0]]&lt;br /&gt;
* [[dpuiu_cat|Cat]]&lt;br /&gt;
* [[Coffee_bacs|Coffee BACs]]&lt;br /&gt;
* [[Culex_pipiens_symbiont|Culex pipiens wolbachia symbiont]]&lt;br /&gt;
* [[Culex_pipiens_mitochondrion|Culex pipiens wolbachia mitochondrion]]&lt;br /&gt;
* [[Helicobacter_pylori|Helicobacter pylori]]&lt;br /&gt;
* [[Homo_sapiens|Homo sapiens]]&lt;br /&gt;
* [[Kalanchoe|Kalanchoe]]&lt;br /&gt;
* [[Megachile_rotundata|Megachile rotundata]]&lt;br /&gt;
* [[Methanobrevibacter_smithii|Methanobrevibacter smithii]]&lt;br /&gt;
* [[Pine_tree|Pine tree]]&lt;br /&gt;
* [[Pseudomonas_aeruginosa|Pseudomonas aeruginosa]]&lt;br /&gt;
* [[Pseudodomonas_syringae|Pseudomonas syringae]]&lt;br /&gt;
* [[Sea_urchin|Sea urchin]]&lt;br /&gt;
* [[Turkey|Turkey]]&lt;br /&gt;
* [[Xanthomonas_oryzae|Xanthomonas oryzae]] XOO&lt;br /&gt;
* [[Xanthomonas_campestris_pv_raphani|Xanthomonas campestris pv. raphani]] XCR&lt;br /&gt;
* [[Strawberry|Strawberry]]&lt;br /&gt;
* [[Ecoli_germany|Ecoli Germany]]&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Other =&lt;br /&gt;
* [[Trace_formatting|Sequencing]] &lt;br /&gt;
* [[dpuiu_alignment|Alignment]] *&lt;br /&gt;
* [[dpuiu_CA|CA]] ; [[Assembly_merge|Assembly]]&lt;br /&gt;
* [[dpuiu_Assemblathon|Assemblathon]]&lt;br /&gt;
* [[dpuiu_HTS|HTS]]&lt;br /&gt;
* [[Comparative_assemblies|Comparative assemblies]]&lt;br /&gt;
* [[Metagenoms|Metagenomics]]&lt;br /&gt;
* [[NCBI_submission|NCBI submission]]&lt;br /&gt;
* [[Repeat_search|Repeat search]]&lt;br /&gt;
* [[Vector_trimming|Vector trimming]]&lt;br /&gt;
* [[Data_formats|Data formats]]&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
* [[dpuiu_definitions|Definitions]]&lt;br /&gt;
* [[dpuiu_meeting_notes|Meeting notes]]&lt;br /&gt;
* [[dpuiu_articles|My articles]]&lt;br /&gt;
* [[dpuiu_CS|CS]] ; [[dpuiu_Perl|Perl]] ; [[dpuiu_C|C]] ; [[dpuiu_Linux|Linux]] ; [[dpuiu_DOS|DOS]] ; [[douiu_wiki|wiki]] ; [[dpuiu_thunderbird|thunderbird]]&lt;br /&gt;
* [[Pop_group_meeting|Pop group meeting]] (Monday &#039;s 3pm)&lt;br /&gt;
* [[dpuiu_snp|Snp calling]]&lt;br /&gt;
* [[dpuiu_todo|ToDo]]&lt;br /&gt;
* [[dpuiu_JHU|JHU]]&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= CBCB software =&lt;br /&gt;
* Link&lt;br /&gt;
  https://wiki.umiacs.umd.edu/cbcb/index.php/Communal_Software#Core_Software&lt;br /&gt;
* Locations&lt;br /&gt;
  /fs/szdevel/core-cbcb-software&lt;br /&gt;
  /fs/sz-user-supported&lt;br /&gt;
* Change group&lt;br /&gt;
    $ groups&lt;br /&gt;
      dpuiu cbcb-staff cbcb cbcbwww&lt;br /&gt;
 &lt;br /&gt;
    $ newgroup cbcb-staff&lt;br /&gt;
       # creates a  new shell; &lt;br /&gt;
    $ ^D&lt;br /&gt;
* After install ...  &lt;br /&gt;
  find /fs/szdevel/core-cbcb-software | xargs chgrp cbcb-staff&lt;br /&gt;
  find /fs/szdevel/core-cbcb-software | xargs chmod o-w&lt;br /&gt;
&lt;br /&gt;
= Test data sets =&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Sanger (1)&#039;&#039;&#039;&lt;br /&gt;
* 23,536 reads ; 11,652 mates; 1 lib : 4.5K mean; 7X cvg&lt;br /&gt;
* Location: /fs/szasmg/Bacteria/F_tularensis/TA_FTP/libs/G878A1.frg &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Sanger (2)&#039;&#039;&#039;&lt;br /&gt;
* 100,781 reads ; 50,008 mates; 24 libs : 4.5K &amp;amp; 10K means; 16X cvg&lt;br /&gt;
* Location: /fs/szasmg/Bacteria/F_tularensis/TA_FTP/libs/G*frg &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa unpaired&#039;&#039;&#039;&lt;br /&gt;
* 2,659,250 36bp reads&lt;br /&gt;
* Location: /fs/szdata/Solexa/Streptococcus_suis/suisp17/strip3.fna&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa paired (1)&#039;&#039;&#039;&lt;br /&gt;
* 142,858 35bp reads; &lt;br /&gt;
* assemble into one 99,995 ctg/scaff by velvet&lt;br /&gt;
* Location /nfshomes/dpuiu/szdevel/velvet/data/test_reads.fa &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa paired (2)&#039;&#039;&#039;&lt;br /&gt;
* 92,143 35bp reads; &lt;br /&gt;
* assemble into 1 scaff by velvetp.sh  (velveth -shortPaired)&lt;br /&gt;
* assemble into 5 contigs by velvet.sh            (velveth)&lt;br /&gt;
* Location /nfshomes/dpuiu/szdevel/velvet/data/test_reads.subset.fa&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa &amp;amp; 454&#039;&#039;&#039;&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?study=SRP001087 E coli Whole Genome Sequencing on 454 and Illumina]&lt;br /&gt;
* Location /fs/szattic-asmg4/dpuiu/Ecoli&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=User:Dpuiu&amp;diff=8927</id>
		<title>User:Dpuiu</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=User:Dpuiu&amp;diff=8927"/>
		<updated>2011-12-27T15:31:03Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* Projects */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daniela Puiu&#039;s page&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= NCBI =&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/ NCBI] ; [http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi?cmd=browse&amp;amp;f=&amp;amp;m=main&amp;amp;s=browse AA] ; [http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?PAGE=Nucleotides&amp;amp;PROGRAM=blastn&amp;amp;MEGABLAST=on&amp;amp;BLAST_PROGRAMS=megaBlast&amp;amp;PAGE_TYPE=BlastSearch&amp;amp;SHOW_DEFAULTS=on BLAST] ; [http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi Microbial_Genomes] ; [http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi? SRA] [ftp://ftp.ncbi.nih.gov/pub/TraceDB/ShortRead/ SRA_FTP] ; [http://www.ncbi.nlm.nih.gov/Traces/trace.cgi? TA] ; [ftp://ftp.ncbi.nih.gov/pub/TraceDB/ TA_FTP] ; [ftp://ftp.ncbi.nih.gov/genomes/ Genomes_FTP] ; [ftp://ftp.ncbi.nih.gov/genomes/Bacteria Bacterial Genomes FTP] ; [http://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi Taxonomy CommonTree] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/genomes/genlist.cgi?taxid=2&amp;amp;type=0&amp;amp;name=Complete%20Bacteria Complete Bacteria]&lt;br /&gt;
&lt;br /&gt;
= Links =&lt;br /&gt;
&lt;br /&gt;
* [http://www.hgsc.bcm.tmc.edu/ Baylor]&lt;br /&gt;
* [http://www.embl.org/ EMBL]&lt;br /&gt;
* [http://emboss.sourceforge.net/apps/#Overview EMBOSS]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&amp;amp;cmd=search&amp;amp;term=homo%20sapiens Homo Sapiens Genome Projects]&lt;br /&gt;
* [http://genome.jgi-psf.org/mic_home.html JGI Microbial Genomes]&lt;br /&gt;
* [http://www.genomesonline.org/ JGI Genomes Online (GOLD)]&lt;br /&gt;
* [http://img.jgi.doe.gov/cgi-bin/pub/main.cgi JGI Integrated Microbial Genomes (IMG)]&lt;br /&gt;
* [http://www.sanger.ac.uk/ Sanger]&lt;br /&gt;
* [http://www.sanger.ac.uk/Projects/Microbes/ Sanger Microbial Genomes]&lt;br /&gt;
&lt;br /&gt;
* [http://www.thearkdb.org/arkdb/index.jsp ARKDB] genetic maps - cow, chicken ...&lt;br /&gt;
&lt;br /&gt;
= Projects =&lt;br /&gt;
 &lt;br /&gt;
&#039;&#039;&#039;DHS genomes&#039;&#039;&#039;&lt;br /&gt;
* [[Bioterrorism|Bioterrorism]]&lt;br /&gt;
* [[Bacillus_anthracis|Bacillus anthracis]]&lt;br /&gt;
* [[Burkholderia_mallei|Burkholderia mallei]]&lt;br /&gt;
* [[Burkholderia_pseudomallei|Burkholderia pseudomallei]]&lt;br /&gt;
* [[Clostridium_botulinum|Clostridium botulinum]] (Hall strain A str. ATCC 3502) ; uploaded to Insignia ; (other) ; some uploaded to Insignia&lt;br /&gt;
* [[Clostridium_perfringens|Clostridium perfringens]] ; some uploaded to Insignia&lt;br /&gt;
* [[Cryptosporidium_hominis|Cryptosporidium hominis]] (test)&lt;br /&gt;
* [[Francisella_tularensis_holarctica_OSU18|Francisella tularensis OSU18]] in Insignia&lt;br /&gt;
* [[Francisella_tularensis|Francisella tularensis]] some uploaded to Insignia&lt;br /&gt;
* [[Salmonella|Salmonella]] (Washington Univ in St Louis)&lt;br /&gt;
* [[Yersinia_enterocolitica|Yersinia enterocolitica]]&lt;br /&gt;
* [[Yersinia_pestis|Yersinia pestis]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
&#039;&#039;&#039;Other genomes&#039;&#039;&#039;&lt;br /&gt;
* [[Bumblebee|Bombus impatiens]]&lt;br /&gt;
* [[Brugia_malayi|Brugia malayi]]&lt;br /&gt;
* [[Bos_taurus|Bos taurus]] , [[Bos_taurus_redo|Bos taurus redo]] , [[Bos_taurus_3.0|Bos taurus 3.0]]&lt;br /&gt;
* [[dpuiu_cat|Cat]]&lt;br /&gt;
* [[Coffee_bacs|Coffee BACs]]&lt;br /&gt;
* [[Culex_pipiens_symbiont|Culex pipiens wolbachia symbiont]]&lt;br /&gt;
* [[Culex_pipiens_mitochondrion|Culex pipiens wolbachia mitochondrion]]&lt;br /&gt;
* [[Helicobacter_pylori|Helicobacter pylori]]&lt;br /&gt;
* [[Homo_sapiens|Homo sapiens]]&lt;br /&gt;
* [[Kalanchoe|Kalanchoe]]&lt;br /&gt;
* [[Megachile_rotundata|Megachile rotundata]]&lt;br /&gt;
* [[Methanobrevibacter_smithii|Methanobrevibacter smithii]]&lt;br /&gt;
* [[Pine_tree|Pine tree]]&lt;br /&gt;
* [[Pseudomonas_aeruginosa|Pseudomonas aeruginosa]]&lt;br /&gt;
* [[Pseudodomonas_syringae|Pseudomonas syringae]]&lt;br /&gt;
* [[Sea_urchin|Sea urchin]]&lt;br /&gt;
* [[Turkey|Turkey]]&lt;br /&gt;
* [[Xanthomonas_oryzae|Xanthomonas oryzae]] XOO&lt;br /&gt;
* [[Xanthomonas_campestris_pv_raphani|Xanthomonas campestris pv. raphani]] XCR&lt;br /&gt;
* [[Strawberry|Strawberry]]&lt;br /&gt;
* [[Ecoli_germany|Ecoli Germany]]&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Other =&lt;br /&gt;
* [[Trace_formatting|Sequencing]] &lt;br /&gt;
* [[dpuiu_alignment|Alignment]] *&lt;br /&gt;
* [[dpuiu_CA|CA]] ; [[Assembly_merge|Assembly]]&lt;br /&gt;
* [[dpuiu_Assemblathon|Assemblathon]]&lt;br /&gt;
* [[dpuiu_HTS|HTS]]&lt;br /&gt;
* [[Comparative_assemblies|Comparative assemblies]]&lt;br /&gt;
* [[Metagenoms|Metagenomics]]&lt;br /&gt;
* [[NCBI_submission|NCBI submission]]&lt;br /&gt;
* [[Repeat_search|Repeat search]]&lt;br /&gt;
* [[Vector_trimming|Vector trimming]]&lt;br /&gt;
* [[Data_formats|Data formats]]&lt;br /&gt;
* [[dpuiu_definitions|Definitions]]&lt;br /&gt;
* [[dpuiu_meeting_notes|Meeting notes]]&lt;br /&gt;
* [[dpuiu_articles|My articles]]&lt;br /&gt;
* [[dpuiu_CS|CS]] ; [[dpuiu_Perl|Perl]] ; [[dpuiu_C|C]] ; [[dpuiu_Linux|Linux]] ; [[dpuiu_DOS|DOS]] ; [[douiu_wiki|wiki]] ; [[dpuiu_thunderbird|thunderbird]]&lt;br /&gt;
* [[Pop_group_meeting|Pop group meeting]] (Monday &#039;s 3pm)&lt;br /&gt;
* [[dpuiu_snp|Snp calling]]&lt;br /&gt;
* [[dpuiu_todo|ToDo]]&lt;br /&gt;
&lt;br /&gt;
* [[dpuiu_JHU|JHU]]&lt;br /&gt;
&lt;br /&gt;
= CBCB software =&lt;br /&gt;
* Link&lt;br /&gt;
  https://wiki.umiacs.umd.edu/cbcb/index.php/Communal_Software#Core_Software&lt;br /&gt;
* Locations&lt;br /&gt;
  /fs/szdevel/core-cbcb-software&lt;br /&gt;
  /fs/sz-user-supported&lt;br /&gt;
* Change group&lt;br /&gt;
    $ groups&lt;br /&gt;
      dpuiu cbcb-staff cbcb cbcbwww&lt;br /&gt;
 &lt;br /&gt;
    $ newgroup cbcb-staff&lt;br /&gt;
       # creates a  new shell; &lt;br /&gt;
    $ ^D&lt;br /&gt;
* After install ...  &lt;br /&gt;
  find /fs/szdevel/core-cbcb-software | xargs chgrp cbcb-staff&lt;br /&gt;
  find /fs/szdevel/core-cbcb-software | xargs chmod o-w&lt;br /&gt;
&lt;br /&gt;
= Test data sets =&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Sanger (1)&#039;&#039;&#039;&lt;br /&gt;
* 23,536 reads ; 11,652 mates; 1 lib : 4.5K mean; 7X cvg&lt;br /&gt;
* Location: /fs/szasmg/Bacteria/F_tularensis/TA_FTP/libs/G878A1.frg &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Sanger (2)&#039;&#039;&#039;&lt;br /&gt;
* 100,781 reads ; 50,008 mates; 24 libs : 4.5K &amp;amp; 10K means; 16X cvg&lt;br /&gt;
* Location: /fs/szasmg/Bacteria/F_tularensis/TA_FTP/libs/G*frg &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa unpaired&#039;&#039;&#039;&lt;br /&gt;
* 2,659,250 36bp reads&lt;br /&gt;
* Location: /fs/szdata/Solexa/Streptococcus_suis/suisp17/strip3.fna&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa paired (1)&#039;&#039;&#039;&lt;br /&gt;
* 142,858 35bp reads; &lt;br /&gt;
* assemble into one 99,995 ctg/scaff by velvet&lt;br /&gt;
* Location /nfshomes/dpuiu/szdevel/velvet/data/test_reads.fa &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa paired (2)&#039;&#039;&#039;&lt;br /&gt;
* 92,143 35bp reads; &lt;br /&gt;
* assemble into 1 scaff by velvetp.sh  (velveth -shortPaired)&lt;br /&gt;
* assemble into 5 contigs by velvet.sh            (velveth)&lt;br /&gt;
* Location /nfshomes/dpuiu/szdevel/velvet/data/test_reads.subset.fa&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa &amp;amp; 454&#039;&#039;&#039;&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?study=SRP001087 E coli Whole Genome Sequencing on 454 and Illumina]&lt;br /&gt;
* Location /fs/szattic-asmg4/dpuiu/Ecoli&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=User:Dpuiu&amp;diff=8926</id>
		<title>User:Dpuiu</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=User:Dpuiu&amp;diff=8926"/>
		<updated>2011-12-27T15:30:18Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* Links */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daniela Puiu&#039;s page&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= NCBI =&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/ NCBI] ; [http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi?cmd=browse&amp;amp;f=&amp;amp;m=main&amp;amp;s=browse AA] ; [http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?PAGE=Nucleotides&amp;amp;PROGRAM=blastn&amp;amp;MEGABLAST=on&amp;amp;BLAST_PROGRAMS=megaBlast&amp;amp;PAGE_TYPE=BlastSearch&amp;amp;SHOW_DEFAULTS=on BLAST] ; [http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi Microbial_Genomes] ; [http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi? SRA] [ftp://ftp.ncbi.nih.gov/pub/TraceDB/ShortRead/ SRA_FTP] ; [http://www.ncbi.nlm.nih.gov/Traces/trace.cgi? TA] ; [ftp://ftp.ncbi.nih.gov/pub/TraceDB/ TA_FTP] ; [ftp://ftp.ncbi.nih.gov/genomes/ Genomes_FTP] ; [ftp://ftp.ncbi.nih.gov/genomes/Bacteria Bacterial Genomes FTP] ; [http://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi Taxonomy CommonTree] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/genomes/genlist.cgi?taxid=2&amp;amp;type=0&amp;amp;name=Complete%20Bacteria Complete Bacteria]&lt;br /&gt;
&lt;br /&gt;
= Links =&lt;br /&gt;
&lt;br /&gt;
* [http://www.hgsc.bcm.tmc.edu/ Baylor]&lt;br /&gt;
* [http://www.embl.org/ EMBL]&lt;br /&gt;
* [http://emboss.sourceforge.net/apps/#Overview EMBOSS]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&amp;amp;cmd=search&amp;amp;term=homo%20sapiens Homo Sapiens Genome Projects]&lt;br /&gt;
* [http://genome.jgi-psf.org/mic_home.html JGI Microbial Genomes]&lt;br /&gt;
* [http://www.genomesonline.org/ JGI Genomes Online (GOLD)]&lt;br /&gt;
* [http://img.jgi.doe.gov/cgi-bin/pub/main.cgi JGI Integrated Microbial Genomes (IMG)]&lt;br /&gt;
* [http://www.sanger.ac.uk/ Sanger]&lt;br /&gt;
* [http://www.sanger.ac.uk/Projects/Microbes/ Sanger Microbial Genomes]&lt;br /&gt;
&lt;br /&gt;
* [http://www.thearkdb.org/arkdb/index.jsp ARKDB] genetic maps - cow, chicken ...&lt;br /&gt;
&lt;br /&gt;
= Projects =&lt;br /&gt;
 &lt;br /&gt;
&#039;&#039;&#039;DHS genomes&#039;&#039;&#039;&lt;br /&gt;
* [[Bioterrorism|Bioterrorism]]&lt;br /&gt;
* [[Bacillus_anthracis|Bacillus anthracis]]&lt;br /&gt;
* [[Burkholderia_mallei|Burkholderia mallei]]&lt;br /&gt;
* [[Burkholderia_pseudomallei|Burkholderia pseudomallei]]&lt;br /&gt;
* [[Clostridium_botulinum|Clostridium botulinum]] (Hall strain A str. ATCC 3502) ; uploaded to Insignia ; (other) ; some uploaded to Insignia&lt;br /&gt;
* [[Clostridium_perfringens|Clostridium perfringens]] ; some uploaded to Insignia&lt;br /&gt;
* [[Cryptosporidium_hominis|Cryptosporidium hominis]] (test)&lt;br /&gt;
* [[Francisella_tularensis_holarctica_OSU18|Francisella tularensis OSU18]] in Insignia&lt;br /&gt;
* [[Francisella_tularensis|Francisella tularensis]] some uploaded to Insignia&lt;br /&gt;
* [[Salmonella|Salmonella]] (Washington Univ in St Louis)&lt;br /&gt;
* [[Yersinia_enterocolitica|Yersinia enterocolitica]]&lt;br /&gt;
* [[Yersinia_pestis|Yersinia pestis]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Other genomes&#039;&#039;&#039;&lt;br /&gt;
* [[Bumblebee|Bombus impatiens]]&lt;br /&gt;
* [[Brugia_malayi|Brugia malayi]]&lt;br /&gt;
* [[Bos_taurus|Bos taurus]] , [[Bos_taurus_redo|Bos taurus redo]] , [[Bos_taurus_3.0|Bos taurus 3.0]]&lt;br /&gt;
* [[dpuiu_cat|Cat]]&lt;br /&gt;
* [[Coffee_bacs|Coffee BACs]]&lt;br /&gt;
* [[Culex_pipiens_symbiont|Culex pipiens wolbachia symbiont]]&lt;br /&gt;
* [[Culex_pipiens_mitochondrion|Culex pipiens wolbachia mitochondrion]]&lt;br /&gt;
* [[Helicobacter_pylori|Helicobacter pylori]]&lt;br /&gt;
* [[Homo_sapiens|Homo sapiens]]&lt;br /&gt;
* [[Kalanchoe|Kalanchoe]]&lt;br /&gt;
* [[Megachile_rotundata|Megachile rotundata]]&lt;br /&gt;
* [[Methanobrevibacter_smithii|Methanobrevibacter smithii]]&lt;br /&gt;
* [[Pine_tree|Pine tree]]&lt;br /&gt;
* [[Pseudomonas_aeruginosa|Pseudomonas aeruginosa]]&lt;br /&gt;
* [[Pseudodomonas_syringae|Pseudomonas syringae]]&lt;br /&gt;
* [[Sea_urchin|Sea urchin]]&lt;br /&gt;
* [[Turkey|Turkey]]&lt;br /&gt;
* [[Xanthomonas_oryzae|Xanthomonas oryzae]] XOO&lt;br /&gt;
* [[Xanthomonas_campestris_pv_raphani|Xanthomonas campestris pv. raphani]] XCR&lt;br /&gt;
* [[Strawberry|Strawberry]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* [[Ecoli_germany|Ecoli Germany]]&lt;br /&gt;
&lt;br /&gt;
= Other =&lt;br /&gt;
* [[Trace_formatting|Sequencing]] &lt;br /&gt;
* [[dpuiu_alignment|Alignment]] *&lt;br /&gt;
* [[dpuiu_CA|CA]] ; [[Assembly_merge|Assembly]]&lt;br /&gt;
* [[dpuiu_Assemblathon|Assemblathon]]&lt;br /&gt;
* [[dpuiu_HTS|HTS]]&lt;br /&gt;
* [[Comparative_assemblies|Comparative assemblies]]&lt;br /&gt;
* [[Metagenoms|Metagenomics]]&lt;br /&gt;
* [[NCBI_submission|NCBI submission]]&lt;br /&gt;
* [[Repeat_search|Repeat search]]&lt;br /&gt;
* [[Vector_trimming|Vector trimming]]&lt;br /&gt;
* [[Data_formats|Data formats]]&lt;br /&gt;
* [[dpuiu_definitions|Definitions]]&lt;br /&gt;
* [[dpuiu_meeting_notes|Meeting notes]]&lt;br /&gt;
* [[dpuiu_articles|My articles]]&lt;br /&gt;
* [[dpuiu_CS|CS]] ; [[dpuiu_Perl|Perl]] ; [[dpuiu_C|C]] ; [[dpuiu_Linux|Linux]] ; [[dpuiu_DOS|DOS]] ; [[douiu_wiki|wiki]] ; [[dpuiu_thunderbird|thunderbird]]&lt;br /&gt;
* [[Pop_group_meeting|Pop group meeting]] (Monday &#039;s 3pm)&lt;br /&gt;
* [[dpuiu_snp|Snp calling]]&lt;br /&gt;
* [[dpuiu_todo|ToDo]]&lt;br /&gt;
&lt;br /&gt;
* [[dpuiu_JHU|JHU]]&lt;br /&gt;
&lt;br /&gt;
= CBCB software =&lt;br /&gt;
* Link&lt;br /&gt;
  https://wiki.umiacs.umd.edu/cbcb/index.php/Communal_Software#Core_Software&lt;br /&gt;
* Locations&lt;br /&gt;
  /fs/szdevel/core-cbcb-software&lt;br /&gt;
  /fs/sz-user-supported&lt;br /&gt;
* Change group&lt;br /&gt;
    $ groups&lt;br /&gt;
      dpuiu cbcb-staff cbcb cbcbwww&lt;br /&gt;
 &lt;br /&gt;
    $ newgroup cbcb-staff&lt;br /&gt;
       # creates a  new shell; &lt;br /&gt;
    $ ^D&lt;br /&gt;
* After install ...  &lt;br /&gt;
  find /fs/szdevel/core-cbcb-software | xargs chgrp cbcb-staff&lt;br /&gt;
  find /fs/szdevel/core-cbcb-software | xargs chmod o-w&lt;br /&gt;
&lt;br /&gt;
= Test data sets =&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Sanger (1)&#039;&#039;&#039;&lt;br /&gt;
* 23,536 reads ; 11,652 mates; 1 lib : 4.5K mean; 7X cvg&lt;br /&gt;
* Location: /fs/szasmg/Bacteria/F_tularensis/TA_FTP/libs/G878A1.frg &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Sanger (2)&#039;&#039;&#039;&lt;br /&gt;
* 100,781 reads ; 50,008 mates; 24 libs : 4.5K &amp;amp; 10K means; 16X cvg&lt;br /&gt;
* Location: /fs/szasmg/Bacteria/F_tularensis/TA_FTP/libs/G*frg &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa unpaired&#039;&#039;&#039;&lt;br /&gt;
* 2,659,250 36bp reads&lt;br /&gt;
* Location: /fs/szdata/Solexa/Streptococcus_suis/suisp17/strip3.fna&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa paired (1)&#039;&#039;&#039;&lt;br /&gt;
* 142,858 35bp reads; &lt;br /&gt;
* assemble into one 99,995 ctg/scaff by velvet&lt;br /&gt;
* Location /nfshomes/dpuiu/szdevel/velvet/data/test_reads.fa &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa paired (2)&#039;&#039;&#039;&lt;br /&gt;
* 92,143 35bp reads; &lt;br /&gt;
* assemble into 1 scaff by velvetp.sh  (velveth -shortPaired)&lt;br /&gt;
* assemble into 5 contigs by velvet.sh            (velveth)&lt;br /&gt;
* Location /nfshomes/dpuiu/szdevel/velvet/data/test_reads.subset.fa&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa &amp;amp; 454&#039;&#039;&#039;&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?study=SRP001087 E coli Whole Genome Sequencing on 454 and Illumina]&lt;br /&gt;
* Location /fs/szattic-asmg4/dpuiu/Ecoli&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=User:Dpuiu&amp;diff=8925</id>
		<title>User:Dpuiu</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=User:Dpuiu&amp;diff=8925"/>
		<updated>2011-12-27T15:29:44Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* Bookmarks */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daniela Puiu&#039;s page&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= NCBI =&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/ NCBI] ; [http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi?cmd=browse&amp;amp;f=&amp;amp;m=main&amp;amp;s=browse AA] ; [http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?PAGE=Nucleotides&amp;amp;PROGRAM=blastn&amp;amp;MEGABLAST=on&amp;amp;BLAST_PROGRAMS=megaBlast&amp;amp;PAGE_TYPE=BlastSearch&amp;amp;SHOW_DEFAULTS=on BLAST] ; [http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi Microbial_Genomes] ; [http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi? SRA] [ftp://ftp.ncbi.nih.gov/pub/TraceDB/ShortRead/ SRA_FTP] ; [http://www.ncbi.nlm.nih.gov/Traces/trace.cgi? TA] ; [ftp://ftp.ncbi.nih.gov/pub/TraceDB/ TA_FTP] ; [ftp://ftp.ncbi.nih.gov/genomes/ Genomes_FTP] ; [ftp://ftp.ncbi.nih.gov/genomes/Bacteria Bacterial Genomes FTP] ; [http://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi Taxonomy CommonTree] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/genomes/genlist.cgi?taxid=2&amp;amp;type=0&amp;amp;name=Complete%20Bacteria Complete Bacteria]&lt;br /&gt;
&lt;br /&gt;
= Links =&lt;br /&gt;
&lt;br /&gt;
* [http://www.hgsc.bcm.tmc.edu/ Baylor]&lt;br /&gt;
* [http://www.embl.org/ EMBL]&lt;br /&gt;
* [http://emboss.sourceforge.net/apps/#Overview EMBOSS]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&amp;amp;cmd=search&amp;amp;term=homo%20sapiens Homo Sapiens Genome Projects]&lt;br /&gt;
* [http://genome.jgi-psf.org/mic_home.html JGI Microbial Genomes]&lt;br /&gt;
* [http://www.genomesonline.org/ JGI Genomes Online (GOLD)]&lt;br /&gt;
* [http://img.jgi.doe.gov/cgi-bin/pub/main.cgi JGI Integrated Microbial Genomes (IMG)]&lt;br /&gt;
* [http://www.sanger.ac.uk/ Sanger]&lt;br /&gt;
* [http://www.sanger.ac.uk/Projects/Microbes/ Sanger Microbial Genomes]&lt;br /&gt;
* [https://wiki.umiacs.umd.edu/cbcb-private/index.php/Main_Page CBCB]&lt;br /&gt;
** NCBI complete genomes: /fs/szdata//ncbi/genomes/Bacteria/  (*.fna)&lt;br /&gt;
** www -&amp;gt; /fs/www-cbcb/htdocs/&lt;br /&gt;
** Personal web site http://www.cbcb.umd.edu/~dpuiu -&amp;gt; /fs/www/users/dpuiu/&lt;br /&gt;
** Personal ftp site ftp://ftp.cbcb.umd.edu/pub/data/dpuiu -&amp;gt; /fs/ftp-cbcb/pub/data/dpuiu/&lt;br /&gt;
 &lt;br /&gt;
* [http://www.thearkdb.org/arkdb/index.jsp ARKDB] genetic maps - cow, chicken ...&lt;br /&gt;
&lt;br /&gt;
= Projects =&lt;br /&gt;
 &lt;br /&gt;
&#039;&#039;&#039;DHS genomes&#039;&#039;&#039;&lt;br /&gt;
* [[Bioterrorism|Bioterrorism]]&lt;br /&gt;
* [[Bacillus_anthracis|Bacillus anthracis]]&lt;br /&gt;
* [[Burkholderia_mallei|Burkholderia mallei]]&lt;br /&gt;
* [[Burkholderia_pseudomallei|Burkholderia pseudomallei]]&lt;br /&gt;
* [[Clostridium_botulinum|Clostridium botulinum]] (Hall strain A str. ATCC 3502) ; uploaded to Insignia ; (other) ; some uploaded to Insignia&lt;br /&gt;
* [[Clostridium_perfringens|Clostridium perfringens]] ; some uploaded to Insignia&lt;br /&gt;
* [[Cryptosporidium_hominis|Cryptosporidium hominis]] (test)&lt;br /&gt;
* [[Francisella_tularensis_holarctica_OSU18|Francisella tularensis OSU18]] in Insignia&lt;br /&gt;
* [[Francisella_tularensis|Francisella tularensis]] some uploaded to Insignia&lt;br /&gt;
* [[Salmonella|Salmonella]] (Washington Univ in St Louis)&lt;br /&gt;
* [[Yersinia_enterocolitica|Yersinia enterocolitica]]&lt;br /&gt;
* [[Yersinia_pestis|Yersinia pestis]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Other genomes&#039;&#039;&#039;&lt;br /&gt;
* [[Bumblebee|Bombus impatiens]]&lt;br /&gt;
* [[Brugia_malayi|Brugia malayi]]&lt;br /&gt;
* [[Bos_taurus|Bos taurus]] , [[Bos_taurus_redo|Bos taurus redo]] , [[Bos_taurus_3.0|Bos taurus 3.0]]&lt;br /&gt;
* [[dpuiu_cat|Cat]]&lt;br /&gt;
* [[Coffee_bacs|Coffee BACs]]&lt;br /&gt;
* [[Culex_pipiens_symbiont|Culex pipiens wolbachia symbiont]]&lt;br /&gt;
* [[Culex_pipiens_mitochondrion|Culex pipiens wolbachia mitochondrion]]&lt;br /&gt;
* [[Helicobacter_pylori|Helicobacter pylori]]&lt;br /&gt;
* [[Homo_sapiens|Homo sapiens]]&lt;br /&gt;
* [[Kalanchoe|Kalanchoe]]&lt;br /&gt;
* [[Megachile_rotundata|Megachile rotundata]]&lt;br /&gt;
* [[Methanobrevibacter_smithii|Methanobrevibacter smithii]]&lt;br /&gt;
* [[Pine_tree|Pine tree]]&lt;br /&gt;
* [[Pseudomonas_aeruginosa|Pseudomonas aeruginosa]]&lt;br /&gt;
* [[Pseudodomonas_syringae|Pseudomonas syringae]]&lt;br /&gt;
* [[Sea_urchin|Sea urchin]]&lt;br /&gt;
* [[Turkey|Turkey]]&lt;br /&gt;
* [[Xanthomonas_oryzae|Xanthomonas oryzae]] XOO&lt;br /&gt;
* [[Xanthomonas_campestris_pv_raphani|Xanthomonas campestris pv. raphani]] XCR&lt;br /&gt;
* [[Strawberry|Strawberry]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* [[Ecoli_germany|Ecoli Germany]]&lt;br /&gt;
&lt;br /&gt;
= Other =&lt;br /&gt;
* [[Trace_formatting|Sequencing]] &lt;br /&gt;
* [[dpuiu_alignment|Alignment]] *&lt;br /&gt;
* [[dpuiu_CA|CA]] ; [[Assembly_merge|Assembly]]&lt;br /&gt;
* [[dpuiu_Assemblathon|Assemblathon]]&lt;br /&gt;
* [[dpuiu_HTS|HTS]]&lt;br /&gt;
* [[Comparative_assemblies|Comparative assemblies]]&lt;br /&gt;
* [[Metagenoms|Metagenomics]]&lt;br /&gt;
* [[NCBI_submission|NCBI submission]]&lt;br /&gt;
* [[Repeat_search|Repeat search]]&lt;br /&gt;
* [[Vector_trimming|Vector trimming]]&lt;br /&gt;
* [[Data_formats|Data formats]]&lt;br /&gt;
* [[dpuiu_definitions|Definitions]]&lt;br /&gt;
* [[dpuiu_meeting_notes|Meeting notes]]&lt;br /&gt;
* [[dpuiu_articles|My articles]]&lt;br /&gt;
* [[dpuiu_CS|CS]] ; [[dpuiu_Perl|Perl]] ; [[dpuiu_C|C]] ; [[dpuiu_Linux|Linux]] ; [[dpuiu_DOS|DOS]] ; [[douiu_wiki|wiki]] ; [[dpuiu_thunderbird|thunderbird]]&lt;br /&gt;
* [[Pop_group_meeting|Pop group meeting]] (Monday &#039;s 3pm)&lt;br /&gt;
* [[dpuiu_snp|Snp calling]]&lt;br /&gt;
* [[dpuiu_todo|ToDo]]&lt;br /&gt;
&lt;br /&gt;
* [[dpuiu_JHU|JHU]]&lt;br /&gt;
&lt;br /&gt;
= CBCB software =&lt;br /&gt;
* Link&lt;br /&gt;
  https://wiki.umiacs.umd.edu/cbcb/index.php/Communal_Software#Core_Software&lt;br /&gt;
* Locations&lt;br /&gt;
  /fs/szdevel/core-cbcb-software&lt;br /&gt;
  /fs/sz-user-supported&lt;br /&gt;
* Change group&lt;br /&gt;
    $ groups&lt;br /&gt;
      dpuiu cbcb-staff cbcb cbcbwww&lt;br /&gt;
 &lt;br /&gt;
    $ newgroup cbcb-staff&lt;br /&gt;
       # creates a  new shell; &lt;br /&gt;
    $ ^D&lt;br /&gt;
* After install ...  &lt;br /&gt;
  find /fs/szdevel/core-cbcb-software | xargs chgrp cbcb-staff&lt;br /&gt;
  find /fs/szdevel/core-cbcb-software | xargs chmod o-w&lt;br /&gt;
&lt;br /&gt;
= Test data sets =&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Sanger (1)&#039;&#039;&#039;&lt;br /&gt;
* 23,536 reads ; 11,652 mates; 1 lib : 4.5K mean; 7X cvg&lt;br /&gt;
* Location: /fs/szasmg/Bacteria/F_tularensis/TA_FTP/libs/G878A1.frg &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Sanger (2)&#039;&#039;&#039;&lt;br /&gt;
* 100,781 reads ; 50,008 mates; 24 libs : 4.5K &amp;amp; 10K means; 16X cvg&lt;br /&gt;
* Location: /fs/szasmg/Bacteria/F_tularensis/TA_FTP/libs/G*frg &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa unpaired&#039;&#039;&#039;&lt;br /&gt;
* 2,659,250 36bp reads&lt;br /&gt;
* Location: /fs/szdata/Solexa/Streptococcus_suis/suisp17/strip3.fna&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa paired (1)&#039;&#039;&#039;&lt;br /&gt;
* 142,858 35bp reads; &lt;br /&gt;
* assemble into one 99,995 ctg/scaff by velvet&lt;br /&gt;
* Location /nfshomes/dpuiu/szdevel/velvet/data/test_reads.fa &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa paired (2)&#039;&#039;&#039;&lt;br /&gt;
* 92,143 35bp reads; &lt;br /&gt;
* assemble into 1 scaff by velvetp.sh  (velveth -shortPaired)&lt;br /&gt;
* assemble into 5 contigs by velvet.sh            (velveth)&lt;br /&gt;
* Location /nfshomes/dpuiu/szdevel/velvet/data/test_reads.subset.fa&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa &amp;amp; 454&#039;&#039;&#039;&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?study=SRP001087 E coli Whole Genome Sequencing on 454 and Illumina]&lt;br /&gt;
* Location /fs/szattic-asmg4/dpuiu/Ecoli&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=User:Dpuiu&amp;diff=8924</id>
		<title>User:Dpuiu</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=User:Dpuiu&amp;diff=8924"/>
		<updated>2011-12-27T15:29:11Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* Bookmarks */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daniela Puiu&#039;s page&lt;br /&gt;
&lt;br /&gt;
= Bookmarks =&lt;br /&gt;
[[dpuiu_Perl|Perl]] ; &lt;br /&gt;
[[dpuiu_C|C]] ; &lt;br /&gt;
[[dpuiu_Linux|Linux]];&lt;br /&gt;
[[dpuiu_Windows|Windows]];&lt;br /&gt;
[[dpuiu_R|R]];&lt;br /&gt;
[http://www.bioperl.org/wiki/HOWTOs BioPerl HOWTOs];&lt;br /&gt;
&lt;br /&gt;
= NCBI =&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/ NCBI] ; [http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi?cmd=browse&amp;amp;f=&amp;amp;m=main&amp;amp;s=browse AA] ; [http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?PAGE=Nucleotides&amp;amp;PROGRAM=blastn&amp;amp;MEGABLAST=on&amp;amp;BLAST_PROGRAMS=megaBlast&amp;amp;PAGE_TYPE=BlastSearch&amp;amp;SHOW_DEFAULTS=on BLAST] ; [http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi Microbial_Genomes] ; [http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi? SRA] [ftp://ftp.ncbi.nih.gov/pub/TraceDB/ShortRead/ SRA_FTP] ; [http://www.ncbi.nlm.nih.gov/Traces/trace.cgi? TA] ; [ftp://ftp.ncbi.nih.gov/pub/TraceDB/ TA_FTP] ; [ftp://ftp.ncbi.nih.gov/genomes/ Genomes_FTP] ; [ftp://ftp.ncbi.nih.gov/genomes/Bacteria Bacterial Genomes FTP] ; [http://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi Taxonomy CommonTree] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/genomes/genlist.cgi?taxid=2&amp;amp;type=0&amp;amp;name=Complete%20Bacteria Complete Bacteria]&lt;br /&gt;
&lt;br /&gt;
= Links =&lt;br /&gt;
&lt;br /&gt;
* [http://www.hgsc.bcm.tmc.edu/ Baylor]&lt;br /&gt;
* [http://www.embl.org/ EMBL]&lt;br /&gt;
* [http://emboss.sourceforge.net/apps/#Overview EMBOSS]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&amp;amp;cmd=search&amp;amp;term=homo%20sapiens Homo Sapiens Genome Projects]&lt;br /&gt;
* [http://genome.jgi-psf.org/mic_home.html JGI Microbial Genomes]&lt;br /&gt;
* [http://www.genomesonline.org/ JGI Genomes Online (GOLD)]&lt;br /&gt;
* [http://img.jgi.doe.gov/cgi-bin/pub/main.cgi JGI Integrated Microbial Genomes (IMG)]&lt;br /&gt;
* [http://www.sanger.ac.uk/ Sanger]&lt;br /&gt;
* [http://www.sanger.ac.uk/Projects/Microbes/ Sanger Microbial Genomes]&lt;br /&gt;
* [https://wiki.umiacs.umd.edu/cbcb-private/index.php/Main_Page CBCB]&lt;br /&gt;
** NCBI complete genomes: /fs/szdata//ncbi/genomes/Bacteria/  (*.fna)&lt;br /&gt;
** www -&amp;gt; /fs/www-cbcb/htdocs/&lt;br /&gt;
** Personal web site http://www.cbcb.umd.edu/~dpuiu -&amp;gt; /fs/www/users/dpuiu/&lt;br /&gt;
** Personal ftp site ftp://ftp.cbcb.umd.edu/pub/data/dpuiu -&amp;gt; /fs/ftp-cbcb/pub/data/dpuiu/&lt;br /&gt;
 &lt;br /&gt;
* [http://www.thearkdb.org/arkdb/index.jsp ARKDB] genetic maps - cow, chicken ...&lt;br /&gt;
&lt;br /&gt;
= Projects =&lt;br /&gt;
 &lt;br /&gt;
&#039;&#039;&#039;DHS genomes&#039;&#039;&#039;&lt;br /&gt;
* [[Bioterrorism|Bioterrorism]]&lt;br /&gt;
* [[Bacillus_anthracis|Bacillus anthracis]]&lt;br /&gt;
* [[Burkholderia_mallei|Burkholderia mallei]]&lt;br /&gt;
* [[Burkholderia_pseudomallei|Burkholderia pseudomallei]]&lt;br /&gt;
* [[Clostridium_botulinum|Clostridium botulinum]] (Hall strain A str. ATCC 3502) ; uploaded to Insignia ; (other) ; some uploaded to Insignia&lt;br /&gt;
* [[Clostridium_perfringens|Clostridium perfringens]] ; some uploaded to Insignia&lt;br /&gt;
* [[Cryptosporidium_hominis|Cryptosporidium hominis]] (test)&lt;br /&gt;
* [[Francisella_tularensis_holarctica_OSU18|Francisella tularensis OSU18]] in Insignia&lt;br /&gt;
* [[Francisella_tularensis|Francisella tularensis]] some uploaded to Insignia&lt;br /&gt;
* [[Salmonella|Salmonella]] (Washington Univ in St Louis)&lt;br /&gt;
* [[Yersinia_enterocolitica|Yersinia enterocolitica]]&lt;br /&gt;
* [[Yersinia_pestis|Yersinia pestis]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Other genomes&#039;&#039;&#039;&lt;br /&gt;
* [[Bumblebee|Bombus impatiens]]&lt;br /&gt;
* [[Brugia_malayi|Brugia malayi]]&lt;br /&gt;
* [[Bos_taurus|Bos taurus]] , [[Bos_taurus_redo|Bos taurus redo]] , [[Bos_taurus_3.0|Bos taurus 3.0]]&lt;br /&gt;
* [[dpuiu_cat|Cat]]&lt;br /&gt;
* [[Coffee_bacs|Coffee BACs]]&lt;br /&gt;
* [[Culex_pipiens_symbiont|Culex pipiens wolbachia symbiont]]&lt;br /&gt;
* [[Culex_pipiens_mitochondrion|Culex pipiens wolbachia mitochondrion]]&lt;br /&gt;
* [[Helicobacter_pylori|Helicobacter pylori]]&lt;br /&gt;
* [[Homo_sapiens|Homo sapiens]]&lt;br /&gt;
* [[Kalanchoe|Kalanchoe]]&lt;br /&gt;
* [[Megachile_rotundata|Megachile rotundata]]&lt;br /&gt;
* [[Methanobrevibacter_smithii|Methanobrevibacter smithii]]&lt;br /&gt;
* [[Pine_tree|Pine tree]]&lt;br /&gt;
* [[Pseudomonas_aeruginosa|Pseudomonas aeruginosa]]&lt;br /&gt;
* [[Pseudodomonas_syringae|Pseudomonas syringae]]&lt;br /&gt;
* [[Sea_urchin|Sea urchin]]&lt;br /&gt;
* [[Turkey|Turkey]]&lt;br /&gt;
* [[Xanthomonas_oryzae|Xanthomonas oryzae]] XOO&lt;br /&gt;
* [[Xanthomonas_campestris_pv_raphani|Xanthomonas campestris pv. raphani]] XCR&lt;br /&gt;
* [[Strawberry|Strawberry]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* [[Ecoli_germany|Ecoli Germany]]&lt;br /&gt;
&lt;br /&gt;
= Other =&lt;br /&gt;
* [[Trace_formatting|Sequencing]] &lt;br /&gt;
* [[dpuiu_alignment|Alignment]] *&lt;br /&gt;
* [[dpuiu_CA|CA]] ; [[Assembly_merge|Assembly]]&lt;br /&gt;
* [[dpuiu_Assemblathon|Assemblathon]]&lt;br /&gt;
* [[dpuiu_HTS|HTS]]&lt;br /&gt;
* [[Comparative_assemblies|Comparative assemblies]]&lt;br /&gt;
* [[Metagenoms|Metagenomics]]&lt;br /&gt;
* [[NCBI_submission|NCBI submission]]&lt;br /&gt;
* [[Repeat_search|Repeat search]]&lt;br /&gt;
* [[Vector_trimming|Vector trimming]]&lt;br /&gt;
* [[Data_formats|Data formats]]&lt;br /&gt;
* [[dpuiu_definitions|Definitions]]&lt;br /&gt;
* [[dpuiu_meeting_notes|Meeting notes]]&lt;br /&gt;
* [[dpuiu_articles|My articles]]&lt;br /&gt;
* [[dpuiu_CS|CS]] ; [[dpuiu_Perl|Perl]] ; [[dpuiu_C|C]] ; [[dpuiu_Linux|Linux]] ; [[dpuiu_DOS|DOS]] ; [[douiu_wiki|wiki]] ; [[dpuiu_thunderbird|thunderbird]]&lt;br /&gt;
* [[Pop_group_meeting|Pop group meeting]] (Monday &#039;s 3pm)&lt;br /&gt;
* [[dpuiu_snp|Snp calling]]&lt;br /&gt;
* [[dpuiu_todo|ToDo]]&lt;br /&gt;
&lt;br /&gt;
* [[dpuiu_JHU|JHU]]&lt;br /&gt;
&lt;br /&gt;
= CBCB software =&lt;br /&gt;
* Link&lt;br /&gt;
  https://wiki.umiacs.umd.edu/cbcb/index.php/Communal_Software#Core_Software&lt;br /&gt;
* Locations&lt;br /&gt;
  /fs/szdevel/core-cbcb-software&lt;br /&gt;
  /fs/sz-user-supported&lt;br /&gt;
* Change group&lt;br /&gt;
    $ groups&lt;br /&gt;
      dpuiu cbcb-staff cbcb cbcbwww&lt;br /&gt;
 &lt;br /&gt;
    $ newgroup cbcb-staff&lt;br /&gt;
       # creates a  new shell; &lt;br /&gt;
    $ ^D&lt;br /&gt;
* After install ...  &lt;br /&gt;
  find /fs/szdevel/core-cbcb-software | xargs chgrp cbcb-staff&lt;br /&gt;
  find /fs/szdevel/core-cbcb-software | xargs chmod o-w&lt;br /&gt;
&lt;br /&gt;
= Test data sets =&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Sanger (1)&#039;&#039;&#039;&lt;br /&gt;
* 23,536 reads ; 11,652 mates; 1 lib : 4.5K mean; 7X cvg&lt;br /&gt;
* Location: /fs/szasmg/Bacteria/F_tularensis/TA_FTP/libs/G878A1.frg &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Sanger (2)&#039;&#039;&#039;&lt;br /&gt;
* 100,781 reads ; 50,008 mates; 24 libs : 4.5K &amp;amp; 10K means; 16X cvg&lt;br /&gt;
* Location: /fs/szasmg/Bacteria/F_tularensis/TA_FTP/libs/G*frg &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa unpaired&#039;&#039;&#039;&lt;br /&gt;
* 2,659,250 36bp reads&lt;br /&gt;
* Location: /fs/szdata/Solexa/Streptococcus_suis/suisp17/strip3.fna&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa paired (1)&#039;&#039;&#039;&lt;br /&gt;
* 142,858 35bp reads; &lt;br /&gt;
* assemble into one 99,995 ctg/scaff by velvet&lt;br /&gt;
* Location /nfshomes/dpuiu/szdevel/velvet/data/test_reads.fa &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa paired (2)&#039;&#039;&#039;&lt;br /&gt;
* 92,143 35bp reads; &lt;br /&gt;
* assemble into 1 scaff by velvetp.sh  (velveth -shortPaired)&lt;br /&gt;
* assemble into 5 contigs by velvet.sh            (velveth)&lt;br /&gt;
* Location /nfshomes/dpuiu/szdevel/velvet/data/test_reads.subset.fa&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solexa &amp;amp; 454&#039;&#039;&#039;&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?study=SRP001087 E coli Whole Genome Sequencing on 454 and Illumina]&lt;br /&gt;
* Location /fs/szattic-asmg4/dpuiu/Ecoli&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=File:GAGE.summary200.txt&amp;diff=8922</id>
		<title>File:GAGE.summary200.txt</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=File:GAGE.summary200.txt&amp;diff=8922"/>
		<updated>2011-08-26T16:50:06Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Dpuiu_Linux&amp;diff=8915</id>
		<title>Dpuiu Linux</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Dpuiu_Linux&amp;diff=8915"/>
		<updated>2011-08-12T13:08:00Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* Audio/Video manipulation */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Version Control =&lt;br /&gt;
== Sourceforge ==&lt;br /&gt;
&lt;br /&gt;
  shell server:  	shell.sourceforge.net&lt;br /&gt;
  CVS server: 	        PROJECTNAME.cvs.sourceforge.net&lt;br /&gt;
  Subversion server: 	PROJECTNAME.svn.sourceforge.net&lt;br /&gt;
 &lt;br /&gt;
Authentication:&lt;br /&gt;
* [http://alexandria.wiki.sourceforge.net/SSH+Key+Generation KeyGen]&lt;br /&gt;
* Generate the enw key:&lt;br /&gt;
   sh-keygen -t dsa -C dxpuiu@shell.sf.net&lt;br /&gt;
   cat /nfshomes/dpuiu/.ssh/id_dsa.pub &amp;gt;&amp;gt; /nfshomes/dpuiu/.ssh/authorized_keys&lt;br /&gt;
* Upload key to the Sourceforge server (&amp;quot;Account Maintenance&amp;quot; site)&lt;br /&gt;
&lt;br /&gt;
== Patching ==&lt;br /&gt;
  diff  file.old file.new &amp;gt; file.diff&lt;br /&gt;
  patch file.old file.patch&lt;br /&gt;
&lt;br /&gt;
== CVS ==&lt;br /&gt;
&lt;br /&gt;
* Local CVS&lt;br /&gt;
  echo $CVSROOT&lt;br /&gt;
  /fs/szdevel/src/cvsroot&lt;br /&gt;
&lt;br /&gt;
  /fs/szdevel/src/cvsroot/users/dpuiu/&lt;br /&gt;
&lt;br /&gt;
* Checkout&lt;br /&gt;
  cvs -z3 -d:ext:dxpuiu@amos.cvs.sourceforge.net:/cvsroot/amos co -P AMOS&lt;br /&gt;
&lt;br /&gt;
* Add a new file&lt;br /&gt;
  cd src/Utils &lt;br /&gt;
  cp ~/bin/file .&lt;br /&gt;
  &lt;br /&gt;
  cvs add file&lt;br /&gt;
  cvs ci&lt;br /&gt;
&lt;br /&gt;
* Add a new dir&lt;br /&gt;
  cd src/Utils &lt;br /&gt;
  cp -R ~/bin/dir .&lt;br /&gt;
  &lt;br /&gt;
  cvs add dir&lt;br /&gt;
  cvs ci&lt;br /&gt;
  &lt;br /&gt;
  cd dir&lt;br /&gt;
  cvs add *&lt;br /&gt;
  cvs ci&lt;br /&gt;
&lt;br /&gt;
* Update files&lt;br /&gt;
  cd src/Utils &lt;br /&gt;
  &lt;br /&gt;
  cvs update&lt;br /&gt;
&lt;br /&gt;
* Check status&lt;br /&gt;
  cvs status -v | grep Status:&lt;br /&gt;
&lt;br /&gt;
* Tagging  a file&lt;br /&gt;
  cvs tag file.tag file&lt;br /&gt;
&lt;br /&gt;
* Downloading a tagged file&lt;br /&gt;
  cvs co -r file.tag file&lt;br /&gt;
&lt;br /&gt;
* Options&lt;br /&gt;
  -l : Local; run only in current working directory, rather than recursing through subdirectories.&lt;br /&gt;
  -R: Recursive (default)&lt;br /&gt;
  -m message&lt;br /&gt;
&lt;br /&gt;
* Keep track of the version in the code : &lt;br /&gt;
  Add following line: my $VERSION = &#039;$Revision: 1.0 $ &#039;;&lt;br /&gt;
  Revision gets automatically incremented when &amp;quot;cvs ci&amp;quot;; no need to &amp;quot;cvs update&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* View file&lt;br /&gt;
  http://amos.cvs.sourceforge.net/viewvc/*checkout*/amos/AMOS/src/Utils/seq2amos.pl&lt;br /&gt;
* View all files&lt;br /&gt;
  http://amos.cvs.sourceforge.net/viewvc/amos/&lt;br /&gt;
&lt;br /&gt;
== Yum ==&lt;br /&gt;
  yum command [options]&lt;br /&gt;
&lt;br /&gt;
* Examples:&lt;br /&gt;
  yum -y list installed                                 # list installed packages&lt;br /&gt;
  yum -y list installed  | grep flash                   # check if flash is installed&lt;br /&gt;
  sudo yum remove flash-plugin.i386                     # remove flash&lt;br /&gt;
  sudo yum -y install flashplayer-square-plugin.x86_64  # install flashplayer-square&lt;br /&gt;
&lt;br /&gt;
== Git ==&lt;br /&gt;
&lt;br /&gt;
* yum install git-core&lt;br /&gt;
&lt;br /&gt;
=  Build =&lt;br /&gt;
* [http://www.gnu.org/software/autoconf/ autoconf] extensible package of M4 macros that produce shell scripts to automatically configure software source code packages&lt;br /&gt;
* [http://www.gnu.org/software/m4/m4.html m4] [http://www.gnu.org/software/m4/manual/m4.pdf man] implementation of the traditional Unix macro processor&lt;br /&gt;
* [http://www.gnu.org/software/libtool/libtool.html libtool] generic library support script. Libtool hides the complexity of using shared libraries behind a consistent, portable interface.&lt;br /&gt;
* [http://developer.gnome.org/doc/GGAD/build-app.html Application Build]&lt;br /&gt;
&lt;br /&gt;
* Example: copy executables to /fs/szdevel/dpuiu/; libs to ...&lt;br /&gt;
  ./configure --prefix=/fs/szdevel/dpuiu/ --libdir=/fs/szdevel/dpuiu/lib &amp;gt; configure.log &lt;br /&gt;
  make                                                                   &amp;gt; make.log&lt;br /&gt;
  make install                                                           &amp;gt; make.install.log&lt;br /&gt;
&lt;br /&gt;
* Example: explicit compiler location&lt;br /&gt;
  ./configure CC=/usr/bin/gcc CXX=/usr/bin/g++&lt;br /&gt;
&lt;br /&gt;
* Example: build for debug&lt;br /&gt;
  ./configure CFLAGS=&amp;quot;-g -pg&amp;quot; CXXFLAGS=&amp;quot;-g -pg&amp;quot;&lt;br /&gt;
  ...&lt;br /&gt;
&lt;br /&gt;
= Redirect =&lt;br /&gt;
&lt;br /&gt;
* tcsh : STDOUT &amp;amp; STDERR separately&lt;br /&gt;
  (command &amp;gt;stdout_file ) &amp;gt;&amp;amp;stderr_file&lt;br /&gt;
&lt;br /&gt;
= Config =&lt;br /&gt;
* [http://en.wikipedia.org/wiki/Gconf-editor gconf-editor]&lt;br /&gt;
&lt;br /&gt;
= Commands =&lt;br /&gt;
* ftp&lt;br /&gt;
  ftp - i  : turns interactive mode off&lt;br /&gt;
  &amp;gt; prompt : turns interactive mode off&lt;br /&gt;
* diff&lt;br /&gt;
  diff -rq dir1 dir2 : recursively compares 2 directories&lt;br /&gt;
* sort by multiple columns&lt;br /&gt;
  # column 2:alpha, 3:nucmeric&lt;br /&gt;
  cat prefix.posmap | msort -k2 -kn3&lt;br /&gt;
&lt;br /&gt;
  cat prefix.sam| msort -k3 -kn4&lt;br /&gt;
&lt;br /&gt;
= Utils =&lt;br /&gt;
* [http://en.wikipedia.org/wiki/Coreutils Coreutils]&lt;br /&gt;
* [http://en.wikipedia.org/wiki/Binutils Binutils]&lt;br /&gt;
  strip executable # removes debugging info =&amp;gt; executable_new&lt;br /&gt;
  &lt;br /&gt;
  gcc -pg prefix.cc -o prefix   # -pg: profiling flag&lt;br /&gt;
  ./prefix #=&amp;gt; gmon.out&lt;br /&gt;
  gprof prefix &amp;gt; prefix.out&lt;br /&gt;
  /nfshomes/dpuiu/szdevel/bin/gprof2dot.py prefix.out &amp;gt; prefix.dot &lt;br /&gt;
  dot -Tpng -o prefix.png prefix.dot&lt;br /&gt;
&lt;br /&gt;
== Convert profiling to dot format ==&lt;br /&gt;
* http://linux.softpedia.com/progDownload/gprof2dot-py-Download-27166.html&lt;br /&gt;
* http://jrfonseca.googlecode.com/svn/trunk/gprof2dot/gprof2dot.py&lt;br /&gt;
&lt;br /&gt;
== Syncronize directories ==&lt;br /&gt;
&lt;br /&gt;
* Example: syncronize FASTA files&lt;br /&gt;
 cd dir1&lt;br /&gt;
 rsync *fasta ../dir2&lt;br /&gt;
&lt;br /&gt;
= System info =&lt;br /&gt;
&lt;br /&gt;
* [http://pagesperso-orange.fr/sebastien.godard/ Sysstat package]&lt;br /&gt;
* Processors&lt;br /&gt;
  /usr/sbin/./x86info&lt;br /&gt;
  cat /proc/cpuinfo&lt;br /&gt;
&lt;br /&gt;
* sycamore (8 processor):&lt;br /&gt;
  cat /proc/cpuinfo | more&lt;br /&gt;
  vendor_id       : AuthenticAMD&lt;br /&gt;
  cpu family      : 15&lt;br /&gt;
  model           : 33&lt;br /&gt;
  model name      : Dual Core AMD Opteron(tm) Processor 875&lt;br /&gt;
  stepping        : 0&lt;br /&gt;
  cpu MHz         : 1793.260&lt;br /&gt;
  cache size      : 1024 KB&lt;br /&gt;
  physical id     : 0&lt;br /&gt;
  siblings        : 2&lt;br /&gt;
  core id         : 0&lt;br /&gt;
  cpu cores       : 2&lt;br /&gt;
&lt;br /&gt;
* walnut (16 processor):&lt;br /&gt;
  vendor_id       : AuthenticAMD&lt;br /&gt;
  cpu family      : 15&lt;br /&gt;
  model           : 65&lt;br /&gt;
  model name      : Dual-Core AMD Opteron(tm) Processor 8220&lt;br /&gt;
  stepping        : 3&lt;br /&gt;
  cpu MHz         : 2792.923&lt;br /&gt;
  cache size      : 1024 KB&lt;br /&gt;
  physical id     : 2&lt;br /&gt;
  siblings        : 2&lt;br /&gt;
  core id         : 0&lt;br /&gt;
  cpu cores       : 2&lt;br /&gt;
&lt;br /&gt;
* ginkgo (32 processor)&lt;br /&gt;
  vendor_id       : AuthenticAMD&lt;br /&gt;
  cpu family      : 16&lt;br /&gt;
  model           : 2&lt;br /&gt;
  model name      : Quad-Core AMD Opteron(tm) Processor 8356&lt;br /&gt;
  stepping        : 3&lt;br /&gt;
  cpu MHz         : 2293.905&lt;br /&gt;
  cache size      : 512 KB&lt;br /&gt;
  physical id     : 1&lt;br /&gt;
  siblings        : 4&lt;br /&gt;
  core id         : 0&lt;br /&gt;
  cpu cores       : 4&lt;br /&gt;
&lt;br /&gt;
* Memory&lt;br /&gt;
  cat /proc/meminfo&lt;br /&gt;
  free -mt&lt;br /&gt;
&lt;br /&gt;
= Gnuplot =&lt;br /&gt;
&lt;br /&gt;
Scripts for automatic generation:&lt;br /&gt;
* ~/bin/len-draw.pl prefix.len : draws a histogram of chromosome lengths&lt;br /&gt;
* ~/bin/map-draw.pl prefix.map : draws a synteny map&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
  perl ~/bin/len-draw.pl turkey.len | gnuplot ; display turkey.png&lt;br /&gt;
&lt;br /&gt;
= Image manipulation =&lt;br /&gt;
&lt;br /&gt;
* [http://www.imagemagick.org/script/command-line-tools.php ImageMagick]&lt;br /&gt;
* Merge:&lt;br /&gt;
  montage *a.jpg montage.jpg&lt;br /&gt;
* Get attributes&lt;br /&gt;
  identify prefix.jpg&lt;br /&gt;
&lt;br /&gt;
= Audio/Video manipulation =&lt;br /&gt;
* [http://www.catswhocode.com/blog/19-ffmpeg-commands-for-all-needs]&lt;br /&gt;
* [http://www.ffmpeg.org/index.html FFMpeg] &lt;br /&gt;
  ffmpeg -i movieSample.avi -ab 256k -vn autioSample.mp3&lt;br /&gt;
&lt;br /&gt;
* Youtube download:&lt;br /&gt;
  youtube-dl &amp;quot;url&amp;quot;&lt;br /&gt;
&lt;br /&gt;
= Pretty printing =&lt;br /&gt;
&lt;br /&gt;
* enscript&lt;br /&gt;
  enscript -G -r -B -f &amp;quot;Courier8&amp;quot; -p file.ps file.txt&lt;br /&gt;
  ps2pdf file.ps file.pdf&lt;br /&gt;
  lpr file.pdf&lt;br /&gt;
&lt;br /&gt;
= Quick calculations =&lt;br /&gt;
* bc&lt;br /&gt;
 echo 99 \* 8 | bc&lt;br /&gt;
 echo 99 \&amp;lt; 8 | bc&lt;br /&gt;
&lt;br /&gt;
= Commands =&lt;br /&gt;
== alias ==&lt;br /&gt;
* With arguments:&lt;br /&gt;
  alias backup cp -i  \!:1  \!:1.`date`&lt;br /&gt;
&lt;br /&gt;
== mkfifo ==&lt;br /&gt;
  mkfifo fifo&lt;br /&gt;
  cat   fifo &amp;amp;&lt;br /&gt;
  ls &amp;gt;&amp;gt; fifo&lt;br /&gt;
&lt;br /&gt;
==chgrp ==&lt;br /&gt;
&lt;br /&gt;
* Make all new files craeted under /fs/www-umiacs-users belong to dpuiu&lt;br /&gt;
  chgrp cbcb /fs/www-umiacs-users/dpuiu&lt;br /&gt;
  chmod 775 /fs/www-umiacs-users/dpuiu&lt;br /&gt;
  chmod g+s /fs/www-umiacs-users/dpuiu&lt;br /&gt;
&lt;br /&gt;
== time ==&lt;br /&gt;
&lt;br /&gt;
* Format&lt;br /&gt;
  Time                                Memory                    IO&lt;br /&gt;
  %Uuser %Ssystem %Eelapsed  %PCPU    (%Xtext+%Ddata %Mmax)k    %Iinputs+%Ooutputs (%Fmajor+%Rminor)pagefaults %Wswaps&lt;br /&gt;
 &lt;br /&gt;
  %U     Total number of CPU-seconds that the process spent in user mode.&lt;br /&gt;
  %S     Total number of CPU-seconds that the process spent in kernel mode.&lt;br /&gt;
  %E     Elapsed real time (in [hours:]minutes:seconds).&lt;br /&gt;
 &lt;br /&gt;
  %P     Percentage of the CPU that this job got, computed as (%U + %S) / %E.&lt;br /&gt;
&lt;br /&gt;
= Articles =&lt;br /&gt;
* [http://www.cyberciti.biz/tips/top-linux-monitoring-tools.html Most useful sysadmin tools]&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8914</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8914"/>
		<updated>2011-08-11T21:35:41Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* Reads */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast:  [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                   #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                   85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
&lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:             83 scaffolds ~358162bp&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem        min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13,747,338  100    100    100    100    9,185      108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72          28,934      100    111    136    426    23,376     378.53*    0          10,952,507&lt;br /&gt;
  -K31 -d2  -max_rd_len100         74,820      100    105    125    390    31,673     320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         264,547     100    108    123    169    32,435     228.49     0          60,445,493&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7,859*      100    113    139    284    43,079     331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3,626       100    113    139    255    43,131*    339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211,820     100    143    156*   187    23,273     227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100         61,152      100    121    151    200    30,846     286.05     0          17,492,450&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem        min  q1   q2    q3    max    mean     n50  sum           readOnContig&lt;br /&gt;
  scf             74,820      100  105  125   390   31,673 320.75   0    23,998,536&lt;br /&gt;
  ctg             5,755,282   32   32   35    43    7,195  41.63    0    239,620,204   33,083,609(40%)&lt;br /&gt;
  edge            11,015,468  1    2    4     11    7,164  8.75     0    96,380,983&lt;br /&gt;
  reads           82,283,738                                             6,006,712,874&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max     mean     n50  sum&lt;br /&gt;
  all             74,820    100  105  125   390   31,673  320.75   0    23,998,536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767     191.56   0    39,462       # VERY BAD&lt;br /&gt;
  cBAC            10,533    100  113  143   428   26,589  477.68   0    5,031,439&lt;br /&gt;
  mito            83        105  448  1730  6851  26,364  4315.20  0    358,162&lt;br /&gt;
  other           63,998    100  104  122   382   31,673  290.16   0    18,569,473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31,673  9662.07  0    434,793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max     mean     n50  sum          readOnContig&lt;br /&gt;
  scf             7,859     100  113  139   284   43,079* 331.49   .    2,605,184&lt;br /&gt;
  ctg             200,062   32   33   37    47    10,392  48.52    .    9,707,307    19,002,331(23%)&lt;br /&gt;
  reads           82,283,738&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max     mean     n50  sum&lt;br /&gt;
  all             7,859*    100  113  139   284   43,079* 331.49   .    2,605,184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43,079  5951.05  0    119,021      # MUCH BETTER&lt;br /&gt;
  cBAC            5,117     100  114  141   320   13,733  334.94   0    1,713,870&lt;br /&gt;
  mito            8         101  134  685   1396  2,166   749.75   0    5,998        # VERY BAD&lt;br /&gt;
  other           2,714     100  111  133   226   7,353   282.35   0    766,295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja) %cE_coli %cpFosDT5_2 %cChloro  %cBAC  %pBAC-DE %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                 12.5%    24%         0.09%     32.5   19.3          # sampled 100K&lt;br /&gt;
 &lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
* About 8.5% of the reads contain a high copy kmer . they don&#039;t get assembles&lt;br /&gt;
 AAAGAGTGTAGATCTCGGTGGT&lt;br /&gt;
 AACTCCAGTCACTTAGGCATCT&lt;br /&gt;
 AAGACGGCATACGAGATGCCTA&lt;br /&gt;
 AAGAGTGTAGATCTCGGTGGTC&lt;br /&gt;
 AAGATCGGAAGAGCGTCGTGTA&lt;br /&gt;
 AAGCAGAAGACGGCATACGAGA&lt;br /&gt;
 AAGTGACTGGAGTTCAGACGTG&lt;br /&gt;
 ACACGTCTGAACTCCAGTCACT&lt;br /&gt;
 ACCGAGATCTACACTCTTTCCC&lt;br /&gt;
 ACGAGATGCCTAAGTGACTGGA&lt;br /&gt;
 ACGGCATACGAGATGCCTAAGT&lt;br /&gt;
 ACGGCGACCACCGAGATCTACA&lt;br /&gt;
 ACGTCTGAACTCCAGTCACTTA&lt;br /&gt;
 ACTCCAGTCACTTAGGCATCTC&lt;br /&gt;
 AGAAGACGGCATACGAGATGCC&lt;br /&gt;
 AGACGGCATACGAGATGCCTAA&lt;br /&gt;
 AGACGTGTGCTCTTCCGATCTA&lt;br /&gt;
 AGAGTGTAGATCTCGGTGGTCG&lt;br /&gt;
 AGATCGGAAGAGCACACGTCTG&lt;br /&gt;
 AGATCGGAAGAGCGTCGTGTAG&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3     max      mean      n50  sum               readOnContig&lt;br /&gt;
  scf             20,441    100  124  374   1980   291,000  2575.50   0    52,645,707&lt;br /&gt;
  ctg             802,463   32   33   39    63     73,415   91.13     0    73,131,767        37,254,577&lt;br /&gt;
  edge            1,013,801 1    2    7     32     30,919   48.85     0    49,525,815&lt;br /&gt;
  reads           47,092,950                                               7,440,686,100&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3     max      mean      n50  sum&lt;br /&gt;
  all             20,441    100  124  374   1980   291,000  2575.50   0    52,645,707&lt;br /&gt;
  cE_coli         149       100  325  6612  41908  291,000  30160.59  0    4,493,928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58        105  166  374   1950   24,932   1875.86   0    108,800&lt;br /&gt;
  cBAC            12,294    100  141  785   4204   45,781   3513.34   0    43,192,987&lt;br /&gt;
  other           7953      100  113  171   599    41,416   619.60    0    4,927,664&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3     max      mean      n50  sum               readOnContig&lt;br /&gt;
  scf             25,482    100  127  262   993    239,672  1339.89   0    34,143,040&lt;br /&gt;
  ctg             265,450   32   34   50    121    49,599   143.69    0    38,141,459        40,191,864(85%)&lt;br /&gt;
  edge            530,926   1    3    11    40     41,918   63.06     0    33,477,999&lt;br /&gt;
  reads           47,092,950                                               7,440,686,100&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3     max      mean      n50  sum&lt;br /&gt;
  all             25,482    100  127  262   993    239,672  1339.89   0    34,143,040&lt;br /&gt;
  cE_coli         205       100  252  2244  30571  239,672  21916.78  0    4,492,939&lt;br /&gt;
  cpFosDT5_2      17        100  118  171   272    855      275.24    0    4,679&lt;br /&gt;
  cChloroplast    31        100  130  322   1363   5,717    986.52    0    30,582&lt;br /&gt;
  cBAC            15,668    100  133  336   1529   33,075   1559.92   0    24,440,863&lt;br /&gt;
  other           9,574     100  117  171   522    27,341   542.74    0    5,196,233&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8913</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8913"/>
		<updated>2011-08-11T20:25:36Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* SOAPdenovo&amp;#039;s */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast:  [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                   #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                   85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
&lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:             83 scaffolds ~358162bp&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem        min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13,747,338  100    100    100    100    9,185      108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72          28,934      100    111    136    426    23,376     378.53*    0          10,952,507&lt;br /&gt;
  -K31 -d2  -max_rd_len100         74,820      100    105    125    390    31,673     320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         264,547     100    108    123    169    32,435     228.49     0          60,445,493&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7,859*      100    113    139    284    43,079     331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3,626       100    113    139    255    43,131*    339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211,820     100    143    156*   187    23,273     227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100         61,152      100    121    151    200    30,846     286.05     0          17,492,450&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem        min  q1   q2    q3    max    mean     n50  sum           readOnContig&lt;br /&gt;
  scf             74,820      100  105  125   390   31,673 320.75   0    23,998,536&lt;br /&gt;
  ctg             5,755,282   32   32   35    43    7,195  41.63    0    239,620,204   33,083,609(40%)&lt;br /&gt;
  edge            11,015,468  1    2    4     11    7,164  8.75     0    96,380,983&lt;br /&gt;
  reads           82,283,738                                             6,006,712,874&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max     mean     n50  sum&lt;br /&gt;
  all             74,820    100  105  125   390   31,673  320.75   0    23,998,536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767     191.56   0    39,462       # VERY BAD&lt;br /&gt;
  cBAC            10,533    100  113  143   428   26,589  477.68   0    5,031,439&lt;br /&gt;
  mito            83        105  448  1730  6851  26,364  4315.20  0    358,162&lt;br /&gt;
  other           63,998    100  104  122   382   31,673  290.16   0    18,569,473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31,673  9662.07  0    434,793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max     mean     n50  sum          readOnContig&lt;br /&gt;
  scf             7,859     100  113  139   284   43,079* 331.49   .    2,605,184&lt;br /&gt;
  ctg             200,062   32   33   37    47    10,392  48.52    .    9,707,307    19,002,331(23%)&lt;br /&gt;
  reads           82,283,738&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max     mean     n50  sum&lt;br /&gt;
  all             7,859*    100  113  139   284   43,079* 331.49   .    2,605,184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43,079  5951.05  0    119,021      # MUCH BETTER&lt;br /&gt;
  cBAC            5,117     100  114  141   320   13,733  334.94   0    1,713,870&lt;br /&gt;
  mito            8         101  134  685   1396  2,166   749.75   0    5,998        # VERY BAD&lt;br /&gt;
  other           2,714     100  111  133   226   7,353   282.35   0    766,295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja) %cE_coli %cpFosDT5_2 %cChloro  %cBAC  %pBAC-DE %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                 12.5%    24%         0.09%     32.5   19.3          # sampled 100K&lt;br /&gt;
 &lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3     max      mean      n50  sum               readOnContig&lt;br /&gt;
  scf             20,441    100  124  374   1980   291,000  2575.50   0    52,645,707&lt;br /&gt;
  ctg             802,463   32   33   39    63     73,415   91.13     0    73,131,767        37,254,577&lt;br /&gt;
  edge            1,013,801 1    2    7     32     30,919   48.85     0    49,525,815&lt;br /&gt;
  reads           47,092,950                                               7,440,686,100&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3     max      mean      n50  sum&lt;br /&gt;
  all             20,441    100  124  374   1980   291,000  2575.50   0    52,645,707&lt;br /&gt;
  cE_coli         149       100  325  6612  41908  291,000  30160.59  0    4,493,928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58        105  166  374   1950   24,932   1875.86   0    108,800&lt;br /&gt;
  cBAC            12,294    100  141  785   4204   45,781   3513.34   0    43,192,987&lt;br /&gt;
  other           7953      100  113  171   599    41,416   619.60    0    4,927,664&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3     max      mean      n50  sum               readOnContig&lt;br /&gt;
  scf             25,482    100  127  262   993    239,672  1339.89   0    34,143,040&lt;br /&gt;
  ctg             265,450   32   34   50    121    49,599   143.69    0    38,141,459        40,191,864(85%)&lt;br /&gt;
  edge            530,926   1    3    11    40     41,918   63.06     0    33,477,999&lt;br /&gt;
  reads           47,092,950                                               7,440,686,100&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3     max      mean      n50  sum&lt;br /&gt;
  all             25,482    100  127  262   993    239,672  1339.89   0    34,143,040&lt;br /&gt;
  cE_coli         205       100  252  2244  30571  239,672  21916.78  0    4,492,939&lt;br /&gt;
  cpFosDT5_2      17        100  118  171   272    855      275.24    0    4,679&lt;br /&gt;
  cChloroplast    31        100  130  322   1363   5,717    986.52    0    30,582&lt;br /&gt;
  cBAC            15,668    100  133  336   1529   33,075   1559.92   0    24,440,863&lt;br /&gt;
  other           9,574     100  117  171   522    27,341   542.74    0    5,196,233&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8912</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8912"/>
		<updated>2011-08-11T18:10:47Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* PineUpload070711 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast:  [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                   #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                   85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
&lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:             83 scaffolds ~358162bp&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem        min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13,747,338  100    100    100    100    9,185      108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72          28,934      100    111    136    426    23,376     378.53*    0          10,952,507&lt;br /&gt;
  -K31 -d2  -max_rd_len100         74,820      100    105    125    390    31,673     320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224,963     100    110    128    343    23,410     260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7,859*      100    113    139    284    43,079     331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3,626       100    113    139    255    43,131*    339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211,820     100    143    156*   187    23,273     227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100         61,152      100    121    151    200    30,846     286.05     0          17,492,450&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem        min  q1   q2    q3    max    mean     n50  sum           readOnContig&lt;br /&gt;
  scf             74,820      100  105  125   390   31,673 320.75   0    23,998,536&lt;br /&gt;
  ctg             5,755,282   32   32   35    43    7,195  41.63    0    239,620,204   33,083,609(40%)&lt;br /&gt;
  edge            11,015,468  1    2    4     11    7,164  8.75     0    96,380,983&lt;br /&gt;
  reads           82,283,738                                             6,006,712,874&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max     mean     n50  sum&lt;br /&gt;
  all             74,820    100  105  125   390   31,673  320.75   0    23,998,536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767     191.56   0    39,462       # VERY BAD&lt;br /&gt;
  cBAC            10,533    100  113  143   428   26,589  477.68   0    5,031,439&lt;br /&gt;
  mito            83        105  448  1730  6851  26,364  4315.20  0    358,162&lt;br /&gt;
  other           63,998    100  104  122   382   31,673  290.16   0    18,569,473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31,673  9662.07  0    434,793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max     mean     n50  sum          readOnContig&lt;br /&gt;
  scf             7,859     100  113  139   284   43,079* 331.49   .    2,605,184&lt;br /&gt;
  ctg             200,062   32   33   37    47    10,392  48.52    .    9,707,307    19,002,331(23%)&lt;br /&gt;
  reads           82,283,738&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max     mean     n50  sum&lt;br /&gt;
  all             7,859*    100  113  139   284   43,079* 331.49   .    2,605,184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43,079  5951.05  0    119,021      # MUCH BETTER&lt;br /&gt;
  cBAC            5,117     100  114  141   320   13,733  334.94   0    1,713,870&lt;br /&gt;
  mito            8         101  134  685   1396  2,166   749.75   0    5,998        # VERY BAD&lt;br /&gt;
  other           2,714     100  111  133   226   7,353   282.35   0    766,295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja) %cE_coli %cpFosDT5_2 %cChloro  %cBAC  %pBAC-DE %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                 12.5%    24%         0.09%     32.5   19.3          # sampled 100K&lt;br /&gt;
 &lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3     max      mean      n50  sum               readOnContig&lt;br /&gt;
  scf             20,441    100  124  374   1980   291,000  2575.50   0    52,645,707&lt;br /&gt;
  ctg             802,463   32   33   39    63     73,415   91.13     0    73,131,767        37,254,577&lt;br /&gt;
  edge            1,013,801 1    2    7     32     30,919   48.85     0    49,525,815&lt;br /&gt;
  reads           47,092,950                                               7,440,686,100&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3     max      mean      n50  sum&lt;br /&gt;
  all             20,441    100  124  374   1980   291,000  2575.50   0    52,645,707&lt;br /&gt;
  cE_coli         149       100  325  6612  41908  291,000  30160.59  0    4,493,928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58        105  166  374   1950   24,932   1875.86   0    108,800&lt;br /&gt;
  cBAC            12,294    100  141  785   4204   45,781   3513.34   0    43,192,987&lt;br /&gt;
  other           7953      100  113  171   599    41,416   619.60    0    4,927,664&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3     max      mean      n50  sum               readOnContig&lt;br /&gt;
  scf             25,482    100  127  262   993    239,672  1339.89   0    34,143,040&lt;br /&gt;
  ctg             265,450   32   34   50    121    49,599   143.69    0    38,141,459        40,191,864(85%)&lt;br /&gt;
  edge            530,926   1    3    11    40     41,918   63.06     0    33,477,999&lt;br /&gt;
  reads           47,092,950                                               7,440,686,100&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3     max      mean      n50  sum&lt;br /&gt;
  all             25,482    100  127  262   993    239,672  1339.89   0    34,143,040&lt;br /&gt;
  cE_coli         205       100  252  2244  30571  239,672  21916.78  0    4,492,939&lt;br /&gt;
  cpFosDT5_2      17        100  118  171   272    855      275.24    0    4,679&lt;br /&gt;
  cChloroplast    31        100  130  322   1363   5,717    986.52    0    30,582&lt;br /&gt;
  cBAC            15,668    100  133  336   1529   33,075   1559.92   0    24,440,863&lt;br /&gt;
  other           9,574     100  117  171   522    27,341   542.74    0    5,196,233&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8911</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8911"/>
		<updated>2011-08-11T17:47:23Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* PineUpload070711 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast:  [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                   #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                   85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
&lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:             83 scaffolds ~358162bp&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem        min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13,747,338  100    100    100    100    9,185      108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72          28,934      100    111    136    426    23,376     378.53*    0          10,952,507&lt;br /&gt;
  -K31 -d2  -max_rd_len100         74,820      100    105    125    390    31,673     320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224,963     100    110    128    343    23,410     260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7,859*      100    113    139    284    43,079     331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3,626       100    113    139    255    43,131*    339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211,820     100    143    156*   187    23,273     227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100         61,152      100    121    151    200    30,846     286.05     0          17,492,450&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem        min  q1   q2    q3    max    mean     n50  sum           readOnContig&lt;br /&gt;
  scf             74,820      100  105  125   390   31,673 320.75   0    23,998,536&lt;br /&gt;
  ctg             5,755,282   32   32   35    43    7,195  41.63    0    239,620,204   33,083,609(40%)&lt;br /&gt;
  edge            11,015,468  1    2    4     11    7,164  8.75     0    96,380,983&lt;br /&gt;
  reads           82,283,738                                             6,006,712,874&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max     mean     n50  sum&lt;br /&gt;
  all             74,820    100  105  125   390   31,673  320.75   0    23,998,536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767     191.56   0    39,462       # VERY BAD&lt;br /&gt;
  cBAC            10,533    100  113  143   428   26,589  477.68   0    5,031,439&lt;br /&gt;
  mito            83        105  448  1730  6851  26,364  4315.20  0    358,162&lt;br /&gt;
  other           63,998    100  104  122   382   31,673  290.16   0    18,569,473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31,673  9662.07  0    434,793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max     mean     n50  sum          readOnContig&lt;br /&gt;
  scf             7,859     100  113  139   284   43,079* 331.49   .    2,605,184&lt;br /&gt;
  ctg             200,062   32   33   37    47    10,392  48.52    .    9,707,307    19,002,331(23%)&lt;br /&gt;
  reads           82,283,738&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max     mean     n50  sum&lt;br /&gt;
  all             7,859*    100  113  139   284   43,079* 331.49   .    2,605,184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43,079  5951.05  0    119,021      # MUCH BETTER&lt;br /&gt;
  cBAC            5,117     100  114  141   320   13,733  334.94   0    1,713,870&lt;br /&gt;
  mito            8         101  134  685   1396  2,166   749.75   0    5,998        # VERY BAD&lt;br /&gt;
  other           2,714     100  111  133   226   7,353   282.35   0    766,295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja) %cE_coli %cpFosDT5_2 %cChloro  %cBAC  %pBAC-DE %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                 12.5%    24%         0.09%     32.5   19.3          # sampled 100K&lt;br /&gt;
 &lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3     max      mean      n50  sum&lt;br /&gt;
  scf             20,441    100  124  374   1980   291,000  2575.50   0    52,645,707&lt;br /&gt;
  ctg             802,463   32   33   39    63     73,415   91.13     0    73,131,767&lt;br /&gt;
  edge            1,013,801 1    2    7     32     30,919   48.85     0    49,525,815&lt;br /&gt;
  reads           47,092,950&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3     max      mean      n50  sum&lt;br /&gt;
  all             20,441    100  124  374   1980   291,000  2575.50   0    52,645,707&lt;br /&gt;
  cE_coli         149       100  325  6612  41908  291,000  30160.59  0    4,493,928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58        105  166  374   1950   24,932   1875.86   0    108,800&lt;br /&gt;
  cBAC            12,294    100  141  785   4204   45,781   3513.34   0    43,192,987&lt;br /&gt;
  other           7953      100  113  171   599    41,416   619.60    0    4,927,664&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3     max      mean      n50  sum&lt;br /&gt;
  scf             25,482    100  127  262   993    239,672  1339.89   0    34,143,040&lt;br /&gt;
  ctg             265,450   32   34   50    121    49,599   143.69    0    38,141,459&lt;br /&gt;
  edge            530,926   1    3    11    40     41,918   63.06     0    33,477,999&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3     max      mean      n50  sum&lt;br /&gt;
  all             25,482    100  127  262   993    239,672  1339.89   0    34,143,040&lt;br /&gt;
  cE_coli         205       100  252  2244  30571  239,672  21916.78  0    4,492,939&lt;br /&gt;
  cpFosDT5_2      17        100  118  171   272    855      275.24    0    4,679&lt;br /&gt;
  cChloroplast    31        100  130  322   1363   5,717    986.52    0    30,582&lt;br /&gt;
  cBAC            15,668    100  133  336   1529   33,075   1559.92   0    24,440,863&lt;br /&gt;
  other           9,574     100  117  171   522    27,341   542.74    0    5,196,233&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8910</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8910"/>
		<updated>2011-08-11T17:39:56Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast:  [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                   #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                   85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
&lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:             83 scaffolds ~358162bp&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem        min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13,747,338  100    100    100    100    9,185      108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72          28,934      100    111    136    426    23,376     378.53*    0          10,952,507&lt;br /&gt;
  -K31 -d2  -max_rd_len100         74,820      100    105    125    390    31,673     320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224,963     100    110    128    343    23,410     260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7,859*      100    113    139    284    43,079     331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3,626       100    113    139    255    43,131*    339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211,820     100    143    156*   187    23,273     227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100         61,152      100    121    151    200    30,846     286.05     0          17,492,450&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem        min  q1   q2    q3    max    mean     n50  sum           readOnContig&lt;br /&gt;
  scf             74,820      100  105  125   390   31,673 320.75   0    23,998,536&lt;br /&gt;
  ctg             5,755,282   32   32   35    43    7,195  41.63    0    239,620,204   33,083,609(40%)&lt;br /&gt;
  edge            11,015,468  1    2    4     11    7,164  8.75     0    96,380,983&lt;br /&gt;
  reads           82,283,738                                             6,006,712,874&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max     mean     n50  sum&lt;br /&gt;
  all             74,820    100  105  125   390   31,673  320.75   0    23,998,536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767     191.56   0    39,462       # VERY BAD&lt;br /&gt;
  cBAC            10,533    100  113  143   428   26,589  477.68   0    5,031,439&lt;br /&gt;
  mito            83        105  448  1730  6851  26,364  4315.20  0    358,162&lt;br /&gt;
  other           63,998    100  104  122   382   31,673  290.16   0    18,569,473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31,673  9662.07  0    434,793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max     mean     n50  sum          readOnContig&lt;br /&gt;
  scf             7,859     100  113  139   284   43,079* 331.49   .    2,605,184&lt;br /&gt;
  ctg             200,062   32   33   37    47    10,392  48.52    .    9,707,307    19,002,331(23%)&lt;br /&gt;
  reads           82,283,738&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max     mean     n50  sum&lt;br /&gt;
  all             7,859*    100  113  139   284   43,079* 331.49   .    2,605,184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43,079  5951.05  0    119,021      # MUCH BETTER&lt;br /&gt;
  cBAC            5,117     100  114  141   320   13,733  334.94   0    1,713,870&lt;br /&gt;
  mito            8         101  134  685   1396  2,166   749.75   0    5,998        # VERY BAD&lt;br /&gt;
  other           2,714     100  111  133   226   7,353   282.35   0    766,295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja) %cE_coli %cpFosDT5_2 %cChloro  %cBAC  %pBAC-DE %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                 12.5%    24%         0.09%     32.5   19.3          # sampled 100K&lt;br /&gt;
 &lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem    min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             25482   100  127  262   993    239672  1339.89   0    34143040&lt;br /&gt;
  ctg             265450  32   34   50    121    49599   143.69    0    38141459&lt;br /&gt;
  edge            530926  1    3    11    40     41918   63.06     0    33477999&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem    min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             25482   100  127  262   993    239672  1339.89   0    34143040&lt;br /&gt;
  cE_coli         205     100  252  2244  30571  239672  21916.78  0    4492939&lt;br /&gt;
  cpFosDT5_2      17      100  118  171   272    855     275.24    0    4679&lt;br /&gt;
  cChloroplast    31      100  130  322   1363   5717    986.52    0    30582&lt;br /&gt;
  cBAC            15668   100  133  336   1529   33075   1559.92   0    24440863&lt;br /&gt;
  other           9574    100  117  171   522    27341   542.74    0    5196233&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8909</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8909"/>
		<updated>2011-08-11T17:38:32Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast:  [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                   #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                   85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
&lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:             83 scaffolds ~358162bp&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem        min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13,747,338  100    100    100    100    9,185      108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72          28,934      100    111    136    426    23,376     378.53*    0          10,952,507&lt;br /&gt;
  -K31 -d2  -max_rd_len100         74,820      100    105    125    390    31,673     320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224,963     100    110    128    343    23,410     260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7,859*      100    113    139    284    43,079     331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3,626       100    113    139    255    43,131*    339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211,820     100    143    156*   187    23,273     227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100         61,152      100    121    151    200    30,846     286.05     0          17,492,450&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem        min  q1   q2    q3    max    mean     n50  sum           readOnContig&lt;br /&gt;
  scf             74,820      100  105  125   390   31,673 320.75   0    23,998,536&lt;br /&gt;
  ctg             5,755,282   32   32   35    43    7,195  41.63    0    239,620,204   33,083,609(40%)&lt;br /&gt;
  edge            11,015,468  1    2    4     11    7,164  8.75     0    96,380,983&lt;br /&gt;
  reads           82,283,738                                             6,006,712,874&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max     mean     n50  sum&lt;br /&gt;
  all             74,820    100  105  125   390   31,673  320.75   0    23,998,536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767     191.56   0    39,462       # VERY BAD&lt;br /&gt;
  cBAC            10,533    100  113  143   428   26,589  477.68   0    5,031,439&lt;br /&gt;
  mito            83        105  448  1730  6851  26,364  4315.20  0    358,162&lt;br /&gt;
  other           63,998    100  104  122   382   31,673  290.16   0    18,569,473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31,673  9662.07  0    434,793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max     mean     n50  sum          readOnContig&lt;br /&gt;
  scf             7,859     100  113  139   284   43,079* 331.49   .    2,605,184&lt;br /&gt;
  ctg             200,062   32   33   37    47    10,392  48.52    .    9,707,307    19,002,331(23%)&lt;br /&gt;
  reads           82,283,738&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja) %cE_coli %cpFosDT5_2 %cChloro  %cBAC  %pBAC-DE %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                 12.5%    24%         0.09%     32.5   19.3          # sampled 100K&lt;br /&gt;
 &lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem    min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             25482   100  127  262   993    239672  1339.89   0    34143040&lt;br /&gt;
  ctg             265450  32   34   50    121    49599   143.69    0    38141459&lt;br /&gt;
  edge            530926  1    3    11    40     41918   63.06     0    33477999&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem    min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             25482   100  127  262   993    239672  1339.89   0    34143040&lt;br /&gt;
  cE_coli         205     100  252  2244  30571  239672  21916.78  0    4492939&lt;br /&gt;
  cpFosDT5_2      17      100  118  171   272    855     275.24    0    4679&lt;br /&gt;
  cChloroplast    31      100  130  322   1363   5717    986.52    0    30582&lt;br /&gt;
  cBAC            15668   100  133  336   1529   33075   1559.92   0    24440863&lt;br /&gt;
  other           9574    100  117  171   522    27341   542.74    0    5196233&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8908</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8908"/>
		<updated>2011-08-11T17:33:14Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast:  [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                   #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                   85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
&lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:             83 scaffolds ~358162bp&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem       min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72          28934      100    111    136    426    23376      378.53*    0          10,952,507&lt;br /&gt;
  -K31 -d2  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7859*      100    113    139    284    43079      331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3626       100    113    139    255    43131*     339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100         61152      100    121    151    200    30846      286.05     0          17,492,450&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem        min  q1   q2    q3    max    mean     n50  sum           readOnContig&lt;br /&gt;
  scf             74,820      100  105  125   390   31673  320.75   0    23,998,536&lt;br /&gt;
  ctg             5,755,282   32   32   35    43    7195   41.63    0    239,620,204   33,083,609(40%)&lt;br /&gt;
  edge            11,015,468  1    2    4     11    7164   8.75     0    96,380,983&lt;br /&gt;
  reads           82,283,738                                             6,006,712,874&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja) %cE_coli %cpFosDT5_2 %cChloro  %cBAC  %pBAC-DE %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                 12.5%    24%         0.09%     32.5   19.3          # sampled 100K&lt;br /&gt;
 &lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem    min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             25482   100  127  262   993    239672  1339.89   0    34143040&lt;br /&gt;
  ctg             265450  32   34   50    121    49599   143.69    0    38141459&lt;br /&gt;
  edge            530926  1    3    11    40     41918   63.06     0    33477999&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem    min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             25482   100  127  262   993    239672  1339.89   0    34143040&lt;br /&gt;
  cE_coli         205     100  252  2244  30571  239672  21916.78  0    4492939&lt;br /&gt;
  cpFosDT5_2      17      100  118  171   272    855     275.24    0    4679&lt;br /&gt;
  cChloroplast    31      100  130  322   1363   5717    986.52    0    30582&lt;br /&gt;
  cBAC            15668   100  133  336   1529   33075   1559.92   0    24440863&lt;br /&gt;
  other           9574    100  117  171   522    27341   542.74    0    5196233&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8907</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8907"/>
		<updated>2011-08-11T17:29:41Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast:  [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                   #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                   85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
&lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:             83 scaffolds ~358162bp&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem       min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72          28934      100    111    136    426    23376      378.53*    0          10,952,507&lt;br /&gt;
  -K31 -d2  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7859*      100    113    139    284    43079      331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3626       100    113    139    255    43131*     339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100         61152      100    121    151    200    30846      286.05     0          17,492,450&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum        readOnContig&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204  33,083,609&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja) %cE_coli %cpFosDT5_2 %cChloro  %cBAC  %pBAC-DE %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                 12.5%    24%         0.09%     32.5   19.3          # sampled 100K&lt;br /&gt;
 &lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem    min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             25482   100  127  262   993    239672  1339.89   0    34143040&lt;br /&gt;
  ctg             265450  32   34   50    121    49599   143.69    0    38141459&lt;br /&gt;
  edge            530926  1    3    11    40     41918   63.06     0    33477999&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem    min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             25482   100  127  262   993    239672  1339.89   0    34143040&lt;br /&gt;
  cE_coli         205     100  252  2244  30571  239672  21916.78  0    4492939&lt;br /&gt;
  cpFosDT5_2      17      100  118  171   272    855     275.24    0    4679&lt;br /&gt;
  cChloroplast    31      100  130  322   1363   5717    986.52    0    30582&lt;br /&gt;
  cBAC            15668   100  133  336   1529   33075   1559.92   0    24440863&lt;br /&gt;
  other           9574    100  117  171   522    27341   542.74    0    5196233&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8906</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8906"/>
		<updated>2011-08-11T17:28:45Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* SOAPdenovo&amp;#039;s */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast:  [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                   #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                   85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
&lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:             83 scaffolds ~358162bp&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem       min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72          28934      100    111    136    426    23376      378.53*    0          10,952,507&lt;br /&gt;
  -K31 -d2  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7859*      100    113    139    284    43079      331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3626       100    113    139    255    43131*     339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100         61152      100    121    151    200    30846      286.05     0          17,492,450&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja) %cE_coli %cpFosDT5_2 %cChloro  %cBAC  %pBAC-DE %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                 12.5%    24%         0.09%     32.5   19.3          # sampled 100K&lt;br /&gt;
 &lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem    min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             25482   100  127  262   993    239672  1339.89   0    34143040&lt;br /&gt;
  ctg             265450  32   34   50    121    49599   143.69    0    38141459&lt;br /&gt;
  edge            530926  1    3    11    40     41918   63.06     0    33477999&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem    min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             25482   100  127  262   993    239672  1339.89   0    34143040&lt;br /&gt;
  cE_coli         205     100  252  2244  30571  239672  21916.78  0    4492939&lt;br /&gt;
  cpFosDT5_2      17      100  118  171   272    855     275.24    0    4679&lt;br /&gt;
  cChloroplast    31      100  130  322   1363   5717    986.52    0    30582&lt;br /&gt;
  cBAC            15668   100  133  336   1529   33075   1559.92   0    24440863&lt;br /&gt;
  other           9574    100  117  171   522    27341   542.74    0    5196233&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8905</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8905"/>
		<updated>2011-08-11T17:22:31Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* Reads (Drosophila) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast:  [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                   #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                   85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
&lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:             83 scaffolds ~358162bp&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem       min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72 &lt;br /&gt;
  -K31 -d2  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7859*      100    113    139    284    43079      331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3626       100    113    139    255    43131      339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja) %cE_coli %cpFosDT5_2 %cChloro  %cBAC  %pBAC-DE %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                 12.5%    24%         0.09%     32.5   19.3          # sampled 100K&lt;br /&gt;
 &lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem    min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             25482   100  127  262   993    239672  1339.89   0    34143040&lt;br /&gt;
  ctg             265450  32   34   50    121    49599   143.69    0    38141459&lt;br /&gt;
  edge            530926  1    3    11    40     41918   63.06     0    33477999&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem    min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             25482   100  127  262   993    239672  1339.89   0    34143040&lt;br /&gt;
  cE_coli         205     100  252  2244  30571  239672  21916.78  0    4492939&lt;br /&gt;
  cpFosDT5_2      17      100  118  171   272    855     275.24    0    4679&lt;br /&gt;
  cChloroplast    31      100  130  322   1363   5717    986.52    0    30582&lt;br /&gt;
  cBAC            15668   100  133  336   1529   33075   1559.92   0    24440863&lt;br /&gt;
  other           9574    100  117  171   522    27341   542.74    0    5196233&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8904</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8904"/>
		<updated>2011-08-11T15:41:37Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast:  [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                   #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                   85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
&lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:             83 scaffolds ~358162bp&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem       min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72 &lt;br /&gt;
  -K31 -d2  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7859*      100    113    139    284    43079      331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3626       100    113    139    255    43131      339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
 &lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem    min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             25482   100  127  262   993    239672  1339.89   0    34143040&lt;br /&gt;
  ctg             265450  32   34   50    121    49599   143.69    0    38141459&lt;br /&gt;
  edge            530926  1    3    11    40     41918   63.06     0    33477999&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem    min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             25482   100  127  262   993    239672  1339.89   0    34143040&lt;br /&gt;
  cE_coli         205     100  252  2244  30571  239672  21916.78  0    4492939&lt;br /&gt;
  cpFosDT5_2      17      100  118  171   272    855     275.24    0    4679&lt;br /&gt;
  cChloroplast    31      100  130  322   1363   5717    986.52    0    30582&lt;br /&gt;
  cBAC            15668   100  133  336   1529   33075   1559.92   0    24440863&lt;br /&gt;
  other           9574    100  117  171   522    27341   542.74    0    5196233&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8903</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8903"/>
		<updated>2011-08-11T15:41:21Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast:  [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                   #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                   85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
&lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:             83 scaffolds ~358162bp&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem       min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72 &lt;br /&gt;
  -K31 -d2  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7859*      100    113    139    284    43079      331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3626       100    113    139    255    43131      339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
 &lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem    min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             25482   100  127  262   993    239672  1339.89   0    34143040&lt;br /&gt;
  ctg             265450  32   34   50    121    49599   143.69    0    38141459&lt;br /&gt;
  edge            530926  1    3    11    40     41918   63.06     0    33477999&lt;br /&gt;
 &lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem    min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             25482   100  127  262   993    239672  1339.89   0    34143040&lt;br /&gt;
  cE_coli         205     100  252  2244  30571  239672  21916.78  0    4492939&lt;br /&gt;
  cpFosDT5_2      17      100  118  171   272    855     275.24    0    4679&lt;br /&gt;
  cChloroplast    31      100  130  322   1363   5717    986.52    0    30582&lt;br /&gt;
  cBAC            15668   100  133  336   1529   33075   1559.92   0    24440863&lt;br /&gt;
  other           9574    100  117  171   522    27341   542.74    0    5196233&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8902</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8902"/>
		<updated>2011-08-11T15:40:59Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast:  [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                   #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                   85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
&lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:             83 scaffolds ~358162bp&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem       min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72 &lt;br /&gt;
  -K31 -d2  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7859*      100    113    139    284    43079      331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3626       100    113    139    255    43131      339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
 &lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
 &lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem    min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             25482   100  127  262   993    239672  1339.89   0    34143040&lt;br /&gt;
  ctg             265450  32   34   50    121    49599   143.69    0    38141459&lt;br /&gt;
  edge            530926  1    3    11    40     41918   63.06     0    33477999&lt;br /&gt;
 &lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem    min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             25482   100  127  262   993    239672  1339.89   0    34143040&lt;br /&gt;
  cE_coli         205     100  252  2244  30571  239672  21916.78  0    4492939&lt;br /&gt;
  cpFosDT5_2      17      100  118  171   272    855     275.24    0    4679&lt;br /&gt;
  cChloroplast    31      100  130  322   1363   5717    986.52    0    30582&lt;br /&gt;
  cBAC            15668   100  133  336   1529   33075   1559.92   0    24440863&lt;br /&gt;
  other           9574    100  117  171   522    27341   542.74    0    5196233&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8901</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8901"/>
		<updated>2011-08-11T15:28:39Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* Reads (Drosophila) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast:  [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                   #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                   85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
&lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:             83 scaffolds ~358162bp&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem       min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72 &lt;br /&gt;
  -K31 -d2  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7859*      100    113    139    284    43079      331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3626       100    113    139    255    43131      339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
 &lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8900</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8900"/>
		<updated>2011-08-11T15:28:28Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* Reads (Drosophila) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast:  [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                   #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                   85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
&lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:             83 scaffolds ~358162bp&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem       min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72 &lt;br /&gt;
  -K31 -d2  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7859*      100    113    139    284    43079      331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3626       100    113    139    255    43131      339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
 &lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8899</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8899"/>
		<updated>2011-08-11T15:27:47Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* Reads */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast:  [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                   #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                   85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
&lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:             83 scaffolds ~358162bp&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem       min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72 &lt;br /&gt;
  -K31 -d2  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7859*      100    113    139    284    43079      331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3626       100    113    139    255    43131      339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8898</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8898"/>
		<updated>2011-08-11T15:27:06Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* NCBI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast:  [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                   #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                   85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
&lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem       min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72 &lt;br /&gt;
  -K31 -d2  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7859*      100    113    139    284    43079      331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3626       100    113    139    255    43131      339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8897</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8897"/>
		<updated>2011-08-11T15:26:56Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* NCBI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast:   [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                   #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                   85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
&lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem       min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72 &lt;br /&gt;
  -K31 -d2  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7859*      100    113    139    284    43079      331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3626       100    113    139    255    43131      339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8896</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8896"/>
		<updated>2011-08-11T15:26:10Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* Reads */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                   #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                   85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
&lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem       min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72 &lt;br /&gt;
  -K31 -d2  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7859*      100    113    139    284    43079      331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3626       100    113    139    255    43131      339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8895</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8895"/>
		<updated>2011-08-11T15:25:56Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* Reads */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                   #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                   85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
  &lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem       min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72 &lt;br /&gt;
  -K31 -d2  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7859*      100    113    139    284    43079      331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3626       100    113    139    255    43131      339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8894</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8894"/>
		<updated>2011-08-11T15:25:45Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* Reads */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                   #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                  85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
  &lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem       min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72 &lt;br /&gt;
  -K31 -d2  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7859*      100    113    139    284    43079      331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3626       100    113    139    255    43131      339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8893</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8893"/>
		<updated>2011-08-11T15:25:04Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* Reads */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                  #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.056%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)    12291&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)    30839 (0.16%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)    29444&lt;br /&gt;
  total                                                                  85289             # ~21X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
  &lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem       min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72 &lt;br /&gt;
  -K31 -d2  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7859*      100    113    139    284    43079      331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3626       100    113    139    255    43131      339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8892</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8892"/>
		<updated>2011-08-11T14:55:20Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                  #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.054%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)          (0.12%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)&lt;br /&gt;
  total                                                                  ?             # ~20X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
  &lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem       min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72 &lt;br /&gt;
  -K31 -d2  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7859*      100    113    139    284    43079      331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3626       100    113    139    255    43131      339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8891</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8891"/>
		<updated>2011-08-11T14:54:58Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 -M 3 choloplast_mated_reads */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                  #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.054%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)          (0.12%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)&lt;br /&gt;
  total                                                                  ?             # ~20X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
  &lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem       min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72 &lt;br /&gt;
  -K31 -d2  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7859*      100    113    139    284    43079      331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3626       100    113    139    255    43131      339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8890</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8890"/>
		<updated>2011-08-11T14:54:45Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* SOAPdenovo-31mer -K 31 -d 20 -M 3 -max_rd_len 100 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                  #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.054%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)          (0.12%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)&lt;br /&gt;
  total                                                                  ?             # ~20X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
  &lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem       min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72 &lt;br /&gt;
  -K31 -d2  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7859*      100    113    139    284    43079      331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3626       100    113    139    255    43131      339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 -M 3 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8889</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8889"/>
		<updated>2011-08-11T14:54:28Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                  #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.054%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)          (0.12%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)&lt;br /&gt;
  total                                                                  ?             # ~20X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
  &lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem       min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72 &lt;br /&gt;
  -K31 -d2  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7859*      100    113    139    284    43079      331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3626       100    113    139    255    43131      339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -M 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 -M 3 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8888</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8888"/>
		<updated>2011-08-11T14:54:00Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* SOAPdenovo&amp;#039;s */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                  #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.054%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)          (0.12%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)&lt;br /&gt;
  total                                                                  ?             # ~20X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
  &lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                elem       min    q1     q2     q3     max        mean       n50        sum &lt;br /&gt;
  -K31 -d0  -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1,485,269,562&lt;br /&gt;
 &lt;br /&gt;
  -K31 -d2  -max_rd_len72 &lt;br /&gt;
  -K31 -d2  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23,998,536  &lt;br /&gt;
  -K31 -d2  -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58,635,190&lt;br /&gt;
&lt;br /&gt;
  -K31 -d20 -max_rd_len100         7859*      100    113    139    284    43079      331.49     .          2,605,184            &lt;br /&gt;
  -K31 -d48 -max_rd_len100         3626       100    113    139    255    43131      339.01     .          1,229,250&lt;br /&gt;
&lt;br /&gt;
  -K47 -d0  -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48,284,629&lt;br /&gt;
  -K47 -d2  -max_rd_len100&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793     &lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -M 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 -M 3 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8887</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8887"/>
		<updated>2011-08-11T14:37:27Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* SOAPdenovo&amp;#039;s */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                  #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.054%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)          (0.12%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)&lt;br /&gt;
  total                                                                  ?             # ~20X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
  &lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                     elem       min    q1     q2     q3     max        mean       n50        sum&lt;br /&gt;
  -K47           -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48284629&lt;br /&gt;
 &lt;br /&gt;
  -K31           -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1485269562&lt;br /&gt;
  -K31 -d2  -D3  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23998536  &lt;br /&gt;
  -K31 -d20 -M3  -max_rd_len100         7859*      100    113    139    284    43079*     331.49     .          2605184*            &lt;br /&gt;
 &lt;br /&gt;
  -K27 -d 2 -D 3 -max_rd_len100         70246      100    107    137    413    30683      369.81     .          25977758&lt;br /&gt;
  -K27 -d 2 -D 2 -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58635190&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793     &lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -M 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 -M 3 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8886</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8886"/>
		<updated>2011-08-11T14:37:01Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* == */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                  #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.054%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)          (0.12%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)&lt;br /&gt;
  total                                                                  ?             # ~20X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
  &lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                     elem       min    q1     q2     q3     max        mean       n50        sum&lt;br /&gt;
  -K47           -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48284629&lt;br /&gt;
 &lt;br /&gt;
  -K31           -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1485269562&lt;br /&gt;
  -K31 -d2  -D3  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23998536  &lt;br /&gt;
  -K31 -d20 -M3  -max_rd_len100         7859*      100    113    139    284    43079*     331.49     .          2605184*            &lt;br /&gt;
 &lt;br /&gt;
  -K27 -d 2 -D 3 -max_rd_len100         70246      100    107    137    413    30683      369.81     .          25977758&lt;br /&gt;
  -K27 -d 2 -D 2 -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58635190&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793     &lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -M 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 -M 3 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8885</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8885"/>
		<updated>2011-08-11T14:36:49Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* == */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                  #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.054%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)          (0.12%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)&lt;br /&gt;
  total                                                                  ?             # ~20X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
  &lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                     elem       min    q1     q2     q3     max        mean       n50        sum&lt;br /&gt;
  -K47           -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48284629&lt;br /&gt;
 &lt;br /&gt;
  -K31           -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1485269562&lt;br /&gt;
  -K31 -d2  -D3  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23998536  &lt;br /&gt;
  -K31 -d20 -M3  -max_rd_len100         7859*      100    113    139    284    43079*     331.49     .          2605184*            &lt;br /&gt;
 &lt;br /&gt;
  -K27 -d 2 -D 3 -max_rd_len100         70246      100    107    137    413    30683      369.81     .          25977758&lt;br /&gt;
  -K27 -d 2 -D 2 -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58635190&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793     &lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -M 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 -M 3 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
========================================================&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8884</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8884"/>
		<updated>2011-08-11T14:36:35Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                  #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.054%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)          (0.12%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)&lt;br /&gt;
  total                                                                  ?             # ~20X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
  &lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                     elem       min    q1     q2     q3     max        mean       n50        sum&lt;br /&gt;
  -K47           -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48284629&lt;br /&gt;
 &lt;br /&gt;
  -K31           -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1485269562&lt;br /&gt;
  -K31 -d2  -D3  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23998536  &lt;br /&gt;
  -K31 -d20 -M3  -max_rd_len100         7859*      100    113    139    284    43079*     331.49     .          2605184*            &lt;br /&gt;
 &lt;br /&gt;
  -K27 -d 2 -D 3 -max_rd_len100         70246      100    107    137    413    30683      369.81     .          25977758&lt;br /&gt;
  -K27 -d 2 -D 2 -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58635190&lt;br /&gt;
&lt;br /&gt;
==========================================================&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462      # VERY BAD&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793     &lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -M 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             7859      100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  ctg             200062    32   33   37    47    10392  48.52    .    9707307&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             7859*     100  113  139   284   43079* 331.49   .    2605184&lt;br /&gt;
  cChloroplast    20        111  193  436   6140  43079  5951.05  0    119021&lt;br /&gt;
  cBAC            5117      100  114  141   320   13733  334.94   0    1713870&lt;br /&gt;
  mito            8         101  134  685   1396  2166   749.75   0    5998        # VERY BAD&lt;br /&gt;
  other           2714      100  111  133   226   7353   282.35   0    766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 -M 3 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum            &lt;br /&gt;
  scf             20        111  193  436   6140  42707  5928.20  0    118564&lt;br /&gt;
&lt;br /&gt;
========================================================&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8883</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8883"/>
		<updated>2011-08-11T14:26:33Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* Reads */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: &lt;br /&gt;
  lane                  #reads       #cChloroplast   #cBAC               #mito&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)    12715(0.054%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)          (0.12%) &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)&lt;br /&gt;
  total                                                                  ?             # ~20X cvg for 100bp read len &amp;amp; 400K mito genome&lt;br /&gt;
  &lt;br /&gt;
* alignments: &lt;br /&gt;
  program: bwa bwasw&lt;br /&gt;
  cChloroplast ref: 1 seq&lt;br /&gt;
  cBAC:             101 seqs&lt;br /&gt;
  mito:&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                     elem       min    q1     q2     q3     max        mean       n50        sum&lt;br /&gt;
  -K47           -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48284629&lt;br /&gt;
 &lt;br /&gt;
  -K31           -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1485269562&lt;br /&gt;
  -K31 -d2  -D3  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23998536  &lt;br /&gt;
  -K31 -d20 -M3  -max_rd_len100         7859*      100    113    139    284    43079*     331.49     .          2605184*            &lt;br /&gt;
 &lt;br /&gt;
  -K27 -d 2 -D 3 -max_rd_len100         70246      100    107    137    413    30683      369.81     .          25977758&lt;br /&gt;
  -K27 -d 2 -D 2 -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58635190&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793     &lt;br /&gt;
&lt;br /&gt;
* Reads aligned to mitochondrial scaffolds (bwa bwasw)&lt;br /&gt;
  lane               #hits  %hits&lt;br /&gt;
  FC638TR_001_8_1    12307  0.054&lt;br /&gt;
  FC638TR_001_8_2    11933 &lt;br /&gt;
  FC638TR_002_8_1    28707  0.12&lt;br /&gt;
  FC638TR_002_8_2    27211&lt;br /&gt;
  total              80158          # 20X cvg for 100bp read len &amp;amp; 400K mito genome ; 29X  cvg for 146bp read len&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -M 3 -max_rd_len 100 ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                          elem   min    q1     q2     q3     max    mean     n50    sum&lt;br /&gt;
  scf                        7859*  100    113    139    284    43079* 331.49   .      2605184&lt;br /&gt;
  ctg                        200062 32     33     37     47     10392  48.52    .      9707307&lt;br /&gt;
&lt;br /&gt;
 # scaffold length stats&lt;br /&gt;
  .                          elem   min    q1     q2     q3     max    mean     n50    sum&lt;br /&gt;
  all                        7859*  100    113    139    284    43079* 331.49   .      2605184&lt;br /&gt;
  cChloroplast               20     111    193    436    6140   43079  5951.05  0      119021&lt;br /&gt;
  cBAC                       5117   100    114    141    320    13733  334.94   0      1713870&lt;br /&gt;
  mito                       8      101    134    685    1396   2166   749.75   0      5998        !!! VERY BAD&lt;br /&gt;
  other                      2714   100    111    133    226    7353   282.35   0      766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 -M 3 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                    elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  scf                  20         111    193    436    6140   42707      5928.20    0          118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8882</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8882"/>
		<updated>2011-08-11T14:19:42Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: bwa bwasw&lt;br /&gt;
  lane                  #reads       #cChloroplast   #cBAC&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)   &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                     elem       min    q1     q2     q3     max        mean       n50        sum&lt;br /&gt;
  -K47           -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48284629&lt;br /&gt;
 &lt;br /&gt;
  -K31           -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1485269562&lt;br /&gt;
  -K31 -d2  -D3  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23998536  &lt;br /&gt;
  -K31 -d20 -M3  -max_rd_len100         7859*      100    113    139    284    43079*     331.49     .          2605184*            &lt;br /&gt;
 &lt;br /&gt;
  -K27 -d 2 -D 3 -max_rd_len100         70246      100    107    137    413    30683      369.81     .          25977758&lt;br /&gt;
  -K27 -d 2 -D 2 -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58635190&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793     &lt;br /&gt;
&lt;br /&gt;
* Reads aligned to mitochondrial scaffolds (bwa bwasw)&lt;br /&gt;
  lane               #hits  %hits&lt;br /&gt;
  FC638TR_001_8_1    12307  0.054&lt;br /&gt;
  FC638TR_001_8_2    11933 &lt;br /&gt;
  FC638TR_002_8_1    28707  0.12&lt;br /&gt;
  FC638TR_002_8_2    27211&lt;br /&gt;
  total              80158          # 20X cvg for 100bp read len &amp;amp; 400K mito genome ; 29X  cvg for 146bp read len&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -M 3 -max_rd_len 100 ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                          elem   min    q1     q2     q3     max    mean     n50    sum&lt;br /&gt;
  scf                        7859*  100    113    139    284    43079* 331.49   .      2605184&lt;br /&gt;
  ctg                        200062 32     33     37     47     10392  48.52    .      9707307&lt;br /&gt;
&lt;br /&gt;
 # scaffold length stats&lt;br /&gt;
  .                          elem   min    q1     q2     q3     max    mean     n50    sum&lt;br /&gt;
  all                        7859*  100    113    139    284    43079* 331.49   .      2605184&lt;br /&gt;
  cChloroplast               20     111    193    436    6140   43079  5951.05  0      119021&lt;br /&gt;
  cBAC                       5117   100    114    141    320    13733  334.94   0      1713870&lt;br /&gt;
  mito                       8      101    134    685    1396   2166   749.75   0      5998        !!! VERY BAD&lt;br /&gt;
  other                      2714   100    111    133    226    7353   282.35   0      766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 -M 3 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                    elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  scf                  20         111    193    436    6140   42707      5928.20    0          118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8881</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8881"/>
		<updated>2011-08-11T14:17:55Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* NCBI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
* Cycas taitungensis has the most similar mitochondrion&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
  mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: bwa bwasw&lt;br /&gt;
  lane                  #reads       #cChloroplast   #cBAC&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)   &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                     elem       min    q1     q2     q3     max        mean       n50        sum&lt;br /&gt;
  -K47           -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48284629&lt;br /&gt;
 &lt;br /&gt;
  -K31           -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1485269562&lt;br /&gt;
  -K31 -d2  -D3  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23998536  &lt;br /&gt;
  -K31 -d20 -M3  -max_rd_len100         7859*      100    113    139    284    43079*     331.49     .          2605184*            &lt;br /&gt;
 &lt;br /&gt;
  -K27 -d 2 -D 3 -max_rd_len100         70246      100    107    137    413    30683      369.81     .          25977758&lt;br /&gt;
  -K27 -d 2 -D 2 -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58635190&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793     &lt;br /&gt;
&lt;br /&gt;
* Cycas_taitungensis mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
&lt;br /&gt;
* Reads aligned to mitochondrial scaffolds (bwa bwasw)&lt;br /&gt;
  lane               #hits  %hits&lt;br /&gt;
  FC638TR_001_8_1    12307  0.054&lt;br /&gt;
  FC638TR_001_8_2    11933 &lt;br /&gt;
  FC638TR_002_8_1    28707  0.12&lt;br /&gt;
  FC638TR_002_8_2    27211&lt;br /&gt;
  total              80158          # 20X cvg for 100bp read len &amp;amp; 400K mito genome ; 29X  cvg for 146bp read len&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -M 3 -max_rd_len 100 ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                          elem   min    q1     q2     q3     max    mean     n50    sum&lt;br /&gt;
  scf                        7859*  100    113    139    284    43079* 331.49   .      2605184&lt;br /&gt;
  ctg                        200062 32     33     37     47     10392  48.52    .      9707307&lt;br /&gt;
&lt;br /&gt;
 # scaffold length stats&lt;br /&gt;
  .                          elem   min    q1     q2     q3     max    mean     n50    sum&lt;br /&gt;
  all                        7859*  100    113    139    284    43079* 331.49   .      2605184&lt;br /&gt;
  cChloroplast               20     111    193    436    6140   43079  5951.05  0      119021&lt;br /&gt;
  cBAC                       5117   100    114    141    320    13733  334.94   0      1713870&lt;br /&gt;
  mito                       8      101    134    685    1396   2166   749.75   0      5998        !!! VERY BAD&lt;br /&gt;
  other                      2714   100    111    133    226    7353   282.35   0      766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 -M 3 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                    elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  scf                  20         111    193    436    6140   42707      5928.20    0          118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8880</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8880"/>
		<updated>2011-08-11T14:16:48Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: bwa bwasw&lt;br /&gt;
  lane                  #reads       #cChloroplast   #cBAC&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)   &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                     elem       min    q1     q2     q3     max        mean       n50        sum&lt;br /&gt;
  -K47           -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48284629&lt;br /&gt;
 &lt;br /&gt;
  -K31           -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1485269562&lt;br /&gt;
  -K31 -d2  -D3  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23998536  &lt;br /&gt;
  -K31 -d20 -M3  -max_rd_len100         7859*      100    113    139    284    43079*     331.49     .          2605184*            &lt;br /&gt;
 &lt;br /&gt;
  -K27 -d 2 -D 3 -max_rd_len100         70246      100    107    137    413    30683      369.81     .          25977758&lt;br /&gt;
  -K27 -d 2 -D 2 -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58635190&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  all             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database ; Cycas_taitungensis was top hit&lt;br /&gt;
  other.long.hiGC 45        5066 6717 8233  10488 31673  9662.07  0    434793     &lt;br /&gt;
&lt;br /&gt;
* Cycas_taitungensis mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
&lt;br /&gt;
* Reads aligned to mitochondrial scaffolds (bwa bwasw)&lt;br /&gt;
  lane               #hits  %hits&lt;br /&gt;
  FC638TR_001_8_1    12307  0.054&lt;br /&gt;
  FC638TR_001_8_2    11933 &lt;br /&gt;
  FC638TR_002_8_1    28707  0.12&lt;br /&gt;
  FC638TR_002_8_2    27211&lt;br /&gt;
  total              80158          # 20X cvg for 100bp read len &amp;amp; 400K mito genome ; 29X  cvg for 146bp read len&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -M 3 -max_rd_len 100 ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                          elem   min    q1     q2     q3     max    mean     n50    sum&lt;br /&gt;
  scf                        7859*  100    113    139    284    43079* 331.49   .      2605184&lt;br /&gt;
  ctg                        200062 32     33     37     47     10392  48.52    .      9707307&lt;br /&gt;
&lt;br /&gt;
 # scaffold length stats&lt;br /&gt;
  .                          elem   min    q1     q2     q3     max    mean     n50    sum&lt;br /&gt;
  all                        7859*  100    113    139    284    43079* 331.49   .      2605184&lt;br /&gt;
  cChloroplast               20     111    193    436    6140   43079  5951.05  0      119021&lt;br /&gt;
  cBAC                       5117   100    114    141    320    13733  334.94   0      1713870&lt;br /&gt;
  mito                       8      101    134    685    1396   2166   749.75   0      5998        !!! VERY BAD&lt;br /&gt;
  other                      2714   100    111    133    226    7353   282.35   0      766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 -M 3 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                    elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  scf                  20         111    193    436    6140   42707      5928.20    0          118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8879</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8879"/>
		<updated>2011-08-11T14:14:55Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: bwa bwasw&lt;br /&gt;
  lane                  #reads       #cChloroplast   #cBAC&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)   &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                     elem       min    q1     q2     q3     max        mean       n50        sum&lt;br /&gt;
  -K47           -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48284629&lt;br /&gt;
 &lt;br /&gt;
  -K31           -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1485269562&lt;br /&gt;
  -K31 -d2  -D3  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23998536  &lt;br /&gt;
  -K31 -d20 -M3  -max_rd_len100         7859*      100    113    139    284    43079*     331.49     .          2605184*            &lt;br /&gt;
 &lt;br /&gt;
  -K27 -d 2 -D 3 -max_rd_len100         70246      100    107    137    413    30683      369.81     .          25977758&lt;br /&gt;
  -K27 -d 2 -D 2 -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58635190&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473   # align to mito database &lt;br /&gt;
  other.5K+.highGC  45      5066 6717 8233  10488 31673  9662.07  0    434793     &lt;br /&gt;
&lt;br /&gt;
* mito  : scaffolds aligned to at least one of the 31 complete plant mitochondrion sequence&lt;br /&gt;
* Cycas_taitungensis mitochondrion sequence (most hits)&lt;br /&gt;
* Cycas_taitungensis mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
&lt;br /&gt;
* Reads aligned to mitochondrial scaffolds (bwa bwasw)&lt;br /&gt;
  lane               #hits  %hits&lt;br /&gt;
  FC638TR_001_8_1    12307  0.054&lt;br /&gt;
  FC638TR_001_8_2    11933 &lt;br /&gt;
  FC638TR_002_8_1    28707  0.12&lt;br /&gt;
  FC638TR_002_8_2    27211&lt;br /&gt;
  total              80158          # 20X cvg for 100bp read len &amp;amp; 400K mito genome ; 29X  cvg for 146bp read len&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -M 3 -max_rd_len 100 ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                          elem   min    q1     q2     q3     max    mean     n50    sum&lt;br /&gt;
  scf                        7859*  100    113    139    284    43079* 331.49   .      2605184&lt;br /&gt;
  ctg                        200062 32     33     37     47     10392  48.52    .      9707307&lt;br /&gt;
&lt;br /&gt;
 # scaffold length stats&lt;br /&gt;
  .                          elem   min    q1     q2     q3     max    mean     n50    sum&lt;br /&gt;
  all                        7859*  100    113    139    284    43079* 331.49   .      2605184&lt;br /&gt;
  cChloroplast               20     111    193    436    6140   43079  5951.05  0      119021&lt;br /&gt;
  cBAC                       5117   100    114    141    320    13733  334.94   0      1713870&lt;br /&gt;
  mito                       8      101    134    685    1396   2166   749.75   0      5998        !!! VERY BAD&lt;br /&gt;
  other                      2714   100    111    133    226    7353   282.35   0      766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 -M 3 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                    elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  scf                  20         111    193    436    6140   42707      5928.20    0          118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8878</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8878"/>
		<updated>2011-08-11T14:10:56Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* NCBI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361, HQ141589, GU477256..GU477266&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: bwa bwasw&lt;br /&gt;
  lane                  #reads       #cChloroplast   #cBAC&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)   &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                     elem       min    q1     q2     q3     max        mean       n50        sum&lt;br /&gt;
  -K47           -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48284629&lt;br /&gt;
 &lt;br /&gt;
  -K31           -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1485269562&lt;br /&gt;
  -K31 -d2  -D3  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23998536  &lt;br /&gt;
  -K31 -d20 -M3  -max_rd_len100         7859*      100    113    139    284    43079*     331.49     .          2605184*            &lt;br /&gt;
 &lt;br /&gt;
  -K27 -d 2 -D 3 -max_rd_len100         70246      100    107    137    413    30683      369.81     .          25977758&lt;br /&gt;
  -K27 -d 2 -D 2 -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58635190&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473&lt;br /&gt;
&lt;br /&gt;
* mito  : scaffolds aligned to at least one of the 31 complete plant mitochondrion sequence&lt;br /&gt;
* Cycas_taitungensis mitochondrion sequence (most hits)&lt;br /&gt;
* Cycas_taitungensis mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
&lt;br /&gt;
* Reads aligned to mitochondrial scaffolds (bwa bwasw)&lt;br /&gt;
  lane               #hits  %hits&lt;br /&gt;
  FC638TR_001_8_1    12307  0.054&lt;br /&gt;
  FC638TR_001_8_2    11933 &lt;br /&gt;
  FC638TR_002_8_1    28707  0.12&lt;br /&gt;
  FC638TR_002_8_2    27211&lt;br /&gt;
  total              80158          # 20X cvg for 100bp read len &amp;amp; 400K mito genome ; 29X  cvg for 146bp read len&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -M 3 -max_rd_len 100 ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                          elem   min    q1     q2     q3     max    mean     n50    sum&lt;br /&gt;
  scf                        7859*  100    113    139    284    43079* 331.49   .      2605184&lt;br /&gt;
  ctg                        200062 32     33     37     47     10392  48.52    .      9707307&lt;br /&gt;
&lt;br /&gt;
 # scaffold length stats&lt;br /&gt;
  .                          elem   min    q1     q2     q3     max    mean     n50    sum&lt;br /&gt;
  all                        7859*  100    113    139    284    43079* 331.49   .      2605184&lt;br /&gt;
  cChloroplast               20     111    193    436    6140   43079  5951.05  0      119021&lt;br /&gt;
  cBAC                       5117   100    114    141    320    13733  334.94   0      1713870&lt;br /&gt;
  mito                       8      101    134    685    1396   2166   749.75   0      5998        !!! VERY BAD&lt;br /&gt;
  other                      2714   100    111    133    226    7353   282.35   0      766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 -M 3 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                    elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  scf                  20         111    193    436    6140   42707      5928.20    0          118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8877</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8877"/>
		<updated>2011-08-11T14:09:11Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] &lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== NCBI ==&lt;br /&gt;
&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces] BAC 454 reads&lt;br /&gt;
* BAC assembled sequences : AC241263..AC241361&lt;br /&gt;
* Plant mitochondrion finished sequences&lt;br /&gt;
  .      elem    min    q1      q2      q3      max      mean     sum&lt;br /&gt;
  len    31      45223  209482  414903  539368  982833   402851   12488404&lt;br /&gt;
  gc%    31      32.80  43.73   43.93   44.98   46.92    43.41    .&lt;br /&gt;
&lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: bwa bwasw&lt;br /&gt;
  lane                  #reads       #cChloroplast   #cBAC&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)   &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                     elem       min    q1     q2     q3     max        mean       n50        sum&lt;br /&gt;
  -K47           -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48284629&lt;br /&gt;
 &lt;br /&gt;
  -K31           -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1485269562&lt;br /&gt;
  -K31 -d2  -D3  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23998536  &lt;br /&gt;
  -K31 -d20 -M3  -max_rd_len100         7859*      100    113    139    284    43079*     331.49     .          2605184*            &lt;br /&gt;
 &lt;br /&gt;
  -K27 -d 2 -D 3 -max_rd_len100         70246      100    107    137    413    30683      369.81     .          25977758&lt;br /&gt;
  -K27 -d 2 -D 2 -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58635190&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473&lt;br /&gt;
&lt;br /&gt;
* mito  : scaffolds aligned to at least one of the 31 complete plant mitochondrion sequence&lt;br /&gt;
* Cycas_taitungensis mitochondrion sequence (most hits)&lt;br /&gt;
* Cycas_taitungensis mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
&lt;br /&gt;
* Reads aligned to mitochondrial scaffolds (bwa bwasw)&lt;br /&gt;
  lane               #hits  %hits&lt;br /&gt;
  FC638TR_001_8_1    12307  0.054&lt;br /&gt;
  FC638TR_001_8_2    11933 &lt;br /&gt;
  FC638TR_002_8_1    28707  0.12&lt;br /&gt;
  FC638TR_002_8_2    27211&lt;br /&gt;
  total              80158          # 20X cvg for 100bp read len &amp;amp; 400K mito genome ; 29X  cvg for 146bp read len&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -M 3 -max_rd_len 100 ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                          elem   min    q1     q2     q3     max    mean     n50    sum&lt;br /&gt;
  scf                        7859*  100    113    139    284    43079* 331.49   .      2605184&lt;br /&gt;
  ctg                        200062 32     33     37     47     10392  48.52    .      9707307&lt;br /&gt;
&lt;br /&gt;
 # scaffold length stats&lt;br /&gt;
  .                          elem   min    q1     q2     q3     max    mean     n50    sum&lt;br /&gt;
  all                        7859*  100    113    139    284    43079* 331.49   .      2605184&lt;br /&gt;
  cChloroplast               20     111    193    436    6140   43079  5951.05  0      119021&lt;br /&gt;
  cBAC                       5117   100    114    141    320    13733  334.94   0      1713870&lt;br /&gt;
  mito                       8      101    134    685    1396   2166   749.75   0      5998        !!! VERY BAD&lt;br /&gt;
  other                      2714   100    111    133    226    7353   282.35   0      766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 -M 3 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                    elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  scf                  20         111    193    436    6140   42707      5928.20    0          118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
	<entry>
		<id>https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8876</id>
		<title>Pine tree</title>
		<link rel="alternate" type="text/html" href="https://wiki.umiacs.umd.edu/cbcb/index.php?title=Pine_tree&amp;diff=8876"/>
		<updated>2011-08-11T13:47:45Z</updated>

		<summary type="html">&lt;p&gt;Dpuiu: /* SOAPdenovo-31mer -K 27 -d 2 -D 3 -max_rd_len 100 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Links =&lt;br /&gt;
&lt;br /&gt;
* [https://dendrome.ucdavis.edu/TGPlone dendrome@ucdavis]&lt;br /&gt;
* [http://www.pinegenome.org/pinerefseq pinegenome.org]&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3352 NCBI Taxonomy record] Pinus taeda or &amp;quot;loblolly pine&amp;quot;&lt;br /&gt;
* [http://www.pine.msstate.edu/bac.htm LOBLOLLY PINE BAC LIBRARY@MSSTATE.EDU] AC241263..AC241361&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/pubmed/21283709 Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine.] PLoS One Jan 2011&lt;br /&gt;
Abstract:&lt;br /&gt;
&#039;&#039;Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).&#039;&#039;&lt;br /&gt;
* [http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies&amp;amp;f=study&amp;amp;term=%28Pinus+taeda%29+&amp;amp;go=Go SRA traces]&lt;br /&gt;
&lt;br /&gt;
= Data =&lt;br /&gt;
 &lt;br /&gt;
== UCDAVIS plone ==&lt;br /&gt;
* Links&lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq  &lt;br /&gt;
  dpuiu&lt;br /&gt;
  ddr5fft6 &lt;br /&gt;
  https://dendrome.ucdavis.edu/TGPlone/research-projects/pinerefseq/files/library-and-flow-cell-data/prs-tracking-database-archive/&lt;br /&gt;
* Documents&lt;br /&gt;
** [[Media:PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods|PRS_experiment_agenda_2011-07-28_05-43pm_PDT.ods]] 21 July 2011&lt;br /&gt;
&lt;br /&gt;
== IPST ftp ==&lt;br /&gt;
  ftp genomepc1.umd.edu&lt;br /&gt;
  ftpuser&lt;br /&gt;
  pinegenome&lt;br /&gt;
 &lt;br /&gt;
  cd PineUpload052911/&lt;br /&gt;
  bin&lt;br /&gt;
  prompt             # no Y/N?&lt;br /&gt;
  mget *&lt;br /&gt;
&lt;br /&gt;
== Local data ==&lt;br /&gt;
  ginkgo:&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload052911&lt;br /&gt;
  /fs/szattic-asmg7/PINE/PineUpload070711&lt;br /&gt;
&lt;br /&gt;
= PineUpload052911 =&lt;br /&gt;
&lt;br /&gt;
== Chloroplast ==&lt;br /&gt;
                 len      gc%&lt;br /&gt;
  cChloroplast   120481   38.55&lt;br /&gt;
&lt;br /&gt;
== cBACs ==&lt;br /&gt;
  .       elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  len     102        8288   89909  116121 140549 172161     113400     126689     11566806       &lt;br /&gt;
  gc%     102        34.44  36.56  37.61  38.80  52.88      37.94      37.66      3870.87        &lt;br /&gt;
&lt;br /&gt;
== Reads ==&lt;br /&gt;
  lane           readLen   #mates        mea,std      ~gc%&lt;br /&gt;
  FC638TR_001_8  146       22,729,231    400           39.04&lt;br /&gt;
  FC638TR_002_8  146       18,412,638    400           39.04&lt;br /&gt;
&lt;br /&gt;
* Quality decreases sharply after pos 120        [[Media:FC638TR.qual.png|FC638TR.qual.png]]&lt;br /&gt;
* First 10bp of each read have higher AG count   [[Media:FC638TR.content.png|FC638TR.content.png]]&lt;br /&gt;
* Over 0.5% Ns certain positions                 [[Media:FC638TR.Ns.png|FC638TR.Ns.png]]&lt;br /&gt;
&lt;br /&gt;
  fwd: 1.015% pos=100 ; 0.81% pos=119&lt;br /&gt;
  rev: 1.114% pos=101 ; 0.92% pos=107 ; 0.87% pos=30; 0.21% pos 21&lt;br /&gt;
&lt;br /&gt;
* GC% variation: cBAC(37.5%) &amp;lt; cChloroplast(38.5%) &amp;lt; reads(39%) &amp;lt; mito (44%+) &lt;br /&gt;
&lt;br /&gt;
* Contamination: bwa bwasw&lt;br /&gt;
  lane                  #reads       #cChloroplast   #cBAC&lt;br /&gt;
  FC638TR_001_8_1	22,729,231   468,309(2%)     9,533,849(42.7%)&lt;br /&gt;
  FC638TR_001_8_2	22,729,231   466,185(2%)     9,303,475(41.7%)&lt;br /&gt;
  FC638TR_002_8_1	18,412,638   995,291(5.4%)   7,535,809(41.7%)   &lt;br /&gt;
  FC638TR_002_8_2	18,412,638   990,122(5.4%)   7,330,078(40.5%)&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo&#039;s ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                                     elem       min    q1     q2     q3     max        mean       n50        sum&lt;br /&gt;
  -K47           -max_rd_len100         211820     100    143    156*   187    23273      227.95     .          48284629&lt;br /&gt;
 &lt;br /&gt;
  -K31           -max_rd_len100         13747338   100    100    100    100    9185       108.04     .          1485269562&lt;br /&gt;
  -K31 -d2  -D3  -max_rd_len100         74820      100    105    125    390    31673      320.75     .          23998536  &lt;br /&gt;
  -K31 -d20 -M3  -max_rd_len100         7859*      100    113    139    284    43079*     331.49     .          2605184*            &lt;br /&gt;
 &lt;br /&gt;
  -K27 -d 2 -D 3 -max_rd_len100         70246      100    107    137    413    30683      369.81     .          25977758&lt;br /&gt;
  -K27 -d 2 -D 2 -max_rd_len146         224963     100    110    128    343    23410      260.64     .          58635190&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  scf             74820     100  105  125   390   31673  320.75   0    23998536&lt;br /&gt;
  ctg             5755282   32   32   35    43    7195   41.63    0    239620204&lt;br /&gt;
  edge            11015468  1    2    4     11    7164   8.75     0    96380983&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem      min  q1   q2    q3    max    mean     n50  sum&lt;br /&gt;
  cChloroplast    206       100  122  159   229   767    191.56   0    39462&lt;br /&gt;
  cBAC            10533     100  113  143   428   26589  477.68   0    5031439&lt;br /&gt;
  mito            83        105  448  1730  6851  26364  4315.20  0    358162&lt;br /&gt;
  other           63998     100  104  122   382   31673  290.16   0    18569473&lt;br /&gt;
&lt;br /&gt;
* mito  : scaffolds aligned to at least one of the 31 complete plant mitochondrion sequence&lt;br /&gt;
* Cycas_taitungensis mitochondrion sequence (most hits)&lt;br /&gt;
* Cycas_taitungensis mitochondrion vs chloroplast: [[Media:Cycas_taitungensis_mito-chloroplast.png|Cycas_taitungensis_mito-chloroplast.png]]&lt;br /&gt;
  NC_009618	chloroplast     163,403&lt;br /&gt;
  NC_010303	mitochondrion   414,903&lt;br /&gt;
&lt;br /&gt;
* Reads aligned to mitochondrial scaffolds (bwa bwasw)&lt;br /&gt;
  lane               #hits  %hits&lt;br /&gt;
  FC638TR_001_8_1    12307  0.054&lt;br /&gt;
  FC638TR_001_8_2    11933 &lt;br /&gt;
  FC638TR_002_8_1    28707  0.12&lt;br /&gt;
  FC638TR_002_8_2    27211&lt;br /&gt;
  total              80158          # 20X cvg for 100bp read len &amp;amp; 400K mito genome ; 29X  cvg for 146bp read len&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 20 -M 3 -max_rd_len 100 ==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                          elem   min    q1     q2     q3     max    mean     n50    sum&lt;br /&gt;
  scf                        7859*  100    113    139    284    43079* 331.49   .      2605184&lt;br /&gt;
  ctg                        200062 32     33     37     47     10392  48.52    .      9707307&lt;br /&gt;
&lt;br /&gt;
 # scaffold length stats&lt;br /&gt;
  .                          elem   min    q1     q2     q3     max    mean     n50    sum&lt;br /&gt;
  all                        7859*  100    113    139    284    43079* 331.49   .      2605184&lt;br /&gt;
  cChloroplast               20     111    193    436    6140   43079  5951.05  0      119021&lt;br /&gt;
  cBAC                       5117   100    114    141    320    13733  334.94   0      1713870&lt;br /&gt;
  mito                       8      101    134    685    1396   2166   749.75   0      5998        !!! VERY BAD&lt;br /&gt;
  other                      2714   100    111    133    226    7353   282.35   0      766295&lt;br /&gt;
&lt;br /&gt;
== SOAPdenovo-31mer -K 31 -d 48 -max_rd_len 100 -M 3 choloplast_mated_reads==&lt;br /&gt;
  #scaffold stats&lt;br /&gt;
  .                    elem       min    q1     q2     q3     max        mean       n50        sum            &lt;br /&gt;
  scf                  20         111    193    436    6140   42707      5928.20    0          118564&lt;br /&gt;
&lt;br /&gt;
= PineUpload070711 =&lt;br /&gt;
&lt;br /&gt;
== Ecoli ==&lt;br /&gt;
                 len     gc%&lt;br /&gt;
  cE_coli        4639675 50.79  &lt;br /&gt;
&lt;br /&gt;
== Cloning vector ==&lt;br /&gt;
                 len    gc% &lt;br /&gt;
  pFosDT5_2      8345   47.93&lt;br /&gt;
&lt;br /&gt;
== Drosophila refseq ==&lt;br /&gt;
&lt;br /&gt;
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=6185 NCBI Genome Overview]&lt;br /&gt;
  Chromosome      len            gc%&lt;br /&gt;
  2L              23,011,544     41&lt;br /&gt;
  2R              21,146,708     43&lt;br /&gt;
  3L              24,543,557     41&lt;br /&gt;
  3R              27,905,053     42&lt;br /&gt;
  4               1,351,857      35&lt;br /&gt;
  X               22,422,827     42 &lt;br /&gt;
  un              10,049,037     ?    &lt;br /&gt;
  mitochondrion   19,517         17&lt;br /&gt;
  total           137,586,636    ?     # actually the chromosome lengths sum to 130,450,100&lt;br /&gt;
&lt;br /&gt;
== Reads (Drosophila) ==    &lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #reads    #cE_coli         #pFosDT5_2       #cChloroplast  #cBAC  &lt;br /&gt;
  FC70M6V_6_001_1          160      23546475  2931496(12.44%)  5473141(23.24%)  24148(0.10%)   7739576(32.86%)&lt;br /&gt;
  FC70M6V_6_001_2          156      23546475  2885406(12.25%)  5854468(24.86%)  21794(0.09%)   7520343(31.93%)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
  lib                      readLen  #mates    mea,std   ~gc%  %merged(Tanja)   %cE_coli  %cpFosDT5_2  %cChloroplast  %cBAC   %other  &lt;br /&gt;
  FC70M6V_6_001            160,156  23546475  343,30    42.5                   12.5%     24%          0.09%          32.5    34      # sampled 100K&lt;br /&gt;
&lt;br /&gt;
  TIL_242_FC70M6V_2_002    160,156  9917211   242       .      91.4%  &lt;br /&gt;
  TIL_242_FC70M6V_3_002    160,156  6276300   242              92.7%  &lt;br /&gt;
 &lt;br /&gt;
  TIL_254_FC70M6V_2_004    160,156  9279789   254        .     91.5%&lt;br /&gt;
  TIL_254_FC70M6V_3_004    160,156  5924239   254              92.9%&lt;br /&gt;
 &lt;br /&gt;
  TIL_270_FC70M6V_2_003    160,156  10188776  270        .     88.1%&lt;br /&gt;
  TIL_270_FC70M6V_3_003    160,156  6556676   270              90.3%&lt;br /&gt;
 &lt;br /&gt;
  TIL_288_FC70M6V_2_001    160,156  9524524   288        .     80.0%&lt;br /&gt;
  TIL_288_FC70M6V_3_001    160,156  6158919   288              83.0%&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* kastevens@ucdavis.edu:&lt;br /&gt;
** The files labeled  TIL_XXX_FC70M6V_Y_00Z, are Drosophila libraries with a median target insert size of XXX. They come in pairs and can be merged.&lt;br /&gt;
** Regarding pairing, each insert size was run in two lanes Y at two different concentrations. &lt;br /&gt;
** Lane 3, with the lower concentration, should have higher quality data than lane 2 but with a higher cost per bp. &lt;br /&gt;
** The loss in quality was quantitativly small, so we don&#039;t expect the extra expense of lowering the concentration will be justified empirically.&lt;br /&gt;
** The first library, FC70M6V_6_001, is a ~40x library created from a pool of ~1000 fosmids. In general, we do not put the insert size in the filename. &lt;br /&gt;
** However, we did estimate the insert size to be 343bp with a below median standard deviation of 30. So roughly 15% of the inserts are &amp;lt; 313bp and  have &amp;gt; 3bp overlap. This seems to fit well with your result.&lt;br /&gt;
** Each lane is multiplexed into sub-lanes indicated by 00Z. So the amount of reads in the file is variable and not nessesarily reflective of the cluster density. &lt;br /&gt;
** The Drosophila libraries were each run in 1/4 lane and the fosmid pool was run in 1/2 lane. The pool has roughy double the sequence content of the &lt;br /&gt;
** Drosophila libraries run in lane 2 at nominal density.&lt;br /&gt;
&lt;br /&gt;
==  SOAPdenovo-31mer -K 31 -d 2 -D 3 -max_rd_len 100 ==&lt;br /&gt;
  #stats&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  scf             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  ctg             802463   32   33   39    63     73415   91.13     0    73131767&lt;br /&gt;
  edge            1013801  1    2    7     32     30919   48.85     0    49525815&lt;br /&gt;
&lt;br /&gt;
  #scf alignments&lt;br /&gt;
  .               elem     min  q1   q2    q3     max     mean      n50  sum&lt;br /&gt;
  all             20441    100  124  374   1980   291000  2575.50   0    52645707&lt;br /&gt;
  cE_coli         149      100  325  6612  41908  291000  30160.59  0    4493928&lt;br /&gt;
  cpFosDT5_2      0&lt;br /&gt;
  cChloroplast    58       105  166  374   1950   24932   1875.86   0    108800&lt;br /&gt;
  cBAC            12294    100  141  785   4204   45781   3513.34   0    43192987&lt;br /&gt;
  other           7953     100  113  171   599    41416   619.60    0    4927664&lt;/div&gt;</summary>
		<author><name>Dpuiu</name></author>
	</entry>
</feed>