Brugia malayi: Difference between revisions
No edit summary |
|||
Line 28: | Line 28: | ||
File location: | File location: | ||
/fs/szasmg3/dpuiu/Brugia_malayi/Data/Bm.fasta | /fs/szasmg3/dpuiu/Brugia_malayi/Data/Bm.fasta | ||
ctgs | ctgs min q1 q2 q3 max mean n50 sum | ||
26879 | 26879 200 836 1005 1495 611244 3241.17 18986 87119350 | ||
* [http://www.tigr.org/tdb/e2k1/bma1/intro.shtml TIGR Genome project] (TRS strain) | * [http://www.tigr.org/tdb/e2k1/bma1/intro.shtml TIGR Genome project] (TRS strain) | ||
Line 35: | Line 35: | ||
== Data == | == Data == | ||
=== Original Traces === | === Original Traces === | ||
1.26M reads & 15 Libraries: | |||
Libraries: | |||
* 454 Titanium paired-end (NEW) | * 454 Titanium paired-end (NEW) | ||
** 500Mbp per run in reads of ~350bp each. | ** 500Mbp per run in reads of ~350bp each. | ||
Line 62: | Line 59: | ||
* JCVI 403,591 | * JCVI 403,591 | ||
* NO BACS !!!; max INSERT_SIZE=23K | * NO BACS !!!; max INSERT_SIZE=23K | ||
INSERT_SIZE INSERT_STDEV TRACE_TYPE_CODE | |||
1000 300 WGS 13500 | |||
1258 377 WGS 305906 | |||
1415 424 WGS 389561 | |||
3123 936 WGS 293782 | |||
7158 2147 WGS 219306 | |||
17168 5150 WGS 14934 | |||
22419 6725 WGS 18094 | |||
* TI's: 1172642810, ... ,1174845185 | * TI's: 1172642810, ... ,1174845185 | ||
Line 82: | Line 88: | ||
* All library insert sizes are underestimated | * All library insert sizes are underestimated | ||
* The contaminant reads align at ~91-93% id to the contaminant ctgs while the Mt/We reads align at 99% id to Mt/We finished seq. What %id thold to use for contaminant? | * The contaminant reads align at ~91-93% id to the contaminant ctgs while the Mt/We reads align at 99% id to Mt/We finished seq. What %id thold to use for contaminant? | ||
=== BACS === | |||
8,000 BAC clones @Children's Hospital Oakland Research Institute. (!!! no NCBI TA submission) | |||
=== PITT FTP data === | === PITT FTP data === |
Revision as of 07:20, 12 February 2010
Articles
Genome Info
- 6 chromosomes: 1-5, XY
- ~ 90M, 30% GC, 32% coding, 15% repeats
- diploid genome of approximately 110 Mb
Other sequences
- mitochondrion finished: 13,657 bp; 24% GC
- Wolbachia endosymbiont strain TRS from Brugia malayi strain wMel complete: 1,080,084 bp; 34%GC (New England Biolabs)
- Wolbachia endosymbiont strain wMel progress (TIGR)
- Rodent: some trace contamination; Example: Mus musculus is ~44%GC
contaminants min q1 q2 q3 max mean n50 sum 2929 200 527 675 820 8994 740.04 762 2167588
Genome Project
Brugia malayi has a diploid genome of approximately 110 Mb, organized in 6 pairs of chromosomes (five pairs of autosomes and one pair of sex chromosomes). In addition to the nuclear genome, B. malayi has a mitochondrial genome of about 14kb, and the genome of the harbored bacterial endosymbiont Wolbachia sp (1-2Mb).
The B. malayi genome project has been completed by The Institute for Genomic Research. Whole Genome Shotgun sequencing was used to obtain more than eight-fold coverage of the genome. The complete genome was assembled into approximately 8200 scaffolds and deposited in GenBank. The accession for the WGS project is AAQA00000000 and consists of sequences AAQA01000001-AAQA01029808. File location:
/fs/szasmg3/dpuiu/Brugia_malayi/Data/Bm.fasta ctgs min q1 q2 q3 max mean n50 sum 26879 200 836 1005 1495 611244 3241.17 18986 87119350
- TIGR Genome project (TRS strain)
Data
Original Traces
1.26M reads & 15 Libraries: * 454 Titanium paired-end (NEW) ** 500Mbp per run in reads of ~350bp each. ** reads: 80bp to 500bp ** 30% - 50% of the reads contain two mate tags per read ** Each mate tag is ~150bp and the tags are separated by the 42bp "recombi" XLR linker sequence ** lib mean,std =~ 28K,6K ** duplication problem: 2-4 identical copies of each pair???
Trace summary:
* all: 1,260,215 * TRACE_TYPE_CODE * WGS: 1,258,277 * TRANSPOSON: 1,437 * PRIMER_WALK 501 * CENTER NAME * TIGR: 856,624 * JCVI 403,591 * NO BACS !!!; max INSERT_SIZE=23K
INSERT_SIZE INSERT_STDEV TRACE_TYPE_CODE 1000 300 WGS 13500 1258 377 WGS 305906 1415 424 WGS 389561 3123 936 WGS 293782 7158 2147 WGS 219306 17168 5150 WGS 14934 22419 6725 WGS 18094
- TI's: 1172642810, ... ,1174845185
- SEQ_LIB_ID's : 1047111480027, ... , 1047174912885
FRG file:
- FRG.src : same as TI's above
- FRG.acc: 2 ..
- DST.acc: 1260217, ... , 1260234
- Location
/fs/szasmg3/dpuiu/Brugia_malayi/Data/nucmer_seq/Bm-all.frg DST 15 FRG 1178192 LKG 530930 seqs min q1 q2 q3 max mean n50 sum 1178192 65 645 771 850 1214 724 800 853847771 => 8X
Problems:
- All library insert sizes are underestimated
- The contaminant reads align at ~91-93% id to the contaminant ctgs while the Mt/We reads align at 99% id to Mt/We finished seq. What %id thold to use for contaminant?
BACS
8,000 BAC clones @Children's Hospital Oakland Research Institute. (!!! no NCBI TA submission)
PITT FTP data
CBCB Location:
/scratch1/brugia_malayi/Data/ /fs/szattic-asmg4/brugia_malayi/Data/
FTP access:
lftp -u bma 136.142.191.201 pass: 6279 user: bma # empty as of --Dpuiu 12:04, 8 January 2010 (EST)
Elodie's table: /scratch1/brugia_malayi/brugia-sequencing-summary.txt.csv
# elodie's date protocol platform type description run_name Reads Mates 1 01/17/2008 WGS Standard Full run (2/2) Mix of worms (calibration of the machine) R_2008_01_31_18_01_35_FLX10070260_adminrig_ghedintestsample ? 0 2 07/01/2008 3Kb Standard Full single worm (pUC contamination) R_2008_08_06_13_52_29_FLX10070260_adminrig_080608_Ghedin-BrugiaLTPE1 492575 84341 3 09/11/2008 3kb Standard 4/8 wells single worm (pUC contamination) R_2008_09_19_14_17_55_FLX10070260_adminrig_091908_HATFULL-MIDrepeat_GHEDIN-LTPE1 263421 49258 4 10/01/2008 3Kb Standard Full Mix of worms (still pUC contamination) R_2008_10_14_15_06_50_FLX10070260_adminrig_101408_GHEDIN-Brugia-pool_LTPEtest 59711 5096 5 02/01/2009 WGS Standard 1/4 wells Mix of worms; regions 2 & 3 were myxoma R_2009_02_27_16_11_34_FLX10070260_adminrig_022709_GHEDIN ? 0 6 04/06/2009 WGS Standard 1/4 wells Mix of worms; with comp. bio run R_2009_04_15_14_46_56_FLX10070260_adminrig_041509_GHEDIN_r1-WGS1_r2-LMW4_r3-pool2compbio_r4-pool3compbio ? 0 7 05/01/2009 20Kb Titanium 7/8 wells Mix of worms R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1 631287 213524 8 10/28/2009 20Kb Titanium Full Mix of worms R_2009_10_22_15_30_12_FLX10070260_adminrig_102209_GHEDIN_Brugia20kb2 1095713 377547 9 Pending 3 Kb Titanium Full Mix of worms ? ? ? . Total 2542707 729766
SFF files counts:
run count linker 1 R_2008_01_31_18_01_35_FLX10070260_adminrig_ghedintestsample/D_2008_01_31_18_01_35_FLX10070260_adminrig_FullAnalysis/sff/E4RA0X101.sff 272923 1 R_2008_01_31_18_01_35_FLX10070260_adminrig_ghedintestsample/D_2008_01_31_18_01_35_FLX10070260_adminrig_FullAnalysis/sff/E4RA0X102.sff 261899 2 R_2008_08_06_13_52_29_FLX10070260_adminrig_080608_Ghedin-BrugiaLTPE1/D_2009_02_12_22_12_04_j_SignalProcessing/sff/FEZH5RS01.sff 228204 flx 2 R_2008_08_06_13_52_29_FLX10070260_adminrig_080608_Ghedin-BrugiaLTPE1/D_2009_02_12_22_12_04_j_SignalProcessing/sff/FEZH5RS02.sff 264371 flx 3 R_2008_09_19_14_17_55_FLX10070260_adminrig_091908_HATFULL-MIDrepeat_GHEDIN-LTPE1/FHAVB5T02.sff 86862 flx 3 R_2008_09_19_14_17_55_FLX10070260_adminrig_091908_HATFULL-MIDrepeat_GHEDIN-LTPE1/FHAVB5T03.sff 87488 flx 3 R_2008_09_19_14_17_55_FLX10070260_adminrig_091908_HATFULL-MIDrepeat_GHEDIN-LTPE1/FHAVB5T04.sff 89071 flx 4 R_2008_10_14_15_06_50_FLX10070260_adminrig_101408_GHEDIN-Brugia-pool_LTPEtest/FIOXLOM01.sff 13695 flx 4 R_2008_10_14_15_06_50_FLX10070260_adminrig_101408_GHEDIN-Brugia-pool_LTPEtest/FIOXLOM02.sff 14197 flx 4 R_2008_10_14_15_06_50_FLX10070260_adminrig_101408_GHEDIN-Brugia-pool_LTPEtest/FIOXLOM03.sff 15515 flx 4 R_2008_10_14_15_06_50_FLX10070260_adminrig_101408_GHEDIN-Brugia-pool_LTPEtest/FIOXLOM04.sff 16304 flx 5 R_2009_02_27_16_11_34_FLX10070260_adminrig_022709_GHEDIN/FRLDXKV01.sff 18025 6 R_2009_04_15_14_46_56_FLX10070260_adminrig_041509_GHEDIN_r1-WGS1_r2-LMW4_r3-pool2compbio_r4-pool3compbio/D_2009_04_16_14_19_21_morty_fullProcessing/FT9KOI001.sff 118490 7 R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1/D_2009_06_08_15_32_36_compute-0-2_fullProcessing/sff/FW1OXFY01.sff 73807 tit 7 R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1/D_2009_06_08_15_32_36_compute-0-2_fullProcessing/sff/FW1OXFY02.sff 91698 tit 7 R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1/D_2009_06_08_15_32_36_compute-0-2_fullProcessing/sff/FW1OXFY03.sff 93878 tit 7 R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1/D_2009_06_08_15_32_36_compute-0-2_fullProcessing/sff/FW1OXFY04.sff 90232 tit 7 R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1/D_2009_06_08_15_32_36_compute-0-2_fullProcessing/sff/FW1OXFY05.sff 97065 tit 7 R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1/D_2009_06_08_15_32_36_compute-0-2_fullProcessing/sff/FW1OXFY06.sff 94326 tit 7 R_2009_06_05_16_18_41_FLX10070260_adminrig_060509_GHEDIN_Brugia-gDNA-TI20kb1/D_2009_06_08_15_32_36_compute-0-2_fullProcessing/sff/FW1OXFY07.sff 90281 tit 8 R_2009_10_22_15_30_12_FLX10070260_adminrig_102209_GHEDIN_Brugia20kb2/F4H5CMB01.sff 551263 tit 8 R_2009_10_22_15_30_12_FLX10070260_adminrig_102209_GHEDIN_Brugia20kb2/F4H5CMB02.sff 544450 tit total 3214044
454 Frg files
Location:
/fs/szattic-asmg4/brugia_malayi/Data/Frg/ #clr of the seqs in the frg files seqs min q1 q2 q3 max mean n50 sum 3,297,077 3 156 248 301 2043 244 275 806,091,347 => 8X
454 Contaminant search
count rodent 197,429 We 23,249 Mt 2,634
Assemblies
TIGR
- 9X coverage, 856K Sanger traces => 8,200 scaff & 29,808 ctg (avg. scaff=~10K & avg ctg=~3K)
- "scaffolds totaling ~71 Mb of data with a further ~17.5 Mb of contigs not integrated into any scaffold (orphan contigs)" (Science 2007)
- NCBI AAQA00000000 AAQA01000001-AAQA01029808
* 26,879 good ctgs * 2,929 jird contaminants (Example: AAQA01001321 : mouse 99%id hits)
good ctg len #elem min max mean median n50 sum all 26879 200 611244 3241 1005 19005 87119350 10K+ 1224 10036 611244 41018 23135 60727 50206329 good ctg GC% #elem min max mean median n50 all 26878 0.00 72.30 28.86 28.56 29.46 10K+ 1224 24.38 38.44 30.38 30.43 30.62
contaminant ctg len #elem min max mean median n50 sum all 2929 200 8994 740 675 763 2167588 contaminant ctg GC% #elem min max mean median n50 all 2929 18.09 75.96 44.1 43.59 44.80
CBCB
- test wgs 5.1 on filtered Sanger reads
- better assembly than the published one
- Location:
/fs/szasmg3/dpuiu/Brugia_malayi/Assembly/Bm/2008_0826_CA
#elem min max mean median n50 sum ctg 12753 273 376744 6113 1632 24765 77964006 ctg(10K+) 1553 10015 376744 33868 23216 40618 52597039 deg 9661 65 72494 1241 949 1008 11988997 ctg+deg 22414 65 376744 4013 1201 19411 89953003 ctg+deg(10K+) 1651 10015 376744 33019 22612 40060 54514638 scf 10317 935 3890532 8019 1538 41772 82730474 scf2K+ 3656 2001 3890532 20189 5733 50293 73813083 reads 1178192(100%) singletons 134119 (11.43%)
PITT
- Best so far
- Date: 11/05/08
- Location:
/fs/szasmg3/dpuiu/Brugia_malayi/Assembly/PITT/brugia_assemblies_110508.fasta
- Stats:
elem min q1 q2 q3 max mean n50 sum scf 3170 2000 2917 4483 14471 6534162 22916 112914 72643770
Files
/fs/szattic/asmg1/adelcher/Genomes/Brugia : Art's files /fs/sztmpscratch/cole/tarchive_download/brugia_malay : Cole's files /fs/szasmg3/dpuiu/Brugia_malayi/ : Daniela's files /scratch1/brugia_malayi/Data/ : ftp PITT data /fs/szattic-asmg4/brugia_malayi : ftp PITT data (as well)