Brugia malayi: Difference between revisions
(→CBCB) |
(→Traces) |
||
Line 31: | Line 31: | ||
* 15-20 K @TIGR | * 15-20 K @TIGR | ||
* 8,000 BAC clones @Children's Hospital Oakland Research Institute. (!!! no NCBI TA submission) | * 8,000 BAC clones @Children's Hospital Oakland Research Institute. (!!! no NCBI TA submission) | ||
* 454 Titanium paired-end (NEW) | |||
** 500Mbp per run in reads of ~350bp each. | |||
** reads: 80bp to 500bp | |||
** 30% - 50% of the reads contain two mate tags per read | |||
** Each mate tag is ~150bp and the tags are separated by the 42bp "recombi" XLR linker sequence | |||
** lib mean,std =~ 28K,6K | |||
** duplication problem: 2-4 identical copies of each pair??? | |||
* [http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?&cmd=retrieve&val=SPECIES_CODE%20%3D%20%22BRUGIA%20MALAYI%22&retrieve=Submit NCBI TA] | * [http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?&cmd=retrieve&val=SPECIES_CODE%20%3D%20%22BRUGIA%20MALAYI%22&retrieve=Submit NCBI TA] | ||
* [ftp://ftp.ncbi.nih.gov/pub/TraceDB/brugia_malayi/ NCBI TA FTP] | * [ftp://ftp.ncbi.nih.gov/pub/TraceDB/brugia_malayi/ NCBI TA FTP] |
Revision as of 17:48, 23 June 2009
Articles
Genome Info
- 6 chromosomes: 1-5, XY
- ~ 90M, 30% GC, 32% coding, 15% repeats
Other sequences
- mitochondrion finished: 13,657 bp; 24% GC
- Wolbachia endosymbiont strain TRS from Brugia malayi strain wMel complete: 1,080,084 bp; 34%GC (New England Biolabs)
- Wolbachia endosymbiont strain wMel progress (TIGR)
- Rodent: some trace contamination; Example: Mus musculus is ~40%GC
Genome Project
Brugia malayi has a diploid genome of approximately 110 Mb, organized in 6 pairs of chromosomes (five pairs of autosomes and one pair of sex chromosomes). In addition to the nuclear genome, B. malayi has a mitochondrial genome of about 14kb, and the genome of the harbored bacterial endosymbiont Wolbachia sp (1-2Mb).
The B. malayi genome project has been completed by The Institute for Genomic Research. Whole Genome Shotgun sequencing was used to obtain more than eight-fold coverage of the genome. The complete genome was assembled into approximately 8200 scaffolds and deposited in GenBank. The accession for the WGS project is AAQA00000000 and consists of sequences AAQA01000001-AAQA01029808.
- TIGR Genome project (TRS strain)
Traces
Libraries: * 2K : bulk of the sequence @TIGR * 15-20 K @TIGR * 8,000 BAC clones @Children's Hospital Oakland Research Institute. (!!! no NCBI TA submission) * 454 Titanium paired-end (NEW) ** 500Mbp per run in reads of ~350bp each. ** reads: 80bp to 500bp ** 30% - 50% of the reads contain two mate tags per read ** Each mate tag is ~150bp and the tags are separated by the 42bp "recombi" XLR linker sequence ** lib mean,std =~ 28K,6K ** duplication problem: 2-4 identical copies of each pair???
Trace summary:
* all: 1,260,215 * TRACE_TYPE_CODE * WGS: 1,258,277 * TRANSPOSON: 1,437 * PRIMER_WALK 501 * CENTER NAME * TIGR: 856,624 * JCVI 403,591 * NO BACS !!!; max INSERT_SIZE=23K
- TI's: 1172642810, ... ,1174845185
- SEQ_LIB_ID's : 1047111480027, ... , 1047174912885
FRG file:
- FRG.src : same as TI's above
- FRG.acc: 2 ..
- DST.acc: 1260217, ... , 1260234
Problems:
- All library insert sizes are underestimated
- The contaminant reads align at ~91-93% id to the contaminant ctgs while the Mt/We reads align at 99% id to Mt/We finished seq. What %id thold to use for contaminant?
Contigs
- NCBI AAQA00000000 AAQA01000001-AAQA01029808
* 26,879 good ctgs * 2,929 jird contaminants (Example: AAQA01001321 : mouse 99%id hits)
good ctg len #elem min max mean median n50 sum all 26879 200 611244 3241 1005 19005 87119350 10K+ 1224 10036 611244 41018 23135 60727 50206329 good ctg GC% #elem min max mean median n50 all 26878 0.00 72.30 28.86 28.56 29.46 10K+ 1224 24.38 38.44 30.38 30.43 30.62
contaminant ctg len #elem min max mean median n50 sum all 2929 200 8994 740 675 763 2167588 contaminant ctg GC% #elem min max mean median n50 all 2929 18.09 75.96 44.1 43.59 44.80
Assemblies
TIGR
- 9X coverage, 856K Sanger traces => 8,200 scaff & 29,808 ctg (avg. scaff=~10K & avg ctg=~3K)
- "scaffolds totaling ~71 Mb of data with a further ~17.5 Mb of contigs not integrated into any scaffold (orphan contigs)" (Science 2007)
CBCB
wgs 5.1 on filtered Sanger reads
#elem min max mean median n50 sum ctg 12753 273 376744 6113 1632 24765 77964006 ctg(10K+) 1553 10015 376744 33868 23216 40618 52597039 deg 9661 65 72494 1241 949 1008 11988997 ctg+deg 22414 65 376744 4013 1201 19411 89953003 ctg+deg(10K+) 1651 10015 376744 33019 22612 40060 54514638 scf 10317 935 3890532 8019 1538 41772 82730474 reads 1178192(100%) singletons 134119 (11.43%)
!!! better assembly than the published one
- Location: /fs/szasmg3/dpuiu/Brugia_malayi/Assembly/Bm/2008_0826_CA
Files
* /fs/szattic/asmg1/adelcher/Genomes/Brugia : Art's files * /fs/sztmpscratch/cole/tarchive_download/brugia_malay : Cole's files * /fs/szasmg3/dpuiu/Brugia_malayi/ : Daniela's files