Culex pipiens symbiont: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
|||
Line 25: | Line 25: | ||
computed. There were 6.6M reads in the download compared with 7.3M now. | computed. There were 6.6M reads in the download compared with 7.3M now. | ||
Unfortunately I've only noticed this difference at the end of my experiment. | Unfortunately I've only noticed this difference at the end of my experiment. | ||
2. The Wolbachia endosymbiont of Culex quinquefasciatus assembly has | 2. The Wolbachia endosymbiont of Culex quinquefasciatus assembly has | ||
been downloaded from the Sanger ftp site | been downloaded from the Sanger ftp site | ||
( ftp://ftp.sanger.ac.uk/pub/pathogens/Wolbachia/Wb_Cq.dbs ) ; there are | ( ftp://ftp.sanger.ac.uk/pub/pathogens/Wolbachia/Wb_Cq.dbs ) ; there are | ||
95 sequences in this file. Most of them are very short. Below are listed | 95 sequences in this file. Most of them are very short. Below are listed | ||
the name,length & gc% of the longest 10: | the name,length & gc% of the longest 10: | ||
name length(bp) gc% | name length(bp) gc% | ||
culex173d08.p1k 1457497 34.17 | culex173d08.p1k 1457497 34.17 | ||
Line 43: | Line 41: | ||
culex180e07.p1k 6960 36.59 | culex180e07.p1k 6960 36.59 | ||
culex53a02.p1k 5343 33.58 | culex53a02.p1k 5343 33.58 | ||
3. The cpqg random reads (clr only) have been aligned to symbiont | |||
3. The | |||
sequences using nucmer (default parameters) | sequences using nucmer (default parameters) | ||
4. The nucmer output has been analyzed. It's been noticed that many of | 4. The nucmer output has been analyzed. It's been noticed that many of | ||
the short symbiont sequences (2-3KB in length) have a higher than | the short symbiont sequences (2-3KB in length) have a higher than | ||
expected number of alignments. To avoid the repeats I've selected only | expected number of alignments. To avoid the repeats I've selected only | ||
the reads that aligned to the longest 10 symbiont sequences (see above). | the reads that aligned to the longest 10 symbiont sequences (see above). | ||
5. A 95% identity and minimum of 400 bp alignment thold has been used to | 5. A 95% identity and minimum of 400 bp alignment thold has been used to | ||
determine the symbiont reads. There were 29,110 unique reads (30,690 | determine the symbiont reads. There were 29,110 unique reads (30,690 | ||
reads+mates) selected. Below is a per library breakdown (reads+mates): | reads+mates) selected. Below is a per library breakdown (reads+mates): | ||
MSC-CULEX-PIPIENS-QUINQUEFASCIATUS_01-G-CULEX-10KB 9581 | MSC-CULEX-PIPIENS-QUINQUEFASCIATUS_01-G-CULEX-10KB 9581 | ||
MSC-CULEX-PIPIENS-QUINQUEFASCIATUS_06-G-CULEX-10KB 4549 | MSC-CULEX-PIPIENS-QUINQUEFASCIATUS_06-G-CULEX-10KB 4549 |
Revision as of 17:03, 9 August 2007
Data Sources
NCBI:
Sanger: Wolbachia pipientis endosymbiont of Culex quinquefasciatus
- [ http://www.sanger.ac.uk/Projects/W_pipientis/ Genome Project]
- [ ftp://ftp.sanger.ac.uk/pub/pathogens/Wolbachia/ FTP]
JCVI:
Read Counts
Assembly
2006_1226_WGA
1. All cpqg reads have been downloaded from the TA (July 2006). The reads have been grouped by libraries and the clear range has been computed. There were 6.6M reads in the download compared with 7.3M now. Unfortunately I've only noticed this difference at the end of my experiment. 2. The Wolbachia endosymbiont of Culex quinquefasciatus assembly has been downloaded from the Sanger ftp site ( ftp://ftp.sanger.ac.uk/pub/pathogens/Wolbachia/Wb_Cq.dbs ) ; there are 95 sequences in this file. Most of them are very short. Below are listed the name,length & gc% of the longest 10: name length(bp) gc% culex173d08.p1k 1457497 34.17 culexbac1d10Bg07.p1k 24726 35.11 culex3d09.p1k 15587 21.81 culex166f03.q1k 13962 36.17 culex_1177_1189-1a02.w2k1177 13564 37.10 culex26b07.p1k 9245 35.53 culex174d04.p1k 8832 33.64 J28015Ag08.q1ka 7809 36.04 culex180e07.p1k 6960 36.59 culex53a02.p1k 5343 33.58 3. The cpqg random reads (clr only) have been aligned to symbiont sequences using nucmer (default parameters) 4. The nucmer output has been analyzed. It's been noticed that many of the short symbiont sequences (2-3KB in length) have a higher than expected number of alignments. To avoid the repeats I've selected only the reads that aligned to the longest 10 symbiont sequences (see above). 5. A 95% identity and minimum of 400 bp alignment thold has been used to determine the symbiont reads. There were 29,110 unique reads (30,690 reads+mates) selected. Below is a per library breakdown (reads+mates): MSC-CULEX-PIPIENS-QUINQUEFASCIATUS_01-G-CULEX-10KB 9581 MSC-CULEX-PIPIENS-QUINQUEFASCIATUS_06-G-CULEX-10KB 4549 G818P4 3784 G818P2 3478 G818P1 2238 G818F1 1283 MSC-CULEX-PIPIENS-QUINQUEFASCIATUS_02-G-CULEX-4KB 1156 MSC-CULEX-PIPIENS-QUINQUEFASCIATUS_03-F-CULEX-40KB 738 G818P3 723 MSC-CULEX-PIPIENS-QUINQUEFASCIATUS_07-G-CULEX-10KB 556 MSC-CULEX-PIPIENS-QUINQUEFASCIATUS_05-F-CULEX-40KB 327 MSC-CULEX-PIPIENS-QUINQUEFASCIATUS_04-F-CULEX-40KB 185 1099522705601 99 G809K1 89 1099499586718 77 G772K1 12 G771K1 10 G766BES1 4 1099641499000 2
6. The reads have been assembled using the runCA-OBT.pl script (default parameters). The assembly is available under: /usr/local/projects/CPQG/dpuiu/walbachia/2006_1226_OBT/ Most of the reads got assembled into 3 large scaffolds. There is mate pair evidence (outie mates) that the largest scaffold is circular.
... 7. The scaffolds/contigs have been aligned to longest 10 Wolbachia endosymbiont sequences. Most of the long alignments were at over 99% identity. However, several large rearrangements have been noticed.