Culex pipiens symbiont

From Cbcb
Revision as of 17:02, 9 August 2007 by Dpuiu (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Data Sources

NCBI:

Sanger: Wolbachia pipientis endosymbiont of Culex quinquefasciatus

JCVI:

Read Counts

Assembly

2006_1226_WGA

 1. All cpqg reads have been downloaded from the TA (July 2006). The
 reads have been grouped by libraries and the clear range has been
 computed. There were 6.6M reads in the download compared with 7.3M now.
 Unfortunately I've only noticed this difference at the end of my experiment.
 2. The Wolbachia endosymbiont of Culex quinquefasciatus assembly has
 been downloaded from the Sanger ftp site
 ( ftp://ftp.sanger.ac.uk/pub/pathogens/Wolbachia/Wb_Cq.dbs ) ; there are
 95 sequences in this file. Most of them are very short. Below are listed
 the name,length & gc% of the longest 10:
 
    name                        length(bp)   gc%
    culex173d08.p1k               1457497    34.17
    culexbac1d10Bg07.p1k            24726    35.11
    culex3d09.p1k                   15587    21.81
    culex166f03.q1k                 13962    36.17
    culex_1177_1189-1a02.w2k1177    13564    37.10
    culex26b07.p1k                   9245    35.53
    culex174d04.p1k                  8832    33.64
    J28015Ag08.q1ka                  7809    36.04
    culex180e07.p1k                  6960    36.59
    culex53a02.p1k                   5343    33.58
 3. The /cpqg random reads (clr only) have been aligned to symbiont
 sequences using nucmer (default parameters)
 4.  The nucmer output has been analyzed. It's been noticed that many of
 the short symbiont sequences (2-3KB in length) have a higher than
 expected number of alignments. To avoid the repeats I've selected only
 the reads that aligned to the longest 10 symbiont sequences (see above).

 5. A 95% identity and minimum of 400 bp alignment thold has been used to
 determine the symbiont reads. There were 29,110 unique reads (30,690
 reads+mates) selected. Below is a per library breakdown (reads+mates):
    MSC-CULEX-PIPIENS-QUINQUEFASCIATUS_01-G-CULEX-10KB      9581
    MSC-CULEX-PIPIENS-QUINQUEFASCIATUS_06-G-CULEX-10KB      4549
    G818P4                                                  3784
    G818P2                                                  3478
    G818P1                                                  2238
    G818F1                                                  1283
    MSC-CULEX-PIPIENS-QUINQUEFASCIATUS_02-G-CULEX-4KB       1156
    MSC-CULEX-PIPIENS-QUINQUEFASCIATUS_03-F-CULEX-40KB      738
    G818P3                                                  723
    MSC-CULEX-PIPIENS-QUINQUEFASCIATUS_07-G-CULEX-10KB      556
    MSC-CULEX-PIPIENS-QUINQUEFASCIATUS_05-F-CULEX-40KB      327
    MSC-CULEX-PIPIENS-QUINQUEFASCIATUS_04-F-CULEX-40KB      185
    1099522705601                                           99
    G809K1                                                  89
    1099499586718                                           77
    G772K1                                                  12
    G771K1                                                  10
    G766BES1                                                4
    1099641499000                                           2
 6. The reads have been assembled using the runCA-OBT.pl script (default
 parameters). The assembly is available under:
 /usr/local/projects/CPQG/dpuiu/walbachia/2006_1226_OBT/
 
 Most of the reads got assembled into 3 large scaffolds. There is mate
 pair evidence (outie mates) that the largest scaffold is circular.
 ...

7. The scaffolds/contigs have been aligned to longest 10 Wolbachia
 endosymbiont sequences. Most of the long alignments were at over 99%
 identity. However, several large rearrangements have been noticed.