Bumblebee: Difference between revisions

From Cbcb
Jump to navigation Jump to search
No edit summary
Line 10: Line 10:
  2. GC bias, so we can compute a-stats properly
  2. GC bias, so we can compute a-stats properly
  3. Redundancy in the long paired ends, which are lane 1 and lane 2.
  3. Redundancy in the long paired ends, which are lane 1 and lane 2.
 
* Data stats
  Lane  Insert  #Reads
  1      3Kbp    34,944,099
  3      8Kbp    32,540,640


* Lane 1: 3Kbp insert; 34,944,099 reads
* Formatting: keep only the first 100bp
* Lane 3: 8Kbp insert; 32,540,640 reads


* Formatting: keep only the first 100bp
* circularization adaptors
  TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG - revcomp -> CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA
  AGCATATTGAAGCATATTACATACGATATGCTTCAATAATGC


= Assembly =
= Assembly =

Revision as of 03:29, 4 March 2010

Data

  • Location:
 /fs/szattic-asmg4/Bees/Bombus_impatiens
  • There are 7 pairs of data files (paired ends) : lanes 1..3,5..8 (lane 4 wasn't used)
  • Tasks to figure out:
1. Erroneous reads/bases, which we need to correct or discard
2. GC bias, so we can compute a-stats properly
3. Redundancy in the long paired ends, which are lane 1 and lane 2.
 
  • Data stats
 Lane   Insert   #Reads
 1      3Kbp     34,944,099 
 3      8Kbp     32,540,640
  • Formatting: keep only the first 100bp
  • circularization adaptors
 TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG - revcomp -> CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA
 AGCATATTGAAGCATATTACATACGATATGCTTCAATAATGC

Assembly

  • Meryl
 meryl -Dh -s 0-mercounts/asm-C-ms22-cm1 >! 22mers.hist
 Found 3136399464 mers.
 Found 379123530 distinct mers.
 Found 201257394 unique mers.
 Largest mercount is 12006651; 90 mers are too big for histogram.
  • countKmers
 most frequent 22mer :                 AGCATACATTATACGAAGTTAT     ~ 16% of the seqs
 most frequent 42mer : CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA ~ 10%  of the seqs (pPAC7.9124-9165)  
 
  • Location
 /fs/szdevel/dpuiu/SourceForge/wgs-assembler.030210/Linux-amd64/bin/runCA