Data
/fs/szattic-asmg4/Bees/Bombus_impatiens
- There are 7 pairs of data files (paired ends) : lanes 1..3,5..8 (lane 4 wasn't used)
1. Erroneous reads/bases, which we need to correct or discard
2. GC bias, so we can compute a-stats properly
3. Redundancy in the long paired ends, which are lane 1 and lane 2.
Lane Insert #Reads
1 3Kbp 34,944,099
3 8Kbp 32,540,640
- Formatting: keep only the first 100bp
TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG - revcomp -> CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA
AGCATATTGAAGCATATTACATACGATATGCTTCAATAATGC
Assembly
meryl -Dh -s 0-mercounts/asm-C-ms22-cm1 >! 22mers.hist
Found 3136399464 mers.
Found 379123530 distinct mers.
Found 201257394 unique mers.
Largest mercount is 12006651; 90 mers are too big for histogram.
most frequent 22mer : AGCATACATTATACGAAGTTAT ~ 16% of the seqs
most frequent 42mer : CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA ~ 10% of the seqs (pPAC7.9124-9165)
/fs/szdevel/dpuiu/SourceForge/wgs-assembler.030210/Linux-amd64/bin/runCA