Cbcb:Pop-Lab:Challenges
- comparative assembly of metagenomic data with thousands of references
The basic idea here is that we have pretty good software for doing comparative assembly once you're settled on a genome to use as a reference. What if you have a metagenomic datasets and thousands of reference genomes? Can you do better than simply running the data-set against each of the genomes and combining the results afterwards? There are some issues here of both how you pick the correct genomes to use as references, and how you store the genomes and/or sequences in order to efficiently do the comparative assembly.
- visualization tools for large assembly graphs
How do you display large assembly graphs with the goal of presenting this information to biologists looking for interesting patterns in terms of population structure in closely related organisms.
- "interesting" patterns in assembly graphs
there are quite a few examples of genomic structures that bacteria use to rapidly generate antigenic variation, eg. by expressing different types of proteins on the surface. These strucutres usually involve repeats that allow the genome to rearrange. What do these regions look like in genome assembly graphs? Can you find putative hypervariable loci by looking at the assembly graphs?