Cbcb:Pop-Lab:Challenges - Revision history

Tgibbons at 16:38, 20 January 2010

2010-01-20T16:38:47Z

← Older revision		Revision as of 16:38, 20 January 2010
Line 1:		Line 1:
	* comparative assembly of metagenomic data with thousands of references		* comparative assembly of metagenomic data with thousands of references

	The basic idea here is that we have pretty good software for doing comparative assembly once you're settled on a genome to use as a reference. What if you have a metagenomic ~~datasets~~ and thousands of reference genomes? Can you do better than simply running the data-set against each of the genomes and combining the results afterwards? There are some issues here of both how you pick the correct genomes to use as references, and how you store the genomes and/or sequences in order to efficiently do the comparative assembly.		The basic idea here is that we have pretty good software for doing comparative assembly once you're settled on a genome to use as a reference. What if you have a metagenomic dataset and thousands of reference genomes? Can you do better than simply running the data-set against each of the genomes and combining the results afterwards? There are some issues here of both how you pick the correct genomes to use as references, and how you store the genomes and/or sequences in order to efficiently do the comparative assembly.

	* visualization tools for large assembly graphs		* visualization tools for large assembly graphs

Mpop at 19:09, 11 November 2009

2009-11-11T19:09:34Z

← Older revision		Revision as of 19:09, 11 November 2009
Line 10:		Line 10:

	there are quite a few examples of genomic structures that bacteria use to rapidly generate antigenic variation, eg. by expressing different types of proteins on the surface. These strucutres usually involve repeats that allow the genome to rearrange. What do these regions look like in genome assembly graphs? Can you find putative hypervariable loci by looking at the assembly graphs?		there are quite a few examples of genomic structures that bacteria use to rapidly generate antigenic variation, eg. by expressing different types of proteins on the surface. These strucutres usually involve repeats that allow the genome to rearrange. What do these regions look like in genome assembly graphs? Can you find putative hypervariable loci by looking at the assembly graphs?

			* pooling of samples for assembly

			Most metagenomic projects will focus on multiple samples/individuals, yet, due to cost constraints, each sample will only be thinly covered by sequencing data so that only the most abundant organisms can be assembled. A simple solution is to mix together multiple samples prior to assembly. How would you do this, however, if you have too much data (either too many samples, or too many reads in each sample)? Also, how would you deal with polymorphisms introduced by this pooling approach (e.g. different samples contain slightly different variants of a same organism).

Mpop at 18:39, 11 November 2009

2009-11-11T18:39:09Z

New page

* comparative assembly of metagenomic data with thousands of references

The basic idea here is that we have pretty good software for doing comparative assembly once you're settled on a genome to use as a reference. What if you have a metagenomic datasets and thousands of reference genomes? Can you do better than simply running the data-set against each of the genomes and combining the results afterwards? There are some issues here of both how you pick the correct genomes to use as references, and how you store the genomes and/or sequences in order to efficiently do the comparative assembly.

* visualization tools for large assembly graphs

How do you display large assembly graphs with the goal of presenting this information to biologists looking for interesting patterns in terms of population structure in closely related organisms.

* "interesting" patterns in assembly graphs

there are quite a few examples of genomic structures that bacteria use to rapidly generate antigenic variation, eg. by expressing different types of proteins on the surface. These strucutres usually involve repeats that allow the genome to rearrange. What do these regions look like in genome assembly graphs? Can you find putative hypervariable loci by looking at the assembly graphs?