Cbcb talk:Pop-Lab:16S-pipeline 16S analysis pipeline (for Gates project): Difference between revisions

From Cbcb
Jump to navigation Jump to search
No edit summary
 
(No difference)

Latest revision as of 19:11, 27 September 2010

Step 9: Run OLD clustering tool

  • First generate clusters

Note: This part assumes we're running the whole set of sequences as one batch.

cd Analysis/Run[date]/Run[date].fna
/fs/szasmg2/ghodsi/Src/clusterk/clusterk7 -r 2 -i Run[date].fna > Run[date].fna.cluster
/fs/szasmg2/ghodsi/Src/clusterk/clusterk7 -r 2 -m -i Run[date].fna > Run[date].fna.align

Output will be in Run[date].fna.cluster, one cluster per line, cluster center listed as the first identifier.

The .align file contains aligned FASTA records for all the sequences in each cluster. Clusters are separated by #<number> where <number> is the number of sequences in the cluster.


  • Then extract the cluster centers

From here on the code runs the same in both full-run and batch modes

/fs/szasmg2/ghodsi/Src/clusterk/fastaselect Run[date].fna < Run[date].fna.cluster > Run[date].centers.fna