Cbcb talk:Pop-Lab:16S-pipeline 16S analysis pipeline (for Gates project)

From Cbcb
Jump to navigation Jump to search

Step 9: Run OLD clustering tool

  • First generate clusters

Note: This part assumes we're running the whole set of sequences as one batch.

cd Analysis/Run[date]/Run[date].fna
/fs/szasmg2/ghodsi/Src/clusterk/clusterk7 -r 2 -i Run[date].fna > Run[date].fna.cluster
/fs/szasmg2/ghodsi/Src/clusterk/clusterk7 -r 2 -m -i Run[date].fna > Run[date].fna.align

Output will be in Run[date].fna.cluster, one cluster per line, cluster center listed as the first identifier.

The .align file contains aligned FASTA records for all the sequences in each cluster. Clusters are separated by #<number> where <number> is the number of sequences in the cluster.


  • Then extract the cluster centers

From here on the code runs the same in both full-run and batch modes

/fs/szasmg2/ghodsi/Src/clusterk/fastaselect Run[date].fna < Run[date].fna.cluster > Run[date].centers.fna