Cbcb talk:Pop-Lab:16S-pipeline 16S analysis pipeline (for Gates project)
Jump to navigation
Jump to search
Step 9: Run OLD clustering tool
- First generate clusters
Note: This part assumes we're running the whole set of sequences as one batch.
cd Analysis/Run[date]/Run[date].fna /fs/szasmg2/ghodsi/Src/clusterk/clusterk7 -r 2 -i Run[date].fna > Run[date].fna.cluster /fs/szasmg2/ghodsi/Src/clusterk/clusterk7 -r 2 -m -i Run[date].fna > Run[date].fna.align
Output will be in Run[date].fna.cluster, one cluster per line, cluster center listed as the first identifier.
The .align file contains aligned FASTA records for all the sequences in each cluster. Clusters are separated by #<number> where <number> is the number of sequences in the cluster.
- Then extract the cluster centers
From here on the code runs the same in both full-run and batch modes
/fs/szasmg2/ghodsi/Src/clusterk/fastaselect Run[date].fna < Run[date].fna.cluster > Run[date].centers.fna