Overview
The StrepSuis-AMRVirKM module performs unsupervised K-Modes clustering of Streptococcus suis strains based on their antimicrobial resistance patterns, virulence factors, and MIC profiles. The tool uses K-Modes clustering specifically optimized for binary categorical genomic data.
Key Features
- Automatic K Selection: Determines optimal number of clusters using silhouette analysis
- Multiple Correspondence Analysis (MCA): Dimensionality reduction for visualization
- Bootstrap Validation: Confidence intervals for cluster assignments (500+ iterations)
- Feature Importance: Chi-square tests with FDR correction to identify discriminative features
- Association Rules: Apriori algorithm to find feature co-occurrence patterns
When to Use
Use this tool when you want to:
- Identify natural groupings of bacterial strains
- Discover resistance patterns across your dataset
- Find strains with similar phenotypic profiles
- Explore relationships between AMR genes and virulence factors
Input Files Required
MIC.csv - Minimum Inhibitory Concentration data
AMR_genes.csv - Antimicrobial resistance genes
Virulence.csv - Virulence factors
Format: CSV with first column "Strain_ID", other columns binary (0/1)
Output Files
- HTML Report: Interactive tables with sorting, filtering, and Plotly visualizations
- Excel Workbook: Multi-sheet with metadata, results, and chart index
- PNG Charts: High-resolution figures (150+ DPI) for publications
Statistical Methods
- K-Modes clustering with categorical distance metrics
- Silhouette coefficient for cluster quality assessment
- Chi-square test with Benjamini-Hochberg FDR correction
- Bootstrap resampling for confidence intervals
- Multiple Correspondence Analysis (MCA)
Example Workflow
- Upload your CSV files (MIC, AMR_genes, Virulence)
- Configure parameters (or use defaults)
- Run analysis (5-10 minutes)
- Review HTML report for interactive exploration
- Download Excel file for detailed data analysis
- Use PNG charts in your manuscript
Quick Start
Option 1: Google Colab (Recommended)
- Click "Run in Colab" button
- Execute all cells (Runtime → Run all)
- Upload files when prompted
- Download results.zip
Run in Colab
Option 2: Local Installation
# Clone repository
git clone https://github.com/MK-vet/MKrep.git
cd MKrep
# Install dependencies
pip install -r requirements.txt
# Run analysis
python Cluster_MIC_AMR_Viruelnce.py
Option 3: Command Line Interface
# Install package
pip install mkrep
# Run analysis
mkrep-cluster --data-dir ./data --output ./results