StrepSuis-AMRVirKM

K-Modes Clustering of Antimicrobial Resistance and Virulence Profiles in Streptococcus suis

Back to Tools

Overview

The StrepSuis-AMRVirKM module performs unsupervised K-Modes clustering of Streptococcus suis strains based on their antimicrobial resistance patterns, virulence factors, and MIC profiles. The tool uses K-Modes clustering specifically optimized for binary categorical genomic data.

Key Features

  • Automatic K Selection: Determines optimal number of clusters using silhouette analysis
  • Multiple Correspondence Analysis (MCA): Dimensionality reduction for visualization
  • Bootstrap Validation: Confidence intervals for cluster assignments (500+ iterations)
  • Feature Importance: Chi-square tests with FDR correction to identify discriminative features
  • Association Rules: Apriori algorithm to find feature co-occurrence patterns

When to Use

Use this tool when you want to:

  • Identify natural groupings of bacterial strains
  • Discover resistance patterns across your dataset
  • Find strains with similar phenotypic profiles
  • Explore relationships between AMR genes and virulence factors

Input Files Required

  • MIC.csv - Minimum Inhibitory Concentration data
  • AMR_genes.csv - Antimicrobial resistance genes
  • Virulence.csv - Virulence factors

Format: CSV with first column "Strain_ID", other columns binary (0/1)

Output Files

  • HTML Report: Interactive tables with sorting, filtering, and Plotly visualizations
  • Excel Workbook: Multi-sheet with metadata, results, and chart index
  • PNG Charts: High-resolution figures (150+ DPI) for publications

Statistical Methods

  • K-Modes clustering with categorical distance metrics
  • Silhouette coefficient for cluster quality assessment
  • Chi-square test with Benjamini-Hochberg FDR correction
  • Bootstrap resampling for confidence intervals
  • Multiple Correspondence Analysis (MCA)

Example Workflow

  1. Upload your CSV files (MIC, AMR_genes, Virulence)
  2. Configure parameters (or use defaults)
  3. Run analysis (5-10 minutes)
  4. Review HTML report for interactive exploration
  5. Download Excel file for detailed data analysis
  6. Use PNG charts in your manuscript

Quick Start

Option 1: Google Colab (Recommended)

  1. Click "Run in Colab" button
  2. Execute all cells (Runtime → Run all)
  3. Upload files when prompted
  4. Download results.zip
Run in Colab

Option 2: Local Installation

# Clone repository
git clone https://github.com/MK-vet/MKrep.git
cd MKrep

# Install dependencies
pip install -r requirements.txt

# Run analysis
python Cluster_MIC_AMR_Viruelnce.py

Option 3: Command Line Interface

# Install package
pip install mkrep

# Run analysis
mkrep-cluster --data-dir ./data --output ./results

Parameters

Max Clusters 8
Bootstrap 500
FDR Alpha 0.05
Random Seed 42
Default values shown. All parameters are configurable.

Runtime

5-10 minutes

Typical runtime on standard dataset (100 strains, 50 features)