What is MLExplain?

MLExplain is an educational and practical tool for exploring how different machine learning algorithms perform on classic datasets. It provides an intuitive web interface for training models, inspecting feature importance, studying confusion matrices, and comparing algorithms side by side.

Built with Flask and scikit-learn, MLExplain requires no external AI APIs and runs entirely locally. All experiments are stored in a SQLite database for full reproducibility and history tracking.

Supported Algorithms

Decision Tree

Interpretable tree-based classifier that splits data based on feature thresholds. Provides direct feature importance via Gini impurity reduction.

max_depth min_samples_split

Random Forest

Ensemble of decision trees trained with bagging. Reduces overfitting and provides robust feature importance averaged across all trees.

n_estimators max_depth

Support Vector Machine

Kernel-based classifier that finds optimal hyperplanes for separating classes. Effective in high-dimensional spaces.

C kernel

K-Nearest Neighbours

Instance-based learning that classifies samples by majority vote of their k nearest neighbours in feature space.

n_neighbors metric

Logistic Regression

Linear model for classification using log-odds. Fast to train and provides probabilistic predictions with confidence scores.

C solver max_iter

Built-in Datasets

Iris

150 samples · 4 features · 3 classes (setosa, versicolor, virginica)

Wine

178 samples · 13 features · 3 cultivar classes

Breast Cancer

569 samples · 30 features · 2 classes (malignant, benign)

Digits

1,797 samples · 64 features · 10 classes (0-9)

Explanation Methods

Tree-based Feature Importance

Available for Decision Tree and Random Forest. Uses the total Gini impurity reduction contributed by each feature across all splits in the tree(s).

Permutation Importance

Available for all models. Measures the decrease in model accuracy when a single feature's values are randomly shuffled, breaking the relationship between the feature and the target.

Technology Stack

Backend: Python 3.11, Flask 3.0
ML Engine: scikit-learn 1.3, NumPy 1.26
Database: SQLite via SQLAlchemy 2.0
Frontend: Jinja2 templates, Chart.js 4
Testing: pytest 7.4 with coverage
Deployment: Docker, Gunicorn

REST API

MLExplain provides a full REST API for programmatic access to all features:

Method Endpoint Description
GET/api/healthHealth check
GET/api/datasetsList datasets
GET/api/datasets/<name>Dataset info
POST/api/trainTrain a model
GET/api/experimentsList experiments
GET/api/experiments/<id>Experiment details
DELETE/api/experiments/<id>Delete experiment
GET/api/experiments/<id>/importanceFeature importance
GET/api/experiments/<id>/confusionConfusion matrix
GET/api/experiments/<id>/metricsAll metrics
POST/api/predict/<id>Prediction
POST/api/compareCompare models