Skip to content

Instantly share code, notes, and snippets.

@Daemon-AT4
Last active January 8, 2026 23:48
Show Gist options
  • Select an option

  • Save Daemon-AT4/17a09036833ca8a7ca8029fef7ece2fa to your computer and use it in GitHub Desktop.

Select an option

Save Daemon-AT4/17a09036833ca8a7ca8029fef7ece2fa to your computer and use it in GitHub Desktop.
ML Tools and Learning resources

Comprehensive Machine Learning Resources Guide

Random Forest, Decision Trees, SVM, F1 Score & Scikit-learn


πŸ“š Official Scikit-learn Documentation

Core Algorithm Documentation

Algorithm Official Documentation URL
Decision Trees https://scikit-learn.org/stable/modules/tree.html
DecisionTreeClassifier https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
RandomForestClassifier https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
Support Vector Machines https://scikit-learn.org/stable/modules/svm.html
Ensemble Methods https://scikit-learn.org/stable/modules/ensemble.html
Cross-Validation https://scikit-learn.org/stable/modules/cross_validation.html
Model Evaluation Metrics https://scikit-learn.org/stable/modules/model_evaluation.html
Scikit-learn Intro Tutorial https://scikit-learn.org/stable/tutorial/basic/tutorial.html
Algorithm Cheat Sheet https://scikit-learn.org/stable/_static/ml_map.png

πŸ“– Academic Papers & Research

Foundational Papers

Paper Authors Link
Random Forests (2001) Leo Breiman https://stat.berkeley.edu/~breiman/randomforest2001.pdf
Support-Vector Networks (1995) Cortes & Vapnik https://link.springer.com/article/10.1007/BF00994018
Analysis of Random Forests GΓ©rard Biau https://perso.lpsm.paris/
SVM Tutorial for Pattern Recognition ENS Paris https://di.ens.fr/~mallat/papiers/svmtutorial.pdf

Academic Databases


πŸ“• Books (Free & Paid)

Essential Books

Book Author(s) Notes
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow AurΓ©lien GΓ©ron Covers DT, RF, SVM with practical code
Introduction to Statistical Learning (ISLR) James, Witten, Hastie, Tibshirani FREE PDF available
The Elements of Statistical Learning Hastie, Tibshirani, Friedman FREE PDF β€” Advanced mathematical foundations
Python Machine Learning Sebastian Raschka Practical scikit-learn implementations
Machine Learning with Python Cookbook Chris Albon Quick scikit-learn recipes
Interpretable Machine Learning Christoph Molnar FREE β€” https://christophm.github.io/interpretable-ml-book/

πŸŽ“ Online Courses & MOOCs

Free Courses

Paid Platforms

  • Coursera: Build Decision Trees, SVMs, and Artificial Neural Networks
  • DataCamp: Random Forest, SVM, Decision Trees courses
  • Pluralsight: ML courses with scikit-learn
  • LinkedIn Learning: ML with Python courses
  • O'Reilly Learning: Comprehensive scikit-learn courses
  • IBM AI Engineering Professional Certificate

πŸŽ₯ Video Tutorials

YouTube Channels

Channel Focus
StatQuest with Josh Starmer Intuitive ML explanations with visuals
3Blue1Brown Mathematical visualizations
Sentdex ML with Python playlist
Tech With Tim ML tutorials with scikit-learn
Krish Naik SVM, Random Forest, Decision Trees
codebasics Scikit-learn tutorial series

Conference Talks

  • PyData Conference β€” YouTube talks on scikit-learn best practices
  • SciPy Conference β€” ML tools presentations

πŸ’» Tutorials & Practical Guides

Random Forest Resources

Decision Trees Resources

Support Vector Machines Resources

F1 Score & Evaluation Metrics


πŸ”¬ Mathematical Foundations

Key Formulas

Gini Impurity (Decision Trees): [ \text{Gini} = 1 - \sum_{i=1}^{n} p_i^2 ]

Entropy (Decision Trees): [ \text{Entropy} = -\sum_{i=1}^{n} p_i \log_2(p_i) ]

SVM Margin Maximization: [ \text{Maximize: } \frac{2}{|w|} ]

F1 Score: [ F_1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} ]

Precision & Recall: [ \text{Precision} = \frac{TP}{TP + FP}, \quad \text{Recall} = \frac{TP}{TP + FN} ]


πŸ”§ GitHub Repositories & Code Examples

Learning Repositories

Repository Description
tanmayjay/Comparative-Analysis-of-Different-Classification-Algorithms Implements DT, RF, NB, SVM with confusion matrix and ROC
yemifalokun/comparingclassifiers UC Berkeley ML course comparing classifiers
willkoehrsen Random Forest implementation notebooks
calvinfeng GitBook Machine Learning Notebook with Gini index explanation

Model Interpretation Libraries

Library Purpose Link
SHAP Model explanation (Shapley values) https://github.com/slundberg/shap
LIME Local Interpretable Model-agnostic Explanations https://github.com/marcotcr/lime
ELI5 Explaining scikit-learn models https://github.com/TeamHG-Memex/eli5

πŸ“Š Benchmark Datasets

Dataset Sources

Source URL Notes
UCI ML Repository https://archive.ics.uci.edu/ml Classic benchmark datasets
OpenML https://openml.org ML experiments and benchmarks
Kaggle Datasets https://kaggle.com/datasets Real-world datasets
Penn ML Benchmarks https://github.com/EpistasisLab/pmlb ML evaluation benchmarks

Built-in Scikit-learn Datasets

from sklearn.datasets import (
    load_iris,           # Classification
    load_breast_cancer,  # Binary classification
    load_wine,           # Multi-class classification
    load_digits          # Image classification
)

πŸ› οΈ Key Scikit-learn Functions & Classes

Classification Models

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC

Evaluation Metrics

from sklearn.metrics import (
    f1_score,
    precision_score,
    recall_score,
    accuracy_score,
    confusion_matrix,
    classification_report,
    roc_auc_score,
    RocCurveDisplay
)

Cross-Validation

from sklearn.model_selection import (
    cross_val_score,
    cross_val_predict,
    StratifiedKFold,
    GridSearchCV,
    HalvingGridSearchCV  # Faster alternative
)

Visualization

from sklearn.tree import plot_tree, export_graphviz, export_text
from sklearn.inspection import permutation_importance, PartialDependenceDisplay

🎯 Hyperparameter Tuning Guide

Random Forest Parameters

Parameter Description Typical Range
n_estimators Number of trees 100-1000
max_depth Maximum tree depth 5-50 or None
min_samples_split Min samples to split 2-20
min_samples_leaf Min samples at leaf 1-10
max_features Features per split 'sqrt', 'log2', int

SVM Parameters

Parameter Description Typical Range
C Regularization 0.1-100
gamma RBF kernel coefficient 'scale', 'auto', 0.001-1
kernel Kernel type 'linear', 'rbf', 'poly', 'sigmoid'

βš–οΈ Algorithm Comparison Guide

When to Use Each Algorithm

Criterion Decision Tree Random Forest SVM
Dataset Size Small-Medium Medium-Large Small-Medium
High Dimensions Moderate Good Excellent
Interpretability Excellent Moderate Low
Training Speed Fast Moderate Slow for large data
Feature Importance Yes Yes No (native)
Overfitting Risk High Low Low
Memory Usage Low High Moderate

Performance Benchmarks (from research)

  • Random Forest: ~86-90% accuracy, F1 ~90%
  • Decision Tree: ~84-88% accuracy, F1 ~89%
  • SVM with RBF: ~85-92% (depends on scaling)

βœ… Best Practices

Data Preprocessing

  • Always scale features before SVM (StandardScaler or MinMaxScaler)
  • Use class_weight='balanced' for imbalanced datasets
  • Apply SMOTE for severe class imbalance

Model Development

  • Use Random Forest as initial baseline model
  • Apply cross-validation with StratifiedKFold to preserve class distribution
  • Use f1_macro or f1_weighted for multi-class problems
  • Implement nested cross-validation for hyperparameter tuning

Common Pitfalls to Avoid

  • Overfitting Decision Trees (use max_depth, min_samples_leaf)
  • Not scaling data before SVM
  • Using accuracy on imbalanced datasets (use F1 instead)
  • Data leakage during preprocessing

🌐 Community & Forums

Q&A Platforms

  • Stack Overflow β€” sklearn tag with 100k+ questions
  • Cross Validated β€” https://stats.stackexchange.com/ (statistical ML)
  • Scikit-learn GitHub Discussions β€” Official community support

Reddit Communities

  • r/MachineLearning
  • r/learnmachinelearning

Blogs & News


πŸš€ Model Deployment Resources

Serialization

import joblib
# Save model
joblib.dump(model, 'model.pkl')
# Load model
model = joblib.load('model.pkl')

Deployment Options

  • Flask/FastAPI β€” REST API deployment
  • AWS SageMaker β€” Cloud ML deployment (supports scikit-learn)
  • Docker β€” Containerized deployment
  • MLflow β€” Model tracking and deployment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment