Created
March 4, 2026 16:33
-
-
Save wojtyniak/1e91790fbe77652e848c1cbd50c0f2f9 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# PyAutoFEP: Automated Free Energy Perturbation Workflow for GROMACS\n", | |
| "\n", | |
| "**Paper:** PyAutoFEP: An Automated Free Energy Perturbation Workflow for GROMACS Integrating Enhanced Sampling Methods\n", | |
| "\n", | |
| "**Authors:** Luan Carvalho Martins, Elio A. Cino, Rafaela Salgado Ferreira\n", | |
| "\n", | |
| "---\n", | |
| "\n", | |
| "## Overview\n", | |
| "\n", | |
| "This notebook provides an **educational demonstration** of the PyAutoFEP workflow for automated Free Energy Perturbation (FEP) calculations. PyAutoFEP is a Python-based pipeline that automates relative free energy of binding (RFEB) calculations using GROMACS molecular dynamics software.\n", | |
| "\n", | |
| "### Key Features:\n", | |
| "- Automated perturbation map generation\n", | |
| "- Dual topology creation for alchemical transformations\n", | |
| "- Integration with GROMACS for MD simulations\n", | |
| "- Optional enhanced sampling (REST/REST2)\n", | |
| "- Statistical analysis using BAR/MBAR methods\n", | |
| "\n", | |
| "### Important Notes:\n", | |
| "\n", | |
| "**Resource Constraints:**\n", | |
| "- This notebook uses small-scale, synthetic data for educational purposes\n", | |
| "- Full FEP calculations require GROMACS software, GPUs, and significant computational time\n", | |
| "- Real FEP campaigns typically involve 10+ ns simulations per λ-window\n", | |
| "\n", | |
| "**What This Notebook Demonstrates:**\n", | |
| "- Core algorithms: MCS calculation, perturbation map generation\n", | |
| "- Workflow structure and data flow\n", | |
| "- Statistical analysis concepts (BAR/MBAR)\n", | |
| "- Visualization of results\n", | |
| "\n", | |
| "**What Requires Full Infrastructure:**\n", | |
| "- Actual GROMACS MD simulations (marked with clear comments)\n", | |
| "- Force field parameterization\n", | |
| "- Production FEP runs (hours to days on HPC/GPU clusters)\n", | |
| "\n", | |
| "---" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 1. Setup and Dependencies\n", | |
| "\n", | |
| "Install all required Python packages for this demonstration." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 1, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "\u001b[2mAudited \u001b[1m8 packages\u001b[0m \u001b[2min 7ms\u001b[0m\u001b[0m\r\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# Install all dependencies in a single command\n", | |
| "!uv pip install rdkit networkx numpy scipy matplotlib seaborn pandas pymbar" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 2, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Note: RDKit Draw module not available (requires X11 libraries)\n", | |
| "Molecule visualization will be skipped, but all analysis will work fine.\n", | |
| "\n", | |
| "✓ All libraries imported successfully\n", | |
| "NumPy version: 2.4.2\n", | |
| "RDKit available: True\n", | |
| "RDKit Draw available: False\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# Import libraries\n", | |
| "import numpy as np\n", | |
| "import pandas as pd\n", | |
| "import matplotlib.pyplot as plt\n", | |
| "import seaborn as sns\n", | |
| "import networkx as nx\n", | |
| "from scipy import stats\n", | |
| "import warnings\n", | |
| "warnings.filterwarnings('ignore')\n", | |
| "\n", | |
| "# RDKit for molecular structure handling\n", | |
| "from rdkit import Chem\n", | |
| "from rdkit.Chem import AllChem\n", | |
| "from rdkit.Chem import rdFMCS\n", | |
| "\n", | |
| "# Try to import Draw module (may fail in headless environments)\n", | |
| "try:\n", | |
| " from rdkit.Chem import Draw\n", | |
| " RDKIT_DRAW_AVAILABLE = True\n", | |
| "except ImportError:\n", | |
| " RDKIT_DRAW_AVAILABLE = False\n", | |
| " print(\"Note: RDKit Draw module not available (requires X11 libraries)\")\n", | |
| " print(\"Molecule visualization will be skipped, but all analysis will work fine.\\n\")\n", | |
| "\n", | |
| "# Set random seed for reproducibility\n", | |
| "np.random.seed(42)\n", | |
| "\n", | |
| "# Plotting style\n", | |
| "plt.style.use('seaborn-v0_8-darkgrid')\n", | |
| "sns.set_palette(\"husl\")\n", | |
| "\n", | |
| "print(\"✓ All libraries imported successfully\")\n", | |
| "print(f\"NumPy version: {np.__version__}\")\n", | |
| "print(f\"RDKit available: {Chem is not None}\")\n", | |
| "print(f\"RDKit Draw available: {RDKIT_DRAW_AVAILABLE}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "---\n", | |
| "\n", | |
| "## 2. PyAutoFEP Workflow Overview\n", | |
| "\n", | |
| "The PyAutoFEP pipeline consists of 14 main steps:\n", | |
| "\n", | |
| "1. **Input Molecule Preparation** - Load ligand structures\n", | |
| "2. **Maximum Common Substructure (MCS) Calculation** - Find common cores\n", | |
| "3. **Perturbation Map Generation** - Optimize transformation network\n", | |
| "4. **Ligand Topology Generation** - Create force field parameters\n", | |
| "5. **Dual Topology Generation** - Generate λ-state topologies\n", | |
| "6. **Ligand Pose Preparation** - Align ligands to binding site\n", | |
| "7. **System Building** - Combine protein + ligand + solvent\n", | |
| "8. **Enhanced Sampling Setup** - Optional REST/REST2\n", | |
| "9. **Energy Minimization** - Remove clashes\n", | |
| "10. **System Equilibration** - NVE/NVT/NPT equilibration\n", | |
| "11. **FEP Production MD** - Main sampling simulations\n", | |
| "12. **ΔΔG Estimation** - BAR/MBAR analysis\n", | |
| "13. **RFEB Calculation** - Convert to reference compound\n", | |
| "14. **Analysis and Visualization** - Generate diagnostic plots\n", | |
| "\n", | |
| "**In this notebook, we'll demonstrate steps 1-3, 12-14 with working code.**" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "---\n", | |
| "\n", | |
| "## 3. Data Preparation: Generate Example Molecules\n", | |
| "\n", | |
| "For educational purposes, we'll create a small set of example molecules that represent typical FEP campaign compounds (similar structures with small variations)." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 3, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "✓ Loaded 7 example molecules\n", | |
| "\n", | |
| "Molecule library:\n", | |
| " MOL_01: c1ccccc1\n", | |
| " MOL_02: Cc1ccccc1\n", | |
| " MOL_03: Cc1ccc(C)cc1\n", | |
| " MOL_04: Cc1ccc(O)cc1\n", | |
| " MOL_05: c1ccc(O)cc1\n", | |
| " MOL_06: c1ccc(N)cc1\n", | |
| " MOL_07: Cc1ccc(N)cc1\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# Create a set of related molecules (benzene derivatives as simple example)\n", | |
| "# In a real FEP campaign, these would be congeneric series from drug design\n", | |
| "\n", | |
| "example_molecules = {\n", | |
| " 'MOL_01': 'c1ccccc1', # Benzene (core)\n", | |
| " 'MOL_02': 'Cc1ccccc1', # Toluene (add methyl)\n", | |
| " 'MOL_03': 'Cc1ccc(C)cc1', # p-Xylene (add two methyls)\n", | |
| " 'MOL_04': 'Cc1ccc(O)cc1', # p-Cresol (methyl + OH)\n", | |
| " 'MOL_05': 'c1ccc(O)cc1', # Phenol (add OH)\n", | |
| " 'MOL_06': 'c1ccc(N)cc1', # Aniline (add NH2)\n", | |
| " 'MOL_07': 'Cc1ccc(N)cc1', # p-Toluidine (methyl + NH2)\n", | |
| "}\n", | |
| "\n", | |
| "# Convert SMILES to RDKit molecule objects\n", | |
| "molecules = {}\n", | |
| "for name, smiles in example_molecules.items():\n", | |
| " mol = Chem.MolFromSmiles(smiles)\n", | |
| " if mol is not None:\n", | |
| " AllChem.Compute2DCoords(mol)\n", | |
| " molecules[name] = {'smiles': smiles, 'mol': mol}\n", | |
| " else:\n", | |
| " print(f\"Warning: Could not parse {name}\")\n", | |
| "\n", | |
| "print(f\"✓ Loaded {len(molecules)} example molecules\")\n", | |
| "print(\"\\nMolecule library:\")\n", | |
| "for name, data in molecules.items():\n", | |
| " print(f\" {name}: {data['smiles']}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Visualize the molecule library\n", | |
| "if RDKIT_DRAW_AVAILABLE:\n", | |
| " mol_list = [data['mol'] for data in molecules.values()]\n", | |
| " legends = list(molecules.keys())\n", | |
| " \n", | |
| " img = Draw.MolsToGridImage(mol_list, molsPerRow=4, subImgSize=(200,200), \n", | |
| " legends=legends, returnPNG=False)\n", | |
| " display(img)\n", | |
| "else:\n", | |
| " print(\"Molecule visualization skipped (RDKit Draw not available)\")\n", | |
| " print(\"All molecules loaded successfully - see the SMILES strings above.\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "---\n", | |
| "\n", | |
| "## 4. Step 2: Maximum Common Substructure (MCS) Calculation\n", | |
| "\n", | |
| "PyAutoFEP uses MCS to identify common cores between molecule pairs. This determines which atoms can be matched during the alchemical transformation.\n", | |
| "\n", | |
| "**Method:** The paper supports both 3D-guided MCS and graph-based MCS using RDKit's `rdFMCS.FindMCS()`." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "def calculate_mcs(mol1, mol2, timeout=2):\n", | |
| " \"\"\"\n", | |
| " Calculate Maximum Common Substructure between two molecules.\n", | |
| " \n", | |
| " Parameters:\n", | |
| " -----------\n", | |
| " mol1, mol2 : RDKit Mol objects\n", | |
| " timeout : int, timeout in seconds\n", | |
| " \n", | |
| " Returns:\n", | |
| " --------\n", | |
| " dict with MCS information\n", | |
| " \"\"\"\n", | |
| " result = rdFMCS.FindMCS(\n", | |
| " [mol1, mol2],\n", | |
| " timeout=timeout,\n", | |
| " atomCompare=rdFMCS.AtomCompare.CompareElements,\n", | |
| " bondCompare=rdFMCS.BondCompare.CompareOrder,\n", | |
| " ringMatchesRingOnly=True\n", | |
| " )\n", | |
| " \n", | |
| " mcs_mol = Chem.MolFromSmarts(result.smartsString) if result.smartsString else None\n", | |
| " \n", | |
| " return {\n", | |
| " 'smarts': result.smartsString,\n", | |
| " 'num_atoms': result.numAtoms,\n", | |
| " 'num_bonds': result.numBonds,\n", | |
| " 'mol': mcs_mol\n", | |
| " }\n", | |
| "\n", | |
| "\n", | |
| "# Calculate pairwise MCS for all molecule pairs\n", | |
| "mcs_matrix = {}\n", | |
| "mol_names = list(molecules.keys())\n", | |
| "\n", | |
| "print(\"Calculating pairwise MCS...\")\n", | |
| "for i, name1 in enumerate(mol_names):\n", | |
| " for j, name2 in enumerate(mol_names):\n", | |
| " if i < j: # Only upper triangle\n", | |
| " mol1 = molecules[name1]['mol']\n", | |
| " mol2 = molecules[name2]['mol']\n", | |
| " mcs = calculate_mcs(mol1, mol2)\n", | |
| " mcs_matrix[(name1, name2)] = mcs\n", | |
| " print(f\" {name1} <-> {name2}: {mcs['num_atoms']} atoms in common\")\n", | |
| "\n", | |
| "print(f\"\\n✓ Calculated MCS for {len(mcs_matrix)} molecule pairs\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "### Visualize MCS Similarity Matrix\n", | |
| "\n", | |
| "Higher MCS atom count indicates more similar molecules, which typically results in better FEP convergence." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Create similarity matrix based on MCS atom count\n", | |
| "n_mols = len(mol_names)\n", | |
| "similarity_matrix = np.zeros((n_mols, n_mols))\n", | |
| "\n", | |
| "for i, name1 in enumerate(mol_names):\n", | |
| " for j, name2 in enumerate(mol_names):\n", | |
| " if i == j:\n", | |
| " # Diagonal: similarity to self = number of atoms\n", | |
| " similarity_matrix[i, j] = molecules[name1]['mol'].GetNumAtoms()\n", | |
| " elif i < j:\n", | |
| " # Upper triangle: use calculated MCS\n", | |
| " mcs_atoms = mcs_matrix[(name1, name2)]['num_atoms']\n", | |
| " similarity_matrix[i, j] = mcs_atoms\n", | |
| " similarity_matrix[j, i] = mcs_atoms # Symmetric\n", | |
| "\n", | |
| "# Plot\n", | |
| "plt.figure(figsize=(10, 8))\n", | |
| "sns.heatmap(similarity_matrix, \n", | |
| " xticklabels=mol_names, \n", | |
| " yticklabels=mol_names,\n", | |
| " annot=True, \n", | |
| " fmt='.0f', \n", | |
| " cmap='YlOrRd',\n", | |
| " cbar_kws={'label': 'Common Atoms'})\n", | |
| "plt.title('Maximum Common Substructure Similarity Matrix', fontsize=14, fontweight='bold')\n", | |
| "plt.xlabel('Molecule', fontsize=12)\n", | |
| "plt.ylabel('Molecule', fontsize=12)\n", | |
| "plt.tight_layout()\n", | |
| "plt.show()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "---\n", | |
| "\n", | |
| "## 5. Step 3: Perturbation Map Generation\n", | |
| "\n", | |
| "PyAutoFEP generates an optimal perturbation map (network of transformations) using an **Ant Colony Optimization (ACO)** algorithm.\n", | |
| "\n", | |
| "**Goal:** Minimize:\n", | |
| "1. Total number of perturbations (fewer transformations = fewer simulations)\n", | |
| "2. Structural distance between molecule pairs (smaller changes = better convergence)\n", | |
| "\n", | |
| "**For this demonstration:** We'll use a simplified graph-based approach to create a minimum spanning tree based on MCS similarity." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "def calculate_perturbation_cost(mol1, mol2, mcs_atoms):\n", | |
| " \"\"\"\n", | |
| " Calculate cost of perturbing mol1 -> mol2.\n", | |
| " \n", | |
| " Cost increases with:\n", | |
| " - Number of atoms that need to be created/deleted\n", | |
| " - Dissimilarity (inverse of MCS size)\n", | |
| " \n", | |
| " Parameters:\n", | |
| " -----------\n", | |
| " mol1, mol2 : RDKit Mol objects\n", | |
| " mcs_atoms : int, number of atoms in MCS\n", | |
| " \n", | |
| " Returns:\n", | |
| " --------\n", | |
| " float : perturbation cost\n", | |
| " \"\"\"\n", | |
| " n1 = mol1.GetNumAtoms()\n", | |
| " n2 = mol2.GetNumAtoms()\n", | |
| " \n", | |
| " # Number of atoms that need to be created or deleted\n", | |
| " atoms_changed = abs(n1 - n2) + (n1 - mcs_atoms) + (n2 - mcs_atoms)\n", | |
| " \n", | |
| " # Similarity score (higher MCS = lower cost)\n", | |
| " if mcs_atoms > 0:\n", | |
| " similarity_score = mcs_atoms / max(n1, n2)\n", | |
| " cost = atoms_changed / (similarity_score + 0.1)\n", | |
| " else:\n", | |
| " cost = 1000 # Very high cost for no overlap\n", | |
| " \n", | |
| " return cost\n", | |
| "\n", | |
| "\n", | |
| "# Build weighted graph\n", | |
| "G = nx.Graph()\n", | |
| "\n", | |
| "# Add nodes\n", | |
| "for name in mol_names:\n", | |
| " G.add_node(name)\n", | |
| "\n", | |
| "# Add edges with costs\n", | |
| "for (name1, name2), mcs in mcs_matrix.items():\n", | |
| " mol1 = molecules[name1]['mol']\n", | |
| " mol2 = molecules[name2]['mol']\n", | |
| " cost = calculate_perturbation_cost(mol1, mol2, mcs['num_atoms'])\n", | |
| " G.add_edge(name1, name2, weight=cost, mcs_atoms=mcs['num_atoms'])\n", | |
| "\n", | |
| "print(f\"✓ Built perturbation graph with {G.number_of_nodes()} nodes and {G.number_of_edges()} edges\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Generate optimal perturbation map using Minimum Spanning Tree\n", | |
| "# This minimizes total cost while ensuring all molecules are connected\n", | |
| "\n", | |
| "perturbation_map = nx.minimum_spanning_tree(G, weight='weight')\n", | |
| "\n", | |
| "print(\"Optimal Perturbation Map:\")\n", | |
| "print(f\" Total perturbations: {perturbation_map.number_of_edges()}\")\n", | |
| "print(f\" Total molecules: {perturbation_map.number_of_nodes()}\")\n", | |
| "print(\"\\nPerturbation pairs:\")\n", | |
| "\n", | |
| "total_cost = 0\n", | |
| "for edge in perturbation_map.edges(data=True):\n", | |
| " name1, name2, data = edge\n", | |
| " cost = data['weight']\n", | |
| " mcs_atoms = data['mcs_atoms']\n", | |
| " total_cost += cost\n", | |
| " print(f\" {name1} <-> {name2}: cost={cost:.2f}, MCS={mcs_atoms} atoms\")\n", | |
| "\n", | |
| "print(f\"\\nTotal map cost: {total_cost:.2f}\")\n", | |
| "print(f\"\\n✓ Generated optimal perturbation map with {perturbation_map.number_of_edges()} transformations\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Visualize the perturbation map\n", | |
| "plt.figure(figsize=(12, 8))\n", | |
| "\n", | |
| "# Use spring layout for better visualization\n", | |
| "pos = nx.spring_layout(perturbation_map, k=2, iterations=50, seed=42)\n", | |
| "\n", | |
| "# Draw nodes\n", | |
| "nx.draw_networkx_nodes(perturbation_map, pos, \n", | |
| " node_color='lightblue', \n", | |
| " node_size=2000,\n", | |
| " edgecolors='black',\n", | |
| " linewidths=2)\n", | |
| "\n", | |
| "# Draw edges with width proportional to MCS size (thicker = more similar)\n", | |
| "edge_widths = [data['mcs_atoms'] / 2 for _, _, data in perturbation_map.edges(data=True)]\n", | |
| "nx.draw_networkx_edges(perturbation_map, pos, \n", | |
| " width=edge_widths,\n", | |
| " edge_color='gray',\n", | |
| " alpha=0.6)\n", | |
| "\n", | |
| "# Draw labels\n", | |
| "nx.draw_networkx_labels(perturbation_map, pos, \n", | |
| " font_size=10,\n", | |
| " font_weight='bold')\n", | |
| "\n", | |
| "# Draw edge labels (MCS atoms)\n", | |
| "edge_labels = {(u, v): f\"{d['mcs_atoms']}\" for u, v, d in perturbation_map.edges(data=True)}\n", | |
| "nx.draw_networkx_edge_labels(perturbation_map, pos, edge_labels, font_size=8)\n", | |
| "\n", | |
| "plt.title('Optimal Perturbation Map\\n(numbers show MCS atom count)', \n", | |
| " fontsize=14, fontweight='bold')\n", | |
| "plt.axis('off')\n", | |
| "plt.tight_layout()\n", | |
| "plt.show()\n", | |
| "\n", | |
| "print(\"\\n📊 Perturbation map visualized\")\n", | |
| "print(\" - Thicker edges = more similar molecules (larger MCS)\")\n", | |
| "print(\" - Numbers on edges = common atoms in MCS\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "---\n", | |
| "\n", | |
| "## 6. Steps 4-11: GROMACS Simulation Workflow\n", | |
| "\n", | |
| "**⚠️ The following steps require GROMACS software and significant computational resources.**\n", | |
| "\n", | |
| "These steps are typically performed on HPC clusters or workstations with GPUs:\n", | |
| "\n", | |
| "### Step 4: Ligand Topology Generation\n", | |
| "- Generate force field parameters (AMBER, OPLS, or CHARMM)\n", | |
| "- Tools: `acpype`, `CGenFF`, `LigParGen`\n", | |
| "- Output: `.itp` topology files for GROMACS\n", | |
| "\n", | |
| "### Step 5: Dual Topology Generation\n", | |
| "- Create dual topologies for λ-states\n", | |
| "- Atoms are gradually transformed from state A to state B\n", | |
| "- λ = 0: fully in state A, λ = 1: fully in state B\n", | |
| "\n", | |
| "### Step 6: Ligand Pose Preparation\n", | |
| "- Align ligands to binding site using core-constrained alignment\n", | |
| "- Ensure consistent positioning for FEP calculations\n", | |
| "\n", | |
| "### Step 7: System Building\n", | |
| "- Combine protein + dual-ligand + solvent + ions\n", | |
| "- Typical system size: 40,000-100,000 atoms\n", | |
| "- GROMACS command: `gmx solvate`, `gmx genion`\n", | |
| "\n", | |
| "### Step 8: Enhanced Sampling (Optional)\n", | |
| "- REST or REST2 (Replica Exchange with Solute Tempering/Scaling)\n", | |
| "- Improves sampling in difficult cases\n", | |
| "- Requires GROMACS-2019 patched with PLUMED\n", | |
| "\n", | |
| "### Step 9: Energy Minimization\n", | |
| "- Remove steric clashes\n", | |
| "- GROMACS command: `gmx mdrun -deffnm em`\n", | |
| "\n", | |
| "### Step 10: Equilibration\n", | |
| "- NVE (0.2 ns) → NVT (0.2 ns) → NPT (0.2 ns)\n", | |
| "- Gradually relax system to production conditions\n", | |
| "\n", | |
| "### Step 11: FEP Production MD\n", | |
| "- **Most computationally intensive step**\n", | |
| "- 10 ns per λ-window × 12-24 windows = 120-240 ns total per perturbation\n", | |
| "- Typical runtime: 1-7 days per perturbation (GPU-accelerated)\n", | |
| "- Output: `dhdl.xvg` files containing energy derivatives\n", | |
| "\n", | |
| "---\n", | |
| "\n", | |
| "**For this educational notebook, we'll skip to the analysis steps using synthetic data.**" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "---\n", | |
| "\n", | |
| "## 7. Generate Synthetic FEP Simulation Data\n", | |
| "\n", | |
| "Since we can't run actual GROMACS simulations, we'll generate realistic synthetic data that mimics the output of FEP calculations." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "def generate_synthetic_fep_data(n_lambda=12, n_frames=1000):\n", | |
| " \"\"\"\n", | |
| " Generate synthetic dH/dλ data for FEP simulations.\n", | |
| " \n", | |
| " In real FEP:\n", | |
| " - Multiple λ windows sample the alchemical path\n", | |
| " - Each window records dH/dλ (energy derivative)\n", | |
| " - BAR/MBAR uses this data to estimate ΔG\n", | |
| " \n", | |
| " Parameters:\n", | |
| " -----------\n", | |
| " n_lambda : int, number of λ windows\n", | |
| " n_frames : int, number of MD frames per window\n", | |
| " \n", | |
| " Returns:\n", | |
| " --------\n", | |
| " dict : synthetic FEP data\n", | |
| " \"\"\"\n", | |
| " lambda_values = np.linspace(0, 1, n_lambda)\n", | |
| " \n", | |
| " # Storage for dH/dλ values\n", | |
| " dhdl_data = {}\n", | |
| " \n", | |
| " # Generate synthetic data for each λ window\n", | |
| " # Real dH/dλ curves are typically smooth and vary with λ\n", | |
| " for i, lam in enumerate(lambda_values):\n", | |
| " # Base energy difference with some λ-dependence\n", | |
| " base_dhdl = 50 * np.sin(np.pi * lam) + 10 * lam\n", | |
| " \n", | |
| " # Add realistic fluctuations (thermal noise)\n", | |
| " noise = np.random.normal(0, 5, n_frames)\n", | |
| " dhdl = base_dhdl + noise\n", | |
| " \n", | |
| " dhdl_data[lam] = {\n", | |
| " 'dhdl': dhdl,\n", | |
| " 'frames': np.arange(n_frames)\n", | |
| " }\n", | |
| " \n", | |
| " return {'lambda': lambda_values, 'data': dhdl_data}\n", | |
| "\n", | |
| "\n", | |
| "# Generate synthetic FEP data for a perturbation\n", | |
| "print(\"Generating synthetic FEP simulation data...\")\n", | |
| "fep_data = generate_synthetic_fep_data(n_lambda=12, n_frames=1000)\n", | |
| "\n", | |
| "print(f\"✓ Generated FEP data for {len(fep_data['lambda'])} λ-windows\")\n", | |
| "print(f\" Each window: {1000} frames (simulating 10 ns at 10 ps/frame)\")\n", | |
| "print(f\" Total simulation time (synthetic): {10 * len(fep_data['lambda'])} ns\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Visualize dH/dλ curves across λ-windows\n", | |
| "fig, axes = plt.subplots(2, 2, figsize=(14, 10))\n", | |
| "\n", | |
| "# Plot 1: dH/dλ time series for selected λ-windows\n", | |
| "ax = axes[0, 0]\n", | |
| "lambda_to_plot = [0.0, 0.25, 0.5, 0.75, 1.0]\n", | |
| "for lam in lambda_to_plot:\n", | |
| " if lam in fep_data['data']:\n", | |
| " data = fep_data['data'][lam]\n", | |
| " ax.plot(data['frames'][:200], data['dhdl'][:200], \n", | |
| " alpha=0.7, label=f'λ={lam:.2f}')\n", | |
| "ax.set_xlabel('Frame', fontsize=11)\n", | |
| "ax.set_ylabel('dH/dλ (kJ/mol)', fontsize=11)\n", | |
| "ax.set_title('dH/dλ Time Series (first 200 frames)', fontweight='bold')\n", | |
| "ax.legend()\n", | |
| "ax.grid(alpha=0.3)\n", | |
| "\n", | |
| "# Plot 2: Mean dH/dλ vs λ\n", | |
| "ax = axes[0, 1]\n", | |
| "mean_dhdl = [np.mean(fep_data['data'][lam]['dhdl']) for lam in fep_data['lambda']]\n", | |
| "std_dhdl = [np.std(fep_data['data'][lam]['dhdl']) for lam in fep_data['lambda']]\n", | |
| "ax.errorbar(fep_data['lambda'], mean_dhdl, yerr=std_dhdl, \n", | |
| " marker='o', capsize=5, capthick=2, linewidth=2)\n", | |
| "ax.set_xlabel('λ', fontsize=11)\n", | |
| "ax.set_ylabel('⟨dH/dλ⟩ (kJ/mol)', fontsize=11)\n", | |
| "ax.set_title('Mean dH/dλ Profile', fontweight='bold')\n", | |
| "ax.grid(alpha=0.3)\n", | |
| "\n", | |
| "# Plot 3: Distribution of dH/dλ at λ=0.5\n", | |
| "ax = axes[1, 0]\n", | |
| "mid_lambda = fep_data['lambda'][len(fep_data['lambda'])//2]\n", | |
| "dhdl_mid = fep_data['data'][mid_lambda]['dhdl']\n", | |
| "ax.hist(dhdl_mid, bins=50, alpha=0.7, edgecolor='black')\n", | |
| "ax.axvline(np.mean(dhdl_mid), color='red', linestyle='--', \n", | |
| " linewidth=2, label=f'Mean={np.mean(dhdl_mid):.1f}')\n", | |
| "ax.set_xlabel('dH/dλ (kJ/mol)', fontsize=11)\n", | |
| "ax.set_ylabel('Frequency', fontsize=11)\n", | |
| "ax.set_title(f'Distribution at λ={mid_lambda:.2f}', fontweight='bold')\n", | |
| "ax.legend()\n", | |
| "ax.grid(alpha=0.3)\n", | |
| "\n", | |
| "# Plot 4: Standard deviation vs λ (convergence indicator)\n", | |
| "ax = axes[1, 1]\n", | |
| "ax.plot(fep_data['lambda'], std_dhdl, marker='s', linewidth=2, markersize=8)\n", | |
| "ax.set_xlabel('λ', fontsize=11)\n", | |
| "ax.set_ylabel('σ(dH/dλ) (kJ/mol)', fontsize=11)\n", | |
| "ax.set_title('Fluctuation Magnitude vs λ', fontweight='bold')\n", | |
| "ax.grid(alpha=0.3)\n", | |
| "\n", | |
| "plt.tight_layout()\n", | |
| "plt.show()\n", | |
| "\n", | |
| "print(\"\\n📊 Synthetic FEP data visualized\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "---\n", | |
| "\n", | |
| "## 8. Step 12: ΔΔG Estimation Using BAR\n", | |
| "\n", | |
| "PyAutoFEP uses the **Bennett Acceptance Ratio (BAR)** or **Multistate BAR (MBAR)** methods to estimate free energy differences.\n", | |
| "\n", | |
| "### BAR Method:\n", | |
| "\n", | |
| "BAR estimates the free energy difference between adjacent λ-windows using:\n", | |
| "\n", | |
| "$$\n", | |
| "\\sum_{i=1}^{N_0} f(U_1(x_i^{(0)}) - U_0(x_i^{(0)}) + C) = \\sum_{j=1}^{N_1} f(U_0(x_j^{(1)}) - U_1(x_j^{(1)}) - C)\n", | |
| "$$\n", | |
| "\n", | |
| "where $f(x) = 1/(1+e^x)$ is the Fermi function, and $C = \\Delta G$ is the free energy difference.\n", | |
| "\n", | |
| "**Advantages:**\n", | |
| "- Statistically optimal estimator\n", | |
| "- Uses data from both windows\n", | |
| "- Better than simple thermodynamic integration\n", | |
| "\n", | |
| "**In PyAutoFEP:** Implemented via the `alchemlyb` library." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "def estimate_free_energy_trapezoid(lambda_values, mean_dhdl):\n", | |
| " \"\"\"\n", | |
| " Simplified free energy estimation using trapezoidal integration.\n", | |
| " \n", | |
| " Real PyAutoFEP uses BAR/MBAR from alchemlyb, which is more sophisticated.\n", | |
| " This is a simplified demonstration.\n", | |
| " \n", | |
| " ΔG ≈ ∫ ⟨dH/dλ⟩ dλ\n", | |
| " \n", | |
| " Parameters:\n", | |
| " -----------\n", | |
| " lambda_values : array, λ values\n", | |
| " mean_dhdl : array, mean dH/dλ at each λ\n", | |
| " \n", | |
| " Returns:\n", | |
| " --------\n", | |
| " float : ΔG estimate in kJ/mol\n", | |
| " \"\"\"\n", | |
| " # Use scipy trapezoid for compatibility\n", | |
| " from scipy.integrate import trapezoid\n", | |
| " delta_g = trapezoid(mean_dhdl, lambda_values)\n", | |
| " return delta_g\n", | |
| "\n", | |
| "\n", | |
| "def bootstrap_error_estimate(lambda_values, dhdl_data, n_bootstrap=100):\n", | |
| " \"\"\"\n", | |
| " Estimate uncertainty in ΔG using bootstrap resampling.\n", | |
| " \n", | |
| " Parameters:\n", | |
| " -----------\n", | |
| " lambda_values : array, λ values\n", | |
| " dhdl_data : dict, dH/dλ data for each λ\n", | |
| " n_bootstrap : int, number of bootstrap samples\n", | |
| " \n", | |
| " Returns:\n", | |
| " --------\n", | |
| " float : standard error of ΔG\n", | |
| " \"\"\"\n", | |
| " dg_bootstrap = []\n", | |
| " \n", | |
| " for _ in range(n_bootstrap):\n", | |
| " # Resample with replacement\n", | |
| " mean_dhdl_boot = []\n", | |
| " for lam in lambda_values:\n", | |
| " dhdl = dhdl_data[lam]['dhdl']\n", | |
| " boot_sample = np.random.choice(dhdl, size=len(dhdl), replace=True)\n", | |
| " mean_dhdl_boot.append(np.mean(boot_sample))\n", | |
| " \n", | |
| " dg = estimate_free_energy_trapezoid(lambda_values, mean_dhdl_boot)\n", | |
| " dg_bootstrap.append(dg)\n", | |
| " \n", | |
| " return np.std(dg_bootstrap)\n", | |
| "\n", | |
| "\n", | |
| "# Calculate ΔG for the synthetic perturbation\n", | |
| "mean_dhdl = [np.mean(fep_data['data'][lam]['dhdl']) for lam in fep_data['lambda']]\n", | |
| "delta_g = estimate_free_energy_trapezoid(fep_data['lambda'], mean_dhdl)\n", | |
| "delta_g_error = bootstrap_error_estimate(fep_data['lambda'], fep_data['data'], n_bootstrap=100)\n", | |
| "\n", | |
| "print(\"Free Energy Estimation Results:\")\n", | |
| "print(f\" ΔG = {delta_g:.2f} ± {delta_g_error:.2f} kJ/mol\")\n", | |
| "print(f\" ΔG = {delta_g * 0.239:.2f} ± {delta_g_error * 0.239:.2f} kcal/mol\")\n", | |
| "print(f\"\\n✓ Free energy estimated using simplified trapezoidal integration\")\n", | |
| "print(\" (Real PyAutoFEP uses BAR/MBAR from alchemlyb for better accuracy)\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "---\n", | |
| "\n", | |
| "## 9. Generate Complete FEP Campaign Results\n", | |
| "\n", | |
| "For a complete FEP campaign, we need ΔΔG values for all perturbations in the perturbation map. Let's generate synthetic results for the entire network." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Generate ΔΔG values for all perturbations in the map\n", | |
| "perturbation_results = {}\n", | |
| "\n", | |
| "print(\"Generating ΔΔG values for all perturbations...\\n\")\n", | |
| "\n", | |
| "for edge in perturbation_map.edges(data=True):\n", | |
| " mol1, mol2, data = edge\n", | |
| " \n", | |
| " # Generate synthetic ΔΔG based on structural similarity\n", | |
| " # More similar molecules (higher MCS) -> smaller ΔΔG\n", | |
| " mcs_atoms = data['mcs_atoms']\n", | |
| " \n", | |
| " # Base ΔΔG with some randomness\n", | |
| " base_ddg = np.random.normal(0, 3) # Mean=0, std=3 kcal/mol\n", | |
| " \n", | |
| " # Add component based on structural change\n", | |
| " n1 = molecules[mol1]['mol'].GetNumAtoms()\n", | |
| " n2 = molecules[mol2]['mol'].GetNumAtoms()\n", | |
| " structural_term = 0.5 * abs(n1 - n2) # Penalty for size change\n", | |
| " \n", | |
| " ddg = base_ddg + np.random.normal(0, 0.5) * structural_term\n", | |
| " ddg_error = np.random.uniform(0.3, 1.0) # Typical FEP errors\n", | |
| " \n", | |
| " perturbation_results[(mol1, mol2)] = {\n", | |
| " 'ddg_kcal': ddg,\n", | |
| " 'error_kcal': ddg_error,\n", | |
| " 'mcs_atoms': mcs_atoms\n", | |
| " }\n", | |
| " \n", | |
| " print(f\" {mol1} → {mol2}: ΔΔG = {ddg:6.2f} ± {ddg_error:.2f} kcal/mol (MCS={mcs_atoms})\")\n", | |
| "\n", | |
| "print(f\"\\n✓ Generated ΔΔG for {len(perturbation_results)} perturbations\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "---\n", | |
| "\n", | |
| "## 10. Step 13: Convert to Reference-Relative Free Energies\n", | |
| "\n", | |
| "FEP calculates **pairwise** free energy differences (ΔΔG between molecule pairs).\n", | |
| "For comparison with experimental data, we need **reference-relative** values.\n", | |
| "\n", | |
| "**Method:** Use shortest path in the perturbation network to sum ΔΔG values from each molecule to a reference compound." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "def calculate_reference_relative_energies(perturbation_map, perturbation_results, reference='MOL_01'):\n", | |
| " \"\"\"\n", | |
| " Convert pairwise ΔΔG to reference-relative free energies.\n", | |
| " \n", | |
| " Parameters:\n", | |
| " -----------\n", | |
| " perturbation_map : networkx Graph\n", | |
| " perturbation_results : dict, ΔΔG values for each edge\n", | |
| " reference : str, reference molecule name\n", | |
| " \n", | |
| " Returns:\n", | |
| " --------\n", | |
| " dict : relative free energies for each molecule\n", | |
| " \"\"\"\n", | |
| " relative_energies = {}\n", | |
| " \n", | |
| " # Reference has ΔG = 0 by definition\n", | |
| " relative_energies[reference] = {'dg': 0.0, 'error': 0.0, 'path': [reference]}\n", | |
| " \n", | |
| " # For each molecule, find shortest path to reference\n", | |
| " for mol in perturbation_map.nodes():\n", | |
| " if mol == reference:\n", | |
| " continue\n", | |
| " \n", | |
| " try:\n", | |
| " # Find shortest path\n", | |
| " path = nx.shortest_path(perturbation_map, source=reference, target=mol)\n", | |
| " \n", | |
| " # Sum ΔΔG along path\n", | |
| " total_dg = 0.0\n", | |
| " total_error_sq = 0.0\n", | |
| " \n", | |
| " for i in range(len(path) - 1):\n", | |
| " mol1, mol2 = path[i], path[i+1]\n", | |
| " \n", | |
| " # Get ΔΔG (handle both directions)\n", | |
| " if (mol1, mol2) in perturbation_results:\n", | |
| " ddg = perturbation_results[(mol1, mol2)]['ddg_kcal']\n", | |
| " error = perturbation_results[(mol1, mol2)]['error_kcal']\n", | |
| " elif (mol2, mol1) in perturbation_results:\n", | |
| " ddg = -perturbation_results[(mol2, mol1)]['ddg_kcal']\n", | |
| " error = perturbation_results[(mol2, mol1)]['error_kcal']\n", | |
| " else:\n", | |
| " print(f\"Warning: No data for {mol1} <-> {mol2}\")\n", | |
| " continue\n", | |
| " \n", | |
| " total_dg += ddg\n", | |
| " total_error_sq += error**2\n", | |
| " \n", | |
| " # Errors add in quadrature\n", | |
| " total_error = np.sqrt(total_error_sq)\n", | |
| " \n", | |
| " relative_energies[mol] = {\n", | |
| " 'dg': total_dg,\n", | |
| " 'error': total_error,\n", | |
| " 'path': path\n", | |
| " }\n", | |
| " \n", | |
| " except nx.NetworkXNoPath:\n", | |
| " print(f\"Warning: No path from {reference} to {mol}\")\n", | |
| " \n", | |
| " return relative_energies\n", | |
| "\n", | |
| "\n", | |
| "# Calculate reference-relative energies\n", | |
| "reference_mol = 'MOL_01'\n", | |
| "relative_energies = calculate_reference_relative_energies(\n", | |
| " perturbation_map, \n", | |
| " perturbation_results, \n", | |
| " reference=reference_mol\n", | |
| ")\n", | |
| "\n", | |
| "print(f\"Reference-Relative Free Energies (relative to {reference_mol}):\")\n", | |
| "print(f\"{'Molecule':<10} {'ΔG (kcal/mol)':<20} {'Path from reference'}\")\n", | |
| "print(\"-\" * 70)\n", | |
| "\n", | |
| "for mol in sorted(relative_energies.keys()):\n", | |
| " data = relative_energies[mol]\n", | |
| " path_str = ' → '.join(data['path'])\n", | |
| " print(f\"{mol:<10} {data['dg']:6.2f} ± {data['error']:.2f} {path_str}\")\n", | |
| "\n", | |
| "print(f\"\\n✓ Calculated reference-relative free energies for {len(relative_energies)} molecules\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "---\n", | |
| "\n", | |
| "## 11. Generate Synthetic Experimental Data for Validation\n", | |
| "\n", | |
| "To demonstrate the validation workflow, we'll generate synthetic experimental binding affinities." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Generate synthetic experimental data\n", | |
| "# Real FEP validation uses experimental Ki, IC50, or Kd values\n", | |
| "\n", | |
| "experimental_data = {}\n", | |
| "\n", | |
| "for mol in molecules.keys():\n", | |
| " # Use FEP prediction as base, add experimental noise\n", | |
| " if mol in relative_energies:\n", | |
| " fep_dg = relative_energies[mol]['dg']\n", | |
| " else:\n", | |
| " fep_dg = 0.0\n", | |
| " \n", | |
| " # Add noise to simulate experimental error (typically larger than FEP error)\n", | |
| " exp_noise = np.random.normal(0, 1.5) # ~1.5 kcal/mol experimental error\n", | |
| " exp_dg = fep_dg + exp_noise\n", | |
| " \n", | |
| " experimental_data[mol] = {\n", | |
| " 'dg_exp': exp_dg,\n", | |
| " 'error_exp': np.random.uniform(0.5, 1.0)\n", | |
| " }\n", | |
| "\n", | |
| "print(\"Synthetic Experimental Data (relative to MOL_01):\")\n", | |
| "print(f\"{'Molecule':<10} {'ΔG_exp (kcal/mol)'}\")\n", | |
| "print(\"-\" * 40)\n", | |
| "for mol in sorted(experimental_data.keys()):\n", | |
| " data = experimental_data[mol]\n", | |
| " print(f\"{mol:<10} {data['dg_exp']:6.2f} ± {data['error_exp']:.2f}\")\n", | |
| "\n", | |
| "print(\"\\n✓ Generated synthetic experimental data\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "---\n", | |
| "\n", | |
| "## 12. Step 14: Statistical Analysis and Validation\n", | |
| "\n", | |
| "PyAutoFEP evaluates prediction accuracy using:\n", | |
| "\n", | |
| "1. **Pearson's r** - Linear correlation\n", | |
| "2. **Spearman's ρ** - Rank correlation\n", | |
| "3. **Kendall's τ** - Rank correlation (robust to outliers)\n", | |
| "4. **RMSE** - Root mean squared error\n", | |
| "5. **MAE** - Mean absolute error\n", | |
| "\n", | |
| "These metrics assess how well FEP predictions match experimental data." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Prepare data for correlation analysis\n", | |
| "fep_values = []\n", | |
| "exp_values = []\n", | |
| "mol_labels = []\n", | |
| "\n", | |
| "for mol in molecules.keys():\n", | |
| " if mol in relative_energies and mol in experimental_data:\n", | |
| " fep_values.append(relative_energies[mol]['dg'])\n", | |
| " exp_values.append(experimental_data[mol]['dg_exp'])\n", | |
| " mol_labels.append(mol)\n", | |
| "\n", | |
| "fep_values = np.array(fep_values)\n", | |
| "exp_values = np.array(exp_values)\n", | |
| "\n", | |
| "# Calculate statistics\n", | |
| "pearson_r, pearson_p = stats.pearsonr(fep_values, exp_values)\n", | |
| "spearman_r, spearman_p = stats.spearmanr(fep_values, exp_values)\n", | |
| "kendall_tau, kendall_p = stats.kendalltau(fep_values, exp_values)\n", | |
| "rmse = np.sqrt(np.mean((fep_values - exp_values)**2))\n", | |
| "mae = np.mean(np.abs(fep_values - exp_values))\n", | |
| "\n", | |
| "print(\"Statistical Validation Results:\")\n", | |
| "print(\"=\" * 50)\n", | |
| "print(f\" Pearson's r: {pearson_r:6.3f} (p={pearson_p:.3e})\")\n", | |
| "print(f\" Spearman's ρ: {spearman_r:6.3f} (p={spearman_p:.3e})\")\n", | |
| "print(f\" Kendall's τ: {kendall_tau:6.3f} (p={kendall_p:.3e})\")\n", | |
| "print(f\" RMSE: {rmse:6.3f} kcal/mol\")\n", | |
| "print(f\" MAE: {mae:6.3f} kcal/mol\")\n", | |
| "print(\"=\" * 50)\n", | |
| "\n", | |
| "print(f\"\\n✓ Statistical analysis complete\")\n", | |
| "print(\"\\nInterpretation:\")\n", | |
| "print(f\" - Correlation coefficients > 0.7 indicate good agreement\")\n", | |
| "print(f\" - RMSE < 1.5 kcal/mol is considered excellent for FEP\")\n", | |
| "print(f\" - RMSE < 2.0 kcal/mol is good\")\n", | |
| "print(f\" - Current RMSE: {rmse:.2f} kcal/mol\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "---\n", | |
| "\n", | |
| "## 13. Visualization: FEP vs Experimental Correlation\n", | |
| "\n", | |
| "The primary validation plot for FEP studies." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Create correlation plot\n", | |
| "fig, ax = plt.subplots(figsize=(10, 9))\n", | |
| "\n", | |
| "# Scatter plot\n", | |
| "ax.scatter(exp_values, fep_values, s=150, alpha=0.7, \n", | |
| " edgecolors='black', linewidths=2, c='steelblue')\n", | |
| "\n", | |
| "# Add molecule labels\n", | |
| "for i, mol in enumerate(mol_labels):\n", | |
| " ax.annotate(mol, (exp_values[i], fep_values[i]), \n", | |
| " xytext=(5, 5), textcoords='offset points', \n", | |
| " fontsize=9, fontweight='bold')\n", | |
| "\n", | |
| "# Add diagonal line (perfect agreement)\n", | |
| "min_val = min(exp_values.min(), fep_values.min()) - 1\n", | |
| "max_val = max(exp_values.max(), fep_values.max()) + 1\n", | |
| "ax.plot([min_val, max_val], [min_val, max_val], \n", | |
| " 'k--', linewidth=2, label='Perfect agreement', alpha=0.5)\n", | |
| "\n", | |
| "# Add linear regression fit\n", | |
| "slope, intercept = np.polyfit(exp_values, fep_values, 1)\n", | |
| "fit_line = slope * exp_values + intercept\n", | |
| "ax.plot(exp_values, fit_line, 'r-', linewidth=2, \n", | |
| " label=f'Linear fit (slope={slope:.2f})', alpha=0.7)\n", | |
| "\n", | |
| "# Add statistics box\n", | |
| "stats_text = f\"Pearson r = {pearson_r:.3f}\\nSpearman ρ = {spearman_r:.3f}\\nKendall τ = {kendall_tau:.3f}\\nRMSE = {rmse:.2f} kcal/mol\\nMAE = {mae:.2f} kcal/mol\"\n", | |
| "ax.text(0.05, 0.95, stats_text, transform=ax.transAxes,\n", | |
| " fontsize=11, verticalalignment='top',\n", | |
| " bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))\n", | |
| "\n", | |
| "ax.set_xlabel('Experimental ΔG (kcal/mol)', fontsize=13, fontweight='bold')\n", | |
| "ax.set_ylabel('FEP Predicted ΔG (kcal/mol)', fontsize=13, fontweight='bold')\n", | |
| "ax.set_title('FEP Predictions vs Experimental Data\\n(Synthetic Example)', \n", | |
| " fontsize=14, fontweight='bold')\n", | |
| "ax.legend(loc='lower right', fontsize=11)\n", | |
| "ax.grid(alpha=0.3)\n", | |
| "ax.set_xlim(min_val, max_val)\n", | |
| "ax.set_ylim(min_val, max_val)\n", | |
| "\n", | |
| "plt.tight_layout()\n", | |
| "plt.show()\n", | |
| "\n", | |
| "print(\"\\n📊 Correlation plot generated\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "---\n", | |
| "\n", | |
| "## 14. Additional Analysis: Convergence Assessment\n", | |
| "\n", | |
| "Real PyAutoFEP includes diagnostic plots for:\n", | |
| "- **Overlap matrices** - λ-window phase space overlap\n", | |
| "- **Convergence plots** - ΔΔG vs simulation time\n", | |
| "- **Replica exchange diagnostics** - for REST/REST2 simulations\n", | |
| "\n", | |
| "Let's demonstrate a convergence plot." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "def simulate_convergence(n_time_points=20, final_dg=25.0):\n", | |
| " \"\"\"\n", | |
| " Simulate ΔG convergence over simulation time.\n", | |
| " \n", | |
| " Real FEP: ΔG calculated from increasing amounts of simulation data\n", | |
| " Should plateau when sufficient sampling is achieved\n", | |
| " \"\"\"\n", | |
| " time_points = np.linspace(1, 10, n_time_points) # 1-10 ns\n", | |
| " \n", | |
| " # Simulate convergence: exponential approach to final value\n", | |
| " # Early estimates have more noise\n", | |
| " dg_estimates = []\n", | |
| " dg_errors = []\n", | |
| " \n", | |
| " for t in time_points:\n", | |
| " # Convergence function\n", | |
| " converged_fraction = 1 - np.exp(-t / 3)\n", | |
| " \n", | |
| " # ΔG estimate approaches final value\n", | |
| " dg = final_dg * converged_fraction + np.random.normal(0, 5 / np.sqrt(t))\n", | |
| " \n", | |
| " # Error decreases with time\n", | |
| " error = 5 / np.sqrt(t) + 0.5\n", | |
| " \n", | |
| " dg_estimates.append(dg)\n", | |
| " dg_errors.append(error)\n", | |
| " \n", | |
| " return time_points, np.array(dg_estimates), np.array(dg_errors)\n", | |
| "\n", | |
| "\n", | |
| "# Generate convergence data for multiple perturbations\n", | |
| "n_perturbations = 3\n", | |
| "fig, ax = plt.subplots(figsize=(12, 7))\n", | |
| "\n", | |
| "for i in range(n_perturbations):\n", | |
| " final_dg = np.random.uniform(15, 35)\n", | |
| " time, dg, errors = simulate_convergence(final_dg=final_dg)\n", | |
| " \n", | |
| " label = f'Perturbation {i+1}'\n", | |
| " ax.errorbar(time, dg, yerr=errors, marker='o', capsize=4, \n", | |
| " linewidth=2, markersize=6, label=label, alpha=0.8)\n", | |
| "\n", | |
| "ax.set_xlabel('Simulation Time (ns)', fontsize=12, fontweight='bold')\n", | |
| "ax.set_ylabel('ΔG Estimate (kJ/mol)', fontsize=12, fontweight='bold')\n", | |
| "ax.set_title('Free Energy Convergence Over Time', fontsize=14, fontweight='bold')\n", | |
| "ax.legend(fontsize=11)\n", | |
| "ax.grid(alpha=0.3)\n", | |
| "\n", | |
| "# Add annotation\n", | |
| "ax.text(0.98, 0.02, 'Well-converged: estimates plateau\\nPoor convergence: continues drifting',\n", | |
| " transform=ax.transAxes, fontsize=10,\n", | |
| " verticalalignment='bottom', horizontalalignment='right',\n", | |
| " bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.7))\n", | |
| "\n", | |
| "plt.tight_layout()\n", | |
| "plt.show()\n", | |
| "\n", | |
| "print(\"\\n📊 Convergence analysis plot generated\")\n", | |
| "print(\"\\nIn real FEP:\")\n", | |
| "print(\" - Run longer if ΔG still changing significantly\")\n", | |
| "print(\" - Typical production runs: 10-50 ns per λ-window\")\n", | |
| "print(\" - Difficult perturbations may need 100+ ns\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "---\n", | |
| "\n", | |
| "## 15. Overlap Matrix Visualization\n", | |
| "\n", | |
| "Overlap matrices show phase space overlap between adjacent λ-windows. Good overlap (> 0.03) is essential for reliable BAR/MBAR estimates." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Generate synthetic overlap matrix\n", | |
| "n_lambda = 12\n", | |
| "overlap_matrix = np.zeros((n_lambda, n_lambda))\n", | |
| "\n", | |
| "# Diagonal elements (self-overlap) = 1\n", | |
| "np.fill_diagonal(overlap_matrix, 1.0)\n", | |
| "\n", | |
| "# Off-diagonal: overlap decreases with λ-distance\n", | |
| "for i in range(n_lambda):\n", | |
| " for j in range(n_lambda):\n", | |
| " if i != j:\n", | |
| " distance = abs(i - j)\n", | |
| " # Adjacent windows should have good overlap (0.1-0.3)\n", | |
| " # Distant windows have poor overlap\n", | |
| " overlap = np.exp(-distance * 1.5) * np.random.uniform(0.15, 0.35)\n", | |
| " overlap_matrix[i, j] = overlap\n", | |
| "\n", | |
| "# Plot\n", | |
| "plt.figure(figsize=(10, 8))\n", | |
| "sns.heatmap(overlap_matrix, \n", | |
| " annot=True, \n", | |
| " fmt='.2f', \n", | |
| " cmap='RdYlGn',\n", | |
| " vmin=0, \n", | |
| " vmax=1,\n", | |
| " cbar_kws={'label': 'Overlap'},\n", | |
| " xticklabels=[f'λ={i/(n_lambda-1):.2f}' for i in range(n_lambda)],\n", | |
| " yticklabels=[f'λ={i/(n_lambda-1):.2f}' for i in range(n_lambda)])\n", | |
| "\n", | |
| "plt.title('λ-Window Phase Space Overlap Matrix\\n(Green = good overlap, Red = poor overlap)', \n", | |
| " fontsize=14, fontweight='bold')\n", | |
| "plt.xlabel('λ-Window', fontsize=12)\n", | |
| "plt.ylabel('λ-Window', fontsize=12)\n", | |
| "plt.tight_layout()\n", | |
| "plt.show()\n", | |
| "\n", | |
| "# Check overlap quality\n", | |
| "min_adjacent_overlap = min([overlap_matrix[i, i+1] for i in range(n_lambda-1)])\n", | |
| "print(f\"\\n📊 Overlap matrix visualized\")\n", | |
| "print(f\"\\nMinimum adjacent window overlap: {min_adjacent_overlap:.3f}\")\n", | |
| "print(\"\\nGuidelines:\")\n", | |
| "print(\" - Overlap > 0.03: Adequate for BAR/MBAR\")\n", | |
| "print(\" - Overlap > 0.10: Good\")\n", | |
| "print(\" - Overlap > 0.20: Excellent\")\n", | |
| "print(f\"\\nThis simulation: {'✓ Good' if min_adjacent_overlap > 0.10 else '⚠ Could be better'}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "---\n", | |
| "\n", | |
| "## 16. Lead Optimization Utility Analysis\n", | |
| "\n", | |
| "The paper includes a Monte Carlo simulation to assess PyAutoFEP's utility in lead optimization.\n", | |
| "\n", | |
| "**Question:** If you need to select the best compound from N candidates, how much better is FEP-guided selection vs random selection?\n", | |
| "\n", | |
| "Let's implement this analysis." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "def lead_optimization_simulation(n_compounds_list=[5, 10, 20, 50], \n", | |
| " n_iterations=10000,\n", | |
| " exp_mean=0.0,\n", | |
| " exp_std=1.5,\n", | |
| " prediction_error=1.2):\n", | |
| " \"\"\"\n", | |
| " Monte Carlo simulation of lead optimization campaign.\n", | |
| " \n", | |
| " Parameters:\n", | |
| " -----------\n", | |
| " n_compounds_list : list, number of compounds to screen\n", | |
| " n_iterations : int, Monte Carlo iterations\n", | |
| " exp_mean : float, mean ΔΔG in typical lead opt (kcal/mol)\n", | |
| " exp_std : float, std of ΔΔG distribution (kcal/mol)\n", | |
| " prediction_error : float, FEP prediction error (kcal/mol)\n", | |
| " \n", | |
| " Returns:\n", | |
| " --------\n", | |
| " dict : results for each selection method\n", | |
| " \"\"\"\n", | |
| " results = {n: {'fep': [], 'random': [], 'perfect': []} for n in n_compounds_list}\n", | |
| " \n", | |
| " for n_comp in n_compounds_list:\n", | |
| " for _ in range(n_iterations):\n", | |
| " # Generate true ΔΔG for N compounds\n", | |
| " true_ddg = np.random.normal(exp_mean, exp_std, n_comp)\n", | |
| " \n", | |
| " # Perfect selection: pick best true value\n", | |
| " best_true = np.min(true_ddg)\n", | |
| " results[n_comp]['perfect'].append(best_true)\n", | |
| " \n", | |
| " # Random selection\n", | |
| " random_pick = np.random.choice(true_ddg)\n", | |
| " results[n_comp]['random'].append(random_pick)\n", | |
| " \n", | |
| " # FEP-guided selection: add prediction noise\n", | |
| " fep_predictions = true_ddg + np.random.normal(0, prediction_error, n_comp)\n", | |
| " best_fep_idx = np.argmin(fep_predictions)\n", | |
| " results[n_comp]['fep'].append(true_ddg[best_fep_idx])\n", | |
| " \n", | |
| " return results\n", | |
| "\n", | |
| "\n", | |
| "print(\"Running lead optimization Monte Carlo simulation...\")\n", | |
| "print(\"(This may take 10-20 seconds)\\n\")\n", | |
| "\n", | |
| "# Run simulation\n", | |
| "lead_opt_results = lead_optimization_simulation(\n", | |
| " n_compounds_list=[5, 10, 20, 50],\n", | |
| " n_iterations=5000, # Reduced for speed\n", | |
| " exp_mean=0.0,\n", | |
| " exp_std=1.5, # Based on Hajduk & Sauer data\n", | |
| " prediction_error=1.2 # Typical PyAutoFEP error from paper\n", | |
| ")\n", | |
| "\n", | |
| "print(\"✓ Simulation complete\\n\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Analyze and visualize results\n", | |
| "fig, axes = plt.subplots(1, 2, figsize=(16, 6))\n", | |
| "\n", | |
| "# Plot 1: Distributions for N=20 compounds\n", | |
| "ax = axes[0]\n", | |
| "n_comp = 20\n", | |
| "data_to_plot = [\n", | |
| " lead_opt_results[n_comp]['perfect'],\n", | |
| " lead_opt_results[n_comp]['fep'],\n", | |
| " lead_opt_results[n_comp]['random']\n", | |
| "]\n", | |
| "labels = ['Perfect', 'FEP-guided', 'Random']\n", | |
| "colors = ['green', 'steelblue', 'gray']\n", | |
| "\n", | |
| "for data, label, color in zip(data_to_plot, labels, colors):\n", | |
| " ax.hist(data, bins=50, alpha=0.6, label=label, color=color, edgecolor='black')\n", | |
| "\n", | |
| "ax.axvline(0, color='red', linestyle='--', linewidth=2, alpha=0.5, label='No change')\n", | |
| "ax.set_xlabel('ΔΔG of Selected Compound (kcal/mol)', fontsize=12, fontweight='bold')\n", | |
| "ax.set_ylabel('Frequency', fontsize=12, fontweight='bold')\n", | |
| "ax.set_title(f'Best Compound ΔΔG Distribution\\n(Screening {n_comp} compounds)', \n", | |
| " fontsize=13, fontweight='bold')\n", | |
| "ax.legend(fontsize=11)\n", | |
| "ax.grid(alpha=0.3, axis='y')\n", | |
| "\n", | |
| "# Plot 2: Success probability vs N compounds\n", | |
| "ax = axes[1]\n", | |
| "n_compounds = sorted(lead_opt_results.keys())\n", | |
| "\n", | |
| "improvement_threshold = -1.4 # 1 pKi unit improvement\n", | |
| "\n", | |
| "success_rates = {'perfect': [], 'fep': [], 'random': []}\n", | |
| "\n", | |
| "for n_comp in n_compounds:\n", | |
| " for method in ['perfect', 'fep', 'random']:\n", | |
| " data = np.array(lead_opt_results[n_comp][method])\n", | |
| " success_rate = np.mean(data < improvement_threshold) * 100\n", | |
| " success_rates[method].append(success_rate)\n", | |
| "\n", | |
| "ax.plot(n_compounds, success_rates['perfect'], marker='o', linewidth=3, \n", | |
| " markersize=10, label='Perfect predictions', color='green')\n", | |
| "ax.plot(n_compounds, success_rates['fep'], marker='s', linewidth=3, \n", | |
| " markersize=10, label='FEP-guided', color='steelblue')\n", | |
| "ax.plot(n_compounds, success_rates['random'], marker='^', linewidth=3, \n", | |
| " markersize=10, label='Random selection', color='gray')\n", | |
| "\n", | |
| "ax.set_xlabel('Number of Compounds Screened', fontsize=12, fontweight='bold')\n", | |
| "ax.set_ylabel('Probability of Finding Improved Compound (%)', fontsize=12, fontweight='bold')\n", | |
| "ax.set_title(f'Success Rate vs Library Size\\n(Success = ΔΔG < {improvement_threshold} kcal/mol)', \n", | |
| " fontsize=13, fontweight='bold')\n", | |
| "ax.legend(fontsize=11, loc='lower right')\n", | |
| "ax.grid(alpha=0.3)\n", | |
| "ax.set_ylim(0, 100)\n", | |
| "\n", | |
| "plt.tight_layout()\n", | |
| "plt.show()\n", | |
| "\n", | |
| "# Print summary statistics\n", | |
| "print(\"\\n📊 Lead Optimization Utility Analysis\\n\")\n", | |
| "print(\"Success rates for finding ΔΔG < -1.4 kcal/mol (1 pKi unit):\")\n", | |
| "print(\"=\"*70)\n", | |
| "print(f\"{'N Compounds':<15} {'Perfect':<15} {'FEP-guided':<15} {'Random':<15}\")\n", | |
| "print(\"-\"*70)\n", | |
| "for i, n_comp in enumerate(n_compounds):\n", | |
| " print(f\"{n_comp:<15} {success_rates['perfect'][i]:>6.1f}% \"\n", | |
| " f\"{success_rates['fep'][i]:>6.1f}% {success_rates['random'][i]:>6.1f}%\")\n", | |
| "print(\"=\"*70)\n", | |
| "\n", | |
| "# Calculate improvement factor\n", | |
| "improvement_20 = success_rates['fep'][n_compounds.index(20)] / success_rates['random'][n_compounds.index(20)]\n", | |
| "print(f\"\\nFor 20 compounds: FEP is {improvement_20:.1f}× better than random selection\")\n", | |
| "print(\"\\n✓ This demonstrates PyAutoFEP's practical value in lead optimization\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "---\n", | |
| "\n", | |
| "## 17. Summary and Next Steps\n", | |
| "\n", | |
| "### What We've Demonstrated:\n", | |
| "\n", | |
| "✅ **Core PyAutoFEP Algorithms:**\n", | |
| "- Maximum Common Substructure (MCS) calculation\n", | |
| "- Perturbation map optimization\n", | |
| "- Free energy estimation concepts (simplified BAR)\n", | |
| "- Reference-relative energy calculation\n", | |
| "\n", | |
| "✅ **Analysis and Validation:**\n", | |
| "- Statistical correlation metrics (Pearson, Spearman, Kendall)\n", | |
| "- Convergence analysis\n", | |
| "- Overlap matrices\n", | |
| "- Lead optimization utility simulation\n", | |
| "\n", | |
| "✅ **Visualizations:**\n", | |
| "- Molecule structures\n", | |
| "- MCS similarity matrices\n", | |
| "- Perturbation network graphs\n", | |
| "- FEP vs experimental correlation plots\n", | |
| "- Convergence diagnostics\n", | |
| "\n", | |
| "---\n", | |
| "\n", | |
| "### What Requires Full Infrastructure:\n", | |
| "\n", | |
| "**Software Requirements:**\n", | |
| "- GROMACS 2019+ (patched with PLUMED for REST/REST2)\n", | |
| "- Force field parameterization tools:\n", | |
| " - AcPype (for AMBER/GAFF)\n", | |
| " - CGenFF/ParamChem (for CHARMM)\n", | |
| " - LigParGen (for OPLS-AA/M)\n", | |
| "- alchemlyb (for BAR/MBAR analysis)\n", | |
| "- PyAutoFEP package itself\n", | |
| "\n", | |
| "**Computational Requirements:**\n", | |
| "- **Hardware:** GPU-accelerated workstation or HPC cluster\n", | |
| "- **Time:** 1-7 days per perturbation (depending on system size)\n", | |
| "- **Storage:** 10-100 GB per perturbation for trajectories\n", | |
| "- **Memory:** 8-32 GB depending on system size\n", | |
| "\n", | |
| "**Typical FEP Campaign:**\n", | |
| "- 10-20 molecules → 9-19 perturbations (minimum spanning tree)\n", | |
| "- Each perturbation: 12-24 λ-windows × 10 ns = 120-240 ns\n", | |
| "- Total: 1-5 μs of simulation per campaign\n", | |
| "\n", | |
| "---\n", | |
| "\n", | |
| "### Running Real PyAutoFEP Calculations:\n", | |
| "\n", | |
| "1. **Install PyAutoFEP:**\n", | |
| " ```bash\n", | |
| " pip install pyautofep\n", | |
| " ```\n", | |
| "\n", | |
| "2. **Prepare input files:**\n", | |
| " - Ligand structures (SMILES, mol, mol2)\n", | |
| " - Protein structure (PDB)\n", | |
| " - Force field topologies\n", | |
| " - Configuration file\n", | |
| "\n", | |
| "3. **Run workflow:**\n", | |
| " ```bash\n", | |
| " pyautofep --config config.yaml\n", | |
| " ```\n", | |
| "\n", | |
| "4. **Analyze results:**\n", | |
| " - PyAutoFEP generates analysis plots automatically\n", | |
| " - Compare with experimental binding data\n", | |
| "\n", | |
| "---\n", | |
| "\n", | |
| "### Key Takeaways from the Paper:\n", | |
| "\n", | |
| "1. **PyAutoFEP automates FEP workflows** for GROMACS\n", | |
| "2. **Supports multiple force fields** (AMBER, CHARMM, OPLS)\n", | |
| "3. **Enhanced sampling** (REST/REST2) improves difficult cases\n", | |
| "4. **Validation on FXR ligands:**\n", | |
| " - RMSEc: 0.76-1.19 kcal/mol\n", | |
| " - Kendall's τ: 0.26-0.57\n", | |
| " - Competitive with commercial FEP+ software\n", | |
| "5. **Lead optimization value:**\n", | |
| " - FEP-guided selection ~2-3× better than random\n", | |
| " - Practical utility for drug discovery projects\n", | |
| "\n", | |
| "---\n", | |
| "\n", | |
| "### References:\n", | |
| "\n", | |
| "- PyAutoFEP paper: Martins et al. (Journal details from paper)\n", | |
| "- GROMACS: https://www.gromacs.org/\n", | |
| "- alchemlyb: https://github.com/alchemistry/alchemlyb\n", | |
| "- D3R Grand Challenge: https://drugdesigndata.org/\n", | |
| "\n", | |
| "---\n", | |
| "\n", | |
| "## Thank you for following this educational notebook! 🎉\n", | |
| "\n", | |
| "For questions or to run real FEP calculations, refer to the PyAutoFEP documentation and ensure you have access to appropriate computational resources." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Final summary statistics\n", | |
| "print(\"=\"*70)\n", | |
| "print(\" NOTEBOOK EXECUTION SUMMARY\")\n", | |
| "print(\"=\"*70)\n", | |
| "print(f\"\\n✓ Molecules analyzed: {len(molecules)}\")\n", | |
| "print(f\"✓ MCS pairs calculated: {len(mcs_matrix)}\")\n", | |
| "print(f\"✓ Perturbations in map: {perturbation_map.number_of_edges()}\")\n", | |
| "print(f\"✓ FEP calculations (simulated): {len(perturbation_results)}\")\n", | |
| "print(f\"✓ Reference-relative ΔG: {len(relative_energies)}\")\n", | |
| "print(f\"\\nStatistical Validation:\")\n", | |
| "print(f\" Pearson r: {pearson_r:.3f}\")\n", | |
| "print(f\" Spearman ρ: {spearman_r:.3f}\")\n", | |
| "print(f\" RMSE: {rmse:.2f} kcal/mol\")\n", | |
| "print(f\"\\nLead Optimization (20 compounds):\")\n", | |
| "print(f\" FEP success rate: {success_rates['fep'][n_compounds.index(20)]:.1f}%\")\n", | |
| "print(f\" Random success rate: {success_rates['random'][n_compounds.index(20)]:.1f}%\")\n", | |
| "print(f\" Improvement factor: {improvement_20:.1f}×\")\n", | |
| "print(\"\\n\" + \"=\"*70)\n", | |
| "print(\"All workflow steps completed successfully! ✓\")\n", | |
| "print(\"=\"*70)" | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": "Python 3", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.11.0" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 4 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment