Skip to content

Instantly share code, notes, and snippets.

@richardokonicha
Last active September 29, 2025 14:23
Show Gist options
  • Select an option

  • Save richardokonicha/2d17eaf3a8bba4d1b1ab9cacd9b5c5d6 to your computer and use it in GitHub Desktop.

Select an option

Save richardokonicha/2d17eaf3a8bba4d1b1ab9cacd9b5c5d6 to your computer and use it in GitHub Desktop.
Setup Guide: MindsDB with Elasticsearch Integration

Complete Setup Guide: MindsDB with Elasticsearch Integration

Overview

This comprehensive guide walks you through setting up MindsDB from source using Makefile commands, installing and configuring Elasticsearch, creating datasources, and running extensive test scenarios. Perfect for development, testing, and production environments.

Table of Contents

  1. Prerequisites & System Requirements
  2. MindsDB Installation from Source
  3. Elasticsearch Installation & Configuration
  4. MindsDB Development Setup
  5. Elasticsearch Handler Installation
  6. Creating Elasticsearch Test Data
  7. MindsDB Elasticsearch Datasource Setup
  8. Comprehensive Test Scenarios
  9. Performance Testing & Benchmarks
  10. Troubleshooting Guide
  11. Production Deployment Tips

Prerequisites & System Requirements

Minimum System Requirements

  • OS: macOS, Linux, or Windows with WSL2
  • Python: 3.8+ (3.11+ recommended)
  • Memory: 8GB RAM minimum, 16GB recommended
  • Storage: 10GB free space for development environment
  • Java: 11+ for Elasticsearch

Required Software

# macOS (using Homebrew)
brew install [email protected] java git curl wget

# Ubuntu/Debian
sudo apt update
sudo apt install python3.11 python3.11-dev python3-pip openjdk-11-jdk git curl wget

# CentOS/RHEL
sudo yum install python3.11 python3.11-devel java-11-openjdk git curl wget

Development Tools

# Install essential development tools
pip install --upgrade pip setuptools wheel
pip install virtualenv pre-commit

MindsDB Installation from Source

Step 1: Clone MindsDB Repository

# Clone the official MindsDB repository
git clone https://github.com/mindsdb/mindsdb.git
cd mindsdb

# Optional: Switch to development branch
# git checkout staging

Step 2: Create Virtual Environment

# Create isolated Python environment
python3.11 -m venv venv

# Activate virtual environment
source venv/bin/activate  # Linux/macOS
# venv\Scripts\activate   # Windows

# Verify Python version
python --version  # Should show Python 3.11+

Step 3: Install MindsDB using Makefile

The MindsDB project includes a comprehensive Makefile with pre-configured commands:

# Install MindsDB in development mode with all dependencies
make install_mindsdb

This command performs the following:

  • pip install -e . - Installs MindsDB in editable/development mode
  • pip install -r requirements/requirements-dev.txt - Installs development dependencies
  • pre-commit install - Sets up code formatting and linting hooks

Step 4: Verify MindsDB Installation

# Check MindsDB installation
python -c "import mindsdb; print('MindsDB installed successfully!')"

# View available Makefile commands
make help  # or just: make

Available Makefile Commands

Command Description
make install_mindsdb Complete MindsDB development installation
make install_handler HANDLER_NAME=elasticsearch Install specific handler
make run_mindsdb Start MindsDB server
make unit_tests Run unit tests
make integration_tests Run integration tests
make format Format code with pre-commit hooks
make check Run code quality checks
make build_docker Build Docker image
make run_docker Run MindsDB in Docker

Elasticsearch Installation & Configuration

Option 1: Docker Installation (Recommended for Development)

# Create docker-compose.yml for Elasticsearch
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
    container_name: elasticsearch
    environment:
      - node.name=elasticsearch
      - cluster.name=docker-cluster
      - discovery.type=single-node
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms1g -Xmx1g"
      # Disable security for development (NOT for production)
      - xpack.security.enabled=false
      - xpack.security.enrollment.enabled=false
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - elasticsearch_data:/usr/share/elasticsearch/data
    ports:
      - "9200:9200"
      - "9300:9300"
    networks:
      - elastic

volumes:
  elasticsearch_data:
    driver: local

networks:
  elastic:
    driver: bridge
EOF

# Start Elasticsearch
docker-compose up -d elasticsearch

# Verify Elasticsearch is running
curl -X GET "localhost:9200/"

Option 2: Native Installation

macOS (using Homebrew)

# Install Elasticsearch
brew tap elastic/tap
brew install elastic/tap/elasticsearch-full

# Start Elasticsearch
brew services start elastic/tap/elasticsearch-full

# Configure for development (disable security)
echo "xpack.security.enabled: false" >> /opt/homebrew/etc/elasticsearch/elasticsearch.yml
brew services restart elastic/tap/elasticsearch-full

Ubuntu/Debian

# Install Elasticsearch APT repository
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

# Install Elasticsearch
sudo apt update
sudo apt install elasticsearch

# Configure for development
sudo bash -c 'cat >> /etc/elasticsearch/elasticsearch.yml << EOF
network.host: localhost
discovery.type: single-node
xpack.security.enabled: false
EOF'

# Start Elasticsearch
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch

Step 3: Verify Elasticsearch Installation

# Check Elasticsearch health
curl -X GET "localhost:9200/_cluster/health?pretty"

# Expected response:
# {
#   "cluster_name" : "docker-cluster",
#   "status" : "green" or "yellow",
#   "number_of_nodes" : 1
# }

# Check Elasticsearch version
curl -X GET "localhost:9200/"

MindsDB Development Setup

Step 1: Install Elasticsearch Handler

# Install Elasticsearch handler specifically
make install_handler HANDLER_NAME=elasticsearch

# Verify handler dependencies
pip list | grep elasticsearch
# Should show:
# elasticsearch>=7.13.4,<9.0.0
# elasticsearch-dbapi>=0.2.9

Step 2: Start MindsDB Server

# Start MindsDB with HTTP API
make run_mindsdb

# Alternative: Start with specific options
python -m mindsdb --api http --port 47334

# Alternative: Start without web interface
python -m mindsdb --api http --no_studio

Step 3: Verify MindsDB is Running

# Check MindsDB health
curl -X GET "localhost:47334/api/status"

# Expected response:
# {"status": "ok"}

# View MindsDB web interface (if studio enabled)
open http://localhost:47334

Step 4: Configure Development Environment

# Set up pre-commit hooks for code quality
make precommit

# Run code formatting
make format

# Run quality checks
make check

Creating Elasticsearch Test Data

Step 1: Load Comprehensive Test Dataset

The MindsDB Elasticsearch handler includes a comprehensive test data loading script:

# Make the script executable
chmod +x mindsdb/integrations/handlers/elasticsearch_handler/elasticsearch-test-data.sh

# Load test data (Elasticsearch must be running)
./mindsdb/integrations/handlers/elasticsearch_handler/elasticsearch-test-data.sh

This script creates three test indices:

1. Products Index (Array Field Testing)

  • Purpose: Tests Search API fallback mechanism
  • Features: Array fields (tags, features), nested objects (reviews)
  • Records: 2 sample products with complex structures

2. Users Index (SQL API Testing)

  • Purpose: Tests optimal SQL API performance
  • Features: Simple structure, no arrays
  • Records: 3 sample users with basic information

3. Orders Index (Performance Testing)

  • Purpose: Tests bulk data handling and pagination
  • Features: Bulk data generation, performance metrics
  • Records: 10+ sample orders with random data

Step 2: Verify Test Data

# List all indices
curl -X GET "localhost:9200/_cat/indices?v"

# Check products index (should have arrays)
curl -X GET "localhost:9200/products/_search?pretty&size=1"

# Check users index (simple structure)
curl -X GET "localhost:9200/users/_search?pretty&size=1"

# Check orders index (bulk data)
curl -X GET "localhost:9200/orders/_count"

Step 3: Manual Test Data Creation (Optional)

If you prefer to create custom test data:

# Create custom index
curl -X PUT "localhost:9200/my_test_index" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "title": {"type": "text"},
      "category": {"type": "keyword"},
      "price": {"type": "float"},
      "tags": {"type": "keyword"},
      "created_at": {"type": "date"}
    }
  }
}'

# Add sample document
curl -X POST "localhost:9200/my_test_index/_doc" -H 'Content-Type: application/json' -d'
{
  "title": "Sample Product",
  "category": "Electronics",
  "price": 29.99,
  "tags": ["sample", "test", "electronics"],
  "created_at": "2024-01-01T00:00:00Z"
}'

# Refresh index
curl -X POST "localhost:9200/my_test_index/_refresh"

MindsDB Elasticsearch Datasource Setup

Step 1: Create Elasticsearch Database Connection

Using MindsDB SQL interface:

-- Connect to MindsDB
-- Method 1: Using MindsDB web interface (http://localhost:47334)
-- Method 2: Using curl commands
-- Method 3: Using Python client

-- Create Elasticsearch datasource
CREATE DATABASE my_elasticsearch
WITH ENGINE = 'elasticsearch',
PARAMETERS = {
    "hosts": "localhost:9200"
};

Step 2: Test Connection Using curl

# Create datasource via API
curl -X POST "localhost:47334/api/databases" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my_elasticsearch",
    "engine": "elasticsearch",
    "parameters": {
      "hosts": "localhost:9200"
    }
  }'

# Verify connection
curl -X GET "localhost:47334/api/databases"

Step 3: Advanced Connection Configuration

For production or secure environments:

-- Elasticsearch with authentication
CREATE DATABASE secure_elasticsearch
WITH ENGINE = 'elasticsearch',
PARAMETERS = {
    "hosts": "your-cluster.elasticsearch.com:9200",
    "user": "elastic",
    "password": "your-password",
    "verify_certs": true
};

-- Elasticsearch Cloud
CREATE DATABASE elastic_cloud
WITH ENGINE = 'elasticsearch',
PARAMETERS = {
    "cloud_id": "your-cloud-deployment-id",
    "user": "elastic",
    "password": "your-cloud-password"
};

-- Elasticsearch with SSL
CREATE DATABASE ssl_elasticsearch
WITH ENGINE = 'elasticsearch',
PARAMETERS = {
    "hosts": "https://localhost:9200",
    "ca_certs": "/path/to/ca.pem",
    "verify_certs": true,
    "timeout": 30
};

Step 4: Verify Datasource Setup

-- List available databases
SHOW DATABASES;

-- Test connection
SELECT 1;

-- List tables from Elasticsearch
SHOW TABLES FROM my_elasticsearch;

Comprehensive Test Scenarios

Category 1: Basic SQL Operations

Test 1.1: Simple SELECT Queries

-- Test basic data retrieval
SELECT * FROM my_elasticsearch.users LIMIT 5;

-- Test specific field selection
SELECT username, email, age FROM my_elasticsearch.users;

-- Test counting records
SELECT COUNT(*) as total_users FROM my_elasticsearch.users;

Test 1.2: WHERE Clauses and Filtering

-- Test equality filtering
SELECT * FROM my_elasticsearch.users WHERE city = 'New York';

-- Test range queries
SELECT * FROM my_elasticsearch.users WHERE age BETWEEN 25 AND 35;

-- Test boolean conditions
SELECT * FROM my_elasticsearch.users WHERE active = true;

-- Test multiple conditions
SELECT username, city FROM my_elasticsearch.users
WHERE age > 30 AND active = true;

Test 1.3: ORDER BY and LIMIT

-- Test sorting
SELECT username, age FROM my_elasticsearch.users ORDER BY age DESC;

-- Test pagination
SELECT * FROM my_elasticsearch.orders ORDER BY total DESC LIMIT 5;

-- Test offset
SELECT * FROM my_elasticsearch.orders ORDER BY order_date LIMIT 3 OFFSET 2;

Category 2: Array Field Testing (Fallback Mechanism)

Test 2.1: Array Field Queries

-- Test array field retrieval (triggers Search API fallback)
SELECT name, tags, features FROM my_elasticsearch.products;

-- Test with array field filtering
SELECT name, price FROM my_elasticsearch.products
WHERE name = 'Wireless Headphones';

Test 2.2: Nested Object Handling

-- Test nested object retrieval
SELECT name, reviews FROM my_elasticsearch.products;

-- Test complex document structure
SELECT * FROM my_elasticsearch.products LIMIT 1;

Category 3: Aggregation Testing

Test 3.1: Basic Aggregations

-- Test COUNT aggregation
SELECT category, COUNT(*) as product_count
FROM my_elasticsearch.products GROUP BY category;

-- Test numerical aggregations
SELECT AVG(age) as avg_age, MAX(age) as max_age, MIN(age) as min_age
FROM my_elasticsearch.users;

-- Test SUM aggregation
SELECT SUM(total) as total_revenue FROM my_elasticsearch.orders;

Test 3.2: Advanced Aggregations

-- Test HAVING clause
SELECT status, COUNT(*) as order_count, AVG(total) as avg_total
FROM my_elasticsearch.orders
GROUP BY status
HAVING COUNT(*) > 3;

-- Test multiple aggregations
SELECT customer, COUNT(*) as orders, SUM(total) as total_spent
FROM my_elasticsearch.orders
GROUP BY customer
ORDER BY total_spent DESC;

Category 4: Performance Testing

Test 4.1: Large Dataset Queries

# First, generate more test data
for i in {1..100}; do
  curl -X POST "localhost:9200/orders/_doc" -H 'Content-Type: application/json' -d"
{
  \"order_id\": \"BULK-${i}\",
  \"customer\": \"bulk_customer_${i}\",
  \"total\": $((RANDOM % 1000 + 10)).99,
  \"status\": \"completed\",
  \"order_date\": \"2024-$((RANDOM % 12 + 1))-$((RANDOM % 28 + 1))\",
  \"items\": $((RANDOM % 10 + 1))
}"
done

curl -X POST "localhost:9200/_refresh"
-- Test large result set handling
SELECT * FROM my_elasticsearch.orders;

-- Test pagination performance
SELECT * FROM my_elasticsearch.orders ORDER BY total DESC LIMIT 50;

-- Test aggregation performance on large dataset
SELECT status, COUNT(*) as count, AVG(total) as avg_total
FROM my_elasticsearch.orders
GROUP BY status;

Test 4.2: Complex Query Performance

-- Test complex WHERE clauses
SELECT customer, order_id, total
FROM my_elasticsearch.orders
WHERE total > 100 AND status = 'completed'
ORDER BY total DESC LIMIT 20;

-- Test date range queries
SELECT COUNT(*) FROM my_elasticsearch.orders
WHERE order_date >= '2024-01-01' AND order_date < '2024-12-31';

Category 5: Schema Discovery Testing

Test 5.1: Database Schema Exploration

-- List all tables
SHOW TABLES FROM my_elasticsearch;

-- Get column information
DESCRIBE my_elasticsearch.products;
DESCRIBE my_elasticsearch.users;
DESCRIBE my_elasticsearch.orders;

Test 5.2: Information Schema Queries

-- Alternative schema discovery
SELECT table_name FROM information_schema.tables
WHERE table_schema = 'my_elasticsearch';

-- Column details
SELECT column_name, data_type FROM information_schema.columns
WHERE table_name = 'products' AND table_schema = 'my_elasticsearch';

Category 6: Error Handling Testing

Test 6.1: Invalid Query Testing

-- Test non-existent table
SELECT * FROM my_elasticsearch.nonexistent_table;

-- Test invalid column
SELECT invalid_column FROM my_elasticsearch.users;

-- Test unsupported operations
SELECT * FROM my_elasticsearch.users u
JOIN my_elasticsearch.orders o ON u.username = o.customer;

Test 6.2: Connection Error Testing

# Stop Elasticsearch temporarily
docker-compose stop elasticsearch

# Try to query (should fail gracefully)
curl -X POST "localhost:47334/api/sql/query" \
  -H "Content-Type: application/json" \
  -d '{"query": "SELECT * FROM my_elasticsearch.users LIMIT 1"}'

# Restart Elasticsearch
docker-compose start elasticsearch

Category 7: Real-World Scenarios

Test 7.1: E-commerce Analytics

-- Product catalog analysis
SELECT category, COUNT(*) as product_count, AVG(price) as avg_price
FROM my_elasticsearch.products
GROUP BY category;

-- Customer analysis
SELECT city, COUNT(*) as users,
       SUM(CASE WHEN active = true THEN 1 ELSE 0 END) as active_users
FROM my_elasticsearch.users
GROUP BY city;

Test 7.2: Business Intelligence Queries

-- Sales performance
SELECT
    DATE_FORMAT(order_date, '%Y-%m') as month,
    COUNT(*) as orders,
    SUM(total) as revenue,
    AVG(total) as avg_order_value
FROM my_elasticsearch.orders
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
ORDER BY month;

-- Customer segmentation
SELECT
    CASE
        WHEN total > 500 THEN 'High Value'
        WHEN total > 200 THEN 'Medium Value'
        ELSE 'Low Value'
    END as segment,
    COUNT(*) as customers
FROM my_elasticsearch.orders
GROUP BY segment;

Performance Testing & Benchmarks

Automated Performance Testing

# Create performance test script
cat > performance_test.sh << 'EOF'
#!/bin/bash

echo "πŸš€ MindsDB Elasticsearch Performance Test"
echo "========================================"

# Test 1: Simple query performance
echo "Test 1: Simple SELECT query"
time curl -s -X POST "localhost:47334/api/sql/query" \
  -H "Content-Type: application/json" \
  -d '{"query": "SELECT COUNT(*) FROM my_elasticsearch.users"}' \
  > /dev/null

# Test 2: Array field query (fallback) performance
echo "Test 2: Array field query (Search API fallback)"
time curl -s -X POST "localhost:47334/api/sql/query" \
  -H "Content-Type: application/json" \
  -d '{"query": "SELECT name, tags FROM my_elasticsearch.products LIMIT 10"}' \
  > /dev/null

# Test 3: Aggregation performance
echo "Test 3: Aggregation query"
time curl -s -X POST "localhost:47334/api/sql/query" \
  -H "Content-Type: application/json" \
  -d '{"query": "SELECT status, COUNT(*), AVG(total) FROM my_elasticsearch.orders GROUP BY status"}' \
  > /dev/null

# Test 4: Large result set
echo "Test 4: Large result set (100 records)"
time curl -s -X POST "localhost:47334/api/sql/query" \
  -H "Content-Type: application/json" \
  -d '{"query": "SELECT * FROM my_elasticsearch.orders LIMIT 100"}' \
  > /dev/null

echo "βœ… Performance tests completed!"
EOF

chmod +x performance_test.sh
./performance_test.sh

Memory Usage Monitoring

# Monitor MindsDB memory usage during testing
cat > memory_monitor.sh << 'EOF'
#!/bin/bash

echo "πŸ“Š Memory Usage Monitor"
echo "======================"

# Find MindsDB process
PID=$(pgrep -f "python.*mindsdb")

if [ -z "$PID" ]; then
    echo "❌ MindsDB process not found"
    exit 1
fi

echo "πŸ” Monitoring MindsDB process: $PID"

# Monitor memory for 60 seconds
for i in {1..12}; do
    MEMORY=$(ps -p $PID -o rss= | awk '{print $1/1024 " MB"}')
    echo "$(date): Memory usage: $MEMORY"
    sleep 5
done
EOF

chmod +x memory_monitor.sh
# Run in separate terminal while executing queries
./memory_monitor.sh &

Troubleshooting Guide

Common Issues and Solutions

Issue 1: MindsDB Installation Fails

Symptoms:

  • pip install -e . fails with dependency errors
  • Missing development tools errors

Solutions:

# Update pip and setuptools
pip install --upgrade pip setuptools wheel

# Install system dependencies (Ubuntu/Debian)
sudo apt install python3.11-dev build-essential

# Install system dependencies (macOS)
xcode-select --install

# Clean and reinstall
pip cache purge
make install_mindsdb

Issue 2: Elasticsearch Connection Failed

Symptoms:

  • ConnectionError: HTTPConnectionPool(host='localhost', port=9200)
  • MindsDB cannot connect to Elasticsearch

Diagnostic Steps:

# Check if Elasticsearch is running
curl -X GET "localhost:9200/"

# Check Docker containers
docker ps | grep elasticsearch

# Check system services
sudo systemctl status elasticsearch  # Linux
brew services list | grep elasticsearch  # macOS

Solutions:

# Restart Elasticsearch Docker
docker-compose restart elasticsearch

# Start Elasticsearch service
sudo systemctl start elasticsearch  # Linux
brew services start elastic/tap/elasticsearch-full  # macOS

# Check Elasticsearch logs
docker-compose logs elasticsearch
sudo journalctl -u elasticsearch  # Linux

Issue 3: Array Fields Not Working

Symptoms:

  • "Arrays are not supported" errors
  • Search API fallback not triggering

Verification:

# Check if test data has arrays
curl -X GET "localhost:9200/products/_search?pretty&size=1"

# Verify handler is detecting arrays
# Check MindsDB logs for "using Search API fallback"

Solutions:

# Clear handler cache (restart MindsDB)
pkill -f "python.*mindsdb"
make run_mindsdb

# Recreate test data
./mindsdb/integrations/handlers/elasticsearch_handler/elasticsearch-test-data.sh

Issue 4: Performance Issues

Symptoms:

  • Queries taking >10 seconds
  • High memory usage
  • MindsDB becoming unresponsive

Diagnostic Tools:

# Monitor system resources
top -p $(pgrep -f "python.*mindsdb")

# Check Elasticsearch cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"

# Monitor Elasticsearch performance
curl -X GET "localhost:9200/_nodes/stats?pretty"

Optimization Steps:

# Increase JVM heap for Elasticsearch
echo "ES_JAVA_OPTS=-Xms2g -Xmx2g" >> docker-compose.yml

# Restart with more memory
docker-compose down
docker-compose up -d

# Use pagination for large queries
# SELECT * FROM table LIMIT 1000;

Issue 5: Handler Import Errors

Symptoms:

  • ImportError: No module named 'elasticsearch'
  • Handler not found errors

Solutions:

# Install handler dependencies explicitly
pip install elasticsearch>=7.13.4 elasticsearch-dbapi>=0.2.9

# Reinstall handler
make install_handler HANDLER_NAME=elasticsearch

# Verify installation
python -c "from elasticsearch import Elasticsearch; print('βœ… OK')"

Debug Mode Setup

# Enable debug logging
export MINDSDB_LOG_LEVEL=DEBUG

# Start MindsDB with verbose logging
python -m mindsdb --api http --verbose

# Check logs
tail -f ~/.mindsdb/mindsdb.log

Getting Help

# Check MindsDB version
python -c "import mindsdb; print(mindsdb.__version__)"

# Run MindsDB diagnostics
python -m mindsdb --help

# Check handler status
curl -X GET "localhost:47334/api/handlers"

Production Deployment Tips

Security Configuration

-- Production Elasticsearch connection
CREATE DATABASE prod_elasticsearch
WITH ENGINE = 'elasticsearch',
PARAMETERS = {
    "hosts": "https://your-cluster.com:9200",
    "user": "readonly_user",
    "password": "strong_password",
    "verify_certs": true,
    "ca_certs": "/path/to/ca.crt",
    "timeout": 60
};

Performance Optimization

# Production MindsDB startup
python -m mindsdb \
  --api http \
  --port 47334 \
  --host 0.0.0.0 \
  --no_studio \
  --verbose

# Production Elasticsearch configuration
# - Increase heap size: -Xms4g -Xmx4g
# - Enable monitoring
# - Configure proper security
# - Set up clustering for high availability

Monitoring and Maintenance

# Health check script
cat > health_check.sh << 'EOF'
#!/bin/bash

# Check MindsDB
curl -f http://localhost:47334/api/status || exit 1

# Check Elasticsearch
curl -f http://localhost:9200/_cluster/health || exit 1

# Test database connection
curl -X POST "localhost:47334/api/sql/query" \
  -H "Content-Type: application/json" \
  -d '{"query": "SELECT 1"}' || exit 1

echo "βœ… All services healthy"
EOF

chmod +x health_check.sh

Backup and Recovery

# Backup MindsDB configuration
cp -r ~/.mindsdb/ ./mindsdb_backup/

# Backup Elasticsearch data
curl -X POST "localhost:9200/_snapshot/backup/snapshot_$(date +%Y%m%d)"

# Document your datasource configurations
echo "CREATE DATABASE my_elasticsearch WITH ENGINE = 'elasticsearch', PARAMETERS = {...};" > datasources_backup.sql

Summary

This comprehensive guide covered:

βœ… Complete MindsDB setup from source using Makefile βœ… Elasticsearch installation and configuration βœ… Handler installation and verification βœ… Comprehensive test data creation βœ… Extensive test scenarios covering all functionality βœ… Performance testing and benchmarking βœ… Troubleshooting guide for common issues βœ… Production deployment best practices

Quick Start Commands Summary

# 1. Setup MindsDB
git clone https://github.com/mindsdb/mindsdb.git && cd mindsdb
python3.11 -m venv venv && source venv/bin/activate
make install_mindsdb

# 2. Setup Elasticsearch
docker-compose up -d elasticsearch

# 3. Load test data
./mindsdb/integrations/handlers/elasticsearch_handler/elasticsearch-test-data.sh

# 4. Start MindsDB
make run_mindsdb

# 5. Create datasource
curl -X POST "localhost:47334/api/databases" -H "Content-Type: application/json" -d '{"name": "my_elasticsearch", "engine": "elasticsearch", "parameters": {"hosts": "localhost:9200"}}'

# 6. Test query
curl -X POST "localhost:47334/api/sql/query" -H "Content-Type: application/json" -d '{"query": "SELECT * FROM my_elasticsearch.users LIMIT 5"}'

You now have a complete development environment for working with MindsDB and Elasticsearch integration!

Additional Resources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment