This comprehensive guide walks you through setting up MindsDB from source using Makefile commands, installing and configuring Elasticsearch, creating datasources, and running extensive test scenarios. Perfect for development, testing, and production environments.
- Prerequisites & System Requirements
- MindsDB Installation from Source
- Elasticsearch Installation & Configuration
- MindsDB Development Setup
- Elasticsearch Handler Installation
- Creating Elasticsearch Test Data
- MindsDB Elasticsearch Datasource Setup
- Comprehensive Test Scenarios
- Performance Testing & Benchmarks
- Troubleshooting Guide
- Production Deployment Tips
- OS: macOS, Linux, or Windows with WSL2
- Python: 3.8+ (3.11+ recommended)
- Memory: 8GB RAM minimum, 16GB recommended
- Storage: 10GB free space for development environment
- Java: 11+ for Elasticsearch
# macOS (using Homebrew)
brew install [email protected] java git curl wget
# Ubuntu/Debian
sudo apt update
sudo apt install python3.11 python3.11-dev python3-pip openjdk-11-jdk git curl wget
# CentOS/RHEL
sudo yum install python3.11 python3.11-devel java-11-openjdk git curl wget# Install essential development tools
pip install --upgrade pip setuptools wheel
pip install virtualenv pre-commit# Clone the official MindsDB repository
git clone https://github.com/mindsdb/mindsdb.git
cd mindsdb
# Optional: Switch to development branch
# git checkout staging# Create isolated Python environment
python3.11 -m venv venv
# Activate virtual environment
source venv/bin/activate # Linux/macOS
# venv\Scripts\activate # Windows
# Verify Python version
python --version # Should show Python 3.11+The MindsDB project includes a comprehensive Makefile with pre-configured commands:
# Install MindsDB in development mode with all dependencies
make install_mindsdbThis command performs the following:
pip install -e .- Installs MindsDB in editable/development modepip install -r requirements/requirements-dev.txt- Installs development dependenciespre-commit install- Sets up code formatting and linting hooks
# Check MindsDB installation
python -c "import mindsdb; print('MindsDB installed successfully!')"
# View available Makefile commands
make help # or just: make| Command | Description |
|---|---|
make install_mindsdb |
Complete MindsDB development installation |
make install_handler HANDLER_NAME=elasticsearch |
Install specific handler |
make run_mindsdb |
Start MindsDB server |
make unit_tests |
Run unit tests |
make integration_tests |
Run integration tests |
make format |
Format code with pre-commit hooks |
make check |
Run code quality checks |
make build_docker |
Build Docker image |
make run_docker |
Run MindsDB in Docker |
# Create docker-compose.yml for Elasticsearch
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
container_name: elasticsearch
environment:
- node.name=elasticsearch
- cluster.name=docker-cluster
- discovery.type=single-node
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
# Disable security for development (NOT for production)
- xpack.security.enabled=false
- xpack.security.enrollment.enabled=false
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- elasticsearch_data:/usr/share/elasticsearch/data
ports:
- "9200:9200"
- "9300:9300"
networks:
- elastic
volumes:
elasticsearch_data:
driver: local
networks:
elastic:
driver: bridge
EOF
# Start Elasticsearch
docker-compose up -d elasticsearch
# Verify Elasticsearch is running
curl -X GET "localhost:9200/"# Install Elasticsearch
brew tap elastic/tap
brew install elastic/tap/elasticsearch-full
# Start Elasticsearch
brew services start elastic/tap/elasticsearch-full
# Configure for development (disable security)
echo "xpack.security.enabled: false" >> /opt/homebrew/etc/elasticsearch/elasticsearch.yml
brew services restart elastic/tap/elasticsearch-full# Install Elasticsearch APT repository
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
# Install Elasticsearch
sudo apt update
sudo apt install elasticsearch
# Configure for development
sudo bash -c 'cat >> /etc/elasticsearch/elasticsearch.yml << EOF
network.host: localhost
discovery.type: single-node
xpack.security.enabled: false
EOF'
# Start Elasticsearch
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch# Check Elasticsearch health
curl -X GET "localhost:9200/_cluster/health?pretty"
# Expected response:
# {
# "cluster_name" : "docker-cluster",
# "status" : "green" or "yellow",
# "number_of_nodes" : 1
# }
# Check Elasticsearch version
curl -X GET "localhost:9200/"# Install Elasticsearch handler specifically
make install_handler HANDLER_NAME=elasticsearch
# Verify handler dependencies
pip list | grep elasticsearch
# Should show:
# elasticsearch>=7.13.4,<9.0.0
# elasticsearch-dbapi>=0.2.9# Start MindsDB with HTTP API
make run_mindsdb
# Alternative: Start with specific options
python -m mindsdb --api http --port 47334
# Alternative: Start without web interface
python -m mindsdb --api http --no_studio# Check MindsDB health
curl -X GET "localhost:47334/api/status"
# Expected response:
# {"status": "ok"}
# View MindsDB web interface (if studio enabled)
open http://localhost:47334# Set up pre-commit hooks for code quality
make precommit
# Run code formatting
make format
# Run quality checks
make checkThe MindsDB Elasticsearch handler includes a comprehensive test data loading script:
# Make the script executable
chmod +x mindsdb/integrations/handlers/elasticsearch_handler/elasticsearch-test-data.sh
# Load test data (Elasticsearch must be running)
./mindsdb/integrations/handlers/elasticsearch_handler/elasticsearch-test-data.shThis script creates three test indices:
- Purpose: Tests Search API fallback mechanism
- Features: Array fields (tags, features), nested objects (reviews)
- Records: 2 sample products with complex structures
- Purpose: Tests optimal SQL API performance
- Features: Simple structure, no arrays
- Records: 3 sample users with basic information
- Purpose: Tests bulk data handling and pagination
- Features: Bulk data generation, performance metrics
- Records: 10+ sample orders with random data
# List all indices
curl -X GET "localhost:9200/_cat/indices?v"
# Check products index (should have arrays)
curl -X GET "localhost:9200/products/_search?pretty&size=1"
# Check users index (simple structure)
curl -X GET "localhost:9200/users/_search?pretty&size=1"
# Check orders index (bulk data)
curl -X GET "localhost:9200/orders/_count"If you prefer to create custom test data:
# Create custom index
curl -X PUT "localhost:9200/my_test_index" -H 'Content-Type: application/json' -d'
{
"mappings": {
"properties": {
"title": {"type": "text"},
"category": {"type": "keyword"},
"price": {"type": "float"},
"tags": {"type": "keyword"},
"created_at": {"type": "date"}
}
}
}'
# Add sample document
curl -X POST "localhost:9200/my_test_index/_doc" -H 'Content-Type: application/json' -d'
{
"title": "Sample Product",
"category": "Electronics",
"price": 29.99,
"tags": ["sample", "test", "electronics"],
"created_at": "2024-01-01T00:00:00Z"
}'
# Refresh index
curl -X POST "localhost:9200/my_test_index/_refresh"Using MindsDB SQL interface:
-- Connect to MindsDB
-- Method 1: Using MindsDB web interface (http://localhost:47334)
-- Method 2: Using curl commands
-- Method 3: Using Python client
-- Create Elasticsearch datasource
CREATE DATABASE my_elasticsearch
WITH ENGINE = 'elasticsearch',
PARAMETERS = {
"hosts": "localhost:9200"
};# Create datasource via API
curl -X POST "localhost:47334/api/databases" \
-H "Content-Type: application/json" \
-d '{
"name": "my_elasticsearch",
"engine": "elasticsearch",
"parameters": {
"hosts": "localhost:9200"
}
}'
# Verify connection
curl -X GET "localhost:47334/api/databases"For production or secure environments:
-- Elasticsearch with authentication
CREATE DATABASE secure_elasticsearch
WITH ENGINE = 'elasticsearch',
PARAMETERS = {
"hosts": "your-cluster.elasticsearch.com:9200",
"user": "elastic",
"password": "your-password",
"verify_certs": true
};
-- Elasticsearch Cloud
CREATE DATABASE elastic_cloud
WITH ENGINE = 'elasticsearch',
PARAMETERS = {
"cloud_id": "your-cloud-deployment-id",
"user": "elastic",
"password": "your-cloud-password"
};
-- Elasticsearch with SSL
CREATE DATABASE ssl_elasticsearch
WITH ENGINE = 'elasticsearch',
PARAMETERS = {
"hosts": "https://localhost:9200",
"ca_certs": "/path/to/ca.pem",
"verify_certs": true,
"timeout": 30
};-- List available databases
SHOW DATABASES;
-- Test connection
SELECT 1;
-- List tables from Elasticsearch
SHOW TABLES FROM my_elasticsearch;-- Test basic data retrieval
SELECT * FROM my_elasticsearch.users LIMIT 5;
-- Test specific field selection
SELECT username, email, age FROM my_elasticsearch.users;
-- Test counting records
SELECT COUNT(*) as total_users FROM my_elasticsearch.users;-- Test equality filtering
SELECT * FROM my_elasticsearch.users WHERE city = 'New York';
-- Test range queries
SELECT * FROM my_elasticsearch.users WHERE age BETWEEN 25 AND 35;
-- Test boolean conditions
SELECT * FROM my_elasticsearch.users WHERE active = true;
-- Test multiple conditions
SELECT username, city FROM my_elasticsearch.users
WHERE age > 30 AND active = true;-- Test sorting
SELECT username, age FROM my_elasticsearch.users ORDER BY age DESC;
-- Test pagination
SELECT * FROM my_elasticsearch.orders ORDER BY total DESC LIMIT 5;
-- Test offset
SELECT * FROM my_elasticsearch.orders ORDER BY order_date LIMIT 3 OFFSET 2;-- Test array field retrieval (triggers Search API fallback)
SELECT name, tags, features FROM my_elasticsearch.products;
-- Test with array field filtering
SELECT name, price FROM my_elasticsearch.products
WHERE name = 'Wireless Headphones';-- Test nested object retrieval
SELECT name, reviews FROM my_elasticsearch.products;
-- Test complex document structure
SELECT * FROM my_elasticsearch.products LIMIT 1;-- Test COUNT aggregation
SELECT category, COUNT(*) as product_count
FROM my_elasticsearch.products GROUP BY category;
-- Test numerical aggregations
SELECT AVG(age) as avg_age, MAX(age) as max_age, MIN(age) as min_age
FROM my_elasticsearch.users;
-- Test SUM aggregation
SELECT SUM(total) as total_revenue FROM my_elasticsearch.orders;-- Test HAVING clause
SELECT status, COUNT(*) as order_count, AVG(total) as avg_total
FROM my_elasticsearch.orders
GROUP BY status
HAVING COUNT(*) > 3;
-- Test multiple aggregations
SELECT customer, COUNT(*) as orders, SUM(total) as total_spent
FROM my_elasticsearch.orders
GROUP BY customer
ORDER BY total_spent DESC;# First, generate more test data
for i in {1..100}; do
curl -X POST "localhost:9200/orders/_doc" -H 'Content-Type: application/json' -d"
{
\"order_id\": \"BULK-${i}\",
\"customer\": \"bulk_customer_${i}\",
\"total\": $((RANDOM % 1000 + 10)).99,
\"status\": \"completed\",
\"order_date\": \"2024-$((RANDOM % 12 + 1))-$((RANDOM % 28 + 1))\",
\"items\": $((RANDOM % 10 + 1))
}"
done
curl -X POST "localhost:9200/_refresh"-- Test large result set handling
SELECT * FROM my_elasticsearch.orders;
-- Test pagination performance
SELECT * FROM my_elasticsearch.orders ORDER BY total DESC LIMIT 50;
-- Test aggregation performance on large dataset
SELECT status, COUNT(*) as count, AVG(total) as avg_total
FROM my_elasticsearch.orders
GROUP BY status;-- Test complex WHERE clauses
SELECT customer, order_id, total
FROM my_elasticsearch.orders
WHERE total > 100 AND status = 'completed'
ORDER BY total DESC LIMIT 20;
-- Test date range queries
SELECT COUNT(*) FROM my_elasticsearch.orders
WHERE order_date >= '2024-01-01' AND order_date < '2024-12-31';-- List all tables
SHOW TABLES FROM my_elasticsearch;
-- Get column information
DESCRIBE my_elasticsearch.products;
DESCRIBE my_elasticsearch.users;
DESCRIBE my_elasticsearch.orders;-- Alternative schema discovery
SELECT table_name FROM information_schema.tables
WHERE table_schema = 'my_elasticsearch';
-- Column details
SELECT column_name, data_type FROM information_schema.columns
WHERE table_name = 'products' AND table_schema = 'my_elasticsearch';-- Test non-existent table
SELECT * FROM my_elasticsearch.nonexistent_table;
-- Test invalid column
SELECT invalid_column FROM my_elasticsearch.users;
-- Test unsupported operations
SELECT * FROM my_elasticsearch.users u
JOIN my_elasticsearch.orders o ON u.username = o.customer;# Stop Elasticsearch temporarily
docker-compose stop elasticsearch
# Try to query (should fail gracefully)
curl -X POST "localhost:47334/api/sql/query" \
-H "Content-Type: application/json" \
-d '{"query": "SELECT * FROM my_elasticsearch.users LIMIT 1"}'
# Restart Elasticsearch
docker-compose start elasticsearch-- Product catalog analysis
SELECT category, COUNT(*) as product_count, AVG(price) as avg_price
FROM my_elasticsearch.products
GROUP BY category;
-- Customer analysis
SELECT city, COUNT(*) as users,
SUM(CASE WHEN active = true THEN 1 ELSE 0 END) as active_users
FROM my_elasticsearch.users
GROUP BY city;-- Sales performance
SELECT
DATE_FORMAT(order_date, '%Y-%m') as month,
COUNT(*) as orders,
SUM(total) as revenue,
AVG(total) as avg_order_value
FROM my_elasticsearch.orders
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
ORDER BY month;
-- Customer segmentation
SELECT
CASE
WHEN total > 500 THEN 'High Value'
WHEN total > 200 THEN 'Medium Value'
ELSE 'Low Value'
END as segment,
COUNT(*) as customers
FROM my_elasticsearch.orders
GROUP BY segment;# Create performance test script
cat > performance_test.sh << 'EOF'
#!/bin/bash
echo "π MindsDB Elasticsearch Performance Test"
echo "========================================"
# Test 1: Simple query performance
echo "Test 1: Simple SELECT query"
time curl -s -X POST "localhost:47334/api/sql/query" \
-H "Content-Type: application/json" \
-d '{"query": "SELECT COUNT(*) FROM my_elasticsearch.users"}' \
> /dev/null
# Test 2: Array field query (fallback) performance
echo "Test 2: Array field query (Search API fallback)"
time curl -s -X POST "localhost:47334/api/sql/query" \
-H "Content-Type: application/json" \
-d '{"query": "SELECT name, tags FROM my_elasticsearch.products LIMIT 10"}' \
> /dev/null
# Test 3: Aggregation performance
echo "Test 3: Aggregation query"
time curl -s -X POST "localhost:47334/api/sql/query" \
-H "Content-Type: application/json" \
-d '{"query": "SELECT status, COUNT(*), AVG(total) FROM my_elasticsearch.orders GROUP BY status"}' \
> /dev/null
# Test 4: Large result set
echo "Test 4: Large result set (100 records)"
time curl -s -X POST "localhost:47334/api/sql/query" \
-H "Content-Type: application/json" \
-d '{"query": "SELECT * FROM my_elasticsearch.orders LIMIT 100"}' \
> /dev/null
echo "β
Performance tests completed!"
EOF
chmod +x performance_test.sh
./performance_test.sh# Monitor MindsDB memory usage during testing
cat > memory_monitor.sh << 'EOF'
#!/bin/bash
echo "π Memory Usage Monitor"
echo "======================"
# Find MindsDB process
PID=$(pgrep -f "python.*mindsdb")
if [ -z "$PID" ]; then
echo "β MindsDB process not found"
exit 1
fi
echo "π Monitoring MindsDB process: $PID"
# Monitor memory for 60 seconds
for i in {1..12}; do
MEMORY=$(ps -p $PID -o rss= | awk '{print $1/1024 " MB"}')
echo "$(date): Memory usage: $MEMORY"
sleep 5
done
EOF
chmod +x memory_monitor.sh
# Run in separate terminal while executing queries
./memory_monitor.sh &Symptoms:
pip install -e .fails with dependency errors- Missing development tools errors
Solutions:
# Update pip and setuptools
pip install --upgrade pip setuptools wheel
# Install system dependencies (Ubuntu/Debian)
sudo apt install python3.11-dev build-essential
# Install system dependencies (macOS)
xcode-select --install
# Clean and reinstall
pip cache purge
make install_mindsdbSymptoms:
ConnectionError: HTTPConnectionPool(host='localhost', port=9200)- MindsDB cannot connect to Elasticsearch
Diagnostic Steps:
# Check if Elasticsearch is running
curl -X GET "localhost:9200/"
# Check Docker containers
docker ps | grep elasticsearch
# Check system services
sudo systemctl status elasticsearch # Linux
brew services list | grep elasticsearch # macOSSolutions:
# Restart Elasticsearch Docker
docker-compose restart elasticsearch
# Start Elasticsearch service
sudo systemctl start elasticsearch # Linux
brew services start elastic/tap/elasticsearch-full # macOS
# Check Elasticsearch logs
docker-compose logs elasticsearch
sudo journalctl -u elasticsearch # LinuxSymptoms:
- "Arrays are not supported" errors
- Search API fallback not triggering
Verification:
# Check if test data has arrays
curl -X GET "localhost:9200/products/_search?pretty&size=1"
# Verify handler is detecting arrays
# Check MindsDB logs for "using Search API fallback"Solutions:
# Clear handler cache (restart MindsDB)
pkill -f "python.*mindsdb"
make run_mindsdb
# Recreate test data
./mindsdb/integrations/handlers/elasticsearch_handler/elasticsearch-test-data.shSymptoms:
- Queries taking >10 seconds
- High memory usage
- MindsDB becoming unresponsive
Diagnostic Tools:
# Monitor system resources
top -p $(pgrep -f "python.*mindsdb")
# Check Elasticsearch cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"
# Monitor Elasticsearch performance
curl -X GET "localhost:9200/_nodes/stats?pretty"Optimization Steps:
# Increase JVM heap for Elasticsearch
echo "ES_JAVA_OPTS=-Xms2g -Xmx2g" >> docker-compose.yml
# Restart with more memory
docker-compose down
docker-compose up -d
# Use pagination for large queries
# SELECT * FROM table LIMIT 1000;Symptoms:
ImportError: No module named 'elasticsearch'- Handler not found errors
Solutions:
# Install handler dependencies explicitly
pip install elasticsearch>=7.13.4 elasticsearch-dbapi>=0.2.9
# Reinstall handler
make install_handler HANDLER_NAME=elasticsearch
# Verify installation
python -c "from elasticsearch import Elasticsearch; print('β
OK')"# Enable debug logging
export MINDSDB_LOG_LEVEL=DEBUG
# Start MindsDB with verbose logging
python -m mindsdb --api http --verbose
# Check logs
tail -f ~/.mindsdb/mindsdb.log# Check MindsDB version
python -c "import mindsdb; print(mindsdb.__version__)"
# Run MindsDB diagnostics
python -m mindsdb --help
# Check handler status
curl -X GET "localhost:47334/api/handlers"-- Production Elasticsearch connection
CREATE DATABASE prod_elasticsearch
WITH ENGINE = 'elasticsearch',
PARAMETERS = {
"hosts": "https://your-cluster.com:9200",
"user": "readonly_user",
"password": "strong_password",
"verify_certs": true,
"ca_certs": "/path/to/ca.crt",
"timeout": 60
};# Production MindsDB startup
python -m mindsdb \
--api http \
--port 47334 \
--host 0.0.0.0 \
--no_studio \
--verbose
# Production Elasticsearch configuration
# - Increase heap size: -Xms4g -Xmx4g
# - Enable monitoring
# - Configure proper security
# - Set up clustering for high availability# Health check script
cat > health_check.sh << 'EOF'
#!/bin/bash
# Check MindsDB
curl -f http://localhost:47334/api/status || exit 1
# Check Elasticsearch
curl -f http://localhost:9200/_cluster/health || exit 1
# Test database connection
curl -X POST "localhost:47334/api/sql/query" \
-H "Content-Type: application/json" \
-d '{"query": "SELECT 1"}' || exit 1
echo "β
All services healthy"
EOF
chmod +x health_check.sh# Backup MindsDB configuration
cp -r ~/.mindsdb/ ./mindsdb_backup/
# Backup Elasticsearch data
curl -X POST "localhost:9200/_snapshot/backup/snapshot_$(date +%Y%m%d)"
# Document your datasource configurations
echo "CREATE DATABASE my_elasticsearch WITH ENGINE = 'elasticsearch', PARAMETERS = {...};" > datasources_backup.sqlThis comprehensive guide covered:
β Complete MindsDB setup from source using Makefile β Elasticsearch installation and configuration β Handler installation and verification β Comprehensive test data creation β Extensive test scenarios covering all functionality β Performance testing and benchmarking β Troubleshooting guide for common issues β Production deployment best practices
# 1. Setup MindsDB
git clone https://github.com/mindsdb/mindsdb.git && cd mindsdb
python3.11 -m venv venv && source venv/bin/activate
make install_mindsdb
# 2. Setup Elasticsearch
docker-compose up -d elasticsearch
# 3. Load test data
./mindsdb/integrations/handlers/elasticsearch_handler/elasticsearch-test-data.sh
# 4. Start MindsDB
make run_mindsdb
# 5. Create datasource
curl -X POST "localhost:47334/api/databases" -H "Content-Type: application/json" -d '{"name": "my_elasticsearch", "engine": "elasticsearch", "parameters": {"hosts": "localhost:9200"}}'
# 6. Test query
curl -X POST "localhost:47334/api/sql/query" -H "Content-Type: application/json" -d '{"query": "SELECT * FROM my_elasticsearch.users LIMIT 5"}'You now have a complete development environment for working with MindsDB and Elasticsearch integration!
- MindsDB Documentation: https://docs.mindsdb.com/
- Elasticsearch Documentation: https://www.elastic.co/guide/
- MindsDB GitHub: https://github.com/mindsdb/mindsdb
- Community Support: https://community.mindsdb.com/
- Handler Source Code:
mindsdb/integrations/handlers/elasticsearch_handler/