Secure MongoDB Query MCP Tool

Architecture Overview

sequenceDiagram
    participant LLM as LLM Agent
    participant MCP as MCP Tool
    participant Auth as Session Token
    participant Rewrite as Query Rewriter
    participant DB as MongoDB

    LLM->>MCP: Query with Bearer token
    MCP->>Auth: Decode token
    Auth-->>MCP: user_id
    MCP->>MCP: Load user → company
    MCP->>Rewrite: Rewrite query with company_id
    Rewrite->>Rewrite: Validate + inject filter
    Rewrite-->>MCP: Scoped query
    MCP->>DB: Execute query
    DB-->>MCP: Results
    MCP-->>LLM: Scoped results

Design Approach: Allowlisted Collections with Query Rewriting

This approach balances flexibility with security:

Admin-only access - Only company admins can use these MCP tools
Allowlist collections - Only permit queries against pre-defined collections
Auto-inject scoping filter - Based on collection type:

Company-scoped: Inject company_id = user.company_id
User-scoped: Inject user_id IN (all company user IDs)

Validate aggregation pipelines - Ensure $match is first stage, inject scoping filter
Block dangerous operators - Disallow $out, $merge, and unrestricted $lookup
Read-only - Only support find, aggregate, count operations

Two Scoping Strategies

flowchart TD
    subgraph auth [Authorization]
        A[Request with token] --> B[Decode token]
        B --> C{Is user admin?}
        C -->|No| D[Reject: 403]
        C -->|Yes| E[Get company_id]
    end

    subgraph scope [Scoping]
        E --> F{Collection type?}
        F -->|company_id field| G["Inject: company_id = X"]
        F -->|user_id field| H[Get all company user IDs]
        H --> I["Inject: user_id IN [...]"]
    end

    G --> J[Execute query]
    I --> J

Company-scoped collections (most collections):

Have a company_id field
Scoper injects: { company_id: company.id }
Any user-provided company_id is overwritten (prevents injection)

User-scoped collections (e.g., app_notifications, llm_runs):

Have a user_id field but no company_id
Caller must specify user_id or { user_id: { $in: [...] } }
Scoper validates ALL specified user_ids belong to the company
Rejects entire query if any user_id is outside the company

Key Files to Create

1. Query Executor (Read-Only)

Location: lib/bizy/mongodb_query_executor.rb

This is the ONLY interface to MongoDB for MCP tools. It uses the read-only client and exposes only read methods:

module Bizy
  class MongodbQueryExecutor
    MAX_LIMIT = 100
    DEFAULT_LIMIT = 20
    QUERY_TIMEOUT_MS = 30_000

    def initialize
      @client = Mongoid.client(:mcp_readonly)
    end

    # The ONLY three public methods - no write operations exist
    def find(collection:, filter:, projection: nil, sort: nil, limit: DEFAULT_LIMIT)
      limit = [limit.to_i, MAX_LIMIT].min

      cursor = @client[collection]
        .find(filter)
        .projection(projection || {})
        .sort(sort || { _id: -1 })
        .limit(limit)
        .max_time_ms(QUERY_TIMEOUT_MS)

      cursor.to_a
    end

    def aggregate(collection:, pipeline:)
      @client[collection].aggregate(
        pipeline,
        max_time_ms: QUERY_TIMEOUT_MS
      ).to_a
    end

    def count(collection:, filter:)
      @client[collection]
        .count_documents(filter, max_time_ms: QUERY_TIMEOUT_MS)
    end

    # NO insert, update, delete, bulk_write methods exist
    # The class literally cannot write - it only has read methods
  end
end

2. Query Scoper Service

Location: lib/bizy/mongodb_query_scoper.rb

This service handles query rewriting for both company-scoped and user-scoped collections:

module Bizy
  class MongodbQueryScoper
    def initialize(company:)
      @company = company
      @company_id = company.id
    end

    def scope_find(collection:, filter:)
      validate_collection!(collection)
      merge_scope_filter(collection, filter)
    end

    def scope_aggregate(collection:, pipeline:)
      validate_collection!(collection)
      MongodbPipelineValidator.validate!(pipeline)
      inject_scope_into_pipeline(collection, pipeline)
    end

    private

    def validate_collection!(collection)
      unless MongodbCollectionRegistry.allowed?(collection)
        raise Bizy::UnauthorizedCollectionError,
          "Collection '#{collection}' is not available for querying"
      end
    end

    def merge_scope_filter(collection, filter)
      config = MongodbCollectionRegistry.config(collection)

      case config[:scope_type]
      when :company
        # Company-scoped: inject company_id, ignore any user-provided company_id
        filter.merge(config[:scope_field] => @company_id)
      when :user
        # User-scoped: validate and scope user_ids from filter
        scope_user_filter(config[:scope_field], filter)
      else
        raise "Unknown scope type: #{config[:scope_type]}"
      end
    end

    def scope_user_filter(scope_field, filter)
      requested_user_ids = extract_user_ids(filter[scope_field])

      if requested_user_ids.present?
        # Validate ALL requested user_ids belong to this company
        validate_user_ids_in_company!(requested_user_ids)
        # Keep the filter as-is (already validated)
        filter
      else
        # No user_ids specified - require them for user-scoped queries
        raise Bizy::InvalidFilterError,
          "user_id is required for this collection. Provide a specific user_id or array of user_ids."
      end
    end

    def extract_user_ids(user_id_filter)
      case user_id_filter
      when BSON::ObjectId, String
        [normalize_id(user_id_filter)]
      when Hash
        # Handle { "$in" => [...] } syntax
        user_id_filter["$in"]&.map { |id| normalize_id(id) }
      when Array
        user_id_filter.map { |id| normalize_id(id) }
      else
        nil
      end
    end

    def validate_user_ids_in_company!(user_ids)
      # Check that ALL user_ids belong to this company
      valid_count = @company.users.where(:_id.in => user_ids).count

      unless valid_count == user_ids.length
        raise Bizy::UnauthorizedUserError,
          "One or more user_ids do not belong to this company"
      end
    end

    def normalize_id(id)
      id.is_a?(BSON::ObjectId) ? id : BSON::ObjectId(id.to_s)
    end

    def inject_scope_into_pipeline(collection, pipeline)
      # For aggregation, extract user_ids from first $match stage and validate
      config = MongodbCollectionRegistry.config(collection)

      case config[:scope_type]
      when :company
        inject_company_filter(config[:scope_field], pipeline)
      when :user
        validate_user_filter_in_pipeline(config[:scope_field], pipeline)
      end

      pipeline
    end

    def inject_company_filter(scope_field, pipeline)
      if pipeline.first&.key?("$match")
        pipeline.first["$match"][scope_field] = @company_id
      else
        pipeline.unshift({ "$match" => { scope_field => @company_id } })
      end
    end

    def validate_user_filter_in_pipeline(scope_field, pipeline)
      first_match = pipeline.first&.dig("$match")
      raise Bizy::InvalidFilterError, "Pipeline must start with $match containing user_id" unless first_match

      user_ids = extract_user_ids(first_match[scope_field])
      raise Bizy::InvalidFilterError, "user_id is required in first $match stage" if user_ids.blank?

      validate_user_ids_in_company!(user_ids)
    end
  end
end

Key design decisions for user-scoped collections:

Caller must specify user_ids - No automatic "all company users" query (too expensive, rarely what you want)
Validation before execution - All specified user_ids are validated to belong to the company
Fail closed - If any user_id is invalid/outside company, the entire query fails
Flexible input - Accepts single ID, array, or { "$in": [...] } syntax

3. MCP Tools

Create three tools following the existing Bizy::BaseTool pattern.

All tools require admin authorization:

# Shared authorization helper in BaseTool or module
def require_admin!
  unless user.company_admin? || user.can_administer?(:global)
    raise Bizy::NotAuthorizedError, "Admin access required"
  end
end

a. Find Tool - lib/bizy/tools/mongodb_find.rb

module Bizy
  module Tools
    class MongodbFind < Bizy::BaseTool
      tool_name "mongodb_find"
      description "Query documents from a MongoDB collection. Admin access required. Results are automatically scoped to your company."

      input_schema do
        {
          type: "object",
          properties: {
            collection: { type: "string", description: "Collection name (e.g., 'bonuses', 'users')" },
            filter: { type: "object", description: "MongoDB query filter" },
            projection: { type: "object", description: "Fields to include/exclude" },
            sort: { type: "object", description: "Sort order" },
            limit: { type: "integer", description: "Max docs to return (default: 20, max: 100)" }
          },
          required: ["collection"]
        }
      end

      def call
        require_admin!  # Only admins can use this tool

        scoper = MongodbQueryScoper.new(company: company)
        scoped_filter = scoper.scope_find(
          collection: arguments[:collection],
          filter: arguments[:filter] || {}
        )

        executor = MongodbQueryExecutor.new
        results = executor.find(
          collection: arguments[:collection],
          filter: scoped_filter,
          projection: arguments[:projection],
          sort: arguments[:sort],
          limit: arguments[:limit] || 20
        )

        success(
          collection: arguments[:collection],
          count: results.length,
          documents: results
        )
      rescue Bizy::UnauthorizedCollectionError => e
        error(e.message)
      end
    end
  end
end

b. Aggregate Tool - lib/bizy/tools/mongodb_aggregate.rb

c. Count Tool - lib/bizy/tools/mongodb_count.rb

4. Collection Registry

Location: lib/bizy/mongodb_collection_registry.rb

Defines which collections are queryable and their scoping rules:

module Bizy
  class MongodbCollectionRegistry
    # Scoping types:
    # - :company - Filter by company_id field
    # - :user    - Filter by user_id IN (all company user IDs)

    COLLECTIONS = {
      # Company-scoped collections (have company_id field)
      "bonuses" => {
        scope_type: :company,
        scope_field: "company_id",
        description: "Recognition given between employees"
      },
      "users" => {
        scope_type: :company,
        scope_field: "company_id",
        description: "Employee records"
      },
      "check_ins" => {
        scope_type: :company,
        scope_field: "company_id",
        description: "Employee check-ins"
      },
      "monthly_user_statistics" => {
        scope_type: :company,
        scope_field: "company_id",
        description: "Monthly aggregated user metrics"
      },

      # User-scoped collections (have user_id but no company_id)
      "app_notifications" => {
        scope_type: :user,
        scope_field: "user_id",
        description: "User notifications"
      },
      "llm_runs" => {
        scope_type: :user,
        scope_field: "user_id",
        description: "LLM query history"
      },
      "user_content_dismissals" => {
        scope_type: :user,
        scope_field: "user_id",
        description: "Dismissed content by user"
      },
      # ... etc
    }.freeze

    def self.allowed?(collection_name)
      COLLECTIONS.key?(collection_name)
    end

    def self.scope_type(collection_name)
      COLLECTIONS.dig(collection_name, :scope_type)
    end

    def self.scope_field(collection_name)
      COLLECTIONS.dig(collection_name, :scope_field)
    end

    def self.config(collection_name)
      COLLECTIONS[collection_name]
    end
  end
end

Security Considerations

Read-Only Enforcement (Defense in Depth)

CRITICAL: The MCP tool must have ZERO ability to modify data. This is enforced at multiple layers:

Layer 1: Database-Level (Strongest)

Create a dedicated MongoDB user with read-only permissions for this MCP tool:

// MongoDB shell command to create read-only user
db.createUser({
  user: "bizy_mcp_readonly",
  pwd: "<secure-password>",
  roles: [
    { role: "read", db: "bonusly_production" }
  ]
})

Configure a separate Mongoid client for MCP queries with secondary-only reads:

# config/mongoid.yml
production:
  clients:
    default:
      # ... normal read/write connection
    mcp_readonly:
      uri: <%= ENV['MONGODB_MCP_READONLY_URI'] %>
      options:
        read:
          mode: :secondary  # NOT secondary_preferred - ONLY secondary replicas
        max_pool_size: 5
        server_selection_timeout: 10  # Fail fast if no secondary available

Important: Use :secondary mode, NOT :secondary_preferred. This ensures:

Queries NEVER touch the primary database
If no secondary is available, the query fails (rather than falling back to primary)
Production primary is completely isolated from MCP query load

# In the MCP tool - use read-only client explicitly
def readonly_client
  Mongoid.client(:mcp_readonly)
end

def execute_query(collection_name, filter)
  readonly_client[collection_name].find(filter).to_a
end

Why this is the strongest layer: Even if application code has bugs, MongoDB itself will reject any write operations with a permission error.

Layer 2: Code-Level (API Surface)

Only expose read operations - no insert, update, delete, or bulk write methods:

module Bizy
  class MongodbQueryExecutor
    # ONLY these methods exist - no write methods at all
    def find(collection:, filter:, projection: nil, sort: nil, limit: 20)
      # ...
    end

    def aggregate(collection:, pipeline:)
      # ...
    end

    def count(collection:, filter:)
      # ...
    end

    # NO insert, update, delete, bulk_write, etc.
  end
end

Layer 3: Pipeline Validation (Aggregation)

Block ALL aggregation stages that can write or modify data:

FORBIDDEN_STAGES = %w[
  $out           # Writes to collection
  $merge         # Writes/updates to collection
  $set           # Can modify documents (though read-only user blocks this)
  $unset         # Can modify documents
  $replaceRoot   # Safe for reads, but validate carefully
  $replaceWith   # Safe for reads, but validate carefully
].freeze

# These are write-capable but blocked by read-only user anyway:
# $addFields, $project, $group are safe (in-pipeline transforms only)

def validate_pipeline!(pipeline)
  pipeline.each do |stage|
    stage_name = stage.keys.first.to_s
    if FORBIDDEN_STAGES.include?(stage_name)
      raise Bizy::ForbiddenOperatorError,
        "Aggregation stage '#{stage_name}' is not allowed"
    end
  end
end

Layer 4: Query Validation

Reject any query operators that could have side effects (none exist in MongoDB queries, but validate anyway):

def validate_filter!(filter)
  # MongoDB find filters are inherently read-only
  # But validate structure to prevent injection
  raise Bizy::InvalidFilterError unless filter.is_a?(Hash)
end

Summary: Defense in Depth

Layer	What it blocks	Failure mode
Read-Only DB User	All writes	MongoDB permission error
Secondary-Only Mode	Primary DB access	Query fails if no secondary available
API Surface	No write methods exposed	Method doesn't exist
Pipeline Validation	$out, $merge, $lookup	Application error before query runs

All layers are mandatory:

Read-only DB user: Prevents any data modification at the database level
Secondary-only mode: Isolates primary from query load; MCP can never affect write performance
API surface: Defense against code bugs - write methods don't exist
Pipeline validation: Defense against LLM-crafted malicious aggregation pipelines

$lookup Handling

$lookup stages can join to other collections, potentially exposing cross-company data. Two options:

Option A: Disallow $lookup entirely (simplest, most secure)

FORBIDDEN_PIPELINE_OPERATORS = %w[$out $merge $lookup $graphLookup].freeze

Option B: Inject company filter into $lookup pipelines (more flexible, complex)

def secure_lookup(lookup_stage)
  # Validate "from" is in allowed collections
  # Inject company filter into lookup's pipeline
end

Recommendation: Start with Option A (disallow $lookup), then consider Option B if needed.

Additional Safety Measures

Result size limits - Cap response size to prevent memory issues
Query timeout - Set maxTimeMS to prevent runaway queries
Rate limiting - Limit queries per minute per company
Audit logging - Log all queries for security review
Read from secondary - Use secondary DB to reduce primary load

def execute_query(collection, pipeline)
  collection.aggregate(
    pipeline,
    max_time_ms: 30_000,
    read: { mode: :secondary_preferred }
  ).to_a
end

Alternative Approaches Considered

Template-Based Queries (Rejected)

Pre-define parameterized query templates that the LLM fills in:

TEMPLATES = {
  "recognition_by_user" => {
    collection: "bonuses",
    filter: ->(params) { { receiver_ids: params[:user_id] } }
  }
}

Pros: Maximum security, no query parsing Cons: Very limited flexibility, new use cases require code changes

Rejected because it defeats the purpose of "arbitrary queries" and requires constant maintenance.

MongoDB Views (Rejected)

Create company-filtered views in MongoDB itself.

Pros: Database-level security Cons: Can't create per-company views (impractical), views are static

Proxy to Existing MongoDB MCP (Rejected)

Wrap the existing user-MongoDB-prod MCP server.

Pros: Less code to write Cons: Still need to intercept and rewrite queries, harder to control

Security Audit: Attack Vectors and Mitigations

This section documents all considered attack vectors and their mitigations.

Attack 1: Nested Filter Injection (CRITICAL)

Attack:

filter: {
  "$or": [
    { "company_id": "other_company_id", "amount": { "$gt": 0 } },
    { "amount": { "$lt": 0 } }
  ]
}

Problem: Simple filter.merge("company_id" => X) only sets top-level company_id. The $or clause still contains the attacker's company_id and will match documents from other companies.

Mitigation: Recursively sanitize company_id from entire filter structure:

def sanitize_scope_field!(filter, scope_field)
  filter.each do |key, value|
    if key.to_s == scope_field
      filter.delete(key)  # Remove any user-provided scope field
    elsif value.is_a?(Hash)
      sanitize_scope_field!(value, scope_field)
    elsif value.is_a?(Array)
      value.each { |v| sanitize_scope_field!(v, scope_field) if v.is_a?(Hash) }
    end
  end
end

# Then inject our scope field at top level
filter[scope_field] = @company_id

Attack 2: $where JavaScript Injection (CRITICAL)

Attack:

filter: { "$where": "this.company_id == 'other_company_id'" }

Problem: $where executes JavaScript and can bypass any field-level filters.

Mitigation: Block dangerous filter operators:

FORBIDDEN_FILTER_OPERATORS = %w[$where $function].freeze
# Note: $expr is ALLOWED - it cannot bypass top-level field matches

def validate_filter_operators!(filter)
  filter.each do |key, value|
    if FORBIDDEN_FILTER_OPERATORS.include?(key.to_s)
      raise Bizy::ForbiddenOperatorError, "Operator '#{key}' is not allowed"
    end
    validate_filter_operators!(value) if value.is_a?(Hash)
    value.each { |v| validate_filter_operators!(v) if v.is_a?(Hash) } if value.is_a?(Array)
  end
end

Attack 3: $unionWith Collection Bypass (CRITICAL)

Attack:

pipeline: [
  { "$match": { "company_id": my_company } },
  { "$unionWith": { "coll": "bonuses" } },  # Unions ALL bonuses, unfiltered!
  { "$match": { "company_id": "other_company" } }
]

Problem: $unionWith brings in data from another collection without applying the company filter.

Mitigation: Block in forbidden stages list:

FORBIDDEN_STAGES = %w[
  $out           # Writes to collection
  $merge         # Writes to collection
  $lookup        # Cross-collection join (data leakage)
  $graphLookup   # Recursive cross-collection join
  $unionWith     # Unions unfiltered collection data
  $documents     # Creates documents from scratch
].freeze

Attack 4: $expr Filter Override (RE-EVALUATED: SAFE TO ALLOW)

Initial concern:

filter: { "$expr": { "$eq": ["$company_id", "other_company_id"] } }

Analysis: After deeper review, $expr cannot bypass top-level field matches.

MongoDB query semantics treat all top-level conditions as an implicit AND:

# After our injection:
{
  "company_id" => my_company,  # Our top-level field match
  "$expr" => { "$eq": ["$company_id", "other"] }  # Attacker's $expr
}

Both conditions must be true:

company_id == my_company (TRUE for company's documents)
$company_id == "other" → my_company == "other" (FALSE)

Result: No documents match. The attacker's $expr cannot override our field match.

Why we should ALLOW $expr:

60+ uses in existing analytics pipelines
Required for $lookup variable comparisons ($$user_id)
Required for field-to-field comparisons
Required for complex arithmetic conditions
Cannot bypass top-level field matches

Mitigation (minimal): Optionally sanitize scope field references inside $expr to prevent confusion, but not strictly required for security.

Decision: $expr is ALLOWED - removed from FORBIDDEN_FILTER_OPERATORS.

Attack 5: Type Confusion in User ID Validation (MEDIUM)

Attack:

# Attacker sends string ID
filter: { "user_id": "507f1f77bcf86cd799439011" }

# Validation might check ObjectId, but filter uses string

Problem: Mismatch between validation type and query type could allow bypass.

Mitigation: Normalize IDs in the filter itself, not just during validation:

def normalize_user_ids_in_filter!(filter, scope_field)
  user_id_value = filter[scope_field]
  return unless user_id_value

  filter[scope_field] = case user_id_value
  when String
    BSON::ObjectId(user_id_value)
  when BSON::ObjectId
    user_id_value
  when Hash
    if user_id_value["$in"]
      { "$in" => user_id_value["$in"].map { |id| BSON::ObjectId(id.to_s) } }
    else
      user_id_value
    end
  when Array
    { "$in" => user_id_value.map { |id| BSON::ObjectId(id.to_s) } }
  end
end

Attack 6: Regex Denial of Service (MEDIUM)

Attack:

filter: { "reason": { "$regex": "(a+)+$" } }

Problem: Catastrophic backtracking in regex causes CPU exhaustion (ReDoS).

Mitigation:

MAX_REGEX_LENGTH = 100

def validate_regex!(filter)
  filter.each do |key, value|
    if key.to_s == "$regex" && value.to_s.length > MAX_REGEX_LENGTH
      raise Bizy::InvalidFilterError, "Regex pattern too long"
    end
    validate_regex!(value) if value.is_a?(Hash)
    value.each { |v| validate_regex!(v) if v.is_a?(Hash) } if value.is_a?(Array)
  end
end

Attack 7: Information Disclosure via Errors (MEDIUM)

Attack:

# Probe for collection names
collection: "secret_internal_collection"
# Error reveals: "Collection 'secret_internal_collection' is not available"

# Probe for user existence
filter: { "user_id": "..." }
# Different errors reveal user existence

Problem: Detailed error messages leak information about data existence.

Mitigation: Use generic error messages:

# Bad - reveals information
raise "User ID 'abc' does not belong to this company"
raise "Collection 'xyz' is not available"

# Good - generic errors
raise Bizy::NotAuthorizedError, "Query not permitted"

Attack 8: Data Enumeration via Rate Abuse (MEDIUM)

Attack:

# Systematically extract all company data
loop do
  results = query(filter: { "_id": { "$gt": last_id } }, limit: 100)
  break if results.empty?
  last_id = results.last["_id"]
end

Problem: Even with proper scoping, attackers can extract entire datasets.

Mitigation: Rate limiting via rack-attack and audit logging:

# config/initializers/rack_attack.rb
Rack::Attack.throttle("mcp/company", limit: 30, period: 60) do |req|
  if req.path.start_with?("/mcp/")
    # Decode token to get company_id for throttle key
    token = req.get_header("HTTP_AUTHORIZATION")&.sub("Bearer ", "")
    if token
      decoded = Bizy::SessionToken.decode(token) rescue nil
      decoded&.dig(:company_id)&.to_s
    end
  end
end

# lib/bizy/mcp_audit_logger.rb
module Bizy
  class McpAuditLogger
    def self.log_query(user:, company:, collection:, filter:)
      Rails.logger.info("[MCP Query Audit]", {
        user_id: user.id.to_s,
        company_id: company.id.to_s,
        collection: collection,
        filter_keys: filter.keys,  # Log structure, not values (PII)
        timestamp: Time.current.iso8601
      })
    end
  end
end

Attack 9: $facet Nested Pipeline Bypass (MEDIUM)

Attack:

pipeline: [
  { "$match": { "company_id": my_company } },
  { "$facet": {
    "legitimate": [{ "$match": { "amount": 10 } }],
    "attack": [{ "$unionWith": { "coll": "users" } }]  # Nested forbidden stage!
  }}
]

Problem: $facet contains nested pipelines that might contain forbidden stages.

Mitigation: Recursively validate nested pipelines:

def validate_pipeline!(pipeline)
  pipeline.each do |stage|
    stage_name = stage.keys.first.to_s

    raise ForbiddenOperatorError, "Stage '#{stage_name}' not allowed" if FORBIDDEN_STAGES.include?(stage_name)

    # Recursively validate $facet sub-pipelines
    if stage_name == "$facet"
      stage["$facet"].each_value do |sub_pipeline|
        validate_pipeline!(sub_pipeline)
      end
    end
  end
end

Attack 10: Field Name Injection (LOW)

Attack:

filter: { "company_id\x00": "other" }      # Null byte
filter: { "company_id\u200B": "other" }    # Zero-width space

Problem: Unicode tricks might bypass string matching for field names.

Mitigation: Validate field name characters:

VALID_FIELD_NAME = /\A[\w.$]+\z/

def validate_field_names!(obj)
  return unless obj.is_a?(Hash)

  obj.each do |key, value|
    unless key.to_s.match?(VALID_FIELD_NAME)
      raise Bizy::InvalidFilterError, "Invalid field name format"
    end
    validate_field_names!(value) if value.is_a?(Hash)
    value.each { |v| validate_field_names!(v) if v.is_a?(Hash) } if value.is_a?(Array)
  end
end

Attack 11: Token Replay/Theft (LOW)

Attack: Steal or replay a valid session token to impersonate another user.

Mitigation: Already handled by existing Bizy::SessionToken:

Tokens expire after 1 hour
Tokens are signed with secret key (can't forge)
Tokens are generated server-side (LLM never sees user_id directly)

Attack Vector Summary

Priority	Attack Vector	Mitigation	Status
CRITICAL	Nested filter injection	Recursive sanitization of scope field	Required
CRITICAL	$where JavaScript	Block $where, $function	Required
CRITICAL	$unionWith bypass	Block $unionWith, $documents	Required
SAFE	$expr override	ALLOWED - cannot bypass field matches	Safe to allow
MEDIUM	Type confusion	Normalize IDs in filter	Required
MEDIUM	ReDoS	Limit regex length	Required
MEDIUM	Info disclosure	Generic error messages	Required
MEDIUM	Enumeration	Rate limiting + audit logging	Required
MEDIUM	$facet nesting	Recursive pipeline validation	Required
LOW	Field name tricks	Validate field name characters	Required
LOW	Token theft	Existing token expiration/signing	Already handled

Complete Forbidden Operators List

module Bizy
  class MongodbSecurityValidator
    # Filter operators that can bypass field-level security
    FORBIDDEN_FILTER_OPERATORS = %w[
      $where      # JavaScript execution - can bypass any filter
      $function   # JavaScript execution - can bypass any filter
    ].freeze
    # NOTE: $expr is ALLOWED - it cannot bypass top-level field matches
    #       and is required for 60+ existing analytics pipelines

    # Aggregation stages that can leak or write data
    FORBIDDEN_STAGES = %w[
      $out           # Writes to collection
      $merge         # Writes to collection
      $lookup        # Cross-collection join (data leakage)
      $graphLookup   # Recursive cross-collection join
      $unionWith     # Unions unfiltered collection data
      $documents     # Creates documents from scratch
    ].freeze

    MAX_REGEX_LENGTH = 100
    VALID_FIELD_NAME = /\A[\w.$]+\z/
  end
end

Testing Strategy

Unit tests for query scoper - Verify company_id is correctly injected
Collection allowlist tests - Verify disallowed collections are rejected
Pipeline validation tests - Verify dangerous operators are blocked
Read-only security tests - Verify writes are impossible at multiple layers
Integration tests - End-to-end query execution with auth

Example Specs

Query Scoper:

RSpec.describe Bizy::MongodbQueryScoper do
  let(:company) { create(:company) }
  let(:other_company) { create(:company) }
  let(:user1) { create(:user, company: company) }
  let(:user2) { create(:user, company: company) }
  let(:other_user) { create(:user, company: other_company) }

  describe "#scope_find" do
    context "company-scoped collections" do
      it "injects company_id into filter" do
        scoper = described_class.new(company: company)
        result = scoper.scope_find(collection: "bonuses", filter: { "amount" => 10 })
        expect(result).to eq({ "amount" => 10, "company_id" => company.id })
      end

      it "overwrites any user-provided company_id" do
        scoper = described_class.new(company: company)
        result = scoper.scope_find(
          collection: "bonuses",
          filter: { "company_id" => other_company.id }  # Attempted injection
        )
        expect(result["company_id"]).to eq(company.id)  # Overwritten with correct company
      end
    end

    context "user-scoped collections" do
      it "allows querying with valid user_id" do
        scoper = described_class.new(company: company)
        result = scoper.scope_find(
          collection: "app_notifications",
          filter: { "user_id" => user1.id, "seen_at" => nil }
        )
        expect(result["user_id"]).to eq(user1.id)
      end

      it "allows querying with array of valid user_ids" do
        scoper = described_class.new(company: company)
        result = scoper.scope_find(
          collection: "app_notifications",
          filter: { "user_id" => { "$in" => [user1.id, user2.id] } }
        )
        expect(result["user_id"]).to eq({ "$in" => [user1.id, user2.id] })
      end

      it "rejects user_id from another company" do
        scoper = described_class.new(company: company)
        expect {
          scoper.scope_find(
            collection: "app_notifications",
            filter: { "user_id" => other_user.id }
          )
        }.to raise_error(Bizy::UnauthorizedUserError)
      end

      it "rejects if any user_id in array is from another company" do
        scoper = described_class.new(company: company)
        expect {
          scoper.scope_find(
            collection: "app_notifications",
            filter: { "user_id" => { "$in" => [user1.id, other_user.id] } }
          )
        }.to raise_error(Bizy::UnauthorizedUserError)
      end

      it "requires user_id for user-scoped collections" do
        scoper = described_class.new(company: company)
        expect {
          scoper.scope_find(collection: "app_notifications", filter: {})
        }.to raise_error(Bizy::InvalidFilterError, /user_id is required/)
      end
    end

    it "rejects non-allowlisted collections" do
      scoper = described_class.new(company: company)
      expect { scoper.scope_find(collection: "secret_data", filter: {}) }
        .to raise_error(Bizy::UnauthorizedCollectionError)
    end
  end
end

Admin Authorization:

RSpec.describe Bizy::Tools::MongodbFind do
  let(:company) { create(:company) }

  context "with non-admin user" do
    let(:regular_user) { create(:user, company: company, company_admin: false) }

    it "rejects access" do
      result = described_class.execute(
        session_token: "token",
        user: regular_user,
        company: company,
        collection: "bonuses"
      )
      expect(result[:error]).to eq("Admin access required")
    end
  end

  context "with admin user" do
    let(:admin_user) { create(:user, company: company, company_admin: true) }

    it "allows access and returns scoped results" do
      result = described_class.execute(
        session_token: "token",
        user: admin_user,
        company: company,
        collection: "bonuses",
        filter: { amount: 10 }
      )
      expect(result[:data]).to be_present
      expect(result[:error]).to be_nil
    end
  end
end

Read-Only Security Tests (Critical):

RSpec.describe Bizy::MongodbQueryExecutor do
  describe "read-only enforcement" do
    it "does not expose insert method" do
      executor = described_class.new
      expect(executor).not_to respond_to(:insert)
      expect(executor).not_to respond_to(:insert_one)
      expect(executor).not_to respond_to(:insert_many)
    end

    it "does not expose update method" do
      executor = described_class.new
      expect(executor).not_to respond_to(:update)
      expect(executor).not_to respond_to(:update_one)
      expect(executor).not_to respond_to(:update_many)
    end

    it "does not expose delete method" do
      executor = described_class.new
      expect(executor).not_to respond_to(:delete)
      expect(executor).not_to respond_to(:delete_one)
      expect(executor).not_to respond_to(:delete_many)
    end

    it "uses the read-only MongoDB client" do
      executor = described_class.new
      client = executor.send(:client)
      # Verify it's the mcp_readonly client, not default
      expect(client).to eq(Mongoid.client(:mcp_readonly))
    end

    it "enforces secondary-only read mode" do
      executor = described_class.new
      client = executor.send(:client)
      read_preference = client.options[:read]
      # Must be :secondary (not :secondary_preferred) to never touch primary
      expect(read_preference[:mode]).to eq(:secondary)
    end
  end
end

RSpec.describe Bizy::MongodbPipelineValidator do
  describe "#validate!" do
    it "blocks $out stage" do
      pipeline = [{ "$match" => {} }, { "$out" => "output_collection" }]
      expect { described_class.validate!(pipeline) }
        .to raise_error(Bizy::ForbiddenOperatorError, /\$out/)
    end

    it "blocks $merge stage" do
      pipeline = [{ "$match" => {} }, { "$merge" => { into: "target" } }]
      expect { described_class.validate!(pipeline) }
        .to raise_error(Bizy::ForbiddenOperatorError, /\$merge/)
    end

    it "blocks $lookup stage" do
      pipeline = [{ "$lookup" => { from: "other", as: "joined" } }]
      expect { described_class.validate!(pipeline) }
        .to raise_error(Bizy::ForbiddenOperatorError, /\$lookup/)
    end

    it "allows safe stages like $match, $group, $project" do
      pipeline = [
        { "$match" => { status: "active" } },
        { "$group" => { _id: "$type", count: { "$sum" => 1 } } },
        { "$project" => { type: "$_id", count: 1 } }
      ]
      expect { described_class.validate!(pipeline) }.not_to raise_error
    end
  end
end

LLM Context: MCP Documentation Resources

A key challenge for the LLM is having enough context to formulate meaningful queries. Collection and field names alone don't explain what the data means or how it's calculated.

Approach: Curated Documentation as MCP Resources

Expose a curated subset of documentation as MCP resources that the LLM can discover and fetch:

sequenceDiagram
    participant User
    participant LLM
    participant MCP as MCP Server
    participant Docs as doc/mcp_resources/

    User->>LLM: "How are collaboration communities calculated?"
    LLM->>MCP: list_resources(pattern: "collaboration*")
    MCP-->>LLM: [collaboration_communities.md]
    LLM->>MCP: fetch_resource("bonusly://docs/features/collaboration_communities")
    MCP->>Docs: Read curated doc
    Docs-->>MCP: Sanitized markdown
    MCP-->>LLM: Document content
    LLM->>LLM: Now understands data model, weights, etc.
    LLM->>MCP: mongodb_aggregate(collection: "monthly_user_connections_weighted", ...)

Security Considerations for Documentation Resources

Exposing internal docs introduces risks:

Risk	Example	Mitigation
Implementation detail leakage	Class names, file paths, dependencies	Curated docs only - not entire /doc folder
Security mechanism exposure	"Filter by company_id at query layer"	Separate user-facing docs from engineering docs
Prompt injection	Malicious content in doc files	Treat doc content as untrusted input
Accidental secrets	API keys, hostnames in examples	Review process + automated scanning
Collection enumeration	References to non-allowlisted collections	Docs only reference allowlisted collections

Implementation: Curated Resources Folder

Create a dedicated folder for LLM-consumable documentation:

doc/
├── internal/              # Engineering docs (NOT exposed via MCP)
│   └── security/
│       └── how_scoping_works.md
│
└── mcp_resources/         # Curated for LLM consumption
    ├── collections/
    │   ├── bonuses.md
    │   ├── users.md
    │   ├── check_ins.md
    │   └── monthly_user_connections_weighted.md
    ├── features/
    │   └── collaboration_communities.md  # Sanitized version
    └── examples/
        └── common_queries.md

Resource Handler Implementation

# lib/bizy/mcp_resource_handler.rb
module Bizy
  class McpResourceHandler
    RESOURCES_PATH = Rails.root.join("doc/mcp_resources")

    # Only serve from curated folder - never expose /doc directly
    def list_resources
      Dir.glob("#{RESOURCES_PATH}/**/*.md").map do |path|
        relative = path.sub("#{RESOURCES_PATH}/", "")
        {
          uri: "bonusly://docs/#{relative.sub('.md', '')}",
          name: extract_title(path),
          description: extract_description(path),
          mimeType: "text/markdown"
        }
      end
    end

    def fetch_resource(uri)
      # Validate URI format and prevent path traversal
      relative_path = uri.sub("bonusly://docs/", "")
      raise InvalidResourceError unless relative_path.match?(/\A[\w\-\/]+\z/)

      full_path = RESOURCES_PATH.join("#{relative_path}.md")

      # Ensure path is within RESOURCES_PATH (prevent traversal)
      unless full_path.to_s.start_with?(RESOURCES_PATH.to_s)
        raise InvalidResourceError, "Invalid resource path"
      end

      raise ResourceNotFoundError unless File.exist?(full_path)

      File.read(full_path)
    end

    private

    def extract_title(path)
      first_line = File.foreach(path).first
      first_line&.gsub(/^#\s*/, '')&.strip || File.basename(path, '.md').titleize
    end

    def extract_description(path)
      # Extract first paragraph after title
      content = File.read(path)
      paragraphs = content.split(/\n\n+/)
      paragraphs[1]&.gsub(/\n/, ' ')&.truncate(200) || ""
    end
  end
end

Collection Registry Integration

Link collections to their documentation:

COLLECTIONS = {
  "bonuses" => {
    scope_type: :company,
    scope_field: "company_id",
    description: "Recognition given between employees",
    docs: ["collections/bonuses"]  # MCP resource URIs
  },
  "monthly_user_connections_weighted" => {
    scope_type: :company,
    scope_field: "company_id",
    description: "Pre-computed weighted edges for collaboration analysis",
    docs: [
      "collections/monthly_user_connections_weighted",
      "features/collaboration_communities"
    ]
  }
  # ...
}

def self.with_docs(collection_name)
  config = COLLECTIONS[collection_name]
  return config unless config

  config.merge(
    doc_uris: config[:docs]&.map { |d| "bonusly://docs/#{d}" } || []
  )
end

Discovery Tools for LLM

Add dedicated discovery tools to help the LLM find relevant documentation:

# lib/bizy/tools/mongodb_list_collections.rb
module Bizy
  module Tools
    class MongodbListCollections < Bizy::BaseTool
      tool_name "mongodb_list_collections"
      description "List available MongoDB collections with descriptions and documentation links."

      def call
        require_admin!

        collections = MongodbCollectionRegistry::COLLECTIONS.map do |name, config|
          {
            name: name,
            description: config[:description],
            scope_type: config[:scope_type],
            doc_resources: config[:docs]&.map { |d| "bonusly://docs/#{d}" } || []
          }
        end

        success(collections: collections)
      end
    end
  end
end

# lib/bizy/tools/mongodb_describe_collection.rb
module Bizy
  module Tools
    class MongodbDescribeCollection < Bizy::BaseTool
      tool_name "mongodb_describe_collection"
      description "Get detailed information about a collection including sample fields and related documentation."

      input_schema do
        {
          type: "object",
          properties: {
            collection: { type: "string", description: "Collection name" }
          },
          required: ["collection"]
        }
      end

      def call
        require_admin!

        collection = arguments[:collection]
        config = MongodbCollectionRegistry.config(collection)

        unless config
          return error("Collection not available")
        end

        # Get sample document to show field structure
        scoper = MongodbQueryScoper.new(company: company)
        executor = MongodbQueryExecutor.new

        sample = executor.find(
          collection: collection,
          filter: scoper.scope_find(collection: collection, filter: {}),
          limit: 1
        ).first

        success(
          name: collection,
          description: config[:description],
          scope_type: config[:scope_type],
          scope_field: config[:scope_field],
          sample_fields: sample&.keys || [],
          doc_resources: config[:docs]&.map { |d| "bonusly://docs/#{d}" } || []
        )
      end
    end
  end
end

Documentation Content Guidelines

Curated docs in doc/mcp_resources/ should:

Be written for end-user/admin audience - Not engineering implementation details
Focus on data meaning - What does this field represent? What questions can it answer?
Include example queries - Show common use cases with actual query patterns
Reference only allowlisted collections - Don't mention internal collections
Avoid implementation details - No class names, file paths, or internal architecture
Be reviewed before merge - Security review for any new MCP resource docs

Example curated doc structure:

# Bonuses Collection

Recognition events where employees appreciate each other's contributions.

## Key Fields

- `giver_id` - User who gave the recognition
- `receiver_ids` - Users who received the recognition (array)
- `amount` - Points awarded (integer)
- `reason` - Message explaining the recognition
- `created_at` - When the recognition was given
- `hashtags` - Company values tagged (array of strings)

## Common Query Patterns

**Recognition given by a specific user:**
filter: { "giver_id": "user_id_here" }

**Recognition in a date range:**
filter: { "created_at": { "$gte": "2025-01-01", "$lte": "2025-01-31" } }

**Recognition mentioning a specific value:**
filter: { "hashtags": "#teamwork" }

## Related Documentation

- [Collaboration Communities](bonusly://docs/features/collaboration_communities) - How recognition patterns create collaboration maps

Review Process for MCP Resources

Before any doc is added to doc/mcp_resources/:

Security review - No internal paths, class names, dependencies, or security mechanisms
No secrets - Scan for API keys, tokens, hostnames, real user/company data
Allowlisted references only - Only mention collections in the registry
User-facing language - Written for admins, not engineers
PR approval required - Standard review process with explicit MCP resource checkbox

Documentation Automation

Keep MCP docs in sync with collection schemas using schema-based detection (not source file tracking). The process runs locally first, with CI as a backup safety net.

Detection Strategy

Only regenerate docs when the actual data structure changes:

Trigger	Detected By	Frequency
New field added to collection	Sample doc comparison	Rare
Field removed from collection	Sample doc comparison	Rare
Registry description changed	Registry vs frontmatter	Rare
New collection added to registry	Missing doc file	Rare
Source code refactoring	Not tracked	N/A

Doc Frontmatter (Tracking Metadata)

---
collection: bonuses
documented_fields: [giver_id, receiver_ids, amount, reason, created_at, hashtags]
registry_description: "Recognition given between employees"
last_generated: 2025-01-29T14:30:00Z
---

# Bonuses Collection
...

Local Workflow (Primary)

# Check which docs are stale
rake mcp:check_docs

# Regenerate stale docs using LLM
rake mcp:generate_docs

# Regenerate a specific collection
rake mcp:generate_docs[bonuses]

Rake Tasks

# lib/tasks/mcp_docs.rake
namespace :mcp do
  desc "Check for stale MCP documentation"
  task check_docs: :environment do
    checker = Bizy::McpDocFreshnessChecker.new
    stale = checker.stale_docs

    if stale.empty?
      puts "All MCP docs are up to date."
    else
      puts "Stale documentation detected:"
      stale.each do |s|
        reasons = []
        reasons << "new fields: #{s[:reason][:added_fields].join(', ')}" if s[:reason][:added_fields].any?
        reasons << "removed fields: #{s[:reason][:removed_fields].join(', ')}" if s[:reason][:removed_fields].any?
        reasons << "description changed" if s[:reason][:description_changed]
        reasons << "doc missing" if s[:reason][:missing]
        puts "  #{s[:collection]}: #{reasons.join(', ')}"
      end
      puts "\nRun 'rake mcp:generate_docs' to update."
      exit 1
    end
  end

  desc "Generate/update MCP documentation for collections"
  task :generate_docs, [:collection] => :environment do |_, args|
    generator = Bizy::McpDocGenerator.new

    collections = if args[:collection]
      { args[:collection] => Bizy::MongodbCollectionRegistry.config(args[:collection]) }
    else
      Bizy::MongodbCollectionRegistry::COLLECTIONS
    end

    collections.each do |name, config|
      puts "Generating docs for #{name}..."
      generator.generate(collection: name, config: config)
    end

    puts "Done. Review changes and commit."
  end
end

Freshness Checker

# lib/bizy/mcp_doc_freshness_checker.rb
module Bizy
  class McpDocFreshnessChecker
    DOCS_PATH = Rails.root.join("doc/mcp_resources/collections")

    def stale_docs
      stale = []

      MongodbCollectionRegistry::COLLECTIONS.each do |name, config|
        doc_path = DOCS_PATH.join("#{name}.md")

        unless File.exist?(doc_path)
          stale << { collection: name, reason: { missing: true } }
          next
        end

        doc = McpDocParser.parse(doc_path)
        sample = fetch_sample_document(name, config)

        current_fields = sample.keys.sort
        documented_fields = doc.metadata[:documented_fields]&.sort || []

        added = current_fields - documented_fields
        removed = documented_fields - current_fields
        description_changed = config[:description] != doc.metadata[:registry_description]

        if added.any? || removed.any? || description_changed
          stale << {
            collection: name,
            reason: {
              added_fields: added,
              removed_fields: removed,
              description_changed: description_changed
            }
          }
        end
      end

      stale
    end

    private

    def fetch_sample_document(name, config)
      Mongoid.client(:mcp_readonly)[name].find({}).limit(1).first || {}
    end
  end
end

LLM-Assisted Generator

# lib/bizy/mcp_doc_generator.rb
module Bizy
  class McpDocGenerator
    DOCS_PATH = Rails.root.join("doc/mcp_resources/collections")

    def generate(collection:, config:)
      sample = fetch_sample_document(collection, config)

      content = generate_with_llm(
        collection: collection,
        description: config[:description],
        fields: sample.keys,
        sample: sanitize_sample(sample)
      )

      frontmatter = {
        collection: collection,
        documented_fields: sample.keys.sort,
        registry_description: config[:description],
        last_generated: Time.current.iso8601
      }

      write_doc(collection, frontmatter, content)
    end

    private

    def generate_with_llm(collection:, description:, fields:, sample:)
      prompt = <<~PROMPT
        Generate user-facing documentation for a MongoDB collection.

        Collection: #{collection}
        Description: #{description}
        Fields: #{fields.join(', ')}
        Sample document (sanitized): #{sample.to_json}

        Guidelines:
        - Write for admin users, not engineers
        - Explain what each field means semantically
        - Include 3-5 common query patterns with example filters
        - Do not mention internal class names, file paths, or implementation details
        - Use markdown format with ## headers for sections

        Required sections:
        1. Brief description (1-2 sentences)
        2. Key Fields (list with descriptions)
        3. Common Query Patterns (with example filters)
      PROMPT

      Bizy::LlmClient.complete(prompt)
    end

    def sanitize_sample(sample)
      # Remove PII, replace IDs with placeholders
      sample.transform_values do |v|
        case v
        when BSON::ObjectId then "<ObjectId>"
        when String then v.truncate(50)
        when Array then v.first(3)
        else v
        end
      end
    end

    def write_doc(collection, frontmatter, content)
      path = DOCS_PATH.join("#{collection}.md")
      FileUtils.mkdir_p(DOCS_PATH)

      File.write(path, <<~DOC)
        ---
        #{frontmatter.to_yaml.lines[1..].join}---

        #{content}
      DOC
    end
  end
end

CI Safety Net (Backup)

# spec/lib/bizy/mcp_documentation_freshness_spec.rb
RSpec.describe "MCP Documentation Freshness" do
  it "has up-to-date documentation for all collections" do
    checker = Bizy::McpDocFreshnessChecker.new
    stale = checker.stale_docs

    if stale.any?
      message = stale.map do |s|
        if s[:reason][:missing]
          "#{s[:collection]}: missing doc file"
        else
          reasons = []
          reasons << "new fields: #{s[:reason][:added_fields]}" if s[:reason][:added_fields]&.any?
          reasons << "removed fields: #{s[:reason][:removed_fields]}" if s[:reason][:removed_fields]&.any?
          reasons << "description changed" if s[:reason][:description_changed]
          "#{s[:collection]}: #{reasons.join(', ')}"
        end
      end.join("\n")

      fail "Stale MCP docs detected:\n#{message}\n\nRun locally: rake mcp:generate_docs"
    end
  end
end

Workflow Summary

flowchart TD
    A[Developer adds/changes collection] --> B[Run rake mcp:check_docs locally]
    B --> C{Stale docs?}
    C -->|No| D[Continue with PR]
    C -->|Yes| E[Run rake mcp:generate_docs]
    E --> F[LLM generates updated docs]
    F --> G[Developer reviews & edits]
    G --> H[Commit changes]
    H --> D
    D --> I[CI runs freshness spec]
    I --> J{Passes?}
    J -->|Yes| K[PR mergeable]
    J -->|No| L[CI fails with instructions]
    L --> B

Implementation Order

Phase 1: Infrastructure

Create read-only MongoDB user (ops/infra) - Create bizy_mcp_readonly user with read role only
Configure Mongoid client - Add mcp_readonly client to config/mongoid.yml with secondary-only reads

Phase 2: Core Query Components (with specs)

Create MongodbQueryExecutor + spec

Read-only client, only find/aggregate/count methods
Spec: read-only enforcement, secondary-only mode

Create MongodbCollectionRegistry + spec

Collection allowlist with scope types
Spec: allowed/disallowed collections

Create MongodbSecurityValidator + spec

Forbidden operators, recursive sanitization, field validation
Spec: all 11 attack vectors (nested injection, $where, $unionWith, etc.)

Create MongodbQueryScoper + spec

Company/user filter injection using SecurityValidator
Spec: company scoping, user scoping, validation failures

Add rack-attack throttle for MCP endpoints + audit logging

Throttle by company_id (decoded from token)
Audit logging helper for query tracking
Spec: throttle enforcement via request specs

Phase 3: Query Tools (with specs)

Create mongodb_find tool + spec

Admin auth, uses scoper and executor
Spec: admin auth, scoped results, error handling

Create mongodb_aggregate tool + spec

Admin auth, pipeline validation
Spec: pipeline validation, scoped aggregation

Create mongodb_count tool + spec

Admin auth, scoped count
- Spec: scoped count queries

Phase 4: Discovery Tools (with specs)

Create mongodb_list_collections tool + spec

Lists available collections with descriptions

Create mongodb_describe_collection tool + spec

Collection details with sample fields

Phase 5: Documentation Resources

Create McpResourceHandler + spec

Serve curated docs, path traversal protection

Create McpDocParser

Parse frontmatter metadata from doc files

Create McpDocFreshnessChecker + spec

Schema-based staleness detection

Create McpDocGenerator + spec

LLM-assisted documentation generation

Create rake tasks (mcp:check_docs, mcp:generate_docs)
Generate initial doc/mcp_resources/ docs for ALL allowlisted collections
Create CI freshness spec as safety net

Phase 6: Integration Testing

Write end-to-end integration specs

Full flow: auth → scope → validate → execute → return

Files Changed/Created

Infrastructure:

config/mongoid.yml (modify) - Add mcp_readonly client configuration with secondary-only reads
Environment variable: MONGODB_MCP_READONLY_URI - Read-only connection string

New Ruby files:

lib/bizy/mongodb_query_executor.rb - Read-only query execution using mcp_readonly client
lib/bizy/mongodb_collection_registry.rb - Collection allowlist with scope types and doc references
lib/bizy/mongodb_security_validator.rb - Filter/pipeline validation, forbidden operators, sanitization
lib/bizy/mongodb_query_scoper.rb - Company/user filter injection with security validation
config/initializers/rack_attack.rb (modify) - Add MCP endpoint throttle by company_id
lib/bizy/mcp_audit_logger.rb - Audit logging for MCP queries
lib/bizy/mcp_resource_handler.rb - MCP resource handler for curated documentation
lib/bizy/mcp_doc_freshness_checker.rb - Schema-based staleness detection
lib/bizy/mcp_doc_generator.rb - LLM-assisted documentation generation
lib/bizy/mcp_doc_parser.rb - Parse frontmatter metadata from doc files
lib/tasks/mcp_docs.rake - Rake tasks for local doc management
lib/bizy/tools/mongodb_find.rb - Find tool with admin auth
lib/bizy/tools/mongodb_aggregate.rb - Aggregate tool with admin auth
lib/bizy/tools/mongodb_count.rb - Count tool with admin auth
lib/bizy/tools/mongodb_list_collections.rb - Discovery tool for listing collections
lib/bizy/tools/mongodb_describe_collection.rb - Discovery tool for collection details

Curated Documentation (MCP Resources):

One documentation file per allowlisted collection, plus feature docs:

doc/mcp_resources/collections/{collection_name}.md - One for EACH collection in the registry
- bonuses.md - Recognition data model and query patterns
- users.md - Employee records and fields
- check_ins.md - Check-in data model
- monthly_user_statistics.md - Monthly aggregated metrics
- monthly_user_connections_weighted.md - Collaboration edge weights
- app_notifications.md - User notifications
- llm_runs.md - LLM query history
- user_content_dismissals.md - Dismissed content tracking
- (+ any additional collections added to registry)
doc/mcp_resources/features/collaboration_communities.md - Cross-collection feature docs
doc/mcp_resources/examples/common_queries.md - Example query patterns across collections

Specs:

spec/lib/bizy/mongodb_query_executor_spec.rb - Read-only enforcement tests
spec/lib/bizy/mongodb_collection_registry_spec.rb - Allowlist tests
spec/lib/bizy/mongodb_security_validator_spec.rb - Attack vector tests (all 11 vectors)
spec/lib/bizy/mongodb_query_scoper_spec.rb - Scoping tests for company and user types
spec/requests/mcp/rate_limiting_spec.rb - Rack-attack throttle tests
spec/lib/bizy/mcp_resource_handler_spec.rb - Resource handler with path traversal tests
spec/lib/bizy/mcp_doc_freshness_checker_spec.rb - Staleness detection tests
spec/lib/bizy/mcp_doc_generator_spec.rb - Generator tests
spec/lib/bizy/mcp_documentation_freshness_spec.rb - CI safety net spec
spec/lib/bizy/tools/mongodb_find_spec.rb - Integration tests with admin auth
spec/lib/bizy/tools/mongodb_aggregate_spec.rb - Integration tests with pipeline validation
spec/lib/bizy/tools/mongodb_count_spec.rb - Integration tests
spec/lib/bizy/tools/mongodb_list_collections_spec.rb - Discovery tool tests
spec/lib/bizy/tools/mongodb_describe_collection_spec.rb - Discovery tool tests

swalke16/secure_mongodb_query_mcp_tool_8464c4ab.plan.md

Secure MongoDB Query MCP Tool

Architecture Overview

Design Approach: Allowlisted Collections with Query Rewriting

Two Scoping Strategies

Key Files to Create

1. Query Executor (Read-Only)

2. Query Scoper Service

3. MCP Tools

4. Collection Registry

Security Considerations

Read-Only Enforcement (Defense in Depth)

Layer 1: Database-Level (Strongest)

Layer 2: Code-Level (API Surface)

Layer 3: Pipeline Validation (Aggregation)

Layer 4: Query Validation

Summary: Defense in Depth

$lookup Handling

Additional Safety Measures

Alternative Approaches Considered

Template-Based Queries (Rejected)

MongoDB Views (Rejected)

Proxy to Existing MongoDB MCP (Rejected)

Security Audit: Attack Vectors and Mitigations

Attack 1: Nested Filter Injection (CRITICAL)

Attack 2: $where JavaScript Injection (CRITICAL)

Attack 3: $unionWith Collection Bypass (CRITICAL)

Attack 4: $expr Filter Override (RE-EVALUATED: SAFE TO ALLOW)

Attack 5: Type Confusion in User ID Validation (MEDIUM)

Attack 6: Regex Denial of Service (MEDIUM)

Attack 7: Information Disclosure via Errors (MEDIUM)

Attack 8: Data Enumeration via Rate Abuse (MEDIUM)

Attack 9: $facet Nested Pipeline Bypass (MEDIUM)

Attack 10: Field Name Injection (LOW)

Attack 11: Token Replay/Theft (LOW)

Attack Vector Summary

Complete Forbidden Operators List

Testing Strategy

Example Specs

LLM Context: MCP Documentation Resources

Approach: Curated Documentation as MCP Resources

Security Considerations for Documentation Resources

Implementation: Curated Resources Folder

Resource Handler Implementation

Collection Registry Integration

Discovery Tools for LLM

Documentation Content Guidelines

Review Process for MCP Resources

Documentation Automation

Detection Strategy

Doc Frontmatter (Tracking Metadata)

Local Workflow (Primary)

Rake Tasks

Freshness Checker

LLM-Assisted Generator

CI Safety Net (Backup)

Workflow Summary

Implementation Order

Phase 1: Infrastructure

Phase 2: Core Query Components (with specs)

Phase 3: Query Tools (with specs)

Phase 4: Discovery Tools (with specs)

Phase 5: Documentation Resources

Phase 6: Integration Testing

Files Changed/Created