Skip to content

Instantly share code, notes, and snippets.

@swalke16
Last active January 29, 2026 20:24
Show Gist options
  • Select an option

  • Save swalke16/fc3a7f7f3be33af4efc557ffb4ef28a2 to your computer and use it in GitHub Desktop.

Select an option

Save swalke16/fc3a7f7f3be33af4efc557ffb4ef28a2 to your computer and use it in GitHub Desktop.
Technical plan for a secure, company-scoped MongoDB query MCP tool for LLM agents

Secure MongoDB Query MCP Tool

Architecture Overview

sequenceDiagram
    participant LLM as LLM Agent
    participant MCP as MCP Tool
    participant Auth as Session Token
    participant Rewrite as Query Rewriter
    participant DB as MongoDB

    LLM->>MCP: Query with Bearer token
    MCP->>Auth: Decode token
    Auth-->>MCP: user_id
    MCP->>MCP: Load user → company
    MCP->>Rewrite: Rewrite query with company_id
    Rewrite->>Rewrite: Validate + inject filter
    Rewrite-->>MCP: Scoped query
    MCP->>DB: Execute query
    DB-->>MCP: Results
    MCP-->>LLM: Scoped results
Loading

Design Approach: Allowlisted Collections with Query Rewriting

This approach balances flexibility with security:

  1. Admin-only access - Only company admins can use these MCP tools
  2. Allowlist collections - Only permit queries against pre-defined collections
  3. Auto-inject scoping filter - Based on collection type:
  • Company-scoped: Inject company_id = user.company_id
  • User-scoped: Inject user_id IN (all company user IDs)
  1. Validate aggregation pipelines - Ensure $match is first stage, inject scoping filter
  2. Block dangerous operators - Disallow $out, $merge, and unrestricted $lookup
  3. Read-only - Only support find, aggregate, count operations

Two Scoping Strategies

flowchart TD
    subgraph auth [Authorization]
        A[Request with token] --> B[Decode token]
        B --> C{Is user admin?}
        C -->|No| D[Reject: 403]
        C -->|Yes| E[Get company_id]
    end

    subgraph scope [Scoping]
        E --> F{Collection type?}
        F -->|company_id field| G["Inject: company_id = X"]
        F -->|user_id field| H[Get all company user IDs]
        H --> I["Inject: user_id IN [...]"]
    end

    G --> J[Execute query]
    I --> J
Loading

Company-scoped collections (most collections):

  • Have a company_id field
  • Scoper injects: { company_id: company.id }
  • Any user-provided company_id is overwritten (prevents injection)

User-scoped collections (e.g., app_notifications, llm_runs):

  • Have a user_id field but no company_id
  • Caller must specify user_id or { user_id: { $in: [...] } }
  • Scoper validates ALL specified user_ids belong to the company
  • Rejects entire query if any user_id is outside the company

Key Files to Create

1. Query Executor (Read-Only)

Location: lib/bizy/mongodb_query_executor.rb

This is the ONLY interface to MongoDB for MCP tools. It uses the read-only client and exposes only read methods:

module Bizy
  class MongodbQueryExecutor
    MAX_LIMIT = 100
    DEFAULT_LIMIT = 20
    QUERY_TIMEOUT_MS = 30_000

    def initialize
      @client = Mongoid.client(:mcp_readonly)
    end

    # The ONLY three public methods - no write operations exist
    def find(collection:, filter:, projection: nil, sort: nil, limit: DEFAULT_LIMIT)
      limit = [limit.to_i, MAX_LIMIT].min

      cursor = @client[collection]
        .find(filter)
        .projection(projection || {})
        .sort(sort || { _id: -1 })
        .limit(limit)
        .max_time_ms(QUERY_TIMEOUT_MS)

      cursor.to_a
    end

    def aggregate(collection:, pipeline:)
      @client[collection].aggregate(
        pipeline,
        max_time_ms: QUERY_TIMEOUT_MS
      ).to_a
    end

    def count(collection:, filter:)
      @client[collection]
        .count_documents(filter, max_time_ms: QUERY_TIMEOUT_MS)
    end

    # NO insert, update, delete, bulk_write methods exist
    # The class literally cannot write - it only has read methods
  end
end

2. Query Scoper Service

Location: lib/bizy/mongodb_query_scoper.rb

This service handles query rewriting for both company-scoped and user-scoped collections:

module Bizy
  class MongodbQueryScoper
    def initialize(company:)
      @company = company
      @company_id = company.id
    end

    def scope_find(collection:, filter:)
      validate_collection!(collection)
      merge_scope_filter(collection, filter)
    end

    def scope_aggregate(collection:, pipeline:)
      validate_collection!(collection)
      MongodbPipelineValidator.validate!(pipeline)
      inject_scope_into_pipeline(collection, pipeline)
    end

    private

    def validate_collection!(collection)
      unless MongodbCollectionRegistry.allowed?(collection)
        raise Bizy::UnauthorizedCollectionError,
          "Collection '#{collection}' is not available for querying"
      end
    end

    def merge_scope_filter(collection, filter)
      config = MongodbCollectionRegistry.config(collection)

      case config[:scope_type]
      when :company
        # Company-scoped: inject company_id, ignore any user-provided company_id
        filter.merge(config[:scope_field] => @company_id)
      when :user
        # User-scoped: validate and scope user_ids from filter
        scope_user_filter(config[:scope_field], filter)
      else
        raise "Unknown scope type: #{config[:scope_type]}"
      end
    end

    def scope_user_filter(scope_field, filter)
      requested_user_ids = extract_user_ids(filter[scope_field])

      if requested_user_ids.present?
        # Validate ALL requested user_ids belong to this company
        validate_user_ids_in_company!(requested_user_ids)
        # Keep the filter as-is (already validated)
        filter
      else
        # No user_ids specified - require them for user-scoped queries
        raise Bizy::InvalidFilterError,
          "user_id is required for this collection. Provide a specific user_id or array of user_ids."
      end
    end

    def extract_user_ids(user_id_filter)
      case user_id_filter
      when BSON::ObjectId, String
        [normalize_id(user_id_filter)]
      when Hash
        # Handle { "$in" => [...] } syntax
        user_id_filter["$in"]&.map { |id| normalize_id(id) }
      when Array
        user_id_filter.map { |id| normalize_id(id) }
      else
        nil
      end
    end

    def validate_user_ids_in_company!(user_ids)
      # Check that ALL user_ids belong to this company
      valid_count = @company.users.where(:_id.in => user_ids).count

      unless valid_count == user_ids.length
        raise Bizy::UnauthorizedUserError,
          "One or more user_ids do not belong to this company"
      end
    end

    def normalize_id(id)
      id.is_a?(BSON::ObjectId) ? id : BSON::ObjectId(id.to_s)
    end

    def inject_scope_into_pipeline(collection, pipeline)
      # For aggregation, extract user_ids from first $match stage and validate
      config = MongodbCollectionRegistry.config(collection)

      case config[:scope_type]
      when :company
        inject_company_filter(config[:scope_field], pipeline)
      when :user
        validate_user_filter_in_pipeline(config[:scope_field], pipeline)
      end

      pipeline
    end

    def inject_company_filter(scope_field, pipeline)
      if pipeline.first&.key?("$match")
        pipeline.first["$match"][scope_field] = @company_id
      else
        pipeline.unshift({ "$match" => { scope_field => @company_id } })
      end
    end

    def validate_user_filter_in_pipeline(scope_field, pipeline)
      first_match = pipeline.first&.dig("$match")
      raise Bizy::InvalidFilterError, "Pipeline must start with $match containing user_id" unless first_match

      user_ids = extract_user_ids(first_match[scope_field])
      raise Bizy::InvalidFilterError, "user_id is required in first $match stage" if user_ids.blank?

      validate_user_ids_in_company!(user_ids)
    end
  end
end

Key design decisions for user-scoped collections:

  1. Caller must specify user_ids - No automatic "all company users" query (too expensive, rarely what you want)
  2. Validation before execution - All specified user_ids are validated to belong to the company
  3. Fail closed - If any user_id is invalid/outside company, the entire query fails
  4. Flexible input - Accepts single ID, array, or { "$in": [...] } syntax

3. MCP Tools

Create three tools following the existing Bizy::BaseTool pattern.

All tools require admin authorization:

# Shared authorization helper in BaseTool or module
def require_admin!
  unless user.company_admin? || user.can_administer?(:global)
    raise Bizy::NotAuthorizedError, "Admin access required"
  end
end

a. Find Tool - lib/bizy/tools/mongodb_find.rb

module Bizy
  module Tools
    class MongodbFind < Bizy::BaseTool
      tool_name "mongodb_find"
      description "Query documents from a MongoDB collection. Admin access required. Results are automatically scoped to your company."

      input_schema do
        {
          type: "object",
          properties: {
            collection: { type: "string", description: "Collection name (e.g., 'bonuses', 'users')" },
            filter: { type: "object", description: "MongoDB query filter" },
            projection: { type: "object", description: "Fields to include/exclude" },
            sort: { type: "object", description: "Sort order" },
            limit: { type: "integer", description: "Max docs to return (default: 20, max: 100)" }
          },
          required: ["collection"]
        }
      end

      def call
        require_admin!  # Only admins can use this tool

        scoper = MongodbQueryScoper.new(company: company)
        scoped_filter = scoper.scope_find(
          collection: arguments[:collection],
          filter: arguments[:filter] || {}
        )

        executor = MongodbQueryExecutor.new
        results = executor.find(
          collection: arguments[:collection],
          filter: scoped_filter,
          projection: arguments[:projection],
          sort: arguments[:sort],
          limit: arguments[:limit] || 20
        )

        success(
          collection: arguments[:collection],
          count: results.length,
          documents: results
        )
      rescue Bizy::UnauthorizedCollectionError => e
        error(e.message)
      end
    end
  end
end

b. Aggregate Tool - lib/bizy/tools/mongodb_aggregate.rb

c. Count Tool - lib/bizy/tools/mongodb_count.rb

4. Collection Registry

Location: lib/bizy/mongodb_collection_registry.rb

Defines which collections are queryable and their scoping rules:

module Bizy
  class MongodbCollectionRegistry
    # Scoping types:
    # - :company - Filter by company_id field
    # - :user    - Filter by user_id IN (all company user IDs)

    COLLECTIONS = {
      # Company-scoped collections (have company_id field)
      "bonuses" => {
        scope_type: :company,
        scope_field: "company_id",
        description: "Recognition given between employees"
      },
      "users" => {
        scope_type: :company,
        scope_field: "company_id",
        description: "Employee records"
      },
      "check_ins" => {
        scope_type: :company,
        scope_field: "company_id",
        description: "Employee check-ins"
      },
      "monthly_user_statistics" => {
        scope_type: :company,
        scope_field: "company_id",
        description: "Monthly aggregated user metrics"
      },

      # User-scoped collections (have user_id but no company_id)
      "app_notifications" => {
        scope_type: :user,
        scope_field: "user_id",
        description: "User notifications"
      },
      "llm_runs" => {
        scope_type: :user,
        scope_field: "user_id",
        description: "LLM query history"
      },
      "user_content_dismissals" => {
        scope_type: :user,
        scope_field: "user_id",
        description: "Dismissed content by user"
      },
      # ... etc
    }.freeze

    def self.allowed?(collection_name)
      COLLECTIONS.key?(collection_name)
    end

    def self.scope_type(collection_name)
      COLLECTIONS.dig(collection_name, :scope_type)
    end

    def self.scope_field(collection_name)
      COLLECTIONS.dig(collection_name, :scope_field)
    end

    def self.config(collection_name)
      COLLECTIONS[collection_name]
    end
  end
end

Security Considerations

Read-Only Enforcement (Defense in Depth)

CRITICAL: The MCP tool must have ZERO ability to modify data. This is enforced at multiple layers:

Layer 1: Database-Level (Strongest)

Create a dedicated MongoDB user with read-only permissions for this MCP tool:

// MongoDB shell command to create read-only user
db.createUser({
  user: "bizy_mcp_readonly",
  pwd: "<secure-password>",
  roles: [
    { role: "read", db: "bonusly_production" }
  ]
})

Configure a separate Mongoid client for MCP queries with secondary-only reads:

# config/mongoid.yml
production:
  clients:
    default:
      # ... normal read/write connection
    mcp_readonly:
      uri: <%= ENV['MONGODB_MCP_READONLY_URI'] %>
      options:
        read:
          mode: :secondary  # NOT secondary_preferred - ONLY secondary replicas
        max_pool_size: 5
        server_selection_timeout: 10  # Fail fast if no secondary available

Important: Use :secondary mode, NOT :secondary_preferred. This ensures:

  • Queries NEVER touch the primary database
  • If no secondary is available, the query fails (rather than falling back to primary)
  • Production primary is completely isolated from MCP query load
# In the MCP tool - use read-only client explicitly
def readonly_client
  Mongoid.client(:mcp_readonly)
end

def execute_query(collection_name, filter)
  readonly_client[collection_name].find(filter).to_a
end

Why this is the strongest layer: Even if application code has bugs, MongoDB itself will reject any write operations with a permission error.

Layer 2: Code-Level (API Surface)

Only expose read operations - no insert, update, delete, or bulk write methods:

module Bizy
  class MongodbQueryExecutor
    # ONLY these methods exist - no write methods at all
    def find(collection:, filter:, projection: nil, sort: nil, limit: 20)
      # ...
    end

    def aggregate(collection:, pipeline:)
      # ...
    end

    def count(collection:, filter:)
      # ...
    end

    # NO insert, update, delete, bulk_write, etc.
  end
end

Layer 3: Pipeline Validation (Aggregation)

Block ALL aggregation stages that can write or modify data:

FORBIDDEN_STAGES = %w[
  $out           # Writes to collection
  $merge         # Writes/updates to collection
  $set           # Can modify documents (though read-only user blocks this)
  $unset         # Can modify documents
  $replaceRoot   # Safe for reads, but validate carefully
  $replaceWith   # Safe for reads, but validate carefully
].freeze

# These are write-capable but blocked by read-only user anyway:
# $addFields, $project, $group are safe (in-pipeline transforms only)

def validate_pipeline!(pipeline)
  pipeline.each do |stage|
    stage_name = stage.keys.first.to_s
    if FORBIDDEN_STAGES.include?(stage_name)
      raise Bizy::ForbiddenOperatorError,
        "Aggregation stage '#{stage_name}' is not allowed"
    end
  end
end

Layer 4: Query Validation

Reject any query operators that could have side effects (none exist in MongoDB queries, but validate anyway):

def validate_filter!(filter)
  # MongoDB find filters are inherently read-only
  # But validate structure to prevent injection
  raise Bizy::InvalidFilterError unless filter.is_a?(Hash)
end

Summary: Defense in Depth

Layer What it blocks Failure mode
Read-Only DB User All writes MongoDB permission error
Secondary-Only Mode Primary DB access Query fails if no secondary available
API Surface No write methods exposed Method doesn't exist
Pipeline Validation $out, $merge, $lookup Application error before query runs

All layers are mandatory:

  • Read-only DB user: Prevents any data modification at the database level
  • Secondary-only mode: Isolates primary from query load; MCP can never affect write performance
  • API surface: Defense against code bugs - write methods don't exist
  • Pipeline validation: Defense against LLM-crafted malicious aggregation pipelines

$lookup Handling

$lookup stages can join to other collections, potentially exposing cross-company data. Two options:

Option A: Disallow $lookup entirely (simplest, most secure)

FORBIDDEN_PIPELINE_OPERATORS = %w[$out $merge $lookup $graphLookup].freeze

Option B: Inject company filter into $lookup pipelines (more flexible, complex)

def secure_lookup(lookup_stage)
  # Validate "from" is in allowed collections
  # Inject company filter into lookup's pipeline
end

Recommendation: Start with Option A (disallow $lookup), then consider Option B if needed.

Additional Safety Measures

  1. Result size limits - Cap response size to prevent memory issues
  2. Query timeout - Set maxTimeMS to prevent runaway queries
  3. Rate limiting - Limit queries per minute per company
  4. Audit logging - Log all queries for security review
  5. Read from secondary - Use secondary DB to reduce primary load
def execute_query(collection, pipeline)
  collection.aggregate(
    pipeline,
    max_time_ms: 30_000,
    read: { mode: :secondary_preferred }
  ).to_a
end

Alternative Approaches Considered

Template-Based Queries (Rejected)

Pre-define parameterized query templates that the LLM fills in:

TEMPLATES = {
  "recognition_by_user" => {
    collection: "bonuses",
    filter: ->(params) { { receiver_ids: params[:user_id] } }
  }
}

Pros: Maximum security, no query parsing Cons: Very limited flexibility, new use cases require code changes

Rejected because it defeats the purpose of "arbitrary queries" and requires constant maintenance.

MongoDB Views (Rejected)

Create company-filtered views in MongoDB itself.

Pros: Database-level security Cons: Can't create per-company views (impractical), views are static

Proxy to Existing MongoDB MCP (Rejected)

Wrap the existing user-MongoDB-prod MCP server.

Pros: Less code to write Cons: Still need to intercept and rewrite queries, harder to control

Security Audit: Attack Vectors and Mitigations

This section documents all considered attack vectors and their mitigations.

Attack 1: Nested Filter Injection (CRITICAL)

Attack:

filter: {
  "$or": [
    { "company_id": "other_company_id", "amount": { "$gt": 0 } },
    { "amount": { "$lt": 0 } }
  ]
}

Problem: Simple filter.merge("company_id" => X) only sets top-level company_id. The $or clause still contains the attacker's company_id and will match documents from other companies.

Mitigation: Recursively sanitize company_id from entire filter structure:

def sanitize_scope_field!(filter, scope_field)
  filter.each do |key, value|
    if key.to_s == scope_field
      filter.delete(key)  # Remove any user-provided scope field
    elsif value.is_a?(Hash)
      sanitize_scope_field!(value, scope_field)
    elsif value.is_a?(Array)
      value.each { |v| sanitize_scope_field!(v, scope_field) if v.is_a?(Hash) }
    end
  end
end

# Then inject our scope field at top level
filter[scope_field] = @company_id

Attack 2: $where JavaScript Injection (CRITICAL)

Attack:

filter: { "$where": "this.company_id == 'other_company_id'" }

Problem: $where executes JavaScript and can bypass any field-level filters.

Mitigation: Block dangerous filter operators:

FORBIDDEN_FILTER_OPERATORS = %w[$where $function].freeze
# Note: $expr is ALLOWED - it cannot bypass top-level field matches

def validate_filter_operators!(filter)
  filter.each do |key, value|
    if FORBIDDEN_FILTER_OPERATORS.include?(key.to_s)
      raise Bizy::ForbiddenOperatorError, "Operator '#{key}' is not allowed"
    end
    validate_filter_operators!(value) if value.is_a?(Hash)
    value.each { |v| validate_filter_operators!(v) if v.is_a?(Hash) } if value.is_a?(Array)
  end
end

Attack 3: $unionWith Collection Bypass (CRITICAL)

Attack:

pipeline: [
  { "$match": { "company_id": my_company } },
  { "$unionWith": { "coll": "bonuses" } },  # Unions ALL bonuses, unfiltered!
  { "$match": { "company_id": "other_company" } }
]

Problem: $unionWith brings in data from another collection without applying the company filter.

Mitigation: Block in forbidden stages list:

FORBIDDEN_STAGES = %w[
  $out           # Writes to collection
  $merge         # Writes to collection
  $lookup        # Cross-collection join (data leakage)
  $graphLookup   # Recursive cross-collection join
  $unionWith     # Unions unfiltered collection data
  $documents     # Creates documents from scratch
].freeze

Attack 4: $expr Filter Override (RE-EVALUATED: SAFE TO ALLOW)

Initial concern:

filter: { "$expr": { "$eq": ["$company_id", "other_company_id"] } }

Analysis: After deeper review, $expr cannot bypass top-level field matches.

MongoDB query semantics treat all top-level conditions as an implicit AND:

# After our injection:
{
  "company_id" => my_company,  # Our top-level field match
  "$expr" => { "$eq": ["$company_id", "other"] }  # Attacker's $expr
}

Both conditions must be true:

  1. company_id == my_company (TRUE for company's documents)
  2. $company_id == "other"my_company == "other" (FALSE)

Result: No documents match. The attacker's $expr cannot override our field match.

Why we should ALLOW $expr:

  • 60+ uses in existing analytics pipelines
  • Required for $lookup variable comparisons ($$user_id)
  • Required for field-to-field comparisons
  • Required for complex arithmetic conditions
  • Cannot bypass top-level field matches

Mitigation (minimal): Optionally sanitize scope field references inside $expr to prevent confusion, but not strictly required for security.

Decision: $expr is ALLOWED - removed from FORBIDDEN_FILTER_OPERATORS.


Attack 5: Type Confusion in User ID Validation (MEDIUM)

Attack:

# Attacker sends string ID
filter: { "user_id": "507f1f77bcf86cd799439011" }

# Validation might check ObjectId, but filter uses string

Problem: Mismatch between validation type and query type could allow bypass.

Mitigation: Normalize IDs in the filter itself, not just during validation:

def normalize_user_ids_in_filter!(filter, scope_field)
  user_id_value = filter[scope_field]
  return unless user_id_value

  filter[scope_field] = case user_id_value
  when String
    BSON::ObjectId(user_id_value)
  when BSON::ObjectId
    user_id_value
  when Hash
    if user_id_value["$in"]
      { "$in" => user_id_value["$in"].map { |id| BSON::ObjectId(id.to_s) } }
    else
      user_id_value
    end
  when Array
    { "$in" => user_id_value.map { |id| BSON::ObjectId(id.to_s) } }
  end
end

Attack 6: Regex Denial of Service (MEDIUM)

Attack:

filter: { "reason": { "$regex": "(a+)+$" } }

Problem: Catastrophic backtracking in regex causes CPU exhaustion (ReDoS).

Mitigation:

MAX_REGEX_LENGTH = 100

def validate_regex!(filter)
  filter.each do |key, value|
    if key.to_s == "$regex" && value.to_s.length > MAX_REGEX_LENGTH
      raise Bizy::InvalidFilterError, "Regex pattern too long"
    end
    validate_regex!(value) if value.is_a?(Hash)
    value.each { |v| validate_regex!(v) if v.is_a?(Hash) } if value.is_a?(Array)
  end
end

Attack 7: Information Disclosure via Errors (MEDIUM)

Attack:

# Probe for collection names
collection: "secret_internal_collection"
# Error reveals: "Collection 'secret_internal_collection' is not available"

# Probe for user existence
filter: { "user_id": "..." }
# Different errors reveal user existence

Problem: Detailed error messages leak information about data existence.

Mitigation: Use generic error messages:

# Bad - reveals information
raise "User ID 'abc' does not belong to this company"
raise "Collection 'xyz' is not available"

# Good - generic errors
raise Bizy::NotAuthorizedError, "Query not permitted"

Attack 8: Data Enumeration via Rate Abuse (MEDIUM)

Attack:

# Systematically extract all company data
loop do
  results = query(filter: { "_id": { "$gt": last_id } }, limit: 100)
  break if results.empty?
  last_id = results.last["_id"]
end

Problem: Even with proper scoping, attackers can extract entire datasets.

Mitigation: Rate limiting via rack-attack and audit logging:

# config/initializers/rack_attack.rb
Rack::Attack.throttle("mcp/company", limit: 30, period: 60) do |req|
  if req.path.start_with?("/mcp/")
    # Decode token to get company_id for throttle key
    token = req.get_header("HTTP_AUTHORIZATION")&.sub("Bearer ", "")
    if token
      decoded = Bizy::SessionToken.decode(token) rescue nil
      decoded&.dig(:company_id)&.to_s
    end
  end
end

# lib/bizy/mcp_audit_logger.rb
module Bizy
  class McpAuditLogger
    def self.log_query(user:, company:, collection:, filter:)
      Rails.logger.info("[MCP Query Audit]", {
        user_id: user.id.to_s,
        company_id: company.id.to_s,
        collection: collection,
        filter_keys: filter.keys,  # Log structure, not values (PII)
        timestamp: Time.current.iso8601
      })
    end
  end
end

Attack 9: $facet Nested Pipeline Bypass (MEDIUM)

Attack:

pipeline: [
  { "$match": { "company_id": my_company } },
  { "$facet": {
    "legitimate": [{ "$match": { "amount": 10 } }],
    "attack": [{ "$unionWith": { "coll": "users" } }]  # Nested forbidden stage!
  }}
]

Problem: $facet contains nested pipelines that might contain forbidden stages.

Mitigation: Recursively validate nested pipelines:

def validate_pipeline!(pipeline)
  pipeline.each do |stage|
    stage_name = stage.keys.first.to_s

    raise ForbiddenOperatorError, "Stage '#{stage_name}' not allowed" if FORBIDDEN_STAGES.include?(stage_name)

    # Recursively validate $facet sub-pipelines
    if stage_name == "$facet"
      stage["$facet"].each_value do |sub_pipeline|
        validate_pipeline!(sub_pipeline)
      end
    end
  end
end

Attack 10: Field Name Injection (LOW)

Attack:

filter: { "company_id\x00": "other" }      # Null byte
filter: { "company_id\u200B": "other" }    # Zero-width space

Problem: Unicode tricks might bypass string matching for field names.

Mitigation: Validate field name characters:

VALID_FIELD_NAME = /\A[\w.$]+\z/

def validate_field_names!(obj)
  return unless obj.is_a?(Hash)

  obj.each do |key, value|
    unless key.to_s.match?(VALID_FIELD_NAME)
      raise Bizy::InvalidFilterError, "Invalid field name format"
    end
    validate_field_names!(value) if value.is_a?(Hash)
    value.each { |v| validate_field_names!(v) if v.is_a?(Hash) } if value.is_a?(Array)
  end
end

Attack 11: Token Replay/Theft (LOW)

Attack: Steal or replay a valid session token to impersonate another user.

Mitigation: Already handled by existing Bizy::SessionToken:

  • Tokens expire after 1 hour
  • Tokens are signed with secret key (can't forge)
  • Tokens are generated server-side (LLM never sees user_id directly)

Attack Vector Summary

Priority Attack Vector Mitigation Status
CRITICAL Nested filter injection Recursive sanitization of scope field Required
CRITICAL $where JavaScript Block $where, $function Required
CRITICAL $unionWith bypass Block $unionWith, $documents Required
SAFE $expr override ALLOWED - cannot bypass field matches Safe to allow
MEDIUM Type confusion Normalize IDs in filter Required
MEDIUM ReDoS Limit regex length Required
MEDIUM Info disclosure Generic error messages Required
MEDIUM Enumeration Rate limiting + audit logging Required
MEDIUM $facet nesting Recursive pipeline validation Required
LOW Field name tricks Validate field name characters Required
LOW Token theft Existing token expiration/signing Already handled

Complete Forbidden Operators List

module Bizy
  class MongodbSecurityValidator
    # Filter operators that can bypass field-level security
    FORBIDDEN_FILTER_OPERATORS = %w[
      $where      # JavaScript execution - can bypass any filter
      $function   # JavaScript execution - can bypass any filter
    ].freeze
    # NOTE: $expr is ALLOWED - it cannot bypass top-level field matches
    #       and is required for 60+ existing analytics pipelines

    # Aggregation stages that can leak or write data
    FORBIDDEN_STAGES = %w[
      $out           # Writes to collection
      $merge         # Writes to collection
      $lookup        # Cross-collection join (data leakage)
      $graphLookup   # Recursive cross-collection join
      $unionWith     # Unions unfiltered collection data
      $documents     # Creates documents from scratch
    ].freeze

    MAX_REGEX_LENGTH = 100
    VALID_FIELD_NAME = /\A[\w.$]+\z/
  end
end

Testing Strategy

  1. Unit tests for query scoper - Verify company_id is correctly injected
  2. Collection allowlist tests - Verify disallowed collections are rejected
  3. Pipeline validation tests - Verify dangerous operators are blocked
  4. Read-only security tests - Verify writes are impossible at multiple layers
  5. Integration tests - End-to-end query execution with auth

Example Specs

Query Scoper:

RSpec.describe Bizy::MongodbQueryScoper do
  let(:company) { create(:company) }
  let(:other_company) { create(:company) }
  let(:user1) { create(:user, company: company) }
  let(:user2) { create(:user, company: company) }
  let(:other_user) { create(:user, company: other_company) }

  describe "#scope_find" do
    context "company-scoped collections" do
      it "injects company_id into filter" do
        scoper = described_class.new(company: company)
        result = scoper.scope_find(collection: "bonuses", filter: { "amount" => 10 })
        expect(result).to eq({ "amount" => 10, "company_id" => company.id })
      end

      it "overwrites any user-provided company_id" do
        scoper = described_class.new(company: company)
        result = scoper.scope_find(
          collection: "bonuses",
          filter: { "company_id" => other_company.id }  # Attempted injection
        )
        expect(result["company_id"]).to eq(company.id)  # Overwritten with correct company
      end
    end

    context "user-scoped collections" do
      it "allows querying with valid user_id" do
        scoper = described_class.new(company: company)
        result = scoper.scope_find(
          collection: "app_notifications",
          filter: { "user_id" => user1.id, "seen_at" => nil }
        )
        expect(result["user_id"]).to eq(user1.id)
      end

      it "allows querying with array of valid user_ids" do
        scoper = described_class.new(company: company)
        result = scoper.scope_find(
          collection: "app_notifications",
          filter: { "user_id" => { "$in" => [user1.id, user2.id] } }
        )
        expect(result["user_id"]).to eq({ "$in" => [user1.id, user2.id] })
      end

      it "rejects user_id from another company" do
        scoper = described_class.new(company: company)
        expect {
          scoper.scope_find(
            collection: "app_notifications",
            filter: { "user_id" => other_user.id }
          )
        }.to raise_error(Bizy::UnauthorizedUserError)
      end

      it "rejects if any user_id in array is from another company" do
        scoper = described_class.new(company: company)
        expect {
          scoper.scope_find(
            collection: "app_notifications",
            filter: { "user_id" => { "$in" => [user1.id, other_user.id] } }
          )
        }.to raise_error(Bizy::UnauthorizedUserError)
      end

      it "requires user_id for user-scoped collections" do
        scoper = described_class.new(company: company)
        expect {
          scoper.scope_find(collection: "app_notifications", filter: {})
        }.to raise_error(Bizy::InvalidFilterError, /user_id is required/)
      end
    end

    it "rejects non-allowlisted collections" do
      scoper = described_class.new(company: company)
      expect { scoper.scope_find(collection: "secret_data", filter: {}) }
        .to raise_error(Bizy::UnauthorizedCollectionError)
    end
  end
end

Admin Authorization:

RSpec.describe Bizy::Tools::MongodbFind do
  let(:company) { create(:company) }

  context "with non-admin user" do
    let(:regular_user) { create(:user, company: company, company_admin: false) }

    it "rejects access" do
      result = described_class.execute(
        session_token: "token",
        user: regular_user,
        company: company,
        collection: "bonuses"
      )
      expect(result[:error]).to eq("Admin access required")
    end
  end

  context "with admin user" do
    let(:admin_user) { create(:user, company: company, company_admin: true) }

    it "allows access and returns scoped results" do
      result = described_class.execute(
        session_token: "token",
        user: admin_user,
        company: company,
        collection: "bonuses",
        filter: { amount: 10 }
      )
      expect(result[:data]).to be_present
      expect(result[:error]).to be_nil
    end
  end
end

Read-Only Security Tests (Critical):

RSpec.describe Bizy::MongodbQueryExecutor do
  describe "read-only enforcement" do
    it "does not expose insert method" do
      executor = described_class.new
      expect(executor).not_to respond_to(:insert)
      expect(executor).not_to respond_to(:insert_one)
      expect(executor).not_to respond_to(:insert_many)
    end

    it "does not expose update method" do
      executor = described_class.new
      expect(executor).not_to respond_to(:update)
      expect(executor).not_to respond_to(:update_one)
      expect(executor).not_to respond_to(:update_many)
    end

    it "does not expose delete method" do
      executor = described_class.new
      expect(executor).not_to respond_to(:delete)
      expect(executor).not_to respond_to(:delete_one)
      expect(executor).not_to respond_to(:delete_many)
    end

    it "uses the read-only MongoDB client" do
      executor = described_class.new
      client = executor.send(:client)
      # Verify it's the mcp_readonly client, not default
      expect(client).to eq(Mongoid.client(:mcp_readonly))
    end

    it "enforces secondary-only read mode" do
      executor = described_class.new
      client = executor.send(:client)
      read_preference = client.options[:read]
      # Must be :secondary (not :secondary_preferred) to never touch primary
      expect(read_preference[:mode]).to eq(:secondary)
    end
  end
end

RSpec.describe Bizy::MongodbPipelineValidator do
  describe "#validate!" do
    it "blocks $out stage" do
      pipeline = [{ "$match" => {} }, { "$out" => "output_collection" }]
      expect { described_class.validate!(pipeline) }
        .to raise_error(Bizy::ForbiddenOperatorError, /\$out/)
    end

    it "blocks $merge stage" do
      pipeline = [{ "$match" => {} }, { "$merge" => { into: "target" } }]
      expect { described_class.validate!(pipeline) }
        .to raise_error(Bizy::ForbiddenOperatorError, /\$merge/)
    end

    it "blocks $lookup stage" do
      pipeline = [{ "$lookup" => { from: "other", as: "joined" } }]
      expect { described_class.validate!(pipeline) }
        .to raise_error(Bizy::ForbiddenOperatorError, /\$lookup/)
    end

    it "allows safe stages like $match, $group, $project" do
      pipeline = [
        { "$match" => { status: "active" } },
        { "$group" => { _id: "$type", count: { "$sum" => 1 } } },
        { "$project" => { type: "$_id", count: 1 } }
      ]
      expect { described_class.validate!(pipeline) }.not_to raise_error
    end
  end
end

LLM Context: MCP Documentation Resources

A key challenge for the LLM is having enough context to formulate meaningful queries. Collection and field names alone don't explain what the data means or how it's calculated.

Approach: Curated Documentation as MCP Resources

Expose a curated subset of documentation as MCP resources that the LLM can discover and fetch:

sequenceDiagram
    participant User
    participant LLM
    participant MCP as MCP Server
    participant Docs as doc/mcp_resources/

    User->>LLM: "How are collaboration communities calculated?"
    LLM->>MCP: list_resources(pattern: "collaboration*")
    MCP-->>LLM: [collaboration_communities.md]
    LLM->>MCP: fetch_resource("bonusly://docs/features/collaboration_communities")
    MCP->>Docs: Read curated doc
    Docs-->>MCP: Sanitized markdown
    MCP-->>LLM: Document content
    LLM->>LLM: Now understands data model, weights, etc.
    LLM->>MCP: mongodb_aggregate(collection: "monthly_user_connections_weighted", ...)
Loading

Security Considerations for Documentation Resources

Exposing internal docs introduces risks:

Risk Example Mitigation
Implementation detail leakage Class names, file paths, dependencies Curated docs only - not entire /doc folder
Security mechanism exposure "Filter by company_id at query layer" Separate user-facing docs from engineering docs
Prompt injection Malicious content in doc files Treat doc content as untrusted input
Accidental secrets API keys, hostnames in examples Review process + automated scanning
Collection enumeration References to non-allowlisted collections Docs only reference allowlisted collections

Implementation: Curated Resources Folder

Create a dedicated folder for LLM-consumable documentation:

doc/
├── internal/              # Engineering docs (NOT exposed via MCP)
│   └── security/
│       └── how_scoping_works.md
│
└── mcp_resources/         # Curated for LLM consumption
    ├── collections/
    │   ├── bonuses.md
    │   ├── users.md
    │   ├── check_ins.md
    │   └── monthly_user_connections_weighted.md
    ├── features/
    │   └── collaboration_communities.md  # Sanitized version
    └── examples/
        └── common_queries.md

Resource Handler Implementation

# lib/bizy/mcp_resource_handler.rb
module Bizy
  class McpResourceHandler
    RESOURCES_PATH = Rails.root.join("doc/mcp_resources")

    # Only serve from curated folder - never expose /doc directly
    def list_resources
      Dir.glob("#{RESOURCES_PATH}/**/*.md").map do |path|
        relative = path.sub("#{RESOURCES_PATH}/", "")
        {
          uri: "bonusly://docs/#{relative.sub('.md', '')}",
          name: extract_title(path),
          description: extract_description(path),
          mimeType: "text/markdown"
        }
      end
    end

    def fetch_resource(uri)
      # Validate URI format and prevent path traversal
      relative_path = uri.sub("bonusly://docs/", "")
      raise InvalidResourceError unless relative_path.match?(/\A[\w\-\/]+\z/)

      full_path = RESOURCES_PATH.join("#{relative_path}.md")

      # Ensure path is within RESOURCES_PATH (prevent traversal)
      unless full_path.to_s.start_with?(RESOURCES_PATH.to_s)
        raise InvalidResourceError, "Invalid resource path"
      end

      raise ResourceNotFoundError unless File.exist?(full_path)

      File.read(full_path)
    end

    private

    def extract_title(path)
      first_line = File.foreach(path).first
      first_line&.gsub(/^#\s*/, '')&.strip || File.basename(path, '.md').titleize
    end

    def extract_description(path)
      # Extract first paragraph after title
      content = File.read(path)
      paragraphs = content.split(/\n\n+/)
      paragraphs[1]&.gsub(/\n/, ' ')&.truncate(200) || ""
    end
  end
end

Collection Registry Integration

Link collections to their documentation:

COLLECTIONS = {
  "bonuses" => {
    scope_type: :company,
    scope_field: "company_id",
    description: "Recognition given between employees",
    docs: ["collections/bonuses"]  # MCP resource URIs
  },
  "monthly_user_connections_weighted" => {
    scope_type: :company,
    scope_field: "company_id",
    description: "Pre-computed weighted edges for collaboration analysis",
    docs: [
      "collections/monthly_user_connections_weighted",
      "features/collaboration_communities"
    ]
  }
  # ...
}

def self.with_docs(collection_name)
  config = COLLECTIONS[collection_name]
  return config unless config

  config.merge(
    doc_uris: config[:docs]&.map { |d| "bonusly://docs/#{d}" } || []
  )
end

Discovery Tools for LLM

Add dedicated discovery tools to help the LLM find relevant documentation:

# lib/bizy/tools/mongodb_list_collections.rb
module Bizy
  module Tools
    class MongodbListCollections < Bizy::BaseTool
      tool_name "mongodb_list_collections"
      description "List available MongoDB collections with descriptions and documentation links."

      def call
        require_admin!

        collections = MongodbCollectionRegistry::COLLECTIONS.map do |name, config|
          {
            name: name,
            description: config[:description],
            scope_type: config[:scope_type],
            doc_resources: config[:docs]&.map { |d| "bonusly://docs/#{d}" } || []
          }
        end

        success(collections: collections)
      end
    end
  end
end

# lib/bizy/tools/mongodb_describe_collection.rb
module Bizy
  module Tools
    class MongodbDescribeCollection < Bizy::BaseTool
      tool_name "mongodb_describe_collection"
      description "Get detailed information about a collection including sample fields and related documentation."

      input_schema do
        {
          type: "object",
          properties: {
            collection: { type: "string", description: "Collection name" }
          },
          required: ["collection"]
        }
      end

      def call
        require_admin!

        collection = arguments[:collection]
        config = MongodbCollectionRegistry.config(collection)

        unless config
          return error("Collection not available")
        end

        # Get sample document to show field structure
        scoper = MongodbQueryScoper.new(company: company)
        executor = MongodbQueryExecutor.new

        sample = executor.find(
          collection: collection,
          filter: scoper.scope_find(collection: collection, filter: {}),
          limit: 1
        ).first

        success(
          name: collection,
          description: config[:description],
          scope_type: config[:scope_type],
          scope_field: config[:scope_field],
          sample_fields: sample&.keys || [],
          doc_resources: config[:docs]&.map { |d| "bonusly://docs/#{d}" } || []
        )
      end
    end
  end
end

Documentation Content Guidelines

Curated docs in doc/mcp_resources/ should:

  1. Be written for end-user/admin audience - Not engineering implementation details
  2. Focus on data meaning - What does this field represent? What questions can it answer?
  3. Include example queries - Show common use cases with actual query patterns
  4. Reference only allowlisted collections - Don't mention internal collections
  5. Avoid implementation details - No class names, file paths, or internal architecture
  6. Be reviewed before merge - Security review for any new MCP resource docs

Example curated doc structure:

# Bonuses Collection

Recognition events where employees appreciate each other's contributions.

## Key Fields

- `giver_id` - User who gave the recognition
- `receiver_ids` - Users who received the recognition (array)
- `amount` - Points awarded (integer)
- `reason` - Message explaining the recognition
- `created_at` - When the recognition was given
- `hashtags` - Company values tagged (array of strings)

## Common Query Patterns

**Recognition given by a specific user:**
filter: { "giver_id": "user_id_here" }

**Recognition in a date range:**
filter: { "created_at": { "$gte": "2025-01-01", "$lte": "2025-01-31" } }

**Recognition mentioning a specific value:**
filter: { "hashtags": "#teamwork" }

## Related Documentation

- [Collaboration Communities](bonusly://docs/features/collaboration_communities) - How recognition patterns create collaboration maps

Review Process for MCP Resources

Before any doc is added to doc/mcp_resources/:

  1. Security review - No internal paths, class names, dependencies, or security mechanisms
  2. No secrets - Scan for API keys, tokens, hostnames, real user/company data
  3. Allowlisted references only - Only mention collections in the registry
  4. User-facing language - Written for admins, not engineers
  5. PR approval required - Standard review process with explicit MCP resource checkbox

Documentation Automation

Keep MCP docs in sync with collection schemas using schema-based detection (not source file tracking). The process runs locally first, with CI as a backup safety net.

Detection Strategy

Only regenerate docs when the actual data structure changes:

Trigger Detected By Frequency
New field added to collection Sample doc comparison Rare
Field removed from collection Sample doc comparison Rare
Registry description changed Registry vs frontmatter Rare
New collection added to registry Missing doc file Rare
Source code refactoring Not tracked N/A

Doc Frontmatter (Tracking Metadata)

---
collection: bonuses
documented_fields: [giver_id, receiver_ids, amount, reason, created_at, hashtags]
registry_description: "Recognition given between employees"
last_generated: 2025-01-29T14:30:00Z
---

# Bonuses Collection
...

Local Workflow (Primary)

# Check which docs are stale
rake mcp:check_docs

# Regenerate stale docs using LLM
rake mcp:generate_docs

# Regenerate a specific collection
rake mcp:generate_docs[bonuses]

Rake Tasks

# lib/tasks/mcp_docs.rake
namespace :mcp do
  desc "Check for stale MCP documentation"
  task check_docs: :environment do
    checker = Bizy::McpDocFreshnessChecker.new
    stale = checker.stale_docs

    if stale.empty?
      puts "All MCP docs are up to date."
    else
      puts "Stale documentation detected:"
      stale.each do |s|
        reasons = []
        reasons << "new fields: #{s[:reason][:added_fields].join(', ')}" if s[:reason][:added_fields].any?
        reasons << "removed fields: #{s[:reason][:removed_fields].join(', ')}" if s[:reason][:removed_fields].any?
        reasons << "description changed" if s[:reason][:description_changed]
        reasons << "doc missing" if s[:reason][:missing]
        puts "  #{s[:collection]}: #{reasons.join(', ')}"
      end
      puts "\nRun 'rake mcp:generate_docs' to update."
      exit 1
    end
  end

  desc "Generate/update MCP documentation for collections"
  task :generate_docs, [:collection] => :environment do |_, args|
    generator = Bizy::McpDocGenerator.new

    collections = if args[:collection]
      { args[:collection] => Bizy::MongodbCollectionRegistry.config(args[:collection]) }
    else
      Bizy::MongodbCollectionRegistry::COLLECTIONS
    end

    collections.each do |name, config|
      puts "Generating docs for #{name}..."
      generator.generate(collection: name, config: config)
    end

    puts "Done. Review changes and commit."
  end
end

Freshness Checker

# lib/bizy/mcp_doc_freshness_checker.rb
module Bizy
  class McpDocFreshnessChecker
    DOCS_PATH = Rails.root.join("doc/mcp_resources/collections")

    def stale_docs
      stale = []

      MongodbCollectionRegistry::COLLECTIONS.each do |name, config|
        doc_path = DOCS_PATH.join("#{name}.md")

        unless File.exist?(doc_path)
          stale << { collection: name, reason: { missing: true } }
          next
        end

        doc = McpDocParser.parse(doc_path)
        sample = fetch_sample_document(name, config)

        current_fields = sample.keys.sort
        documented_fields = doc.metadata[:documented_fields]&.sort || []

        added = current_fields - documented_fields
        removed = documented_fields - current_fields
        description_changed = config[:description] != doc.metadata[:registry_description]

        if added.any? || removed.any? || description_changed
          stale << {
            collection: name,
            reason: {
              added_fields: added,
              removed_fields: removed,
              description_changed: description_changed
            }
          }
        end
      end

      stale
    end

    private

    def fetch_sample_document(name, config)
      Mongoid.client(:mcp_readonly)[name].find({}).limit(1).first || {}
    end
  end
end

LLM-Assisted Generator

# lib/bizy/mcp_doc_generator.rb
module Bizy
  class McpDocGenerator
    DOCS_PATH = Rails.root.join("doc/mcp_resources/collections")

    def generate(collection:, config:)
      sample = fetch_sample_document(collection, config)

      content = generate_with_llm(
        collection: collection,
        description: config[:description],
        fields: sample.keys,
        sample: sanitize_sample(sample)
      )

      frontmatter = {
        collection: collection,
        documented_fields: sample.keys.sort,
        registry_description: config[:description],
        last_generated: Time.current.iso8601
      }

      write_doc(collection, frontmatter, content)
    end

    private

    def generate_with_llm(collection:, description:, fields:, sample:)
      prompt = <<~PROMPT
        Generate user-facing documentation for a MongoDB collection.

        Collection: #{collection}
        Description: #{description}
        Fields: #{fields.join(', ')}
        Sample document (sanitized): #{sample.to_json}

        Guidelines:
        - Write for admin users, not engineers
        - Explain what each field means semantically
        - Include 3-5 common query patterns with example filters
        - Do not mention internal class names, file paths, or implementation details
        - Use markdown format with ## headers for sections

        Required sections:
        1. Brief description (1-2 sentences)
        2. Key Fields (list with descriptions)
        3. Common Query Patterns (with example filters)
      PROMPT

      Bizy::LlmClient.complete(prompt)
    end

    def sanitize_sample(sample)
      # Remove PII, replace IDs with placeholders
      sample.transform_values do |v|
        case v
        when BSON::ObjectId then "<ObjectId>"
        when String then v.truncate(50)
        when Array then v.first(3)
        else v
        end
      end
    end

    def write_doc(collection, frontmatter, content)
      path = DOCS_PATH.join("#{collection}.md")
      FileUtils.mkdir_p(DOCS_PATH)

      File.write(path, <<~DOC)
        ---
        #{frontmatter.to_yaml.lines[1..].join}---

        #{content}
      DOC
    end
  end
end

CI Safety Net (Backup)

# spec/lib/bizy/mcp_documentation_freshness_spec.rb
RSpec.describe "MCP Documentation Freshness" do
  it "has up-to-date documentation for all collections" do
    checker = Bizy::McpDocFreshnessChecker.new
    stale = checker.stale_docs

    if stale.any?
      message = stale.map do |s|
        if s[:reason][:missing]
          "#{s[:collection]}: missing doc file"
        else
          reasons = []
          reasons << "new fields: #{s[:reason][:added_fields]}" if s[:reason][:added_fields]&.any?
          reasons << "removed fields: #{s[:reason][:removed_fields]}" if s[:reason][:removed_fields]&.any?
          reasons << "description changed" if s[:reason][:description_changed]
          "#{s[:collection]}: #{reasons.join(', ')}"
        end
      end.join("\n")

      fail "Stale MCP docs detected:\n#{message}\n\nRun locally: rake mcp:generate_docs"
    end
  end
end

Workflow Summary

flowchart TD
    A[Developer adds/changes collection] --> B[Run rake mcp:check_docs locally]
    B --> C{Stale docs?}
    C -->|No| D[Continue with PR]
    C -->|Yes| E[Run rake mcp:generate_docs]
    E --> F[LLM generates updated docs]
    F --> G[Developer reviews & edits]
    G --> H[Commit changes]
    H --> D
    D --> I[CI runs freshness spec]
    I --> J{Passes?}
    J -->|Yes| K[PR mergeable]
    J -->|No| L[CI fails with instructions]
    L --> B
Loading

Implementation Order

Phase 1: Infrastructure

  1. Create read-only MongoDB user (ops/infra) - Create bizy_mcp_readonly user with read role only
  2. Configure Mongoid client - Add mcp_readonly client to config/mongoid.yml with secondary-only reads

Phase 2: Core Query Components (with specs)

  1. Create MongodbQueryExecutor + spec
  • Read-only client, only find/aggregate/count methods
  • Spec: read-only enforcement, secondary-only mode
  1. Create MongodbCollectionRegistry + spec
  • Collection allowlist with scope types
  • Spec: allowed/disallowed collections
  1. Create MongodbSecurityValidator + spec
  • Forbidden operators, recursive sanitization, field validation
  • Spec: all 11 attack vectors (nested injection, $where, $unionWith, etc.)
  1. Create MongodbQueryScoper + spec
  • Company/user filter injection using SecurityValidator
  • Spec: company scoping, user scoping, validation failures
  1. Add rack-attack throttle for MCP endpoints + audit logging
  • Throttle by company_id (decoded from token)
  • Audit logging helper for query tracking
  • Spec: throttle enforcement via request specs

Phase 3: Query Tools (with specs)

  1. Create mongodb_find tool + spec
  • Admin auth, uses scoper and executor
  • Spec: admin auth, scoped results, error handling
  1. Create mongodb_aggregate tool + spec
  • Admin auth, pipeline validation
  • Spec: pipeline validation, scoped aggregation
  1. Create mongodb_count tool + spec
  • Admin auth, scoped count
    • Spec: scoped count queries

Phase 4: Discovery Tools (with specs)

  1. Create mongodb_list_collections tool + spec
  • Lists available collections with descriptions
  1. Create mongodb_describe_collection tool + spec
  • Collection details with sample fields

Phase 5: Documentation Resources

  1. Create McpResourceHandler + spec
  • Serve curated docs, path traversal protection
  1. Create McpDocParser
  • Parse frontmatter metadata from doc files
  1. Create McpDocFreshnessChecker + spec
  • Schema-based staleness detection
  1. Create McpDocGenerator + spec
  • LLM-assisted documentation generation
  1. Create rake tasks (mcp:check_docs, mcp:generate_docs)
  2. Generate initial doc/mcp_resources/ docs for ALL allowlisted collections
  3. Create CI freshness spec as safety net

Phase 6: Integration Testing

  1. Write end-to-end integration specs
  • Full flow: auth → scope → validate → execute → return

Files Changed/Created

Infrastructure:

  • config/mongoid.yml (modify) - Add mcp_readonly client configuration with secondary-only reads
  • Environment variable: MONGODB_MCP_READONLY_URI - Read-only connection string

New Ruby files:

  • lib/bizy/mongodb_query_executor.rb - Read-only query execution using mcp_readonly client
  • lib/bizy/mongodb_collection_registry.rb - Collection allowlist with scope types and doc references
  • lib/bizy/mongodb_security_validator.rb - Filter/pipeline validation, forbidden operators, sanitization
  • lib/bizy/mongodb_query_scoper.rb - Company/user filter injection with security validation
  • config/initializers/rack_attack.rb (modify) - Add MCP endpoint throttle by company_id
  • lib/bizy/mcp_audit_logger.rb - Audit logging for MCP queries
  • lib/bizy/mcp_resource_handler.rb - MCP resource handler for curated documentation
  • lib/bizy/mcp_doc_freshness_checker.rb - Schema-based staleness detection
  • lib/bizy/mcp_doc_generator.rb - LLM-assisted documentation generation
  • lib/bizy/mcp_doc_parser.rb - Parse frontmatter metadata from doc files
  • lib/tasks/mcp_docs.rake - Rake tasks for local doc management
  • lib/bizy/tools/mongodb_find.rb - Find tool with admin auth
  • lib/bizy/tools/mongodb_aggregate.rb - Aggregate tool with admin auth
  • lib/bizy/tools/mongodb_count.rb - Count tool with admin auth
  • lib/bizy/tools/mongodb_list_collections.rb - Discovery tool for listing collections
  • lib/bizy/tools/mongodb_describe_collection.rb - Discovery tool for collection details

Curated Documentation (MCP Resources):

One documentation file per allowlisted collection, plus feature docs:

  • doc/mcp_resources/collections/{collection_name}.md - One for EACH collection in the registry
    • bonuses.md - Recognition data model and query patterns
    • users.md - Employee records and fields
    • check_ins.md - Check-in data model
    • monthly_user_statistics.md - Monthly aggregated metrics
    • monthly_user_connections_weighted.md - Collaboration edge weights
    • app_notifications.md - User notifications
    • llm_runs.md - LLM query history
    • user_content_dismissals.md - Dismissed content tracking
    • (+ any additional collections added to registry)
  • doc/mcp_resources/features/collaboration_communities.md - Cross-collection feature docs
  • doc/mcp_resources/examples/common_queries.md - Example query patterns across collections

Specs:

  • spec/lib/bizy/mongodb_query_executor_spec.rb - Read-only enforcement tests
  • spec/lib/bizy/mongodb_collection_registry_spec.rb - Allowlist tests
  • spec/lib/bizy/mongodb_security_validator_spec.rb - Attack vector tests (all 11 vectors)
  • spec/lib/bizy/mongodb_query_scoper_spec.rb - Scoping tests for company and user types
  • spec/requests/mcp/rate_limiting_spec.rb - Rack-attack throttle tests
  • spec/lib/bizy/mcp_resource_handler_spec.rb - Resource handler with path traversal tests
  • spec/lib/bizy/mcp_doc_freshness_checker_spec.rb - Staleness detection tests
  • spec/lib/bizy/mcp_doc_generator_spec.rb - Generator tests
  • spec/lib/bizy/mcp_documentation_freshness_spec.rb - CI safety net spec
  • spec/lib/bizy/tools/mongodb_find_spec.rb - Integration tests with admin auth
  • spec/lib/bizy/tools/mongodb_aggregate_spec.rb - Integration tests with pipeline validation
  • spec/lib/bizy/tools/mongodb_count_spec.rb - Integration tests
  • spec/lib/bizy/tools/mongodb_list_collections_spec.rb - Discovery tool tests
  • spec/lib/bizy/tools/mongodb_describe_collection_spec.rb - Discovery tool tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment