sequenceDiagram
participant LLM as LLM Agent
participant MCP as MCP Tool
participant Auth as Session Token
participant Rewrite as Query Rewriter
participant DB as MongoDB
LLM->>MCP: Query with Bearer token
MCP->>Auth: Decode token
Auth-->>MCP: user_id
MCP->>MCP: Load user → company
MCP->>Rewrite: Rewrite query with company_id
Rewrite->>Rewrite: Validate + inject filter
Rewrite-->>MCP: Scoped query
MCP->>DB: Execute query
DB-->>MCP: Results
MCP-->>LLM: Scoped results
This approach balances flexibility with security:
- Admin-only access - Only company admins can use these MCP tools
- Allowlist collections - Only permit queries against pre-defined collections
- Auto-inject scoping filter - Based on collection type:
- Company-scoped: Inject
company_id = user.company_id - User-scoped: Inject
user_id IN (all company user IDs)
- Validate aggregation pipelines - Ensure $match is first stage, inject scoping filter
- Block dangerous operators - Disallow $out, $merge, and unrestricted $lookup
- Read-only - Only support find, aggregate, count operations
flowchart TD
subgraph auth [Authorization]
A[Request with token] --> B[Decode token]
B --> C{Is user admin?}
C -->|No| D[Reject: 403]
C -->|Yes| E[Get company_id]
end
subgraph scope [Scoping]
E --> F{Collection type?}
F -->|company_id field| G["Inject: company_id = X"]
F -->|user_id field| H[Get all company user IDs]
H --> I["Inject: user_id IN [...]"]
end
G --> J[Execute query]
I --> J
Company-scoped collections (most collections):
- Have a
company_idfield - Scoper injects:
{ company_id: company.id } - Any user-provided
company_idis overwritten (prevents injection)
User-scoped collections (e.g., app_notifications, llm_runs):
- Have a
user_idfield but nocompany_id - Caller must specify
user_idor{ user_id: { $in: [...] } } - Scoper validates ALL specified user_ids belong to the company
- Rejects entire query if any user_id is outside the company
Location: lib/bizy/mongodb_query_executor.rb
This is the ONLY interface to MongoDB for MCP tools. It uses the read-only client and exposes only read methods:
module Bizy
class MongodbQueryExecutor
MAX_LIMIT = 100
DEFAULT_LIMIT = 20
QUERY_TIMEOUT_MS = 30_000
def initialize
@client = Mongoid.client(:mcp_readonly)
end
# The ONLY three public methods - no write operations exist
def find(collection:, filter:, projection: nil, sort: nil, limit: DEFAULT_LIMIT)
limit = [limit.to_i, MAX_LIMIT].min
cursor = @client[collection]
.find(filter)
.projection(projection || {})
.sort(sort || { _id: -1 })
.limit(limit)
.max_time_ms(QUERY_TIMEOUT_MS)
cursor.to_a
end
def aggregate(collection:, pipeline:)
@client[collection].aggregate(
pipeline,
max_time_ms: QUERY_TIMEOUT_MS
).to_a
end
def count(collection:, filter:)
@client[collection]
.count_documents(filter, max_time_ms: QUERY_TIMEOUT_MS)
end
# NO insert, update, delete, bulk_write methods exist
# The class literally cannot write - it only has read methods
end
endLocation: lib/bizy/mongodb_query_scoper.rb
This service handles query rewriting for both company-scoped and user-scoped collections:
module Bizy
class MongodbQueryScoper
def initialize(company:)
@company = company
@company_id = company.id
end
def scope_find(collection:, filter:)
validate_collection!(collection)
merge_scope_filter(collection, filter)
end
def scope_aggregate(collection:, pipeline:)
validate_collection!(collection)
MongodbPipelineValidator.validate!(pipeline)
inject_scope_into_pipeline(collection, pipeline)
end
private
def validate_collection!(collection)
unless MongodbCollectionRegistry.allowed?(collection)
raise Bizy::UnauthorizedCollectionError,
"Collection '#{collection}' is not available for querying"
end
end
def merge_scope_filter(collection, filter)
config = MongodbCollectionRegistry.config(collection)
case config[:scope_type]
when :company
# Company-scoped: inject company_id, ignore any user-provided company_id
filter.merge(config[:scope_field] => @company_id)
when :user
# User-scoped: validate and scope user_ids from filter
scope_user_filter(config[:scope_field], filter)
else
raise "Unknown scope type: #{config[:scope_type]}"
end
end
def scope_user_filter(scope_field, filter)
requested_user_ids = extract_user_ids(filter[scope_field])
if requested_user_ids.present?
# Validate ALL requested user_ids belong to this company
validate_user_ids_in_company!(requested_user_ids)
# Keep the filter as-is (already validated)
filter
else
# No user_ids specified - require them for user-scoped queries
raise Bizy::InvalidFilterError,
"user_id is required for this collection. Provide a specific user_id or array of user_ids."
end
end
def extract_user_ids(user_id_filter)
case user_id_filter
when BSON::ObjectId, String
[normalize_id(user_id_filter)]
when Hash
# Handle { "$in" => [...] } syntax
user_id_filter["$in"]&.map { |id| normalize_id(id) }
when Array
user_id_filter.map { |id| normalize_id(id) }
else
nil
end
end
def validate_user_ids_in_company!(user_ids)
# Check that ALL user_ids belong to this company
valid_count = @company.users.where(:_id.in => user_ids).count
unless valid_count == user_ids.length
raise Bizy::UnauthorizedUserError,
"One or more user_ids do not belong to this company"
end
end
def normalize_id(id)
id.is_a?(BSON::ObjectId) ? id : BSON::ObjectId(id.to_s)
end
def inject_scope_into_pipeline(collection, pipeline)
# For aggregation, extract user_ids from first $match stage and validate
config = MongodbCollectionRegistry.config(collection)
case config[:scope_type]
when :company
inject_company_filter(config[:scope_field], pipeline)
when :user
validate_user_filter_in_pipeline(config[:scope_field], pipeline)
end
pipeline
end
def inject_company_filter(scope_field, pipeline)
if pipeline.first&.key?("$match")
pipeline.first["$match"][scope_field] = @company_id
else
pipeline.unshift({ "$match" => { scope_field => @company_id } })
end
end
def validate_user_filter_in_pipeline(scope_field, pipeline)
first_match = pipeline.first&.dig("$match")
raise Bizy::InvalidFilterError, "Pipeline must start with $match containing user_id" unless first_match
user_ids = extract_user_ids(first_match[scope_field])
raise Bizy::InvalidFilterError, "user_id is required in first $match stage" if user_ids.blank?
validate_user_ids_in_company!(user_ids)
end
end
endKey design decisions for user-scoped collections:
- Caller must specify user_ids - No automatic "all company users" query (too expensive, rarely what you want)
- Validation before execution - All specified user_ids are validated to belong to the company
- Fail closed - If any user_id is invalid/outside company, the entire query fails
- Flexible input - Accepts single ID, array, or
{ "$in": [...] }syntax
Create three tools following the existing Bizy::BaseTool pattern.
All tools require admin authorization:
# Shared authorization helper in BaseTool or module
def require_admin!
unless user.company_admin? || user.can_administer?(:global)
raise Bizy::NotAuthorizedError, "Admin access required"
end
enda. Find Tool - lib/bizy/tools/mongodb_find.rb
module Bizy
module Tools
class MongodbFind < Bizy::BaseTool
tool_name "mongodb_find"
description "Query documents from a MongoDB collection. Admin access required. Results are automatically scoped to your company."
input_schema do
{
type: "object",
properties: {
collection: { type: "string", description: "Collection name (e.g., 'bonuses', 'users')" },
filter: { type: "object", description: "MongoDB query filter" },
projection: { type: "object", description: "Fields to include/exclude" },
sort: { type: "object", description: "Sort order" },
limit: { type: "integer", description: "Max docs to return (default: 20, max: 100)" }
},
required: ["collection"]
}
end
def call
require_admin! # Only admins can use this tool
scoper = MongodbQueryScoper.new(company: company)
scoped_filter = scoper.scope_find(
collection: arguments[:collection],
filter: arguments[:filter] || {}
)
executor = MongodbQueryExecutor.new
results = executor.find(
collection: arguments[:collection],
filter: scoped_filter,
projection: arguments[:projection],
sort: arguments[:sort],
limit: arguments[:limit] || 20
)
success(
collection: arguments[:collection],
count: results.length,
documents: results
)
rescue Bizy::UnauthorizedCollectionError => e
error(e.message)
end
end
end
endb. Aggregate Tool - lib/bizy/tools/mongodb_aggregate.rb
c. Count Tool - lib/bizy/tools/mongodb_count.rb
Location: lib/bizy/mongodb_collection_registry.rb
Defines which collections are queryable and their scoping rules:
module Bizy
class MongodbCollectionRegistry
# Scoping types:
# - :company - Filter by company_id field
# - :user - Filter by user_id IN (all company user IDs)
COLLECTIONS = {
# Company-scoped collections (have company_id field)
"bonuses" => {
scope_type: :company,
scope_field: "company_id",
description: "Recognition given between employees"
},
"users" => {
scope_type: :company,
scope_field: "company_id",
description: "Employee records"
},
"check_ins" => {
scope_type: :company,
scope_field: "company_id",
description: "Employee check-ins"
},
"monthly_user_statistics" => {
scope_type: :company,
scope_field: "company_id",
description: "Monthly aggregated user metrics"
},
# User-scoped collections (have user_id but no company_id)
"app_notifications" => {
scope_type: :user,
scope_field: "user_id",
description: "User notifications"
},
"llm_runs" => {
scope_type: :user,
scope_field: "user_id",
description: "LLM query history"
},
"user_content_dismissals" => {
scope_type: :user,
scope_field: "user_id",
description: "Dismissed content by user"
},
# ... etc
}.freeze
def self.allowed?(collection_name)
COLLECTIONS.key?(collection_name)
end
def self.scope_type(collection_name)
COLLECTIONS.dig(collection_name, :scope_type)
end
def self.scope_field(collection_name)
COLLECTIONS.dig(collection_name, :scope_field)
end
def self.config(collection_name)
COLLECTIONS[collection_name]
end
end
endCRITICAL: The MCP tool must have ZERO ability to modify data. This is enforced at multiple layers:
Create a dedicated MongoDB user with read-only permissions for this MCP tool:
// MongoDB shell command to create read-only user
db.createUser({
user: "bizy_mcp_readonly",
pwd: "<secure-password>",
roles: [
{ role: "read", db: "bonusly_production" }
]
})Configure a separate Mongoid client for MCP queries with secondary-only reads:
# config/mongoid.yml
production:
clients:
default:
# ... normal read/write connection
mcp_readonly:
uri: <%= ENV['MONGODB_MCP_READONLY_URI'] %>
options:
read:
mode: :secondary # NOT secondary_preferred - ONLY secondary replicas
max_pool_size: 5
server_selection_timeout: 10 # Fail fast if no secondary availableImportant: Use :secondary mode, NOT :secondary_preferred. This ensures:
- Queries NEVER touch the primary database
- If no secondary is available, the query fails (rather than falling back to primary)
- Production primary is completely isolated from MCP query load
# In the MCP tool - use read-only client explicitly
def readonly_client
Mongoid.client(:mcp_readonly)
end
def execute_query(collection_name, filter)
readonly_client[collection_name].find(filter).to_a
endWhy this is the strongest layer: Even if application code has bugs, MongoDB itself will reject any write operations with a permission error.
Only expose read operations - no insert, update, delete, or bulk write methods:
module Bizy
class MongodbQueryExecutor
# ONLY these methods exist - no write methods at all
def find(collection:, filter:, projection: nil, sort: nil, limit: 20)
# ...
end
def aggregate(collection:, pipeline:)
# ...
end
def count(collection:, filter:)
# ...
end
# NO insert, update, delete, bulk_write, etc.
end
endBlock ALL aggregation stages that can write or modify data:
FORBIDDEN_STAGES = %w[
$out # Writes to collection
$merge # Writes/updates to collection
$set # Can modify documents (though read-only user blocks this)
$unset # Can modify documents
$replaceRoot # Safe for reads, but validate carefully
$replaceWith # Safe for reads, but validate carefully
].freeze
# These are write-capable but blocked by read-only user anyway:
# $addFields, $project, $group are safe (in-pipeline transforms only)
def validate_pipeline!(pipeline)
pipeline.each do |stage|
stage_name = stage.keys.first.to_s
if FORBIDDEN_STAGES.include?(stage_name)
raise Bizy::ForbiddenOperatorError,
"Aggregation stage '#{stage_name}' is not allowed"
end
end
endReject any query operators that could have side effects (none exist in MongoDB queries, but validate anyway):
def validate_filter!(filter)
# MongoDB find filters are inherently read-only
# But validate structure to prevent injection
raise Bizy::InvalidFilterError unless filter.is_a?(Hash)
end| Layer | What it blocks | Failure mode |
|---|---|---|
| Read-Only DB User | All writes | MongoDB permission error |
| Secondary-Only Mode | Primary DB access | Query fails if no secondary available |
| API Surface | No write methods exposed | Method doesn't exist |
| Pipeline Validation | $out, $merge, $lookup | Application error before query runs |
All layers are mandatory:
- Read-only DB user: Prevents any data modification at the database level
- Secondary-only mode: Isolates primary from query load; MCP can never affect write performance
- API surface: Defense against code bugs - write methods don't exist
- Pipeline validation: Defense against LLM-crafted malicious aggregation pipelines
$lookup stages can join to other collections, potentially exposing cross-company data. Two options:
Option A: Disallow $lookup entirely (simplest, most secure)
FORBIDDEN_PIPELINE_OPERATORS = %w[$out $merge $lookup $graphLookup].freezeOption B: Inject company filter into $lookup pipelines (more flexible, complex)
def secure_lookup(lookup_stage)
# Validate "from" is in allowed collections
# Inject company filter into lookup's pipeline
endRecommendation: Start with Option A (disallow $lookup), then consider Option B if needed.
- Result size limits - Cap response size to prevent memory issues
- Query timeout - Set maxTimeMS to prevent runaway queries
- Rate limiting - Limit queries per minute per company
- Audit logging - Log all queries for security review
- Read from secondary - Use secondary DB to reduce primary load
def execute_query(collection, pipeline)
collection.aggregate(
pipeline,
max_time_ms: 30_000,
read: { mode: :secondary_preferred }
).to_a
endPre-define parameterized query templates that the LLM fills in:
TEMPLATES = {
"recognition_by_user" => {
collection: "bonuses",
filter: ->(params) { { receiver_ids: params[:user_id] } }
}
}Pros: Maximum security, no query parsing Cons: Very limited flexibility, new use cases require code changes
Rejected because it defeats the purpose of "arbitrary queries" and requires constant maintenance.
Create company-filtered views in MongoDB itself.
Pros: Database-level security Cons: Can't create per-company views (impractical), views are static
Wrap the existing user-MongoDB-prod MCP server.
Pros: Less code to write Cons: Still need to intercept and rewrite queries, harder to control
This section documents all considered attack vectors and their mitigations.
Attack:
filter: {
"$or": [
{ "company_id": "other_company_id", "amount": { "$gt": 0 } },
{ "amount": { "$lt": 0 } }
]
}Problem: Simple filter.merge("company_id" => X) only sets top-level company_id. The $or clause still contains the attacker's company_id and will match documents from other companies.
Mitigation: Recursively sanitize company_id from entire filter structure:
def sanitize_scope_field!(filter, scope_field)
filter.each do |key, value|
if key.to_s == scope_field
filter.delete(key) # Remove any user-provided scope field
elsif value.is_a?(Hash)
sanitize_scope_field!(value, scope_field)
elsif value.is_a?(Array)
value.each { |v| sanitize_scope_field!(v, scope_field) if v.is_a?(Hash) }
end
end
end
# Then inject our scope field at top level
filter[scope_field] = @company_idAttack:
filter: { "$where": "this.company_id == 'other_company_id'" }Problem: $where executes JavaScript and can bypass any field-level filters.
Mitigation: Block dangerous filter operators:
FORBIDDEN_FILTER_OPERATORS = %w[$where $function].freeze
# Note: $expr is ALLOWED - it cannot bypass top-level field matches
def validate_filter_operators!(filter)
filter.each do |key, value|
if FORBIDDEN_FILTER_OPERATORS.include?(key.to_s)
raise Bizy::ForbiddenOperatorError, "Operator '#{key}' is not allowed"
end
validate_filter_operators!(value) if value.is_a?(Hash)
value.each { |v| validate_filter_operators!(v) if v.is_a?(Hash) } if value.is_a?(Array)
end
endAttack:
pipeline: [
{ "$match": { "company_id": my_company } },
{ "$unionWith": { "coll": "bonuses" } }, # Unions ALL bonuses, unfiltered!
{ "$match": { "company_id": "other_company" } }
]Problem: $unionWith brings in data from another collection without applying the company filter.
Mitigation: Block in forbidden stages list:
FORBIDDEN_STAGES = %w[
$out # Writes to collection
$merge # Writes to collection
$lookup # Cross-collection join (data leakage)
$graphLookup # Recursive cross-collection join
$unionWith # Unions unfiltered collection data
$documents # Creates documents from scratch
].freezeInitial concern:
filter: { "$expr": { "$eq": ["$company_id", "other_company_id"] } }Analysis: After deeper review, $expr cannot bypass top-level field matches.
MongoDB query semantics treat all top-level conditions as an implicit AND:
# After our injection:
{
"company_id" => my_company, # Our top-level field match
"$expr" => { "$eq": ["$company_id", "other"] } # Attacker's $expr
}Both conditions must be true:
company_id == my_company(TRUE for company's documents)$company_id == "other"→my_company == "other"(FALSE)
Result: No documents match. The attacker's $expr cannot override our field match.
Why we should ALLOW $expr:
- 60+ uses in existing analytics pipelines
- Required for
$lookupvariable comparisons ($$user_id) - Required for field-to-field comparisons
- Required for complex arithmetic conditions
- Cannot bypass top-level field matches
Mitigation (minimal): Optionally sanitize scope field references inside $expr to prevent confusion, but not strictly required for security.
Decision: $expr is ALLOWED - removed from FORBIDDEN_FILTER_OPERATORS.
Attack:
# Attacker sends string ID
filter: { "user_id": "507f1f77bcf86cd799439011" }
# Validation might check ObjectId, but filter uses stringProblem: Mismatch between validation type and query type could allow bypass.
Mitigation: Normalize IDs in the filter itself, not just during validation:
def normalize_user_ids_in_filter!(filter, scope_field)
user_id_value = filter[scope_field]
return unless user_id_value
filter[scope_field] = case user_id_value
when String
BSON::ObjectId(user_id_value)
when BSON::ObjectId
user_id_value
when Hash
if user_id_value["$in"]
{ "$in" => user_id_value["$in"].map { |id| BSON::ObjectId(id.to_s) } }
else
user_id_value
end
when Array
{ "$in" => user_id_value.map { |id| BSON::ObjectId(id.to_s) } }
end
endAttack:
filter: { "reason": { "$regex": "(a+)+$" } }Problem: Catastrophic backtracking in regex causes CPU exhaustion (ReDoS).
Mitigation:
MAX_REGEX_LENGTH = 100
def validate_regex!(filter)
filter.each do |key, value|
if key.to_s == "$regex" && value.to_s.length > MAX_REGEX_LENGTH
raise Bizy::InvalidFilterError, "Regex pattern too long"
end
validate_regex!(value) if value.is_a?(Hash)
value.each { |v| validate_regex!(v) if v.is_a?(Hash) } if value.is_a?(Array)
end
endAttack:
# Probe for collection names
collection: "secret_internal_collection"
# Error reveals: "Collection 'secret_internal_collection' is not available"
# Probe for user existence
filter: { "user_id": "..." }
# Different errors reveal user existenceProblem: Detailed error messages leak information about data existence.
Mitigation: Use generic error messages:
# Bad - reveals information
raise "User ID 'abc' does not belong to this company"
raise "Collection 'xyz' is not available"
# Good - generic errors
raise Bizy::NotAuthorizedError, "Query not permitted"Attack:
# Systematically extract all company data
loop do
results = query(filter: { "_id": { "$gt": last_id } }, limit: 100)
break if results.empty?
last_id = results.last["_id"]
endProblem: Even with proper scoping, attackers can extract entire datasets.
Mitigation: Rate limiting via rack-attack and audit logging:
# config/initializers/rack_attack.rb
Rack::Attack.throttle("mcp/company", limit: 30, period: 60) do |req|
if req.path.start_with?("/mcp/")
# Decode token to get company_id for throttle key
token = req.get_header("HTTP_AUTHORIZATION")&.sub("Bearer ", "")
if token
decoded = Bizy::SessionToken.decode(token) rescue nil
decoded&.dig(:company_id)&.to_s
end
end
end
# lib/bizy/mcp_audit_logger.rb
module Bizy
class McpAuditLogger
def self.log_query(user:, company:, collection:, filter:)
Rails.logger.info("[MCP Query Audit]", {
user_id: user.id.to_s,
company_id: company.id.to_s,
collection: collection,
filter_keys: filter.keys, # Log structure, not values (PII)
timestamp: Time.current.iso8601
})
end
end
endAttack:
pipeline: [
{ "$match": { "company_id": my_company } },
{ "$facet": {
"legitimate": [{ "$match": { "amount": 10 } }],
"attack": [{ "$unionWith": { "coll": "users" } }] # Nested forbidden stage!
}}
]Problem: $facet contains nested pipelines that might contain forbidden stages.
Mitigation: Recursively validate nested pipelines:
def validate_pipeline!(pipeline)
pipeline.each do |stage|
stage_name = stage.keys.first.to_s
raise ForbiddenOperatorError, "Stage '#{stage_name}' not allowed" if FORBIDDEN_STAGES.include?(stage_name)
# Recursively validate $facet sub-pipelines
if stage_name == "$facet"
stage["$facet"].each_value do |sub_pipeline|
validate_pipeline!(sub_pipeline)
end
end
end
endAttack:
filter: { "company_id\x00": "other" } # Null byte
filter: { "company_id\u200B": "other" } # Zero-width spaceProblem: Unicode tricks might bypass string matching for field names.
Mitigation: Validate field name characters:
VALID_FIELD_NAME = /\A[\w.$]+\z/
def validate_field_names!(obj)
return unless obj.is_a?(Hash)
obj.each do |key, value|
unless key.to_s.match?(VALID_FIELD_NAME)
raise Bizy::InvalidFilterError, "Invalid field name format"
end
validate_field_names!(value) if value.is_a?(Hash)
value.each { |v| validate_field_names!(v) if v.is_a?(Hash) } if value.is_a?(Array)
end
endAttack: Steal or replay a valid session token to impersonate another user.
Mitigation: Already handled by existing Bizy::SessionToken:
- Tokens expire after 1 hour
- Tokens are signed with secret key (can't forge)
- Tokens are generated server-side (LLM never sees user_id directly)
| Priority | Attack Vector | Mitigation | Status |
|---|---|---|---|
| CRITICAL | Nested filter injection | Recursive sanitization of scope field | Required |
| CRITICAL | $where JavaScript | Block $where, $function | Required |
| CRITICAL | $unionWith bypass | Block $unionWith, $documents | Required |
| SAFE | $expr override | ALLOWED - cannot bypass field matches | Safe to allow |
| MEDIUM | Type confusion | Normalize IDs in filter | Required |
| MEDIUM | ReDoS | Limit regex length | Required |
| MEDIUM | Info disclosure | Generic error messages | Required |
| MEDIUM | Enumeration | Rate limiting + audit logging | Required |
| MEDIUM | $facet nesting | Recursive pipeline validation | Required |
| LOW | Field name tricks | Validate field name characters | Required |
| LOW | Token theft | Existing token expiration/signing | Already handled |
module Bizy
class MongodbSecurityValidator
# Filter operators that can bypass field-level security
FORBIDDEN_FILTER_OPERATORS = %w[
$where # JavaScript execution - can bypass any filter
$function # JavaScript execution - can bypass any filter
].freeze
# NOTE: $expr is ALLOWED - it cannot bypass top-level field matches
# and is required for 60+ existing analytics pipelines
# Aggregation stages that can leak or write data
FORBIDDEN_STAGES = %w[
$out # Writes to collection
$merge # Writes to collection
$lookup # Cross-collection join (data leakage)
$graphLookup # Recursive cross-collection join
$unionWith # Unions unfiltered collection data
$documents # Creates documents from scratch
].freeze
MAX_REGEX_LENGTH = 100
VALID_FIELD_NAME = /\A[\w.$]+\z/
end
end- Unit tests for query scoper - Verify company_id is correctly injected
- Collection allowlist tests - Verify disallowed collections are rejected
- Pipeline validation tests - Verify dangerous operators are blocked
- Read-only security tests - Verify writes are impossible at multiple layers
- Integration tests - End-to-end query execution with auth
Query Scoper:
RSpec.describe Bizy::MongodbQueryScoper do
let(:company) { create(:company) }
let(:other_company) { create(:company) }
let(:user1) { create(:user, company: company) }
let(:user2) { create(:user, company: company) }
let(:other_user) { create(:user, company: other_company) }
describe "#scope_find" do
context "company-scoped collections" do
it "injects company_id into filter" do
scoper = described_class.new(company: company)
result = scoper.scope_find(collection: "bonuses", filter: { "amount" => 10 })
expect(result).to eq({ "amount" => 10, "company_id" => company.id })
end
it "overwrites any user-provided company_id" do
scoper = described_class.new(company: company)
result = scoper.scope_find(
collection: "bonuses",
filter: { "company_id" => other_company.id } # Attempted injection
)
expect(result["company_id"]).to eq(company.id) # Overwritten with correct company
end
end
context "user-scoped collections" do
it "allows querying with valid user_id" do
scoper = described_class.new(company: company)
result = scoper.scope_find(
collection: "app_notifications",
filter: { "user_id" => user1.id, "seen_at" => nil }
)
expect(result["user_id"]).to eq(user1.id)
end
it "allows querying with array of valid user_ids" do
scoper = described_class.new(company: company)
result = scoper.scope_find(
collection: "app_notifications",
filter: { "user_id" => { "$in" => [user1.id, user2.id] } }
)
expect(result["user_id"]).to eq({ "$in" => [user1.id, user2.id] })
end
it "rejects user_id from another company" do
scoper = described_class.new(company: company)
expect {
scoper.scope_find(
collection: "app_notifications",
filter: { "user_id" => other_user.id }
)
}.to raise_error(Bizy::UnauthorizedUserError)
end
it "rejects if any user_id in array is from another company" do
scoper = described_class.new(company: company)
expect {
scoper.scope_find(
collection: "app_notifications",
filter: { "user_id" => { "$in" => [user1.id, other_user.id] } }
)
}.to raise_error(Bizy::UnauthorizedUserError)
end
it "requires user_id for user-scoped collections" do
scoper = described_class.new(company: company)
expect {
scoper.scope_find(collection: "app_notifications", filter: {})
}.to raise_error(Bizy::InvalidFilterError, /user_id is required/)
end
end
it "rejects non-allowlisted collections" do
scoper = described_class.new(company: company)
expect { scoper.scope_find(collection: "secret_data", filter: {}) }
.to raise_error(Bizy::UnauthorizedCollectionError)
end
end
endAdmin Authorization:
RSpec.describe Bizy::Tools::MongodbFind do
let(:company) { create(:company) }
context "with non-admin user" do
let(:regular_user) { create(:user, company: company, company_admin: false) }
it "rejects access" do
result = described_class.execute(
session_token: "token",
user: regular_user,
company: company,
collection: "bonuses"
)
expect(result[:error]).to eq("Admin access required")
end
end
context "with admin user" do
let(:admin_user) { create(:user, company: company, company_admin: true) }
it "allows access and returns scoped results" do
result = described_class.execute(
session_token: "token",
user: admin_user,
company: company,
collection: "bonuses",
filter: { amount: 10 }
)
expect(result[:data]).to be_present
expect(result[:error]).to be_nil
end
end
endRead-Only Security Tests (Critical):
RSpec.describe Bizy::MongodbQueryExecutor do
describe "read-only enforcement" do
it "does not expose insert method" do
executor = described_class.new
expect(executor).not_to respond_to(:insert)
expect(executor).not_to respond_to(:insert_one)
expect(executor).not_to respond_to(:insert_many)
end
it "does not expose update method" do
executor = described_class.new
expect(executor).not_to respond_to(:update)
expect(executor).not_to respond_to(:update_one)
expect(executor).not_to respond_to(:update_many)
end
it "does not expose delete method" do
executor = described_class.new
expect(executor).not_to respond_to(:delete)
expect(executor).not_to respond_to(:delete_one)
expect(executor).not_to respond_to(:delete_many)
end
it "uses the read-only MongoDB client" do
executor = described_class.new
client = executor.send(:client)
# Verify it's the mcp_readonly client, not default
expect(client).to eq(Mongoid.client(:mcp_readonly))
end
it "enforces secondary-only read mode" do
executor = described_class.new
client = executor.send(:client)
read_preference = client.options[:read]
# Must be :secondary (not :secondary_preferred) to never touch primary
expect(read_preference[:mode]).to eq(:secondary)
end
end
end
RSpec.describe Bizy::MongodbPipelineValidator do
describe "#validate!" do
it "blocks $out stage" do
pipeline = [{ "$match" => {} }, { "$out" => "output_collection" }]
expect { described_class.validate!(pipeline) }
.to raise_error(Bizy::ForbiddenOperatorError, /\$out/)
end
it "blocks $merge stage" do
pipeline = [{ "$match" => {} }, { "$merge" => { into: "target" } }]
expect { described_class.validate!(pipeline) }
.to raise_error(Bizy::ForbiddenOperatorError, /\$merge/)
end
it "blocks $lookup stage" do
pipeline = [{ "$lookup" => { from: "other", as: "joined" } }]
expect { described_class.validate!(pipeline) }
.to raise_error(Bizy::ForbiddenOperatorError, /\$lookup/)
end
it "allows safe stages like $match, $group, $project" do
pipeline = [
{ "$match" => { status: "active" } },
{ "$group" => { _id: "$type", count: { "$sum" => 1 } } },
{ "$project" => { type: "$_id", count: 1 } }
]
expect { described_class.validate!(pipeline) }.not_to raise_error
end
end
endA key challenge for the LLM is having enough context to formulate meaningful queries. Collection and field names alone don't explain what the data means or how it's calculated.
Expose a curated subset of documentation as MCP resources that the LLM can discover and fetch:
sequenceDiagram
participant User
participant LLM
participant MCP as MCP Server
participant Docs as doc/mcp_resources/
User->>LLM: "How are collaboration communities calculated?"
LLM->>MCP: list_resources(pattern: "collaboration*")
MCP-->>LLM: [collaboration_communities.md]
LLM->>MCP: fetch_resource("bonusly://docs/features/collaboration_communities")
MCP->>Docs: Read curated doc
Docs-->>MCP: Sanitized markdown
MCP-->>LLM: Document content
LLM->>LLM: Now understands data model, weights, etc.
LLM->>MCP: mongodb_aggregate(collection: "monthly_user_connections_weighted", ...)
Exposing internal docs introduces risks:
| Risk | Example | Mitigation |
|---|---|---|
| Implementation detail leakage | Class names, file paths, dependencies | Curated docs only - not entire /doc folder |
| Security mechanism exposure | "Filter by company_id at query layer" | Separate user-facing docs from engineering docs |
| Prompt injection | Malicious content in doc files | Treat doc content as untrusted input |
| Accidental secrets | API keys, hostnames in examples | Review process + automated scanning |
| Collection enumeration | References to non-allowlisted collections | Docs only reference allowlisted collections |
Create a dedicated folder for LLM-consumable documentation:
doc/
├── internal/ # Engineering docs (NOT exposed via MCP)
│ └── security/
│ └── how_scoping_works.md
│
└── mcp_resources/ # Curated for LLM consumption
├── collections/
│ ├── bonuses.md
│ ├── users.md
│ ├── check_ins.md
│ └── monthly_user_connections_weighted.md
├── features/
│ └── collaboration_communities.md # Sanitized version
└── examples/
└── common_queries.md
# lib/bizy/mcp_resource_handler.rb
module Bizy
class McpResourceHandler
RESOURCES_PATH = Rails.root.join("doc/mcp_resources")
# Only serve from curated folder - never expose /doc directly
def list_resources
Dir.glob("#{RESOURCES_PATH}/**/*.md").map do |path|
relative = path.sub("#{RESOURCES_PATH}/", "")
{
uri: "bonusly://docs/#{relative.sub('.md', '')}",
name: extract_title(path),
description: extract_description(path),
mimeType: "text/markdown"
}
end
end
def fetch_resource(uri)
# Validate URI format and prevent path traversal
relative_path = uri.sub("bonusly://docs/", "")
raise InvalidResourceError unless relative_path.match?(/\A[\w\-\/]+\z/)
full_path = RESOURCES_PATH.join("#{relative_path}.md")
# Ensure path is within RESOURCES_PATH (prevent traversal)
unless full_path.to_s.start_with?(RESOURCES_PATH.to_s)
raise InvalidResourceError, "Invalid resource path"
end
raise ResourceNotFoundError unless File.exist?(full_path)
File.read(full_path)
end
private
def extract_title(path)
first_line = File.foreach(path).first
first_line&.gsub(/^#\s*/, '')&.strip || File.basename(path, '.md').titleize
end
def extract_description(path)
# Extract first paragraph after title
content = File.read(path)
paragraphs = content.split(/\n\n+/)
paragraphs[1]&.gsub(/\n/, ' ')&.truncate(200) || ""
end
end
endLink collections to their documentation:
COLLECTIONS = {
"bonuses" => {
scope_type: :company,
scope_field: "company_id",
description: "Recognition given between employees",
docs: ["collections/bonuses"] # MCP resource URIs
},
"monthly_user_connections_weighted" => {
scope_type: :company,
scope_field: "company_id",
description: "Pre-computed weighted edges for collaboration analysis",
docs: [
"collections/monthly_user_connections_weighted",
"features/collaboration_communities"
]
}
# ...
}
def self.with_docs(collection_name)
config = COLLECTIONS[collection_name]
return config unless config
config.merge(
doc_uris: config[:docs]&.map { |d| "bonusly://docs/#{d}" } || []
)
endAdd dedicated discovery tools to help the LLM find relevant documentation:
# lib/bizy/tools/mongodb_list_collections.rb
module Bizy
module Tools
class MongodbListCollections < Bizy::BaseTool
tool_name "mongodb_list_collections"
description "List available MongoDB collections with descriptions and documentation links."
def call
require_admin!
collections = MongodbCollectionRegistry::COLLECTIONS.map do |name, config|
{
name: name,
description: config[:description],
scope_type: config[:scope_type],
doc_resources: config[:docs]&.map { |d| "bonusly://docs/#{d}" } || []
}
end
success(collections: collections)
end
end
end
end
# lib/bizy/tools/mongodb_describe_collection.rb
module Bizy
module Tools
class MongodbDescribeCollection < Bizy::BaseTool
tool_name "mongodb_describe_collection"
description "Get detailed information about a collection including sample fields and related documentation."
input_schema do
{
type: "object",
properties: {
collection: { type: "string", description: "Collection name" }
},
required: ["collection"]
}
end
def call
require_admin!
collection = arguments[:collection]
config = MongodbCollectionRegistry.config(collection)
unless config
return error("Collection not available")
end
# Get sample document to show field structure
scoper = MongodbQueryScoper.new(company: company)
executor = MongodbQueryExecutor.new
sample = executor.find(
collection: collection,
filter: scoper.scope_find(collection: collection, filter: {}),
limit: 1
).first
success(
name: collection,
description: config[:description],
scope_type: config[:scope_type],
scope_field: config[:scope_field],
sample_fields: sample&.keys || [],
doc_resources: config[:docs]&.map { |d| "bonusly://docs/#{d}" } || []
)
end
end
end
endCurated docs in doc/mcp_resources/ should:
- Be written for end-user/admin audience - Not engineering implementation details
- Focus on data meaning - What does this field represent? What questions can it answer?
- Include example queries - Show common use cases with actual query patterns
- Reference only allowlisted collections - Don't mention internal collections
- Avoid implementation details - No class names, file paths, or internal architecture
- Be reviewed before merge - Security review for any new MCP resource docs
Example curated doc structure:
# Bonuses Collection
Recognition events where employees appreciate each other's contributions.
## Key Fields
- `giver_id` - User who gave the recognition
- `receiver_ids` - Users who received the recognition (array)
- `amount` - Points awarded (integer)
- `reason` - Message explaining the recognition
- `created_at` - When the recognition was given
- `hashtags` - Company values tagged (array of strings)
## Common Query Patterns
**Recognition given by a specific user:**
filter: { "giver_id": "user_id_here" }
**Recognition in a date range:**
filter: { "created_at": { "$gte": "2025-01-01", "$lte": "2025-01-31" } }
**Recognition mentioning a specific value:**
filter: { "hashtags": "#teamwork" }
## Related Documentation
- [Collaboration Communities](bonusly://docs/features/collaboration_communities) - How recognition patterns create collaboration mapsBefore any doc is added to doc/mcp_resources/:
- Security review - No internal paths, class names, dependencies, or security mechanisms
- No secrets - Scan for API keys, tokens, hostnames, real user/company data
- Allowlisted references only - Only mention collections in the registry
- User-facing language - Written for admins, not engineers
- PR approval required - Standard review process with explicit MCP resource checkbox
Keep MCP docs in sync with collection schemas using schema-based detection (not source file tracking). The process runs locally first, with CI as a backup safety net.
Only regenerate docs when the actual data structure changes:
| Trigger | Detected By | Frequency |
|---|---|---|
| New field added to collection | Sample doc comparison | Rare |
| Field removed from collection | Sample doc comparison | Rare |
| Registry description changed | Registry vs frontmatter | Rare |
| New collection added to registry | Missing doc file | Rare |
| Source code refactoring | Not tracked | N/A |
---
collection: bonuses
documented_fields: [giver_id, receiver_ids, amount, reason, created_at, hashtags]
registry_description: "Recognition given between employees"
last_generated: 2025-01-29T14:30:00Z
---
# Bonuses Collection
...# Check which docs are stale
rake mcp:check_docs
# Regenerate stale docs using LLM
rake mcp:generate_docs
# Regenerate a specific collection
rake mcp:generate_docs[bonuses]# lib/tasks/mcp_docs.rake
namespace :mcp do
desc "Check for stale MCP documentation"
task check_docs: :environment do
checker = Bizy::McpDocFreshnessChecker.new
stale = checker.stale_docs
if stale.empty?
puts "All MCP docs are up to date."
else
puts "Stale documentation detected:"
stale.each do |s|
reasons = []
reasons << "new fields: #{s[:reason][:added_fields].join(', ')}" if s[:reason][:added_fields].any?
reasons << "removed fields: #{s[:reason][:removed_fields].join(', ')}" if s[:reason][:removed_fields].any?
reasons << "description changed" if s[:reason][:description_changed]
reasons << "doc missing" if s[:reason][:missing]
puts " #{s[:collection]}: #{reasons.join(', ')}"
end
puts "\nRun 'rake mcp:generate_docs' to update."
exit 1
end
end
desc "Generate/update MCP documentation for collections"
task :generate_docs, [:collection] => :environment do |_, args|
generator = Bizy::McpDocGenerator.new
collections = if args[:collection]
{ args[:collection] => Bizy::MongodbCollectionRegistry.config(args[:collection]) }
else
Bizy::MongodbCollectionRegistry::COLLECTIONS
end
collections.each do |name, config|
puts "Generating docs for #{name}..."
generator.generate(collection: name, config: config)
end
puts "Done. Review changes and commit."
end
end# lib/bizy/mcp_doc_freshness_checker.rb
module Bizy
class McpDocFreshnessChecker
DOCS_PATH = Rails.root.join("doc/mcp_resources/collections")
def stale_docs
stale = []
MongodbCollectionRegistry::COLLECTIONS.each do |name, config|
doc_path = DOCS_PATH.join("#{name}.md")
unless File.exist?(doc_path)
stale << { collection: name, reason: { missing: true } }
next
end
doc = McpDocParser.parse(doc_path)
sample = fetch_sample_document(name, config)
current_fields = sample.keys.sort
documented_fields = doc.metadata[:documented_fields]&.sort || []
added = current_fields - documented_fields
removed = documented_fields - current_fields
description_changed = config[:description] != doc.metadata[:registry_description]
if added.any? || removed.any? || description_changed
stale << {
collection: name,
reason: {
added_fields: added,
removed_fields: removed,
description_changed: description_changed
}
}
end
end
stale
end
private
def fetch_sample_document(name, config)
Mongoid.client(:mcp_readonly)[name].find({}).limit(1).first || {}
end
end
end# lib/bizy/mcp_doc_generator.rb
module Bizy
class McpDocGenerator
DOCS_PATH = Rails.root.join("doc/mcp_resources/collections")
def generate(collection:, config:)
sample = fetch_sample_document(collection, config)
content = generate_with_llm(
collection: collection,
description: config[:description],
fields: sample.keys,
sample: sanitize_sample(sample)
)
frontmatter = {
collection: collection,
documented_fields: sample.keys.sort,
registry_description: config[:description],
last_generated: Time.current.iso8601
}
write_doc(collection, frontmatter, content)
end
private
def generate_with_llm(collection:, description:, fields:, sample:)
prompt = <<~PROMPT
Generate user-facing documentation for a MongoDB collection.
Collection: #{collection}
Description: #{description}
Fields: #{fields.join(', ')}
Sample document (sanitized): #{sample.to_json}
Guidelines:
- Write for admin users, not engineers
- Explain what each field means semantically
- Include 3-5 common query patterns with example filters
- Do not mention internal class names, file paths, or implementation details
- Use markdown format with ## headers for sections
Required sections:
1. Brief description (1-2 sentences)
2. Key Fields (list with descriptions)
3. Common Query Patterns (with example filters)
PROMPT
Bizy::LlmClient.complete(prompt)
end
def sanitize_sample(sample)
# Remove PII, replace IDs with placeholders
sample.transform_values do |v|
case v
when BSON::ObjectId then "<ObjectId>"
when String then v.truncate(50)
when Array then v.first(3)
else v
end
end
end
def write_doc(collection, frontmatter, content)
path = DOCS_PATH.join("#{collection}.md")
FileUtils.mkdir_p(DOCS_PATH)
File.write(path, <<~DOC)
---
#{frontmatter.to_yaml.lines[1..].join}---
#{content}
DOC
end
end
end# spec/lib/bizy/mcp_documentation_freshness_spec.rb
RSpec.describe "MCP Documentation Freshness" do
it "has up-to-date documentation for all collections" do
checker = Bizy::McpDocFreshnessChecker.new
stale = checker.stale_docs
if stale.any?
message = stale.map do |s|
if s[:reason][:missing]
"#{s[:collection]}: missing doc file"
else
reasons = []
reasons << "new fields: #{s[:reason][:added_fields]}" if s[:reason][:added_fields]&.any?
reasons << "removed fields: #{s[:reason][:removed_fields]}" if s[:reason][:removed_fields]&.any?
reasons << "description changed" if s[:reason][:description_changed]
"#{s[:collection]}: #{reasons.join(', ')}"
end
end.join("\n")
fail "Stale MCP docs detected:\n#{message}\n\nRun locally: rake mcp:generate_docs"
end
end
endflowchart TD
A[Developer adds/changes collection] --> B[Run rake mcp:check_docs locally]
B --> C{Stale docs?}
C -->|No| D[Continue with PR]
C -->|Yes| E[Run rake mcp:generate_docs]
E --> F[LLM generates updated docs]
F --> G[Developer reviews & edits]
G --> H[Commit changes]
H --> D
D --> I[CI runs freshness spec]
I --> J{Passes?}
J -->|Yes| K[PR mergeable]
J -->|No| L[CI fails with instructions]
L --> B
- Create read-only MongoDB user (ops/infra) - Create
bizy_mcp_readonlyuser withreadrole only - Configure Mongoid client - Add
mcp_readonlyclient toconfig/mongoid.ymlwith secondary-only reads
- Create
MongodbQueryExecutor+ spec
- Read-only client, only find/aggregate/count methods
- Spec: read-only enforcement, secondary-only mode
- Create
MongodbCollectionRegistry+ spec
- Collection allowlist with scope types
- Spec: allowed/disallowed collections
- Create
MongodbSecurityValidator+ spec
- Forbidden operators, recursive sanitization, field validation
- Spec: all 11 attack vectors (nested injection, $where, $unionWith, etc.)
- Create
MongodbQueryScoper+ spec
- Company/user filter injection using SecurityValidator
- Spec: company scoping, user scoping, validation failures
- Add rack-attack throttle for MCP endpoints + audit logging
- Throttle by company_id (decoded from token)
- Audit logging helper for query tracking
- Spec: throttle enforcement via request specs
- Create
mongodb_findtool + spec
- Admin auth, uses scoper and executor
- Spec: admin auth, scoped results, error handling
- Create
mongodb_aggregatetool + spec
- Admin auth, pipeline validation
- Spec: pipeline validation, scoped aggregation
- Create
mongodb_counttool + spec
- Admin auth, scoped count
- Spec: scoped count queries
- Create
mongodb_list_collectionstool + spec
- Lists available collections with descriptions
- Create
mongodb_describe_collectiontool + spec
- Collection details with sample fields
- Create
McpResourceHandler+ spec
- Serve curated docs, path traversal protection
- Create
McpDocParser
- Parse frontmatter metadata from doc files
- Create
McpDocFreshnessChecker+ spec
- Schema-based staleness detection
- Create
McpDocGenerator+ spec
- LLM-assisted documentation generation
- Create rake tasks (
mcp:check_docs,mcp:generate_docs) - Generate initial
doc/mcp_resources/docs for ALL allowlisted collections - Create CI freshness spec as safety net
- Write end-to-end integration specs
- Full flow: auth → scope → validate → execute → return
Infrastructure:
config/mongoid.yml(modify) - Addmcp_readonlyclient configuration with secondary-only reads- Environment variable:
MONGODB_MCP_READONLY_URI- Read-only connection string
New Ruby files:
lib/bizy/mongodb_query_executor.rb- Read-only query execution using mcp_readonly clientlib/bizy/mongodb_collection_registry.rb- Collection allowlist with scope types and doc referenceslib/bizy/mongodb_security_validator.rb- Filter/pipeline validation, forbidden operators, sanitizationlib/bizy/mongodb_query_scoper.rb- Company/user filter injection with security validationconfig/initializers/rack_attack.rb(modify) - Add MCP endpoint throttle by company_idlib/bizy/mcp_audit_logger.rb- Audit logging for MCP querieslib/bizy/mcp_resource_handler.rb- MCP resource handler for curated documentationlib/bizy/mcp_doc_freshness_checker.rb- Schema-based staleness detectionlib/bizy/mcp_doc_generator.rb- LLM-assisted documentation generationlib/bizy/mcp_doc_parser.rb- Parse frontmatter metadata from doc fileslib/tasks/mcp_docs.rake- Rake tasks for local doc managementlib/bizy/tools/mongodb_find.rb- Find tool with admin authlib/bizy/tools/mongodb_aggregate.rb- Aggregate tool with admin authlib/bizy/tools/mongodb_count.rb- Count tool with admin authlib/bizy/tools/mongodb_list_collections.rb- Discovery tool for listing collectionslib/bizy/tools/mongodb_describe_collection.rb- Discovery tool for collection details
Curated Documentation (MCP Resources):
One documentation file per allowlisted collection, plus feature docs:
doc/mcp_resources/collections/{collection_name}.md- One for EACH collection in the registrybonuses.md- Recognition data model and query patternsusers.md- Employee records and fieldscheck_ins.md- Check-in data modelmonthly_user_statistics.md- Monthly aggregated metricsmonthly_user_connections_weighted.md- Collaboration edge weightsapp_notifications.md- User notificationsllm_runs.md- LLM query historyuser_content_dismissals.md- Dismissed content tracking- (+ any additional collections added to registry)
doc/mcp_resources/features/collaboration_communities.md- Cross-collection feature docsdoc/mcp_resources/examples/common_queries.md- Example query patterns across collections
Specs:
spec/lib/bizy/mongodb_query_executor_spec.rb- Read-only enforcement testsspec/lib/bizy/mongodb_collection_registry_spec.rb- Allowlist testsspec/lib/bizy/mongodb_security_validator_spec.rb- Attack vector tests (all 11 vectors)spec/lib/bizy/mongodb_query_scoper_spec.rb- Scoping tests for company and user typesspec/requests/mcp/rate_limiting_spec.rb- Rack-attack throttle testsspec/lib/bizy/mcp_resource_handler_spec.rb- Resource handler with path traversal testsspec/lib/bizy/mcp_doc_freshness_checker_spec.rb- Staleness detection testsspec/lib/bizy/mcp_doc_generator_spec.rb- Generator testsspec/lib/bizy/mcp_documentation_freshness_spec.rb- CI safety net specspec/lib/bizy/tools/mongodb_find_spec.rb- Integration tests with admin authspec/lib/bizy/tools/mongodb_aggregate_spec.rb- Integration tests with pipeline validationspec/lib/bizy/tools/mongodb_count_spec.rb- Integration testsspec/lib/bizy/tools/mongodb_list_collections_spec.rb- Discovery tool testsspec/lib/bizy/tools/mongodb_describe_collection_spec.rb- Discovery tool tests