Skip to content

Instantly share code, notes, and snippets.

@djalmaaraujo
Last active February 25, 2026 14:30
Show Gist options
  • Select an option

  • Save djalmaaraujo/d0a940fd89b82456c976114c05518991 to your computer and use it in GitHub Desktop.

Select an option

Save djalmaaraujo/d0a940fd89b82456c976114c05518991 to your computer and use it in GitHub Desktop.
Plan: Generalize Meta Matching for Authors, Sections, and Tags

Plan: Generalize Meta Matching for Authors, Sections, and Tags

Context

The tag matching system (prefix matching, normalization, suggestions, tag_info metadata) currently only works for tags. When a user asks "How are posts by John Smith doing?" and the LLM sends any_author: "john smith", no resolution happens — the raw string goes straight to the API. If the casing or format doesn't match exactly, results may be empty.

The Mage API already supports find_keys for all meta types (ctx.mage["author"], ctx.mage["section"], ctx.mage["tag"]) with the same interface. We need to generalize the matching pipeline so authors and sections get the same treatment as tags.

Key differences: Authors/sections have no smart tag prefixes (parsely_smart:*) and no site-specific colon prefixes (tag:, ssts:). Their matching is simpler: search, normalize, exact match, suggestions.

Compare.py gets this for free since it calls ANALYTICS_TOOL.method()query_analytics, which is where resolution happens.

Files to Modify

  1. apps/agent/tools/lib/tag_matcher.py — Add generic search_meta, normalize_for_matching, find_matching_meta
  2. apps/agent/tools/analytics.py — Add _resolve_meta_filter, update query_analytics to resolve authors/sections
  3. apps/agent/templates/agent/tools/query_analytics.md — Update LLM instructions for meta matching
  4. tests/agent/tools/test_tag_matcher.py — Tests for new generic functions
  5. tests/agent/tools/test_resolve_tag_filter.py — Tests for meta resolution in analytics

Implementation

Step 1: Add search_meta generic search function (tag_matcher.py)

def search_meta(ctx, aspect: str, query: str, limit: int = 20) -> list[str]:
    """Search for meta values using the Mage API.

    Works for any aspect: "tag", "author", or "section".
    Returns list of meta values (including prefixes for tags).
    """
    try:
        metas = ctx.mage[aspect].find_keys(query.lower(), limit=limit)
        results = []
        if "keys" in metas:
            for item in metas["keys"]:
                if isinstance(item, dict) and aspect in item:
                    value = item[aspect]
                    if value:
                        results.append(str(value))
        log.debug(f"Meta search ({aspect}): query='{query}', hits={metas.get('hits', 0)}, results={results[:10]}")
        return results
    except Exception as e:
        log.error(f"Error searching {aspect}: {e}", exc_info=True)
        return []

Make existing search_tags a thin wrapper:

def search_tags(ctx, query: str, limit: int = 20) -> list[str]:
    return search_meta(ctx, "tag", query, limit)

Step 2: Add normalize_for_matching with aspect-aware prefix handling (tag_matcher.py)

Authors/sections don't have colon prefixes. The existing normalize_tag_for_matching calls extract_tag_name which strips everything before the first colon. An author named "Dr. Smith: Expert" would wrongly become "expert". We need an aspect parameter:

def normalize_for_matching(value: str, aspect: str = "tag") -> str:
    """Normalize a meta value for consistent matching.

    For tags: strips smart tag and colon prefixes before normalizing.
    For authors/sections: only normalizes case, hyphens, and whitespace.
    """
    if aspect == "tag":
        value = extract_tag_name(value)
    normalized = value.lower()
    normalized = normalized.replace('-', ' ')
    normalized = ' '.join(normalized.split())
    return normalized

Keep normalize_tag_for_matching as a backward-compatible wrapper:

def normalize_tag_for_matching(tag: str) -> str:
    return normalize_for_matching(tag, aspect="tag")

Step 3: Add find_matching_meta generic matching function (tag_matcher.py)

For tags, delegates to existing find_matching_tags. For authors/sections, runs a simplified pipeline (no smart tags, no prefix discovery):

def find_matching_meta(ctx, query: str, aspect: str = "tag", smart_tag_display: str = "site") -> dict:
    """Find matching meta values for any aspect (tag, author, section).

    Returns dict with: "tags" (matched values), "match_type", "prefix_tags", and optionally "suggestions".
    """
    if aspect == "tag":
        return find_matching_tags(ctx, query, smart_tag_display)

    if not query:
        return {"tags": [], "match_type": "none", "prefix_tags": []}

    query = query.strip()

    # Search Mage for candidates
    search_results = search_meta(ctx, aspect, query, limit=20)

    # Fallback: try hyphens instead of spaces
    if not search_results and ' ' in query:
        search_results = search_meta(ctx, aspect, query.replace(' ', '-'), limit=20)

    if not search_results:
        return {"tags": [], "match_type": "none", "prefix_tags": []}

    # Exact match using normalization
    query_normalized = normalize_for_matching(query, aspect=aspect)
    matches = [
        candidate for candidate in search_results
        if normalize_for_matching(candidate, aspect=aspect) == query_normalized
    ]

    if matches:
        return {"tags": matches, "match_type": "exact", "prefix_tags": []}

    # No exact match - return suggestions
    return {"tags": [], "match_type": "none", "prefix_tags": [], "suggestions": search_results[:10]}

Note: Uses "tags" key for all aspects to keep the same response shape as find_matching_tags. This is a naming quirk but avoids changing the consumer code.

Step 4: Generalize _resolve_tag_filter_resolve_meta_filter (analytics.py)

Add constants and a generic resolver:

ASPECT_FILTER_KEY = {"tag": "any_tag", "author": "any_author", "section": "any_section"}
ENDPOINT_TO_ASPECT = {"tags": "tag", "authors": "author", "sections": "section"}

def _resolve_meta_filter(ctx, aspect: str, value: str, filters: dict[str, Any]) -> dict | None:
    """Resolve a meta value string into matched values for any aspect."""
    filter_key = ASPECT_FILTER_KEY[aspect]
    try:
        if aspect == "tag":
            result = find_matching_tags(ctx, value, "all")
        else:
            result = find_matching_meta(ctx, value, aspect)

        if result["tags"]:
            filters[filter_key] = result["tags"]

        meta_info = {
            "query": value,
            "aspect": aspect,
            "match_type": result["match_type"],
            "matched_values": result["tags"],
            "matched_count": len(result["tags"]),
        }
        if aspect == "tag":
            meta_info["matched_tags"] = result["tags"]
            meta_info["prefix_available"] = len(result["prefix_tags"])
            meta_info["prefix_sample"] = [extract_tag_name(t) for t in result["prefix_tags"][:5]]
        suggestions = result.get("suggestions", [])
        if suggestions:
            meta_info["suggestions"] = suggestions[:10]
            meta_info["total_suggestions"] = len(suggestions)
        return meta_info
    except (AttributeError, KeyError, TypeError) as e:
        log.error(f"Meta matching error ({aspect}): {e}", exc_info=True)
        return None

Keep _resolve_tag_filter as a wrapper for backward compatibility:

def _resolve_tag_filter(ctx, tag_string: str, filters: dict[str, Any]) -> dict | None:
    return _resolve_meta_filter(ctx, "tag", tag_string, filters)

Step 5: Update query_analytics resolution logic (analytics.py)

Replace the current tag-only resolution block (~lines 321-336) with a generalized loop:

# Handle meta matching for tags, authors, sections
meta_info = None

# 1) Meta parameter resolution (endpoint-specific detail views)
if endpoint in ENDPOINT_TO_ASPECT and meta:
    aspect = ENDPOINT_TO_ASPECT[endpoint]
    filter_key = ASPECT_FILTER_KEY[aspect]
    meta_info = _resolve_meta_filter(request.ctx, aspect, meta, filters)
    if filters.get(filter_key):
        meta = None
        params["meta"] = meta
        log.info(f"Meta matching from meta ({aspect}): cleared meta, filters.{filter_key}={filters.get(filter_key)}")

# 2) Filter parameter resolution (any_tag, any_author, any_section)
for aspect, filter_key in ASPECT_FILTER_KEY.items():
    if meta_info and meta_info.get("aspect") == aspect:
        continue  # Already resolved via meta above
    filter_value = filters.get(filter_key)
    if filter_value and isinstance(filter_value, str):
        info = _resolve_meta_filter(request.ctx, aspect, filter_value, filters)
        log.info(f"Meta matching from filters ({aspect}): filters.{filter_key}={filters.get(filter_key)}")
        if info and meta_info is None:
            meta_info = info

In response construction, replace tag_info references with meta_info:

if meta_info:
    results["meta_info"] = meta_info
    # Backward compat
    if meta_info.get("aspect") == "tag":
        results["tag_info"] = meta_info

Step 6: Update imports in analytics.py

from agent.tools.lib.tag_matcher import extract_tag_name, find_matching_tags, find_matching_meta, search_meta

Step 7: Update LLM prompt template (query_analytics.md)

Add instructions about meta_info for authors/sections alongside the existing tag instructions. The LLM should mention suggestions when no exact match is found for authors/sections too.

Test Instructions

New tests in test_tag_matcher.py:

TestSearchMeta: Parametrized tests for search_meta with different aspects (author, section, tag), mock ctx.mage[aspect].find_keys.

TestNormalizeForMatching:

  • Author with colon preserved: normalize_for_matching("Dr. Smith: Expert", "author")"dr. smith: expert"
  • Author hyphen: normalize_for_matching("John-Smith", "author")"john smith"
  • Tag strips prefix: normalize_for_matching("tag:Olympics", "tag")"olympics"

TestFindMatchingMeta (parametrized):

  • Author exact match (case insensitive)
  • Author hyphen/space normalization
  • Author no match returns suggestions
  • Section exact match
  • Tag delegates to find_matching_tags
  • Empty query returns none

New tests in test_resolve_tag_filter.py:

  • test_resolve_author_filter — verifies any_author is populated
  • test_resolve_section_filter — verifies any_section is populated
  • test_authors_endpoint_meta_triggers_resolution — meta on authors endpoint
  • test_sections_endpoint_meta_triggers_resolution — meta on sections endpoint
  • test_any_author_list_skips_resolution — already-resolved list not re-resolved

Run tests:

docker compose exec backend pytest tests/agent/tools/test_tag_matcher.py -v
docker compose exec backend pytest tests/agent/tools/test_resolve_tag_filter.py -v
docker compose exec backend pytest tests/agent/ -v

Manual testing:

  1. Ask "How are posts by [known author] doing?" — verify author resolution in logs
  2. Ask "Show me top posts in [section name]" — verify section resolution
  3. Ask "How is Olympics performing?" — verify tag resolution still works
  4. Ask "Compare posts by [author1] vs [author2]" — verify compare gets resolution for free
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment