Skip to content

Instantly share code, notes, and snippets.

@calebdre
Created January 27, 2026 20:01
Show Gist options
  • Select an option

  • Save calebdre/0cd207734b5313ba53b2abc195013920 to your computer and use it in GitHub Desktop.

Select an option

Save calebdre/0cd207734b5313ba53b2abc195013920 to your computer and use it in GitHub Desktop.

Investigation: CATS AtsEventProcessingError (sc-33575)

Error Summary

Error: CATS::Materialization::AtsEventProcessingError: RuntimeError occurred while processing an ATS event

RuntimeError message: Unexpected event: type.googleapis.com/grayscale.protobufs.EventMessage

Context from request:

  • partitionKey: "org:6379" → Organization ID 6379
  • sequenceNumber: "49670965021542172069185908121919825437616631603731120498"

Root Cause: Double-Wrapped EventMessage

The error is triggered at kinesis_record_processor.rb:174:

raise "Unexpected event: #{event_message_pb.event.type_url}" unless event_pb.present?

The problem: The event field contains another EventMessage instead of an actual event type.

Expected structure:

EventMessage {
  uuid: "..."
  event: Any { type_url: "type.googleapis.com/grayscale.protobufs.events.CandidateFound", ... }
}

Actual (broken) structure:

EventMessage {
  uuid: "..."
  event: Any { type_url: "type.googleapis.com/grayscale.protobufs.EventMessage", ... }  ← WRONG
}

This is a producer-side bug - something is double-wrapping the event by packing an EventMessage inside another EventMessage's event field.

Key Files

  • /app/lib/cats/materialization/kinesis_record_processor.rb - Main error location
  • /app/services/events/received/process_events.rb - Defines supported event types (EVENTS constant)
  • /app/models/kcl_dead_letter.rb - Dead letter records for failed events

Rails Console Commands for Investigation

1. Look up Organization 6379

org = Organization.find(6379)
puts "Name: #{org.name}"
puts "ATS: #{org.ats_name}"
puts "ATS Integration ID: #{org.ats_integration_id}"
puts "Disabled: #{org.disabled?}"

2. Check if a dead letter was created

dead_letter = KCLDeadLetter.find_by(sequence_number: "49670965021542172069185908121919825437616631603731120498")
puts dead_letter.inspect

3. Check for other dead letters from this org (look for pattern)

# Check recent dead letters - partition keys contain org ID
KCLDeadLetter.where("created_at > ?", 24.hours.ago).order(created_at: :desc).limit(20)

4. Check the ATS integration details

org = Organization.find(6379)
integration = org.ats_integration
puts "Integration type: #{integration.class.name}"
puts "Integration details: #{integration.attributes}"

5. Check what's producing events (if cATs has relevant logs)

# Look for recent fetch requests or sync activities for this org
# This helps identify what triggered the malformed event
org = Organization.find(6379)
FetchRequest.where(organization_id: org.id).order(created_at: :desc).limit(5)

Next Steps

  1. Run console commands above to identify:

    • What org 6379 is and which ATS they use
    • What integration is producing these events
  2. Investigate the producer - The malformed event is being created by:

    • cATs (most likely) - the service that syncs ATS data and produces events
    • Check cATs code for where EventMessage is constructed and published to Kinesis
    • Look for any code path that might accidentally wrap an EventMessage inside another
  3. Check for pattern:

    • Is this a one-off issue or recurring?
    • Are other orgs affected or just 6379?
    • What event type was the inner EventMessage supposed to be?

Bug Found in cATs

File: /Users/caleb/grayscale/cATs/app/workers/webhook_processors/greenhouse/reject_candidate_worker.rb (lines 44-50)

Problem: When new_greenhouse_candidate_rejected_event feature flag is enabled, the code creates an EventMessage and passes it to publish_event!, which wraps it again.

Buggy code:

publish_event!(
  Grayscale::Protobufs::EventMessage.new(  # ← Should NOT create EventMessage here
    uuid: UUID7.generate,
    timestamp: Time.now.to_pb,
    event: ::Google::Protobuf::Any.pack(event_pb)
  )
)

What publish_event! does (in commonalities.rb):

def publish_event!(event_pb, ...)
  event_message_pb = Grayscale::Protobufs::EventMessage.new(  # ← Wraps again!
    uuid:      UUID7.generate,
    event:     ::Google::Protobuf::Any.pack(event_pb),  # event_pb is already EventMessage
    ...
  )
end

The fix - remove the EventMessage.new wrapper:

publish_event!(event_pb)  # Just pass the raw event

This is the only instance of this bug - all other webhook processors correctly pass raw event protobufs.

Recommended Fix (in cATs repo)

File to modify: /Users/caleb/grayscale/cATs/app/workers/webhook_processors/greenhouse/reject_candidate_worker.rb

Change lines 44-50 from:

publish_event!(
  Grayscale::Protobufs::EventMessage.new(
    uuid: UUID7.generate,
    timestamp: Time.now.to_pb,
    event: ::Google::Protobuf::Any.pack(event_pb)
  )
)

To:

publish_event!(event_pb)

This matches the pattern used by other webhook processors (e.g., delete_application_worker.rb, merge_candidate_worker.rb).

Verification

  1. Check that org 6379 uses Greenhouse ATS (console command #1)
  2. Confirm the new_greenhouse_candidate_rejected_event feature flag is enabled for them
  3. After fix is deployed, monitor for new AtsEventProcessingError errors

Notes

  • The KCLDeadLetter is created after error logging (line 207-211), so the record should exist
  • The config raise_kcl_materialization_errors determines if errors halt processing or continue
  • The error was caught and logged but may have been dead-lettered and processing continued
  • This is a producer-side bug - go.grayscale is correctly rejecting malformed events
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment