Skip to content

Instantly share code, notes, and snippets.

@landonepps
Last active May 29, 2025 17:02
Show Gist options
  • Select an option

  • Save landonepps/8e0dc4033e5d004ff55a052e0d9221fb to your computer and use it in GitHub Desktop.

Select an option

Save landonepps/8e0dc4033e5d004ff55a052e0d9221fb to your computer and use it in GitHub Desktop.
Markdown files for reverse engineering SwiftUI

AttributeGraph Function Analysis (macOS Binary)

This document contains analysis comments for key functions within the AttributeGraph framework, based on a macOS binary with image base 0x1B16F2000 as reported by IDA Pro. Offsets are relative to this base.

AG::LayoutDescriptor::Compare::operator()

Mangled Name: __ZN2AG16LayoutDescriptor7CompareclEPKhS3_S3_mmj Relative Offset: 0x1C2C8

Overall Purpose: Compares two data buffers based on a layout command stream. This is the core comparison engine. Important Note: This function expects layout_opcode_stream_ptr (a2) to be a valid, dereferenceable pointer to an opcode stream. It does not internally handle special sentinel values like nullptr or ValueLayoutTrivial ((const unsigned char*)1). If called with such values, it will crash when it attempts to read the first opcode from the stream (e.g., attempting to dereference address 0x0 or 0x1). The responsibility for handling nil or trivial layouts (e.g., by falling back to a bitwise comparison) lies with higher-level calling functions (like the static AG::LayoutDescriptor::compare).

Parameters (from original decompilation, names might vary):

  • a1 (this_ptr): AG::LayoutDescriptor::Compare* - The this pointer for the Compare object instance. Contains state like the enum processing stack.
  • a2 (layout_opcode_stream_ptr): const char* - Pointer to the current position in the layout command stream (opcodes).
  • a3 (lhs_data_ptr_base): const unsigned char* - Base pointer to the first data buffer (LHS).
  • a4 (rhs_data_ptr_base): const unsigned char* - Base pointer to the second data buffer (RHS).
  • a5 (current_data_offset_val): uint64_t - Initial offset within the data buffers from which to start comparison for this call.
  • a6 (data_length_from_offset_val): int64_t - Length of data to compare from initial_data_offset, or -1 for unbounded (process until end of layout or data).
  • a7 (comparison_flags): uint32_t - Flags controlling comparison behavior (e.g., report failures, copy-on-write for enums).

Local Variable Aliases / Key Stack Variables (from decompilation):

  • v91 / rhs_data_base_cached: Stores a4.
  • v92 / lhs_data_base_cached: Stores a3.
  • v79 / this_ptr_cached1: Stores a1.
  • v89[1] / this_ptr_cached2_on_stack: Also stores a1.
  • v80 / initial_enum_stack_depth: Loaded from this_ptr + 0x208.
  • v9 / current_data_offset: Running offset within lhs_data_ptr_base and rhs_data_ptr_base, initialized from a5.
  • v90 / data_offset_limit_cached: Calculated from a5 + a6, upper bound for current_data_offset.
  • v89[0] / processed_flags: comparison_flags with MSB cleared. MSB likely controls failure reporting.
  • v85 / use_copied_payloads_for_enum: Boolean, true if (comparison_flags & 0x100) == 0. Controls enum payload copying.
  • v7 / current_opcode_or_status: Holds the current opcode byte or the function's return status (1 for equal, 0 for unequal).

Initial Setup (around 0x1C2C8 + entry_point_offset to 0x1C2C8 + ~0x7D):

  • Caches parameters a3 (lhs_data) and a4 (rhs_data) into local stack variables (v92, v91).
  • Caches this pointer (a1) into v79 and v89[1].
  • Loads this->enum_stack_depth_member (offset 0x208) into v80.
  • Calculates data_offset_limit (v90) from a5 + a6. If a6 is -1, limit is effectively unbounded.
  • Initializes return_status_flag (v7) to 1 (true/equal).
  • If data_offset_limit <= a5 (no data to compare), jumps to final cleanup.
  • Initializes current_data_offset (v9) with a5.
  • Sets processed_flags (v89[0]) from a7 & 0x7FFFFFFF.
  • Sets use_copied_payloads_for_enum (v85) based on (a7 & 0x100) == 0.

Main Loop (starts around 0x1C2C8 + ~0x7D):

  • Continuously processes opcodes from layout_opcode_stream_ptr (a2).
  • current_opcode_or_status (v7) is loaded with the current byte from the stream.

Opcode Handling - Simple Byte Skips (opcodes 0x40-0x7F):

  • (Around 0x1C2C8 + ~0x8D to 0x1C2C8 + ~0xA3)
  • If opcode is in range [0x40, 0x7F]:
    • skip_count = opcode & 0x3F.
    • current_data_offset += (skip_count + 1).
    • layout_opcode_stream_ptr advances by 1.
    • If current_data_offset exceeds data_offset_limit, comparison ends (returns true).
    • Otherwise, loop continues.
  • If opcode not in this range, breaks to the main switch statement.

Main Switch Statement (jump table at 0x1C2C8 + ~0xC1):

  • Opcode 0 (End of Layout):

    • (Branch from switch to 0x1C2C8 + ~0x603)
    • Sets return_status_flag to 1 (true).
    • Jumps to final cleanup.
    • Comment: Signifies successful comparison up to the end of the layout stream.
  • Opcode 1 (Equals - Swift Equatable):

    • (Starts around 0x1C2C8 + ~0x393)
    • Reads swift::metadata* type_metadata and swift::equatable_witness_table* witness_table from the layout stream (next 16 bytes).
    • Calculates item_size = type_metadata->vw_size().
    • next_data_offset = current_data_offset + item_size.
    • If next_data_offset <= data_offset_limit (enough data):
      • Calls _AGDispatchEquatable(lhs_data_at_offset, rhs_data_at_offset, type_metadata, witness_table). (Offset 0x35084)
      • If false, calls failed() and sets return_status_flag = 0.
    • Else (not enough data):
      • Calls AG::LayoutDescriptor::compare_bytes(lhs_data_at_offset, rhs_data_at_offset, remaining_data_length, &mismatch_offset, nullptr). (Offset 0x1BBA4)
      • If false, calls failed() and sets return_status_flag = 0.
    • Advances layout_opcode_stream_ptr by 17. Advances current_data_offset.
    • Comment: Compares Swift Equatable types. Uses witness table if full data available, otherwise byte-compares remainder.
  • Opcode 2 (Indirect):

    • (Starts around 0x1C2C8 + ~0x350)
    • Reads swift::metadata* target_type_metadata and ValueLayout* layout_cache_in_stream_val from stream.
    • container_metadata is derived from the parent enum on the Compare object's internal enum stack (e.g., this_ptr->enum_stack[depth-1].type).
    • field_size = container_metadata->vw_size().
    • next_data_offset = current_data_offset + field_size.
    • Calls AG::LayoutDescriptor::compare_indirect(&layout_cache_in_stream_val, container_metadata, target_type_metadata, processed_flags, lhs_data_at_offset, rhs_data_at_offset). (Offset 0x1BC40)
    • The (potentially updated) layout_cache_in_stream_val is written back to the layout stream.
    • If false, calls failed() and sets return_status_flag = 0.
    • Advances layout_opcode_stream_ptr and current_data_offset.
    • Comment: Compares indirect types. Resolves indirection using container metadata, then compares target data using target metadata and its layout (possibly cached in stream).
  • Opcode 3 (Existential):

    • (Starts around 0x1C2C8 + ~0x2E2)
    • Reads swift::existential_type_metadata* existential_meta from stream.
    • item_size = existential_meta->vw_size().
    • next_data_offset = current_data_offset + item_size.
    • Calls AG::LayoutDescriptor::compare_existential_values(existential_meta, lhs_data_at_offset, rhs_data_at_offset, processed_flags). (Offset 0x1BED8)
    • If false, calls failed() and sets return_status_flag = 0.
    • Advances layout_opcode_stream_ptr and current_data_offset.
    • Comment: Compares Swift existential types (e.g., any P). Projects contained values and compares them using their dynamic types.
  • Opcodes 4, 5 (HeapRef, Function/WeakRef):

    • (Starts around 0x1C2C8 + ~0x57D)
    • Reads lhs_object_ptr and rhs_object_ptr from data buffers (each 8 bytes).
    • If pointers differ, calls AG::LayoutDescriptor::compare_heap_objects(lhs_object_ptr, rhs_object_ptr, processed_flags, is_opcode_5_or_function_flag). (Offset 0x1CBA0)
    • If false, calls failed() and sets return_status_flag = 0.
    • Advances current_data_offset by 8. layout_opcode_stream_ptr advances by 1.
    • Comment: Compares Swift heap-allocated objects (classes, functions). Opcode 5 might denote special handling for weak/unowned or functions.
  • Opcode 6 (Nested Layout - Varint Length):

    • (Starts around 0x1C2C8 + ~0x1C5)
    • Reads nested_layout_stream_ptr (absolute pointer) from current stream.
    • Reads nested_layout_length (ULEB128) from current stream.
    • Calculates comparison_length = min(nested_layout_length, remaining_data_in_buffers).
    • Recursively calls this->operator()(nested_layout_stream_ptr, lhs_data_base_cached, rhs_data_base_cached, current_data_offset, comparison_length, processed_flags).
    • If false, sets return_status_flag = 0.
    • Advances layout_opcode_stream_ptr (past pointer and ULEB128). Advances current_data_offset by comparison_length.
    • Comment: Processes a nested layout. The nested layout stream itself is pointed to from the current stream.
  • Opcode 7 (Compact Nested Layout - Fixed Offset/Length):

    • (Starts around 0x1C2C8 + ~0x589)
    • Reads fixed_offset (int32) and fixed_length (uint16) from current stream.
    • nested_layout_stream_ptr = AG::LayoutDescriptor::base_address + fixed_offset.
    • Calculates comparison_length = min(fixed_length, remaining_data_in_buffers).
    • Recursively calls this->operator()(nested_layout_stream_ptr, ...).
    • If false, sets return_status_flag = 0.
    • Advances layout_opcode_stream_ptr (by 7 bytes). Advances current_data_offset by comparison_length.
    • Comment: Processes a nested layout. The nested layout stream is at a fixed offset from AG::LayoutDescriptor::base_address.
  • Opcode 8 (Enum Tag - ULEB128):

    • (Starts around 0x1C2C8 + ~0x571)
    • Reads enum_tag_value (ULEB128) from stream. This becomes current_opcode_or_status.
    • Jumps to common enum processing logic (LABEL_23 at 0x1C2C8 + ~0x554).
    • Comment: Start of an enum case block, tag read as ULEB128.
  • Opcodes 9, 10, 11 (Enum Start - Fixed Tags 0, 1, 2):

    • (Starts around 0x1C2C8 + ~0x52A)
    • current_opcode_or_status is adjusted (opcode - 9) to be 0, 1, or 2 (internal Enum::Mode).
    • (Common Enum Logic at LABEL_23, 0x1C2C8 + ~0x554):
      • Reads swift::metadata* enum_type_metadata from stream.
      • Gets get_enum_tag_func from enum_type_metadata->VWT.
      • Calls get_enum_tag_func for both LHS and RHS data at current_data_offset to get lhs_enum_tag, rhs_enum_tag.
      • If tags differ, calls failed() and sets return_status_flag = 0.
      • If tags match:
        • Prepares payload buffers: if use_copied_payloads_for_enum is true, allocates temporary buffers (stack or heap based on size) and sets needs_free_flag. Otherwise, uses original data pointers.
        • Constructs AG::LayoutDescriptor::Compare::Enum object on this_ptr's internal enum stack. (Constructor at 0x1BFD4)
          • Stores: enum_type_metadata, use_copied_payloads_for_enum (as mode), rhs_enum_tag, current_data_offset, original LHS/RHS data pointers, payload buffer pointers, needs_free_flag.
          • If copying, calls VWT initializeBufferWithCopyOfBuffer and destructiveProjectEnumData.
        • Increments this_ptr->enum_stack_depth_member.
        • Adjusts lhs_data_base_cached and rhs_data_base_cached to point to the (potentially copied and projected) enum payloads for subsequent opcodes within this enum case.
        • current_data_offset is effectively reset to 0 relative to these new base payload pointers.
      • Advances layout_opcode_stream_ptr.
    • Comment: Start of an enum case block. Compares enum tags. If match, pushes context onto enum stack and adjusts data pointers to enum payload.
  • Opcode 12 (Enum Continue Tag - ULEB128):

    • (Starts around 0x1C2C8 + ~0x2A5)
    • Reads enum_tag_value (ULEB128) from stream. This becomes current_opcode_or_status.
    • Jumps to common enum processing logic (LABEL_53 at 0x1C2C8 + ~0x28A).
    • Comment: Subsequent enum case, tag read as ULEB128.
  • Opcodes 13-21 (Enum Continue - Fixed Tags 0-8):

    • (Starts around 0x1C2C8 + ~0x533)
    • current_opcode_or_status is adjusted (opcode - 13) to be 0-8.
    • (Common Enum Logic at LABEL_54, 0x1C2C8 + ~0x280):
      • Retrieves top Enum object from this_ptr's internal enum stack.
      • If current_opcode_or_status (adjusted tag from current layout opcode) does not match the enum_tag stored in the top Enum object (which was the tag of the actual data), then it skips subsequent layout opcodes until the next enum-related opcode (another continue, or end).
    • Comment: Subsequent enum case. If this case's tag doesn't match data's tag, skip its layout.
  • Opcode 22 (Enum End):

    • (Starts around 0x1C2C8 + ~0x1F3)
    • Pops an Enum object from this_ptr's internal enum stack.
    • Restores lhs_data_base_cached and rhs_data_base_cached from the popped Enum object (if payloads were copied).
    • current_data_offset is updated to popped_enum_object.original_offset + enum_type_size.
    • Decrements this_ptr->enum_stack_depth_member.
    • Calls destructor for the popped Enum object (AG::LayoutDescriptor::Compare::Enum::~Enum at 0x1C0C0).
    • Comment: End of an enum case block. Pops context from enum stack, restores data pointers and offset.
  • Default Opcodes (includes 0x80-0xFF for byte comparison):

    • (Starts around 0x1C2C8 + ~0x5E5)
    • If opcode >= 0 (i.e., 0x00-0x3F, as 0x40-0x7F handled earlier): This path is for simple byte skips of opcode + 1 bytes. current_data_offset += (opcode + 1).
    • If opcode < 0 (i.e., 0x80-0xFF):
      • compare_length = (opcode & 0x7F) + 1.
      • actual_compare_length = min(compare_length, remaining_data_in_buffers).
      • Calls AG::LayoutDescriptor::compare_bytes(lhs_data_at_offset, rhs_data_at_offset, actual_compare_length, &mismatch_offset, nullptr).
      • If false, calls failed() (passing mismatch_offset + current_data_offset as failure point) and sets return_status_flag = 0.
      • Advances current_data_offset by actual_compare_length.
    • Comment: Handles simple byte skips (positive opcodes not otherwise handled) or direct byte comparisons (negative opcodes).

Final Cleanup (around 0x1C2C8 + ~0x6A0):

  • Loops while this_ptr->enum_stack_depth_member > initial_enum_stack_depth.
  • In each iteration, pops an Enum object from the stack and calls its destructor.
  • Ensures all pushed enum contexts are cleaned up.
  • Returns current_opcode_or_status (which holds the final 1 or 0).
  • Comment: Ensures all enum processing contexts are unwound and destroyed before returning the comparison result.

AG::LayoutDescriptor::fetch

Mangled Name: __ZN2AG16LayoutDescriptor5fetchEPKNS_5swift8metadataEji Relative Offset: 0x1EC20 (Absolute in IDA: 0x1B1710C20 with base 0x1B16F2000)

Overall Purpose: This is a static function that serves as the primary public entry point for obtaining a ValueLayout. It ensures a singleton TypeDescriptorCache is initialized and then delegates the actual fetching logic to AG::anonymous namespace::TypeDescriptorCache::fetch. It passes the type_metadata, options, priority, and an explicit HeapMode(0) to the cache's fetch method.

Key Behavior from Decompilation:

  • Uses a dispatch_once like mechanism for TypeDescriptorCache::_shared_cache.
  • Directly returns the result of AG::anonymous namespace::TypeDescriptorCache::fetch(...).
  • This top-level fetch does not itself translate ValueLayoutTrivial (1) into nullptr. If TypeDescriptorCache::fetch returns 1, this function will also return 1.

AG::anonymous namespace::TypeDescriptorCache::fetch

Mangled Name: __ZN2AG12_GLOBAL__N_119TypeDescriptorCache5fetchEPKNS_5swift8metadataEjNS_16LayoutDescriptor8HeapModeEi Relative Offset: 0x1EC90 (Absolute in IDA: 0x1B1710C90 with base 0x1B16F2000)

Parameters (effective): (TypeDescriptorCache* this, const swift::metadata* type, AGComparisonOptions options, LayoutDescriptor::HeapMode heap_mode, uint32_t priority)

Overall Purpose: Manages a cache of ValueLayouts. If a layout is cached, returns it. Otherwise, generates it, caches it, and returns it. Handles synchronous and asynchronous layout generation.

Key Behavior from Decompilation:

  1. Cache Key: Derived from type metadata pointer and the lower byte of options.
  2. Cache Lookup: Uses an os_unfair_lock and an UntypedTable for caching.
  3. Synchronous vs. Asynchronous Path:
    • Determined by a global atomic flag and bit 9 (0x200) of the options argument (likely AGComparisonOptionsFetchLayoutsSynchronously).
    • Synchronous Path (if options & 0x200 is true, or global flag dictates):
      • Calls AG::LayoutDescriptor::make_layout(type, (uint8_t)options, heap_mode).
      • The result from make_layout (which can be nullptr, ValueLayoutTrivial (1), or a valid layout pointer) is inserted into the cache and returned directly.
    • Asynchronous Path:
      • Inserts nullptr into the cache as a placeholder.
      • Queues the layout creation request.
      • Dispatches TypeDescriptorCache::drain_queue to a global queue.
      • Returns nullptr immediately.

Observed Behavior with Accessor:

  • When called without the synchronous flag (0x200):
    • For Int.self (options 0), TestPoint.self (options 2), and ClosureHolder.self (options 2), this function (and thus the top-level fetch) returned nullptr. This was likely due to taking the asynchronous path.
  • When called with the synchronous flag (options | 0x200):
    • For Int.self (options 0 | 0x200), it still returned nullptr.
    • For TestPoint.self (options 2 | 0x200), it still returned nullptr.
    • For ClosureHolder.self (options 2 | 0x200), it returned a valid layout pointer (e.g., 0x000000013780dc03), which decodes to [0x87, 0x05, 0x87, 0x00].
    • Interpretation of nullptr return in synchronous mode (for Int/TestPoint): This means AG::LayoutDescriptor::make_layout itself is returning nullptr. This likely happens when make_layout (via type.visit(builder)) determines that no specific layout stream is necessary for that type/option combination, and a simpler comparison (e.g., bitwise) is sufficient, signaled by returning nullptr.
    • Layout for Non-Equatable Enums: For NonEquatableTestEnum (options 0 | 0x200), fetch generated [0x09, 0x87, 0x0e, 0x8f, 0x0f, 0x8f, 0x16, 0x00]. This omitted the .stringCase and used padded comparison sizes for payloads.
    • Layout for Equatable Types: For MyEquatableStruct and TestAssocEnum (options 3 | 0x200), fetch generated [0x01 (Equals), 0x00].
    • Layout for Structs with PODs (SimpleFieldsStruct, bitwise opts): fetch returned ValueLayoutTrivial (1).
    • Layout for Structs with Class/Weak Refs (StructWithClassField, StructWithWeakField, bitwise opts): fetch returned ValueLayoutTrivial (1). Opcodes 0x04 (HeapRef) or 0x05 (for weak ref field) were not generated.
    • Layout for Indirect Enums (NonEquatableIndirectEnum, bitwise opts): fetch generated a structural enum layout comparing the indirect case's pointer with 0x87, not Opcode 0x02.
    • Layout for Enums with Many Simple Cases (ManyCasesEnum, bitwise opts): fetch returned nil. ULEB128 enum tag opcodes (0x08, 0x0C) were not generated.
    • Layout for Struct with Closure (ViewWithAdvancedClosure, options 2 | 0x200): fetch generated [0x87 (Int field), 0x87 (closure func_ptr), 0x05 (closure context_ptr), 0x00].
    • Opcode 0x05 (Function/WeakRef): Observed in ClosureHolder's and ViewWithAdvancedClosure's layout for the closure context pointer. Not generated for simple weak var fields in structs that AG deemed bitwise comparable.

Opcode Details (with observations from GeneratedLayoutCoverageTests)

  • Opcode 0x01 (Equals - Swift Equatable):

    • Generated by fetch for MyEquatableStruct and TestAssocEnum when EquatableAlways options were used.
  • Opcode 0x02 (Indirect):

    • Not observed in the layout for NonEquatableIndirectEnum (bitwise options); the indirect case's pointer was compared with 0x87. This suggests Opcode 0x02 is for more complex indirect scenarios or different comparison options.
  • Opcode 0x03 (Existential):

    • Testing was skipped due to complexity of getting metadata for any P. Generation conditions remain unknown from these tests.
  • Opcode 0x04 (HeapRef):

    • Not observed in the layout for StructWithClassField (bitwise options); fetch returned ValueLayoutTrivial. This suggests Opcode 0x04 is for specific heap object types or scenarios not covered by a simple class field in a POD-like struct under bitwise comparison.
  • Opcode 0x05 (Function/WeakRef):

    • Observed in the fetched layouts for ClosureHolder and ViewWithAdvancedClosure, specifically for their closure context pointer fields.
    • When processing this opcode, operator() reads an 8-byte context pointer from both LHS and RHS data buffers. It then calls AG::LayoutDescriptor::compare_heap_objects(lhs_ctx_ptr, rhs_ctx_ptr, ..., is_function_type=true).
    • Behavior of compare_heap_objects for function contexts (is_function_type=true):
      1. If lhs_ctx_ptr == rhs_ctx_ptr (pointers are identical, including both being nil), returns true.
      2. Else if one pointer is nil and the other is not, returns false.
      3. Else (pointers are different but both non-nil): It then compares the memory content of the context objects pointed to by lhs_ctx_ptr and rhs_ctx_ptr. If the content is identical (e.g., via memcmp of a size determined by the context object's type, or a fixed size for simple captures), it returns true. Otherwise, it returns false.
    • This means two distinct closure instances (different context object addresses) can still compare as equal via Opcode 0x05 if their function pointers match (handled by a preceding opcode like 0x87) AND the content of their capture contexts is identical.
    • Not observed for the weak var field in StructWithWeakField (bitwise options); fetch returned ValueLayoutTrivial. This implies Opcode 0x05 is specialized for function/closure contexts or other specific reference types that might involve content comparison, not general weak references if the containing struct is simple and compared bitwise.
  • Opcodes 0x06, 0x07 (Nested Layouts):

    • Not specifically targeted with types expected to generate them. Conditions for their generation remain to be explored.
  • Opcode 0x08 (Enum Tag - ULEB128), 0x0C (Enum Continue Tag - ULEB128):

    • Not observed in the layout for ManyCasesEnum (11 no-payload cases, bitwise options); fetch returned nil. This suggests these ULEB tag opcodes are for enums with many payload-bearing cases requiring distinct layouts, not just a high number of simple cases.
    • NonEquatableTestEnum used fixed tag opcodes (0x09, 0x0E, 0x0F).
  • Opcode 0x09 (Enum Start Fixed Tag 0):

    • Indicates the start of enum processing, expecting data tag 0.
    • Followed by 8 bytes in the layout stream for the swift::metadata* of the enum type.
    • operator() compares the actual tag of the LHS/RHS data. If it matches this opcode's tag (0), it proceeds to the payload layout for this case. Otherwise, it should skip to the next Enum Continue or Enum End.
    • Pushes an Enum context onto its internal stack.
  • Opcode 0x0E (Enum Continue Fixed Tag 1), 0x0F (Enum Continue Fixed Tag 2):

    • 0x0D + N. Indicates layout for case with tag N.
    • operator() checks if the data's actual tag (stored in the top Enum context) matches this case's tag N.
    • If it matches, it processes the subsequent payload layout.
    • If it does not match, it skips opcodes in the layout stream until the next enum control opcode (another Enum Continue or Enum End).
  • Opcode 0x16 (Enum End):

    • Signifies the end of all case layouts for the current enum being processed.
    • operator() pops the current Enum context from its internal stack, restoring data pointers and offsets to what they were before this enum block started, adjusted by the enum type's size.

This markdown file should serve as a good reference for the analyzed functions.

Session Memory: AttributeGraph Function Accessor (AGBridge)

1. Primary Goal

  • Analyze key functions within the macOS AttributeGraph.framework (e.g., AG::LayoutDescriptor::Compare::operator(), AG::LayoutDescriptor::fetch).
  • Create a C++ accessor module (AGBridge) with a C interface for Swift to call these original AttributeGraph functions.
  • Purpose: Enable comparative testing, understand internal behaviors, and provide a stable way to call these non-public APIs.

2. Key Files and Structure (within OpenGraph Swift Package)

  • C++ Accessor Target (AGBridge): (Located in Sources/AGBridge/)

    • FindModule.h / FindModule.cpp: Implements module/symbol finding using _dyld_* and dlfcn.
    • AGAccessor.h: Declares the C++ AGAccessor class, which holds function pointers to AttributeGraph functions (e.g., m_fn_LayoutDescriptor_Compare_Operator, m_fn_LayoutDescriptor_Fetch) and any necessary placeholder data (like AG::LayoutDescriptor::ComparePlaceholder).
    • AGAccessor.cpp: Implements the AGAccessor class. Its constructor finds the AttributeGraph module, resolves function addresses (via dlsym or offset fallbacks), and initializes the function pointers.
    • AGBridge.h: Declares the public C API for Swift. This includes:
      • The opaque handle AGAccessorHandle.
      • C-compatible type definitions (e.g., CValueLayout, CAGComparisonOptions).
      • Bridged functions like AGBridge_AccessorCreate(), AGBridge_AccessorDestroy(), AGBridge_AccessorIsInitialized(), AGBridge_Call_LayoutDescriptor_Compare_Operator(), AGBridge_Call_LayoutDescriptor_Fetch(), AGBridge_FindAttributeGraphBaseAddress().
    • AGBridge.cpp: Implements the C bridge functions, which internally use the AGAccessor C++ class.
    • (Note: module.modulemap and a separate umbrella header like Accessor.h were removed as SwiftPM can often infer the module from AGBridge.h if publicHeadersPath in Package.swift is set correctly, e.g., to "." relative to Sources/AGBridge/ if AGBridge.h is directly there.)
  • Swift Test Target (OpenGraphCompatibilityTests):

    • Various test files (e.g., BasicTypeLayoutCompareTests.swift, SwiftUIViewLayoutTests.swift) that import AGBridge and use the C bridge functions.
    • Package.swift: Defines the AGBridge target (C++ capable) and makes test targets (and potentially other library targets like OpenGraph) depend on it.

3. Key Addresses and Offsets (for macOS AttributeGraph binary)

  • IDA Image Base (used for deriving these offsets): 0x1B16F2000

  • Runtime Module Base (example found by tests): 0x1c447a000 (ASLR means this changes per run)

  • AG::LayoutDescriptor::Compare::operator():

    • Mangled: __ZN2AG16LayoutDescriptor7CompareclEPKhS3_S3_mmj
    • Relative Offset: 0x1C2C8
    • Signature for pointer type AGLayoutDescriptorCompareOperatorFuncPtr: bool (*FuncPtr)(AG::LayoutDescriptor::ComparePlaceholder*, ValueLayout layout, const unsigned char* lhs, const unsigned char* rhs, uint64_t offset, int64_t length, AGComparisonOptions flags)
  • AG::LayoutDescriptor::fetch:

    • Mangled: __ZN2AG16LayoutDescriptor5fetchEPKNS_5swift8metadataEji
    • Relative Offset: 0x1EC20
    • Signature for pointer type AGLayoutDescriptorFetchFuncPtr: ValueLayout (*FuncPtr)(const void* swift_metadata, AGComparisonOptions options, uint32_t priority)
  • Other Function Relative Offsets (macOS binary, base 0x1B16F2000):

    • AG::LayoutDescriptor::Compare::failed: 0x1C1CC
    • AG::LayoutDescriptor::compare_bytes: 0x1BBA4
    • _AGDispatchEquatable: 0x35084
    • AG::LayoutDescriptor::compare_indirect: 0x1BC40
    • AG::LayoutDescriptor::compare_existential_values: 0x1BED8
    • AG::LayoutDescriptor::compare_heap_objects: 0x1CBA0
    • AG::LayoutDescriptor::Compare::Enum::Enum (constructor): 0x1BFD4
    • AG::LayoutDescriptor::Compare::Enum::~Enum (destructor): 0x1C0C0

4. Current Status and Key Findings

  • The AGBridge module (C++/C accessor) is complete and functional.
  • Symbol Discovery: dlsym (both with specific module handle and RTLD_DEFAULT) fails to find the mangled names for the private C++ operator() and fetch functions in the release AttributeGraph.framework.
  • Fallback to Offsets: The AGAccessor class correctly falls back to using the hardcoded relative offsets, which works when the offsets match the runtime binary version.
  • AGBridge_Call_LayoutDescriptor_Fetch Behavior:
    • When calling the original AG::LayoutDescriptor::fetch (via AGBridge_Call_LayoutDescriptor_Fetch) with synchronous flag (0x200 added to options):
      • For Int.self (options 0 | 0x200), it returns nil.
      • For TestPoint.self (struct of two Doubles, options 2 | 0x200), it also returns nil.
      • For ClosureHolder.self (struct with a closure and an Int, options 2 | 0x200), it returns a valid layout pointer, which decodes to [0x87, 0x05, 0x87, 0x00].
    • A nil return (for Int/TestPoint/ManyCasesEnum) or ValueLayoutTrivial (1) (for SimpleFieldsStruct, StructWithClassField, StructWithWeakField) indicates that AG::LayoutDescriptor::make_layout likely returned nullptr or 1 because the type/options combination was deemed simple enough for direct bitwise comparison, not requiring a complex layout stream for operator().
    • For MyEquatableStruct and TestAssocEnum (Equatable types, EquatableAlways sync options), AGBridge_Call_LayoutDescriptor_Fetch returned [0x01 (Equals), 0x00].
    • For ViewWithAdvancedClosure (struct with Int and capturing closure, EquatableUnlessPOD sync options), AGBridge_Call_LayoutDescriptor_Fetch returned [0x87 (id), 0x87 (func_ptr), 0x05 (context_ptr), 0x00].
    • For NonEquatableIndirectEnum (bitwise sync options), AGBridge_Call_LayoutDescriptor_Fetch returned [0x09, 0x87, 0x0e, 0x87, 0x16, 0x00], comparing the indirect case's pointer with 0x87 (not Opcode 0x02).
    • For NonEquatableTestEnum (bitwise sync options), AGBridge_Call_LayoutDescriptor_Fetch returned [0x09, 0x87, 0x0e, 0x8f, 0x0f, 0x8f, 0x16, 0x00], omitting .stringCase and using padded payload comparisons.
  • AGBridge_Call_LayoutDescriptor_Compare_Operator Testing:
    • Successfully tested with manually crafted and various fetched layouts.
  • Closure Comparison Insights:
    • Comparison relies on function pointer identity (e.g., via 0x87).
    • If function pointers are identical, context pointers are compared via Opcode 0x05 (which calls compare_heap_objects with is_function_type=true).
    • compare_heap_objects for function contexts:
      1. If context pointers are identical (e.g., both nil or both point to the same object), returns true.
      2. If one is nil and the other isn't, returns false.
      3. If context pointers are different but both non-nil, it compares the memory content of the context objects. If content is identical, returns true; otherwise false.
    • This means distinct closure instances (different context object addresses) can compare as equal if their function pointers are the same AND their captured states (context object content) are identical. This was confirmed with ViewWithAdvancedClosure Scenario B.
    • The makeValueCapturingClosure factory consistently produced closures with identical function pointers when called multiple times.
  • operator() with ValueLayoutTrivial: Confirmed crash, operator() expects a valid layout.
  • Elusive Opcodes: Opcodes 0x02 (Indirect), 0x03 (Existential), 0x04 (HeapRef), 0x06/0x07 (Nested Layouts), 0x08/0x0C (ULEB Enum Tags) were not generated by AGBridge_Call_LayoutDescriptor_Fetch for the straightforward Swift types tested with bitwise or Equatable-favoring options. AttributeGraph often simplifies.

5. AG::LayoutDescriptor::ComparePlaceholder Details (used by AGAccessor)

  • Mimics the AG::LayoutDescriptor::Compare object for the this pointer when calling operator().
  • Contains unsigned char internal_data[592].
  • Constructor initializes enum stack related members.
  • Reinitialized using placement new before each call to AGBridge_Call_LayoutDescriptor_Compare_Operator via the accessor to ensure a clean state.

6. Documentation Created (and updated for AGBridge refactor)

  • AttributeGraphFunctionAnalysis.md: Detailed comments on operator(), fetch, TypeDescriptorCache::fetch, and observed enum opcodes.
  • Reversing.md: General lessons learned from this reverse engineering process.

7. Key Observations Summary and Future Test Ideas

Key Observations So Far:

  1. AG::LayoutDescriptor::Compare::operator() Functionality:
    • Requires a valid, dereferenceable layout stream. Crashes if passed nullptr or ValueLayoutTrivial ((void*)1).
  2. AG::LayoutDescriptor::fetch Functionality (with Synchronous Flag 0x200):
    • Delegates to TypeDescriptorCache::fetch, which calls make_layout.
    • Simple PODs (Int, TestPoint, SimpleFieldsStruct with bitwise opts): Returns nullptr or ValueLayoutTrivial (1), indicating bitwise comparison is preferred by AG.
    • Equatable Types (MyEquatableStruct, TestAssocEnum with EquatableAlways opts): Returns [0x01 (Equals), 0x00], deferring to Swift's ==.
    • ClosureHolder & ViewWithAdvancedClosure (EquatableUnlessPOD opts): Returns structural layout like [0x87 (field1), 0x87 (closure_func_ptr), 0x05 (closure_context_ptr), 0x00].
    • Non-Equatable Enums:
      • NonEquatableTestEnum (bitwise opts): Returns structural enum layout [0x09, 0x87, 0x0e, 0x8f, 0x0f, 0x8f, 0x16, 0x00]. Omits .stringCase; uses padded payload comparisons.
      • NonEquatableIndirectEnum (bitwise opts): Returns structural enum layout [0x09, 0x87, 0x0e, 0x87, 0x16, 0x00]. The indirect case's pointer is compared with 0x87 (direct byte compare), not Opcode 0x02 (Indirect).
      • ManyCasesEnum (no payloads, bitwise opts): Returns nil. ULEB128 enum tag opcodes (0x08, 0x0C) not generated.
    • Structs with References (StructWithClassField, StructWithWeakField with bitwise opts): Return ValueLayoutTrivial (1). Opcodes 0x04 (HeapRef) or 0x05 (for weak ref field) not generated for these fields.
  3. Closure Comparison (Structural):
    • Relies on function pointer identity (e.g., compared via 0x87).
    • If function pointers match, context pointers are processed by Opcode 0x05 -> compare_heap_objects.
    • compare_heap_objects (for function contexts):
      • Checks for context pointer equality first.
      • If context pointers differ but are non-nil, it compares the content of the context objects.
    • This means closures with identical function pointers and identical captured state (context content) will compare as equal, even if their context objects are distinct instances at different memory addresses. This resolved the "flaky" test for ViewWithAdvancedClosure Scenario B, which now consistently passes with an expectation of true.
    • The makeValueCapturingClosure factory consistently produced closures with identical function pointers.
  4. Symbol Discovery: dlsym fails for private operator()/fetch; offset fallback is crucial.
  5. Elusive Opcodes (via fetch with tested types/options):
    • 0x02 (Indirect): Not for simple indirect enums with bitwise options.
    • 0x03 (Existential): Test skipped.
    • 0x04 (HeapRef): Not for simple class fields in POD-like structs with bitwise options.
    • 0x05 (Function/WeakRef): Specific to closure contexts so far; not for general weak var fields in POD-like structs with bitwise options.
    • 0x06/0x07 (Nested Layouts): Not specifically targeted.
    • 0x08/0x0C (ULEB Enum Tags): Not for enums with many simple cases without payloads.

Future Test Ideas / Remaining Ambiguities:

  1. Triggering Elusive Opcodes:
    • Opcode 0x02 (Indirect): How is it generated? Try a non-POD struct containing an indirect enum, or different comparison options.
    • Opcode 0x03 (Existential): Test FetchOriginalLayoutForType with a struct field of type any Equatable or a custom any P.
    • Opcode 0x04 (HeapRef): What kind of class/heap object triggers this if not a simple class field in a bitwise-compared struct? Perhaps types with known custom retain/release (e.g., NSObject subclasses) or specific runtime flags?
    • Opcode 0x06/0x07 (Nested Layouts): Attempt with very large structs or deeply nested struct hierarchies.
    • Opcode 0x08/0x0C (ULEB Enum Tags): Test a non-Equatable enum with many (>8) cases that have distinct payload types requiring different sub-layouts.
  2. AG::LayoutDescriptor::compare_heap_objects Deep Dive:
    • Decompile this function again with the new understanding that it might compare context object content for function types. Verify this logic.
    • Determine the size/extent of memory it compares for context objects.
    • Consider exposing compare_heap_objects via the accessor to test it directly with various pointer pairs and context object states.
  3. NonEquatableTestEnum's .stringCase: Why omitted with bitwise options? Test with options: 2 | 0x200 (EquatableUnlessPOD) – will it try to use String's Equatable for the payload or still omit?
  4. Enum Payload Comparison Sizes: Further clarify the "fixed-slot" or "padded" comparison behavior for enum payloads.
  5. Other AGComparisonOptions Flags:
    • Test AGComparisonOptionsCopyOnWrite (0x100) with enums having payloads.
    • Investigate AGComparisonOptionsReportFailures (MSB).
  6. Layout for Standard Swift Collections: Array<T>, Dictionary<K,V>.

This summary should capture the essential context and progress of our session.

Reverse Engineering Learnings & Best Practices

This document summarizes key lessons learned, common pitfalls, and successful strategies encountered during reverse engineering efforts, particularly in the context of analyzing and interacting with compiled binaries like system frameworks.

1. Binary Versioning and Integrity

  • Match Runtimes and Analysis Targets: The single most critical factor is ensuring that the binary being analyzed (e.g., in IDA Pro, Ghidra) is the exact same version as the one being targeted at runtime (e.g., by a test harness or an application). Function offsets, internal structures, and even logic can change significantly between versions (e.g., OS updates, SDK updates).
  • dyld_shared_cache (macOS/iOS): Be aware that many system frameworks reside in the dyld_shared_cache. To analyze the correct version:
    • Identify the runtime path of the loaded module (e.g., using _dyld_get_image_name on macOS).
    • If the file at that path is directly analyzable, use it.
    • If it's part of the cache, use tools like dsc_extractor to extract the specific binary from the system's active shared cache for analysis.
  • Consequences of Mismatch: Using offsets or structural information from a mismatched binary version is a primary cause of crashes (EXC_BAD_ACCESS), incorrect behavior, and wasted debugging time.

2. Addresses, Offsets, and Pointers

  • Runtime Module Base Address: Always determine the actual base address where the target module is loaded into memory at runtime.
    • macOS: Use _dyld_image_count, _dyld_get_image_name, and _dyld_get_image_header.
    • Windows: GetModuleHandle().
    • Linux: Parse /proc/self/maps or use dl_iterate_phdr.
  • Disassembler Image Base: Note the image base address used by your disassembler (e.g., IDA's ida_nalt.get_imagebase() or idc.min_ea()).
  • Relative Offsets: Calculate relative offsets for functions and data: Relative Offset = Absolute Address in Disassembler - Disassembler Image Base.
  • Absolute Runtime Address: Absolute Runtime Address = Runtime Module Base + Relative Offset. This is the address to use for function pointers.
  • Function Pointer Signatures: Meticulously define function pointer types.
    • Verify parameter types, order, and count against disassembly and decompilation.
    • Verify the return type.
    • Be mindful of calling conventions (though often standard C-style for cross-module).
    • For C++ member functions, the first (implicit) argument is the this pointer.

3. Interacting with C++ Code

  • this Pointer Management: When calling a C++ member function by its address, a valid this pointer must be supplied.
    • Create a placeholder struct/class in your accessor code that mimics the memory layout of the original C++ class, at least for members accessed by the target function.
    • Initialize any stateful members in this placeholder if the original function relies on their initial state (e.g., stack pointers/depths, counters). Placement new can be used to re-initialize the placeholder before each call if the function modifies this and expects a fresh state.
  • Mangled Names: Use mangled names to reliably find C++ functions in IDA/Ghidra or with dlsym (if exported). Demangle them to understand their original C++ signature.

4. Debugging Crashes and Issues

  • EXC_BAD_ACCESS / Segmentation Faults:
    • At function entry (e.g., on push rbp):
      • Verify the calculated absolute runtime address of the function. Is it correct?
      • Check memory permissions of the target address page (e.g., using lldb's vmmap or image lookup -a). It must be executable (r-x).
      • Suspect stack pointer (rsp) corruption before the call.
    • Inside the function: Usually due to dereferencing a nil or invalid pointer argument, a corrupted this pointer, or accessing an invalid offset from this.
  • lldb / Debugger Use:
    • image lookup -a <address>: Crucial for verifying addresses. If lldb symbolicate the address you're calling to a different function name than expected (or as an offset into a different function), it's a strong indicator that your function offset is incorrect for the runtime binary version. The (ModuleName.__TEXT.__text + segment_offset) part of the output confirms if the address is at least within an executable segment.
    • register read: To inspect register values (arguments, stack pointer) before a call.
    • Breakpoints and si (step instruction): To trace execution flow.
  • Diagnostic Logging: Add extensive printf or NSLog/os_log statements in your bridge code and test code to trace values (base addresses, offsets, function pointers, arguments being passed, return values). This was key to diagnosing fetch behavior.

5. Understanding Target Function Behavior

  • Preconditions and State: Non-exported functions might rely on global state, thread-local state, or prior initialization steps within their framework that are not met when called externally. This can lead to unexpected behavior or crashes.
  • Special Return Values & Flags:
    • Functions like AG::LayoutDescriptor::fetch might return special values (e.g., nullptr or (void*)1 for ValueLayoutTrivial) to indicate specific conditions (e.g., "use bitwise compare" or "queued asynchronously"). The calling code must handle these.
    • Investigate the effect of input flags/options on function behavior (e.g., the 0x200 synchronous flag for fetch was critical to get actual layouts instead of nil from an async path).
  • Layout Generation Quirks:
    • Layout systems might omit parts of a type's layout if they can't be handled by the current comparison mode (e.g., a String field in a non-Equatable enum when bitwise comparison is requested).
    • Be aware of padded or fixed-slot comparisons, especially for enums with varying payload sizes, where the layout might compare a larger fixed slot rather than the exact payload size.

6. Inter-Language Interoperability (e.g., Swift and C++)

  • Swift Package Manager (SPM): For mixed Swift and C/C++ projects:
    • Create a separate C-family language target for your C/C++ bridge code.
    • Use an include directory within this target for public C headers.
    • Define a module.modulemap in the include directory to expose the public C API to Swift.
    • Make your Swift target depend on this C-family target and import the module.
  • Bridging Headers (Xcode Projects): For app targets in Xcode, a bridging header can directly expose C functions to Swift.
  • Data Marshalling: Be careful when passing data between Swift and C/C++ (e.g., pointers, strings, collections). Use Swift's UnsafePointer family and withUnsafeBufferPointer correctly.

7. Iterative Reverse Engineering Workflow: Decompile, Bridge, Test, Refine

This section outlines a structured, iterative approach that has proven effective for understanding and interacting with non-public functions in compiled libraries. This strategy emphasizes a tight loop between static analysis, dynamic interaction, and test-driven refinement.

Phase 1: Initial Reconnaissance and Target Selection

  1. Identify Target Function(s): Determine the function(s) of interest within the library based on observed behavior, crashes, specific features you want to understand/replicate, or areas identified for reimplementation.
  2. Obtain Correct Binary Version: Critically ensure that the binary being analyzed (e.g., in IDA Pro, Ghidra) is the exact same version as the one your test environment will target at runtime. (Refer to Section 1: Binary Versioning and Integrity).
  3. Initial Decompilation & Static Analysis:
    • Use a disassembler/decompiler to locate the target function(s) (e.g., via mangled names, known offsets from previous research, or string/code cross-references).
    • Perform an initial review of the decompiled code to understand:
      • Function signature (parameters, return type, calling convention).
      • High-level logic and control flow.
      • Key data structures manipulated or passed as arguments.
      • Calls to other internal or external functions.
    • MCP Tooling (if available): Leverage Model Context Protocol (MCP) tools (e.g., ida-pro-mcp) to automate fetching decompilation, disassembly, function addresses, type information, and other metadata directly from the disassembler. This can significantly accelerate the analysis process.
    • Document initial hypotheses about the function's purpose, inputs, outputs, and side effects.

Phase 2: Building the Interaction Bridge

  1. Determine Function Address:
    • Attempt to resolve the function's address dynamically using dlsym (if the symbol is unexpectedly exported or discoverable in the local symbol table). Verify that any symbol found by dlsym(RTLD_DEFAULT, ...) actually resides in the target module (e.g., using dladdr).
    • If dlsym fails (common for private symbols), calculate the absolute runtime address by adding the function's relative offset (derived from static analysis in your disassembler) to the target module's runtime base address.
    • Utilize helper functions (like those in our FindModule.cpp and exposed via AGBridge_FindAttributeGraphBaseAddress) to reliably find the module's base address at runtime.
  2. Define Function Pointer Type: Create an accurate C/C++ function pointer typedef that precisely matches the determined signature of the target function. Pay close attention to parameter types, const-correctness, return type, and calling convention. For C++ member functions, remember the implicit this pointer.
  3. Create an Accessor Module/Class:
    • Develop a dedicated module or class (e.g., our C++ AGAccessor class within the AGBridge module) to encapsulate the logic for interacting with the target library.
    • This module/class should store the resolved function pointer(s).
    • Its constructor or initialization routine should perform the address resolution (as in step 1).
    • Implement methods within this class that call the original library functions via these pointers. If calling C++ member functions, this class will also manage any necessary placeholder this objects (like AG::LayoutDescriptor::ComparePlaceholder).
  4. Create an Interaction Interface/Binding Layer:
    • Establish a mechanism to call the original library functions (via the resolved function pointers held in your Accessor Module/Class) from your primary testing or reimplementation language. This layer acts as the "glue."
    • Core Principle: The goal is to invoke the target function pointer with the correct arguments and calling convention, and to retrieve its results in a way that your testing language can understand.
    • Implementation Strategy: The specific technology for this interface depends on the source language of the target library and the language of your testing environment.
      • If the languages are different (e.g., target is C++/Objective-C, testing in Swift/Python/Java), you'll typically create bindings or a bridge. This might involve defining C-style functions (extern "C"), handling data type marshalling, managing memory across the boundary, and using native interop features (like Swift's C/Objective-C interop, JNI, ctypes).
      • If the target library and testing language are the same and access controls allow, direct calls might be possible. Otherwise, unsafe invocation via function pointers might be used.
    • Focus: Keep this interface clean, primarily concerned with faithfully transmitting calls and results.

Phase 3: Test-Driven Understanding and Refinement

  1. Write Initial Tests:
    • Develop test cases in your primary language (e.g., Swift using XCTest).
    • These tests should call the original library functions via the bridge/binding layer created in Phase 2.
    • Start with simple, well-understood inputs and clearly defined expected outputs based on your initial static analysis.
  2. Formulate Hypotheses and Expected Outputs: For each test case:
    • Clearly document the inputs being provided.
    • State the expected behavior or output based on your current understanding of the decompiled code.
    • Identify which specific aspects of the function's logic or which opcodes (if applicable) this test targets.
  3. Execute and Observe:
    • Run the tests.
    • Crucially, implement detailed logging at multiple levels:
      • Test Code (e.g., Swift): Log inputs passed to the bridge and outputs received. Print any intermediate data structures or states.
      • Bridge/Accessor Layer (e.g., C++): Add printf or other logging just before calling the original function pointer to see the exact arguments being passed from the bridge to the target function. Log the raw return value from the target function.
      • This multi-level logging helps pinpoint discrepancies (e.g., if data changes unexpectedly across the bridge).
  4. Compare Actual vs. Expected:
    • If the actual output/behavior matches your expectation: Your understanding for that specific case is likely correct. Proceed to design more complex test cases, explore edge conditions, or test different combinations of inputs/flags.
    • If the actual output/behavior differs from expected: This is a valuable learning opportunity. The discrepancy indicates an area where your current understanding is incomplete or incorrect.
  5. Refine Understanding (Iterate):
    • Re-examine Decompilation: With the specific failing test case in mind, go back to the decompiled code. Focus on the logic paths relevant to your test inputs. Look for:
      • Misinterpreted conditional branches or loops.
      • Unhandled edge cases in the original code.
      • Incorrect assumptions about data structure layouts, parameter meanings, or flag effects.
      • Subtle side effects or dependencies on global/external state.
    • MCP Tooling (if available): Use MCP tools to re-fetch decompilation for related functions, inspect memory, or query type information if your disassembler environment supports it.
    • Adjust Hypotheses: Formulate new hypotheses to explain the observed behavior.
    • Modify Tests or Expected Outputs: Adjust your test cases to probe the new hypotheses, or update the expected outcomes if your understanding of correct behavior has changed.
    • Document Findings: Meticulously record the observed behavior, the expected behavior, the reasons for the discrepancy, and how your understanding evolved (e.g., in dedicated analysis documents like Memory.md or AttributeGraphFunctionAnalysis.md). This documentation is critical for tracking progress and for future reference.
  6. Repeat: Continue this cycle of hypothesis, testing, observation, and refinement until the function's behavior is well understood for the range of inputs and scenarios relevant to your goals.

Phase 4: Broader Application and Documentation

  1. Generalize Learnings: Once a set of functions or a subsystem is reasonably understood, try to extract general principles, common patterns, or architectural insights about the target library.
  2. Update Centralized RE Documentation (Reversing.md): Add these generalized learnings, along with the "Decompile, Bridge, Test, Refine" strategy itself, to your main reverse engineering best practices document to benefit future efforts.

Benefits of this Strategy:

  • Reduces Guesswork: Directly testing assumptions against the live binary provides concrete evidence and is more reliable than relying solely on static analysis.
  • Iterative and Focused Learning: Builds understanding incrementally. Failing tests precisely guide you to areas of the decompiled code that require closer attention and re-evaluation.
  • Actionable Insights: Moves beyond just "reading code" to actively verifying and correcting understanding.
  • Creates a Regression Suite: The resulting test suite becomes a valuable asset for detecting changes in behavior if the target library is updated or if your reimplementation evolves.
  • Facilitates Collaboration: Clear tests and documented findings make it easier to share understanding with others.
  • Automation Potential: MCP tools can automate parts of the decompilation interaction, and the test suite itself automates the verification of hypotheses.

By systematically applying this iterative workflow, you can build a robust and verified understanding of complex, non-public code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment