Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save statico/4417528519bdfb2bd577e002702ef429 to your computer and use it in GitHub Desktop.

Select an option

Save statico/4417528519bdfb2bd577e002702ef429 to your computer and use it in GitHub Desktop.
Reverse-Engineering the Quicken for Mac (2007-era) QDF Bundle Format

Reverse-Engineering the Quicken for Mac (2007-era) QDF Bundle Format

Background

A Quicken for Mac data file (circa 2007-2008, Quicken Mac 2007) was provided as a macOS bundle directory with no working copy of Quicken available to export from. The goal was to extract transaction data (dates, payees, check numbers, memos, and amounts) into a CSV.

Bundle Structure

The file presents as a macOS bundle (directory with a .qdf-like structure):

USAA/
  Data File Alias          # 0 bytes, macOS alias file
  Contents/
    PkgInfo                # 8 bytes: "BDATINTU" (bundle type identifier)
    Data File              # 175,888 bytes - the main binary data store
    Acronyms               # XML plist, list of acronyms (e.g. "ATM")
    PayeeSubstitution      # XML plist, empty array
    Quotes                 # 0 bytes
    FIDir/
      fidir.txt            # ~397 KB, financial institution directory
    Messages/              # empty
    Quicken State/
      DRAT#15967 Fixed
      External Notification Database
      Financial Institution Database
    User Settings/
      NumberField
      Update Schedule

The PkgInfo value BDATINTU identifies this as a Quicken data bundle, following the classic Mac OS creator/type code convention (creator: BDAT, type: INTU for Intuit).

The Data File: Not SQLite

Modern Quicken for Mac (2010+) uses SQLite. This older version does not. The file has no recognizable magic number -- it starts with 00 00 06 49 2f e8..., which is not SQLite (SQLite format 3\0), not a zip archive, and not any standard database format.

Running strings on the file revealed readable payee names, check numbers, and memo text interleaved with binary data, confirming that transaction records are embedded in a proprietary binary format.

Reverse-Engineering the Record Format

Discovery Method

  1. String extraction: strings "Data File" revealed payee names, check numbers (e.g. "1001", "1002"), and memo text, confirming transaction data was present.

  2. Anchor identification: Located known check number strings ("1001", "1005", etc.) in the hex dump and examined surrounding bytes for patterns.

  3. Date field discovery: Found that 32 bytes before each check number field, there was a consistent pattern: 00 00 [2-byte year] [month] [day]. For example, 00 00 07 d1 0a 0f decoded as 2001-10-15 (0x07D1 = 2001, 0x0A = 10, 0x0F = 15). Cross-referencing multiple records confirmed the pattern.

  4. Amount field discovery: Immediately after the date (at offset +6 from the record header), a 4-byte value appeared that was repeated at a fixed offset later in the record (offset +0x88 from the check number field). This duplication suggested it was an important value. Testing the interpretation as a signed 32-bit big-endian integer in cents produced plausible dollar amounts: checks were negative, deposits were positive.

  5. Field layout mapping: By comparing records of varying payee/memo lengths against the hex, the fixed-width field boundaries were established.

Record Layout

Each transaction record occupies a variable-size block but has a fixed header and field structure:

Offset  Size  Type                 Field
------  ----  -------------------  -------------------------
+0x00   2     bytes                Padding/separator (00 00)
+0x02   2     uint16 big-endian    Year
+0x04   1     uint8                Month (1-12)
+0x05   1     uint8                Day (1-31)
+0x06   4     int32 big-endian     Amount in cents (signed)
+0x0A   2     uint16 big-endian    Record sequence number
+0x0C   4     bytes                Flags/unknown
+0x10   2     bytes                Record type? (always 00 01)
+0x12   14    bytes                Zero padding
+0x20   8     null-padded ASCII    Check number (empty for non-check txns)
+0x28   40    null-padded ASCII    Payee name
+0x50   32    null-padded ASCII    Memo/description
+0x70   24    bytes                Zero padding
+0x88   4     int32 big-endian     Amount in cents (duplicate of +0x06)
+0x8C   varies bytes               Zero padding to next record

Key Observations

  • Amount encoding: Signed 32-bit big-endian, in cents (hundredths of a dollar). Negative values represent payments/checks; positive values represent deposits and interest.

  • String fields: Fixed-width, null-padded. Payee is 40 bytes, memo is 32 bytes, check number is 8 bytes. Strings shorter than the field width are null-terminated and the remainder is zero-filled.

  • Amount duplication: The amount appears twice in each record (at +0x06 and +0x88 relative to the record header). Purpose unknown -- possibly a running balance vs. transaction amount, or an integrity check. In all observed records, both values were identical.

  • Non-check transactions: Deposits, interest earned, wire transfers, and other non-check transactions use the same record format but leave the check number field empty (all zeros).

  • Metadata records: Approximately 129 records had non-printable characters in the payee field. These appear to be internal Quicken metadata (account settings, reconciliation markers, etc.) rather than actual transactions. They were filtered out by checking that payee, memo, and check number fields contained only printable ASCII (bytes 32-126).

File Header

The first ~3,346 bytes (before the first transaction record) contain:

  • A file header starting with 00 00 06 49 2f e8 00 00 22 fa 01 ab...
  • A block of FF FF repeated patterns (possibly a color table, category list, or allocation bitmap)
  • The purpose of these header bytes was not fully decoded, as they were not needed for transaction extraction.

Extraction Script

import struct
import csv

with open('USAA/Contents/Data File', 'rb') as f:
    data = f.read()

def is_printable(s):
    return all(32 <= ord(c) < 127 for c in s) if s else True

records = []
i = 0
while i < len(data) - 0x90:
    if data[i] == 0 and data[i+1] == 0:
        year = struct.unpack('>H', data[i+2:i+4])[0]
        month = data[i+4]
        day = data[i+5]
        if 2000 <= year <= 2020 and 1 <= month <= 12 and 1 <= day <= 31:
            amount_raw = struct.unpack('>i', data[i+6:i+10])[0]
            check_off = i + 0x20
            payee_off = i + 0x28
            memo_off  = i + 0x50

            if memo_off + 32 < len(data):
                check_num = data[check_off:check_off+8] \
                    .split(b'\x00')[0].decode('ascii', errors='replace').strip()
                payee = data[payee_off:payee_off+40] \
                    .split(b'\x00')[0].decode('ascii', errors='replace').strip()
                memo = data[memo_off:memo_off+32] \
                    .split(b'\x00')[0].decode('ascii', errors='replace').strip()

                if ((payee or check_num)
                    and is_printable(payee)
                    and is_printable(memo)
                    and is_printable(check_num)):
                    records.append({
                        'date': f'{year}-{month:02d}-{day:02d}',
                        'check_num': check_num,
                        'payee': payee,
                        'memo': memo,
                        'amount': amount_raw / 100.0,
                    })
            i += 0x100  # skip past this record
            continue
    i += 1

records.sort(key=lambda r: r['date'])

with open('transactions.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['Date', 'Check #', 'Payee', 'Memo', 'Amount'])
    for r in records:
        writer.writerow([
            r['date'], r['check_num'], r['payee'], r['memo'],
            f"{r['amount']:.2f}"
        ])

Results

  • 424 clean transaction records extracted
  • Date range: 2001-09 through 2019-08
  • Transaction types: Checks, deposits, interest, wire transfers, pass-throughs
  • Amounts ranged from single-digit interest payments to six-figure deposits

Limitations and Caveats

  1. Category/account fields: Quicken supports categories (e.g. "Auto:Fuel", "Tax:Federal") and split transactions. These fields were not decoded. They may be stored in the unknown bytes or in a separate section of the file.

  2. Cleared/reconciled status: Quicken tracks whether transactions are cleared (c) or reconciled (R). This flag was not identified in the record structure, though it likely exists in the flags at offset +0x0C.

  3. Transfer linkage: Transfers between accounts in Quicken are linked records. The linkage mechanism was not decoded.

  4. Multi-account: If the file contained multiple accounts, this parser treats all records uniformly. Account boundaries were not identified.

  5. The ~129 filtered records: These may contain additional transaction data in a variant record format, or they may be purely internal metadata. Further analysis would be needed to determine if any real transactions were lost.

  6. No independent verification: Without a working Quicken installation to export the same file, the extracted amounts could not be cross-checked against a known-good export. The amounts appear plausible based on context (payee names, memo descriptions, date ordering).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment