Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save inilim/1d512acc7f5f6b0c98a73c2ad76ecd0c to your computer and use it in GitHub Desktop.

Select an option

Save inilim/1d512acc7f5f6b0c98a73c2ad76ecd0c to your computer and use it in GitHub Desktop.
[Article] PHP, SQLite and Fiber: An unobvious trick to save memory

PHP 8.1 introduced Fibers, and SQLite 3.38.0 added built-in json_tree support. What happens when you combine these two technologies with a custom PDO function? You get a quite exotic, yet efficient method for data processing with minimal memory overhead.

What's the Problem?

The standard json_decode in PHP works great until you encounter megabyte-sized files.

Streaming parsers exist, but they are often slower and more complex to use. Our approach is a "golden middle": simpler than streaming, yet more economical than json_decode.

The Core Idea

The idea is to delegate the "heavy" work of traversing the data structure to SQLite (which is written in C and very efficient) and receive the data in PHP in portions via the Fiber::suspend() mechanism.

Here's how it looks in code:

<?php

/**
 * 1. Memory efficiency: If JSON weighs 5MB, this method will take about 12-15MB.
 *    Standard "json_decode" often requires x10 the file size.
 * 2. SQLite "Shadow RAM": In PHP 8.4, tracking memory consumed by SQLite
 *    might work differently and not always be directly accounted for in the script's memory_limit. (Win)
 */
function example_parse(string $json): \Generator
{
    // Use an in-memory database
    $pdo = new \PDO('sqlite::memory:');
    
    // Register a PHP function that SQLite will call
    // This is the key point: the function suspends query execution
    $pdo->sqliteCreateFunction('FETCH', static function ($key, $value, $type, $fullkey) {
        \Fiber::suspend([$key, $value, $type, $fullkey]);
        return null; // Return null so the WHERE condition doesn't execute and doesn't form a result set
    });

    $fiber = new \Fiber(static function (\PDO $pdo, string $json) {
        // The query should not produce output (SELECT 1 ...),
        // this simplifies the SQLite engine's work.
        // We "intercept" the data right during filtering.
        $stmt = $pdo->prepare('
            SELECT 1 
            FROM json_tree(:json)
            WHERE [key] IS NOT NULL 
            AND FETCH([key], [value], [type], [fullkey]) = 1
        ');

        $stmt->execute(['json' => $json]);
    });

    // Start the fiber and begin "pulling" data
    $value = $fiber->start($pdo, $json);

    while (!$fiber->isTerminated()) {
        yield $value;
        $value = $fiber->resume();
    }
}

// Usage example
$json = file_get_contents('big.json');

foreach(example_parse($json) as $node){
    [$key, $value, $type, $fullkey] = $node;
    // Process each node individually
    // echo "$fullkey: $value\n";
}

How It Works Under the Hood

  1. SQLite as an iterator: SQLite's json_tree function recursively traverses the JSON.
  2. sqliteCreateFunction + FETCH: We create a bridge. When SQLite finds a node, it calls our PHP FETCH function.
  3. Fiber pause: Inside FETCH, we don't return a result immediately. We tell the script: "Stop! Return control to the outer loop with the current data" (Fiber::suspend).
  4. Yield: The outer loop receives the data via the generator, processes it, and commands Fiber::resume(), forcing SQLite to continue from where it stopped.

An Important Nuance: Note the SELECT 1 ... AND FETCH(...) = 1. We specifically write the query so that SQLite's optimizer understands there is no selection. We use its WHERE clause engine as an iterator.

Even More Savings: File Database Instead of In-Memory

In the example above, we used sqlite::memory:, but the JSON itself is still in the $json variable in PHP memory. We can go further:

function example_parse_file(string $jsonFilePath): \Generator
{
    // Create a temporary file-based SQLite database
    $file = \tmpfile();
    $tempDb = \stream_get_meta_data($file)['uri']; // eg: /tmp/phpFx0513a
    $pdo = new \PDO("sqlite:$tempDb");
    
    $pdo->sqliteCreateFunction('FETCH', static function ($key, $value, $type, $fullkey) {
        \Fiber::suspend([$key, $value, $type, $fullkey]);
        return null;
    });

    // Create a table and load JSON directly from the file
    $pdo->exec('CREATE TABLE data (json_content TEXT)');
    // readfile() might not be available.
    $stmt = $pdo->prepare('INSERT INTO data (json_content) VALUES (readfile(:path))');
    $stmt->execute(['path' => $jsonFilePath]);
    $stmt = null;

    $fiber = new \Fiber(static function (\PDO $pdo) {
        $stmt = $pdo->prepare('
            SELECT 1 
            FROM data, json_tree(data.json_content)
            WHERE [key] IS NOT NULL 
            AND FETCH([key], [value], [type], [fullkey]) = 1
        ');
        $stmt->execute();
    });

    $value = $fiber->start($pdo);

    while (!$fiber->isTerminated()) {
        yield $value;
        $value = $fiber->resume();
    }
    
    // Cleanup
    $pdo = null;
    \fclose($file);
    @\unlink($tempDb);
}

Why This Is Even More Economical:

  • The JSON string is not loaded into PHP memory via file_get_contents().
  • SQLite reads the file directly and parses it within its own process.

It's Not Just About JSON: The "SQL-Iterator" Pattern

JSON here is just a clear example. The power of the approach is that you get direct access to the data stream inside an SQL query with the ability to control it from PHP.

PS: I'm a very poor writer, so I used an LLM to help format the article.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment