PHP 8.1 introduced Fibers, and SQLite 3.38.0 added built-in json_tree support. What happens when you combine these two technologies with a custom PDO function? You get a quite exotic, yet efficient method for data processing with minimal memory overhead.
The standard json_decode in PHP works great until you encounter megabyte-sized files.
Streaming parsers exist, but they are often slower and more complex to use. Our approach is a "golden middle": simpler than streaming, yet more economical than json_decode.
The idea is to delegate the "heavy" work of traversing the data structure to SQLite (which is written in C and very efficient) and receive the data in PHP in portions via the Fiber::suspend() mechanism.
Here's how it looks in code:
<?php
/**
* 1. Memory efficiency: If JSON weighs 5MB, this method will take about 12-15MB.
* Standard "json_decode" often requires x10 the file size.
* 2. SQLite "Shadow RAM": In PHP 8.4, tracking memory consumed by SQLite
* might work differently and not always be directly accounted for in the script's memory_limit. (Win)
*/
function example_parse(string $json): \Generator
{
// Use an in-memory database
$pdo = new \PDO('sqlite::memory:');
// Register a PHP function that SQLite will call
// This is the key point: the function suspends query execution
$pdo->sqliteCreateFunction('FETCH', static function ($key, $value, $type, $fullkey) {
\Fiber::suspend([$key, $value, $type, $fullkey]);
return null; // Return null so the WHERE condition doesn't execute and doesn't form a result set
});
$fiber = new \Fiber(static function (\PDO $pdo, string $json) {
// The query should not produce output (SELECT 1 ...),
// this simplifies the SQLite engine's work.
// We "intercept" the data right during filtering.
$stmt = $pdo->prepare('
SELECT 1
FROM json_tree(:json)
WHERE [key] IS NOT NULL
AND FETCH([key], [value], [type], [fullkey]) = 1
');
$stmt->execute(['json' => $json]);
});
// Start the fiber and begin "pulling" data
$value = $fiber->start($pdo, $json);
while (!$fiber->isTerminated()) {
yield $value;
$value = $fiber->resume();
}
}
// Usage example
$json = file_get_contents('big.json');
foreach(example_parse($json) as $node){
[$key, $value, $type, $fullkey] = $node;
// Process each node individually
// echo "$fullkey: $value\n";
}- SQLite as an iterator: SQLite's
json_treefunction recursively traverses the JSON. sqliteCreateFunction+FETCH: We create a bridge. When SQLite finds a node, it calls our PHPFETCHfunction.- Fiber pause: Inside
FETCH, we don't return a result immediately. We tell the script: "Stop! Return control to the outer loop with the current data" (Fiber::suspend). - Yield: The outer loop receives the data via the generator, processes it, and commands
Fiber::resume(), forcing SQLite to continue from where it stopped.
An Important Nuance: Note the SELECT 1 ... AND FETCH(...) = 1. We specifically write the query so that SQLite's optimizer understands there is no selection. We use its WHERE clause engine as an iterator.
In the example above, we used sqlite::memory:, but the JSON itself is still in the $json variable in PHP memory. We can go further:
function example_parse_file(string $jsonFilePath): \Generator
{
// Create a temporary file-based SQLite database
$file = \tmpfile();
$tempDb = \stream_get_meta_data($file)['uri']; // eg: /tmp/phpFx0513a
$pdo = new \PDO("sqlite:$tempDb");
$pdo->sqliteCreateFunction('FETCH', static function ($key, $value, $type, $fullkey) {
\Fiber::suspend([$key, $value, $type, $fullkey]);
return null;
});
// Create a table and load JSON directly from the file
$pdo->exec('CREATE TABLE data (json_content TEXT)');
// readfile() might not be available.
$stmt = $pdo->prepare('INSERT INTO data (json_content) VALUES (readfile(:path))');
$stmt->execute(['path' => $jsonFilePath]);
$stmt = null;
$fiber = new \Fiber(static function (\PDO $pdo) {
$stmt = $pdo->prepare('
SELECT 1
FROM data, json_tree(data.json_content)
WHERE [key] IS NOT NULL
AND FETCH([key], [value], [type], [fullkey]) = 1
');
$stmt->execute();
});
$value = $fiber->start($pdo);
while (!$fiber->isTerminated()) {
yield $value;
$value = $fiber->resume();
}
// Cleanup
$pdo = null;
\fclose($file);
@\unlink($tempDb);
}Why This Is Even More Economical:
- The JSON string is not loaded into PHP memory via
file_get_contents(). - SQLite reads the file directly and parses it within its own process.
JSON here is just a clear example. The power of the approach is that you get direct access to the data stream inside an SQL query with the ability to control it from PHP.
PS: I'm a very poor writer, so I used an LLM to help format the article.