Created
April 17, 2022 04:22
-
-
Save rodrigopedra/3fc18f8af4bb0ff49f760b2e2b278061 to your computer and use it in GitHub Desktop.
PHP import CSV
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <?php | |
| $file = fopen('php://output', 'w'); | |
| fputcsv($file, ['ID', 'NAME', 'AGE']); | |
| for ($index = 0; $index < 21_000_000; $index++) { | |
| fputcsv($file, [$index + 1, bin2hex(random_bytes(20)), random_int(18, 65)]); | |
| } | |
| fclose($file); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <?php | |
| define('BUFFER_SIZE', 500); | |
| $pdo = new PDO('mysql:host=127.0.0.1;dbname=dummy;charset=utf8mb4', 'root', 'password', [ | |
| PDO::ATTR_EMULATE_PREPARES => false, | |
| PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION, | |
| ]); | |
| $pdo->exec('DROP TABLE IF EXISTS `import`'); | |
| $pdo->exec('CREATE TABLE `import` ( | |
| `id` BIGINT UNSIGNED NOT NULL, | |
| `name` VARCHAR(100) NOT NULL, | |
| `age` INT UNSIGNED NOT NULL | |
| )'); | |
| $statement = $pdo->prepare('INSERT INTO `import`(`id`, `name`, `age`) VALUES ' . implode(',', array_fill(0, BUFFER_SIZE, '(?, ?, ?)'))); | |
| echo date('Y-m-d H:i:s'), PHP_EOL; | |
| $file = fopen($argv[1], 'r'); | |
| fgetcsv($file, 100); // skip first line | |
| $buffer = []; | |
| while (!feof($file)) { | |
| $fields = fgetcsv($file, 100); | |
| if ($fields === false) { | |
| continue; | |
| } | |
| $buffer[] = $fields; | |
| if (count($buffer) === BUFFER_SIZE) { | |
| $statement->execute(array_merge(...$buffer)); | |
| $buffer = []; | |
| } | |
| } | |
| fclose($file); | |
| $remaining = count($buffer); | |
| if ($remaining > 0) { | |
| $statement = $pdo->prepare('INSERT INTO `import`(`id`, `name`, `age`) VALUES ' . implode(',', array_fill(0, $remaining, '(?, ?, ?)'))); | |
| $statement->execute(array_merge(...$buffer)); | |
| } | |
| echo date('Y-m-d H:i:s'), PHP_EOL; | |
| $value = $pdo->query('SELECT COUNT(*) FROM `import`')->fetchColumn(); | |
| echo number_format($value), PHP_EOL; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <?php | |
| $pdo = new PDO('mysql:host=127.0.0.1;dbname=dummy;charset=utf8mb4', 'root', 'password', [ | |
| PDO::ATTR_EMULATE_PREPARES => false, | |
| PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION, | |
| ]); | |
| $pdo->exec('DROP TABLE IF EXISTS `import`'); | |
| $pdo->exec('CREATE TABLE `import` ( | |
| `id` BIGINT UNSIGNED NOT NULL, | |
| `name` VARCHAR(100) NOT NULL, | |
| `age` INT UNSIGNED NOT NULL | |
| )'); | |
| $statement = $pdo->prepare('INSERT INTO `import`(`id`, `name`, `age`) VALUES (:id, :name, :age)'); | |
| echo date('Y-m-d H:i:s'), PHP_EOL; | |
| $file = fopen($argv[1], 'r'); | |
| fgetcsv($file, 100); // skip first line | |
| while (!feof($file)) { | |
| $fields = fgetcsv($file, 100); | |
| if ($fields === false) { | |
| continue; | |
| } | |
| $statement->execute([ | |
| ':id' => $fields[0], | |
| ':name' => $fields[1], | |
| ':age' => $fields[2], | |
| ]); | |
| } | |
| fclose($file); | |
| echo date('Y-m-d H:i:s'), PHP_EOL; | |
| $value = $pdo->query('SELECT COUNT(*) FROM `import`')->fetchColumn(); | |
| echo number_format($value), PHP_EOL; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| $ php generate.php > data.csv | |
| $ ls -lh data.csv | |
| -rw-r--r-- 1 rodrigo rodrigo 1,1G abr 17 01:02 data.csv | |
| $ php import-linear.php data.csv | |
| 2022-04-17 01:02:50 | |
| 2022-04-17 01:17:00 | |
| 21,000,000 | |
| $ php import-buffered.php data.csv | |
| 2022-04-17 01:17:16 | |
| 2022-04-17 01:19:12 | |
| 21,000,000 |
Author
Author
@jamesmills a final disclaimer:
Although I spent some years (about 7 years) dealing with large datasets, I worked more on the ETL and analysis side of the job, and not the database management.
Of course along the years you learn a trick or two when dealing over and over with the same problem set. But the options I gave above, specially on the second comment on your additional questions, might no be the best solution.
Those are strategies I would do, from my experience, but if my team had an expert on database management, I would talk to them first to assess if this is the best strategy.
So if anyone else reading this knows better, please share in the comments how would you handle these. Thanks =)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
You can:
I usually, when I dealt with large databases, used a mix of both. I ingested data into a temporary table (actually a real table I create with the same data structure as the target table), once data is ingested and verified, I would drop the old one and rename the new one.
Note, that if you foreign indices that references the table being replaced you will need to:
There are other strategies you can use, if you can't afford any downtime:
Another one is using transactions, which will lock your app for a while, but both database and app can stay alive, requests are just going to be delayed. This is very similar to the drop/rename strategy above:
INSERTwithSELECTfor thisThe main difference here is the transaction usage, which will try to guarantee the database stays on a good state once data is committed.
This approach have some downsides:
Hope this ideas help =)