Skip to content

Instantly share code, notes, and snippets.

@akaak
Created March 10, 2026 02:29
Show Gist options
  • Select an option

  • Save akaak/85fe60d90b82d6ea2ff5882a73419612 to your computer and use it in GitHub Desktop.

Select an option

Save akaak/85fe60d90b82d6ea2ff5882a73419612 to your computer and use it in GitHub Desktop.
eRegs: MongoDB → PostgreSQL Data Migration Plan v2 (MigrationMapping idempotency)

MongoDB → PostgreSQL Data Migration Plan (v2)

Context

The existing eRegs production app stores data in MongoDB (5 collections: users, employers, usersubscriptions, transactions, employees). The new eRegs app uses PostgreSQL via Prisma. This plan migrates all paid subscriber data — users, organizations, subscriptions, transactions, and drivers — into the new schema. Scripts must be idempotent (safe to run multiple times) to support staging validation before production execution.


Phase 1: Schema Additions (Prisma)

Edit prisma/schema.prisma then run npx prisma migrate dev --name add-migration-fields.

1.1 Add to User model

accountNumber String?  @unique  // Human-readable ID preserved from MongoDB (SAN-0071, EC5LFY5LD4)

This is a real business field (appears on invoices/support), not migration scaffolding.

1.2 Add to Organization model

billingFrequency String?  // "YEARLY" | "MONTHLY"

1.3 Add to Subscription model

billingFrequency String?  // "YEARLY" | "MONTHLY"

1.4 Add to Driver model

userId String?  @unique
user   User?    @relation(fields: [userId], references: [id], onDelete: SetNull)

Links the driver's PG User row (employee login account) to their Driver record. Also add reverse relation to User: driverProfile Driver?

1.5 New Transaction model

enum TransactionType   { SUBSCRIPTION INVITE }
enum TransactionStatus { PAID PENDING FAILED REFUNDED }

model Transaction {
  id                      String            @id @default(cuid())
  mongoUserSubscriptionId String?           // denormalized ref to source usersubscription._id
  userId                  String?
  user                    User?             @relation(fields: [userId], references: [id], onDelete: SetNull)
  orgId                   String?
  org                     Organization?     @relation(fields: [orgId], references: [id], onDelete: SetNull)
  stripeInvoiceId         String?
  stripeInvoiceUrl        String?
  stripeEventId           String?
  stripePaymentIntentId   String?
  stripeReceiptUrl        String?
  totalAmountCents        Int?
  unitPriceCents          Int?
  quantity                Int?
  type                    TransactionType
  status                  TransactionStatus
  invoiceDate             DateTime?
  createdAt               DateTime          @default(now())
  updatedAt               DateTime          @updatedAt

  @@index([userId])
  @@index([orgId])
  @@index([stripeInvoiceId])
}

Add reverse relations:

// In User:         transactions Transaction[]
// In Organization: transactions Transaction[]

1.6 New MigrationMapping model — idempotency tracking only

model MigrationMapping {
  id        String   @id @default(cuid())
  mongoId   String               // MongoDB ObjectId string
  pgId      String               // PostgreSQL record id (cuid)
  entity    String               // "User" | "Organization" | "Subscription" | "Transaction" | "Driver"
  createdAt DateTime @default(now())

  @@unique([mongoId, entity])
  @@index([entity])
}

This table is the only idempotency mechanism. It is explicitly temporary — after production migration is validated and signed off, run DROP TABLE "MigrationMapping" and remove the model from schema. No business model is polluted with migration-only fields.

New records created by the running app after migration will never touch this table.


Phase 2: .env.datamigration File

Location: .env.datamigration (covered by .gitignore via .env* pattern — do not commit)

# Target PostgreSQL
DATABASE_URL="postgresql://USER:PASS@HOST:5432/DBNAME?sslmode=require"

# Source MongoDB
MONGODB_URI="mongodb+srv://mvpuser:...@cluster0.fxismfr.mongodb.net/eregsmvp_stagging"

# Migration control
DRY_RUN="false"     # "true" = log only, no writes
VERBOSE="false"     # "true" = per-document logs
LIMIT=""            # Max docs per collection for test runs ("" or "0" = no limit)

# Logging
LOG_FILE="scripts/data-migration/migration.log"

Phase 3: Script File Structure

All in scripts/data-migration/

lib/
  mongo.ts        # MongoClient singleton + typed collection accessors + TS interfaces
  prisma.ts       # PrismaClient singleton using .env.datamigration DATABASE_URL
  logger.ts       # MigrationLogger: console+file output, stats tracking
  transform.ts    # Pure field-mapping functions (testable, no DB calls)

00-validate-source.ts   # Read-only pre-flight: MongoDB counts + FK sanity
01-migrate-users.ts     # MongoDB users (EMPLOYER type) → PG User
02-migrate-orgs.ts      # MongoDB employers → PG Organization; update User.organizationId
03-migrate-subs.ts      # MongoDB usersubscriptions → PG Subscription
04-migrate-txns.ts      # MongoDB transactions → PG Transaction
05-migrate-drivers.ts   # MongoDB employees → PG User (role=PRO) + Driver
06-validate-target.ts   # Post-migration SQL validation queries

run-all.ts              # Orchestrator: runs 01→05 in order, halts on failure

output/                 # Created at runtime
  receipt-urls.csv      # S3 PDF URLs from employee receipts (for manual follow-up)
  migration.log         # Appended on each run

Pattern: Each script follows sync-local.ts convention — new PrismaPg({ connectionString }) + new PrismaClient({ adapter }). Each exports default async function main() and auto-runs when executed directly.

Invocation:

# Add mongodb driver (one-time)
npm install --save-dev mongodb

# Pre-flight (read-only, safe to run anytime)
npx tsx --env-file=.env.datamigration scripts/data-migration/00-validate-source.ts

# Dry run (no writes — logs what would happen)
DRY_RUN=true npx tsx --env-file=.env.datamigration scripts/data-migration/run-all.ts

# Real migration
npx tsx --env-file=.env.datamigration scripts/data-migration/run-all.ts

# Post-migration validation
npx tsx --env-file=.env.datamigration scripts/data-migration/06-validate-target.ts

# After production sign-off: clean up the mapping table
# DROP TABLE "MigrationMapping"; then remove model from schema.prisma

Phase 4: Migration Order (Dependency Graph)

01: MongoDB users (EMPLOYER) → PG User
        ↓
02: MongoDB employers → PG Organization  (needs User.id via MigrationMapping)
        ↓              ↓
03: Subscriptions   05: Drivers          (need Organization.id via MigrationMapping)
        ↓
04: Transactions                          (needs User.id + Organization.id)

Phase 5: Idempotency Strategy

Every script uses MigrationMapping as its sole idempotency gate. No mongoId column exists on any business model.

Idempotency check pattern (same for every script):

const mapping = await prisma.migrationMapping.findUnique({
  where: { mongoId_entity: { mongoId: doc._id.toString(), entity: 'User' } }
});
if (mapping) {
  logger.skip(doc._id, 'already migrated → pgId: ' + mapping.pgId);
  stats.skipped++;
  continue;
}

if (DRY_RUN) { logger.dryRun(doc._id, transformed); stats.dryRun++; continue; }

const created = await prisma.user.create({ data: transformed });

await prisma.migrationMapping.create({
  data: { mongoId: doc._id.toString(), pgId: created.id, entity: 'User' }
});
stats.inserted++;

Cross-collection lookups via MigrationMapping:

When a later script needs the PG id for a previously-migrated MongoDB document:

// e.g., in 02-migrate-orgs.ts: find PG User.id for the employer's MongoDB user._id
const mapping = await prisma.migrationMapping.findUnique({
  where: { mongoId_entity: { mongoId: employer.user.toString(), entity: 'User' } }
});
if (!mapping) { logger.warn('employer user not found in PG, skipping'); continue; }
const pgUserId = mapping.pgId;

Secondary guard (belt-and-suspenders):

Also catch Prisma P2002 (unique constraint violation) on fields like email and stripeSubscriptionId. Treat as skip + warning, not a fatal error. This handles partial runs where MigrationMapping was written but the record create failed, or vice versa.

Collection Entity name in MigrationMapping
users (EMPLOYER) "User"
employers "Organization"
usersubscriptions "Subscription"
transactions "Transaction"
employees (User part) "User"
employees (Driver part) "Driver"

Phase 6: Field Mapping

6.1 users (EMPLOYER only) → User

MongoDB PostgreSQL Transform
email email lowercase + trim
password password copy (bcrypt hash is portable)
name.firstName + lastName name concat with space
userType = "EMPLOYER" role "FLEET_MANAGER"
accountNumber accountNumber copy as-is (e.g. SAN-0071)
deleted = true deletedAt set to updatedAt value
createdAt, updatedAt same parse to Date
passwordResetLink, __v, employee, employer skip

Employee users (userType = "EMPLOYEEE") are handled in step 05.

6.2 employersOrganization + update User

MongoDB PostgreSQL Transform
user (→ Mongo User._id) ownerId MigrationMapping lookup → PG User.id
company name trim; fallback: extract from owner email domain
numberOfInvitesRemaining freeInvitesLeft direct int
billingFrequency billingFrequency copy ("YEARLY"|"MONTHLY")
tiralPeriodUsed (typo!) trialEndsAt false → createdAt+14d; true → createdAt
stripe.customer (log only) Subscription is authoritative source
cardDetails SKIP — PCI DATA never read or log this field
access, price, numberOfTotalInvitesRemaining skip

After creating Org, also update User.organizationId = org.id.

6.3 usersubscriptionsSubscription

MongoDB PostgreSQL Transform
employer (→ Mongo Employer._id) orgId MigrationMapping("Organization") → PG Org.id
user (→ Mongo User._id) userId MigrationMapping("User") → PG User.id
stripeDetails.customerId stripeCustomerId copy
stripeDetails.id (sub_xxx) stripeSubscriptionId copy
stripeDetails.planId stripePriceId copy
stripeDetails.status status see enum map below
billingFrequency billingFrequency copy
cancelAtPeriodEnd cancelAtPeriodEnd boolean
startDate currentPeriodStart parse to Date
endDate currentPeriodEnd parse to Date
subscriptionType = "employer" plan FLEET_MANAGER
cancelDate billingNotes "Legacy cancel date: YYYY-MM-DD"
transaction, stripeDetails.paymentMethod skip

Status map: activeACTIVE, trialingTRIALING, past_duePAST_DUE, canceled/cancelledCANCELED, pausedPAUSED, else→TRIALING (log warn).

6.4 transactionsTransaction

MongoDB PostgreSQL Transform
user (→ Mongo User._id) userId MigrationMapping("User") → PG User.id
employer (→ Mongo User._id of employer!) orgId MigrationMapping("User") → pgUserId → org where ownerId=pgUserId
stripeHostedInvoiceUrl stripeInvoiceUrl copy
stripeInvoiceId stripeInvoiceId copy
totalAmount totalAmountCents copy directly — confirmed cents (4000 = $40.00)
status (int) status 1PAID, 0PENDING, -1FAILED
invoiceDate invoiceDate parse to Date
type type "subscription"SUBSCRIPTION, "invite"INVITE
stripeEventId stripeEventId invite type only
paymentIntentId stripePaymentIntentId invite type only
stripeReceiptUrl stripeReceiptUrl invite type only
unit_price unitPriceCents copy directly (already cents)
quantity quantity invite type only
userSubscription (Mongo _id) mongoUserSubscriptionId store as string — denormalized audit ref

Note: Transaction.mongoUserSubscriptionId is the one intentional exception — it stores a Mongo reference as a plain string for audit/support purposes. It is not a FK and carries no idempotency role.

6.5 employees → PG User (role=PRO) + Driver

Step A — Create PG User for the employee:

MongoDB (users join) PostgreSQL User Transform
email email lowercase + trim
password password copy (bcrypt portable)
name.firstName + lastName name concat
userType = "EMPLOYEEE" role "PRO" (handles both EMPLOYEE and EMPLOYEEE typo)
accountNumber accountNumber copy (10-char alphanumeric, e.g. EC5LFY5LD4)
deleted = true deletedAt set to updatedAt

Record in MigrationMapping as entity "User".

Step B — Create Driver linked to Step A User:

MongoDB employees PostgreSQL Driver Transform
PG User.id from Step A userId direct FK
employer (→ employer's Mongo User._id) orgId MigrationMapping("User") → pgUserId → org.ownerId=pgUserId
phone phone trim; null if empty string
email from users join email from joined users doc
name from users join name firstName + lastName
dateAcknowledged non-null status ACKNOWLEDGED; populate acknowledgedAt
accountStatus = true + no ack status ACTIVE
else status PENDING
dateInvited createdAt parse to Date
dateAcknowledged acknowledgedAt parse to Date
phone present, email absent inviteMethod SMS; else EMAIL
receipt (S3 object) write to output/receipt-urls.csv
passwordSetup, access, company, __v skip

Record in MigrationMapping as entity "Driver".


Phase 7: Key Edge Cases

  1. EMPLOYEEE typo — script 05 normalizes userType.toUpperCase().startsWith('EMPLOY') to catch both spellings; both create PG User (role=PRO) + Driver
  2. cardDetails PCI data — explicitly destructured-out at the top of script 02, never read, logged, or processed: const { cardDetails: _PCI, ...safeEmployer } = mongoDoc
  3. tiralPeriodUsed typo — read exactly as-is from MongoDB; do not attempt to correct the key name
  4. employees.employer is a User._id, not Employer._id — lookup chain: employee.employer → MigrationMapping("User") → PG User.id → Organization.ownerId = PG User.id
  5. Duplicate Stripe IDs — catch P2002 on stripeSubscriptionId/stripeCustomerId; log detailed warning (both mongoIds), do not abort the run
  6. Missing cross-references — if a MongoDB FK has no MigrationMapping entry (source doc was deleted/corrupted), log warning with full context, write partial record where possible, do not abort
  7. deleted: true users — still migrated; User.deletedAt set to their updatedAt timestamp
  8. S3 receipt PDFs — employee receipt field collected into output/receipt-urls.csv (columns: employeeMongoId, email, s3Key, s3Bucket, s3Url) for manual follow-up

Phase 8: Post-Migration Validation (06-validate-target.ts)

Count parity — compare PG record counts (by MigrationMapping entity) against source MongoDB collection counts printed by 00-validate-source.ts:

SELECT entity, COUNT(*) FROM "MigrationMapping" GROUP BY entity;

FK integrity (all should return 0):

-- Orgs with invalid ownerId
SELECT COUNT(*) FROM "Organization" o
JOIN "MigrationMapping" m ON m."pgId" = o.id AND m.entity = 'Organization'
WHERE NOT EXISTS (SELECT 1 FROM "User" u WHERE u.id = o."ownerId");

-- Subscriptions with invalid orgId
SELECT COUNT(*) FROM "Subscription" s
JOIN "MigrationMapping" m ON m."pgId" = s.id AND m.entity = 'Subscription'
WHERE s."orgId" IS NOT NULL
  AND NOT EXISTS (SELECT 1 FROM "Organization" o WHERE o.id = s."orgId");

-- Drivers with invalid orgId
SELECT COUNT(*) FROM "Driver" d
JOIN "MigrationMapping" m ON m."pgId" = d.id AND m.entity = 'Driver'
WHERE NOT EXISTS (SELECT 1 FROM "Organization" o WHERE o.id = d."orgId");

Business logic warnings (non-zero = investigate):

-- Migrated orgs without a Subscription
SELECT COUNT(*) FROM "Organization" o
JOIN "MigrationMapping" m ON m."pgId" = o.id AND m.entity = 'Organization'
WHERE NOT EXISTS (SELECT 1 FROM "Subscription" s WHERE s."orgId" = o.id);

-- ACKNOWLEDGED drivers missing acknowledgedAt
SELECT COUNT(*) FROM "Driver" d
JOIN "MigrationMapping" m ON m."pgId" = d.id AND m.entity = 'Driver'
WHERE d.status = 'ACKNOWLEDGED' AND d."acknowledgedAt" IS NULL;

-- Duplicate Stripe customer IDs
SELECT "stripeCustomerId", COUNT(*) FROM "Subscription"
WHERE "stripeCustomerId" IS NOT NULL
GROUP BY "stripeCustomerId" HAVING COUNT(*) > 1;

-- Transaction amounts out of range (0–$10,000 in cents)
SELECT COUNT(*) FROM "Transaction" t
JOIN "MigrationMapping" m ON m."pgId" = t.id AND m.entity = 'Transaction'
WHERE t."totalAmountCents" IS NOT NULL
  AND (t."totalAmountCents" < 0 OR t."totalAmountCents" > 1000000);

Spot checks (printed for human review):

  • 5 sample migrated Users with accountNumber
  • 5 sample Organizations with freeInvitesLeft and billingFrequency
  • 10 sample Transactions with amounts and type

Phase 9: Cleanup (after production sign-off)

DROP TABLE "MigrationMapping";

Then remove the MigrationMapping model from prisma/schema.prisma and run:

npx prisma migrate dev --name remove-migration-mapping

Critical Files

File Action
prisma/schema.prisma Add accountNumber to User, billingFrequency to Org/Subscription, userId to Driver, new Transaction model, new MigrationMapping model
scripts/data-migration/lib/mongo.ts New — MongoClient singleton + typed interfaces for all 5 collections
scripts/data-migration/lib/transform.ts New — pure field-mapping functions (no DB I/O)
scripts/data-migration/lib/logger.ts New — MigrationLogger: console + file, stats tracking
scripts/data-migration/lib/prisma.ts New — PrismaClient using .env.datamigration DATABASE_URL
scripts/data-migration/00-validate-source.ts New — pre-flight read-only MongoDB check
scripts/data-migration/01-migrate-users.ts New
scripts/data-migration/02-migrate-orgs.ts New
scripts/data-migration/03-migrate-subs.ts New
scripts/data-migration/04-migrate-txns.ts New
scripts/data-migration/05-migrate-drivers.ts New
scripts/data-migration/06-validate-target.ts New — post-migration validation
scripts/data-migration/run-all.ts New — orchestrator
.env.datamigration New — gitignored config
package.json Add mongodb dev dependency

Reference pattern for Prisma client init: scripts/sync-local.ts (uses PrismaPg adapter).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment