Skip to content

Instantly share code, notes, and snippets.

@wiserfirst
Created October 3, 2025 04:56
Show Gist options
  • Select an option

  • Save wiserfirst/8b358675b75ba1d85990919ee8741b4b to your computer and use it in GitHub Desktop.

Select an option

Save wiserfirst/8b358675b75ba1d85990919ee8741b4b to your computer and use it in GitHub Desktop.

Service Region Test Investigation - Detailed Findings

Date: 2025-10-03 Test File: apps/admin/test/admin_web/integration/service_region_priority_test.exs:134 Test Name: "fetches only providers meeting criteria"

Executive Summary

Critical Finding: The insert_polygons function is COMPLETELY UNNECESSARY for this test. The test runs in 2.7 seconds, and the actual function being tested (available_providers) only takes 6.75ms. The remaining 2.6+ seconds is spent on WKT decoding and database inserts that serve no purpose.

Performance Breakdown (Instrumented)

Current Timing (with insert_polygons commented out)

Total Test Time: 2.7 seconds

Breakdown:
- Insert provider 'Orion Towing': 24.47ms
- Create 4 provider nodes for 'Orion Towing': 1,291.39ms  (48% of test time)
- Insert provider 'Battery City': 0.46ms
- Create 3 provider nodes for 'Battery City': 1,273.22ms  (47% of test time)
- Call @subject.available_providers: 6.75ms              (0.25% of test time!)

Key Insight: 95% of test time is spent creating provider nodes with WKT-decoded coordinates, but the actual function being tested only takes 6.75ms!

What Takes Time Creating Provider Nodes

Each provider node creation involves:

  1. Reading from CSV stream (geodata_stream)
  2. Filtering postcodes
  3. WKT decoding - Geo.WKT.decode!(geo_shape["shape_geom"]) (~25-30ms per postcode)
  4. Extracting random coordinates from decoded polygon
  5. Database insert

Total for 7 provider nodes: ~2,564ms (2.6 seconds)

Investigation: Does available_providers Need PostGIS?

The Function Implementation

File: apps/admin/lib/admin/service_regions.ex:115

def available_providers(region_id, service_type) do
  case Ecto.UUID.cast(region_id) do
    {:ok, service_region_id} ->
      query =
        from(p in Provider,
          distinct: p.id,
          join: pn in ProviderNode,
          on: pn.provider_id == p.id,
          join: pcode in Postcode,
          on: pcode.postcode == pn.postcode,  # ← String match only!
          join: sp in ServiceRegion,
          on: sp.id == pcode.service_region_id,
          where:
            sp.id == ^service_region_id and
              p.service_type == ^service_type and
              p.status == ^"Active"
        )

      Repo.all(query)

    error ->
      Logger.error("#{__MODULE__}.available_providers/2 UUID CAST ERROR...")
      []
  end
end

Analysis

CRITICAL: This function does ZERO geographic/spatial queries!

The join logic:

  1. ProviderProviderNode (provider_id match)
  2. ProviderNodePostcode (postcode STRING match: pcode.postcode == pn.postcode)
  3. PostcodeServiceRegion (service_region_id match)

No PostGIS operations:

  • ❌ No distance calculations
  • ❌ No ST_Within, ST_Contains, ST_Intersects
  • ❌ No geographic buffers or radius queries
  • ❌ No coordinate-based spatial queries
  • Only string-based postcode matching

What the Test Actually Needs

Current test setup:

# Setup creates:
1. Service region with postcodes: ["2224", "2225", "2226", "2227", "2228", "2229"]
2. Provider nodes with postcodes matching the region
3. Coordinates extracted from WKT geometries (longitude, latitude)

# What available_providers actually uses:
- Only the postcode STRING from ProviderNode
- NEVER uses longitude or latitude
- NEVER queries GeoPostcode table (with PostGIS geometries)

Proof: The Postcode table (joined in the query) is NOT the GeoPostcode table. They're different tables!

  • GeoPostcode - Contains PostGIS geometries, created by insert_polygons - NEVER QUERIED
  • Postcode - Simple table with postcode strings and service_region_id - ACTUALLY USED

Why WKT Decoding is Expensive and Unnecessary

WKT Geometry Example

"POLYGON((151.05 -33.90, 151.10 -33.90, 151.10 -33.95, 151.05 -33.95, 151.05 -33.90))"

Decoding process (~25-30ms per postcode):

  1. Parse WKT string
  2. Validate polygon structure
  3. Extract all coordinate pairs
  4. Flatten nested lists
  5. Select random coordinate

What the test uses from this:

  • A longitude value (e.g., 151.07)
  • A latitude value (e.g., -33.92)

What available_providers uses:

  • NOTHING - these coordinates are stored in provider_node but never queried!

The Smoking Gun: Database Tables

Looking at the query joins:

join: pcode in Postcode,  # ← This is Members.Schemas.Operations.Postcode
on: pcode.postcode == pn.postcode

But insert_polygons inserts into:

GeoPostcode.changeset(i)  # ← This is Members.Schemas.Operations.GeoPostcode
|> OperationsRepo.insert!(on_conflict: :nothing)

These are DIFFERENT TABLES:

  • Postcode - Simple lookup table (postcode string → service_region_id)
  • GeoPostcode - PostGIS geometry table with complex polygons

The test creates Postcode records via:

OperationsFactory.insert(:service_region_postcode,
  service_region: svc_region,
  postcode: pcode)

This creates records in the simple Postcode table, NOT GeoPostcode.

Conclusion

Facts

  1. available_providers NEVER queries GeoPostcode table

    • It queries the simple Postcode table (string lookups)
    • No PostGIS/spatial operations
  2. insert_polygons is completely irrelevant

    • Inserts into GeoPostcode (with complex geometries)
    • This table is never touched by the test logic
  3. WKT decoding wastes 2.5+ seconds

    • Extracts coordinates that are never used by the query
    • available_providers doesn't look at longitude/latitude
    • Only matches on postcode strings
  4. Test works WITHOUT insert_polygons

    • Currently runs in 2.7s with it commented out
    • Would still work with simpler coordinates

Recommended Optimizations

Quick Win: Use Fixed Coordinates (saves ~2.5 seconds)

Replace WKT decoding with simple pre-defined coordinates:

# Instead of:
{long, lat} = Geo.WKT.decode!(geo_shape["shape_geom"]).coordinates
              |> List.flatten()
              |> Enum.random()

# Use:
coords = %{
  "2224" => {151.15, -33.95},
  "2225" => {151.25, -33.95},
  "2226" => {151.20, -34.00},
  # ... etc
}
{long, lat} = coords[postcode]

Expected improvement: 2.7s → ~0.2s (92% faster!)

Alternative: Remove Coordinates Entirely (saves ~2.5 seconds)

Since available_providers doesn't use coordinates:

# Don't even set longitude/latitude - just use postcodes
OperationsFactory.insert(:provider_node,
  provider: other,
  locality: "Newcastle",
  postcode: "2225"
  # No longitude/latitude needed!
)

Expected improvement: 2.7s → ~0.2s (92% faster!)

Risk: If provider_node factory requires coordinates, use defaults:

longitude: 0.0,
latitude: 0.0

Action Items

  1. Remove WKT decoding - Use pre-defined coordinate map or defaults
  2. Verify insert_polygons is unused - Confirm GeoPostcode table is never queried
  3. Update test - Simplify to only create data that's actually used
  4. Document pattern - Share this finding for similar tests

Files to Update

  1. apps/admin/test/admin_web/integration/service_region_priority_test.exs:134
    • Replace WKT decoding with coordinate map
    • Remove dependency on CSV files
    • Expected: 2.7s → 0.2s

Questions Answered

Q: Does insert_polygons play a role in available_providers? A: NO. The function only does string-based postcode matching. It never queries the GeoPostcode table with PostGIS geometries.

Q: Why does the test take 2.7 seconds? A: Because it's doing expensive WKT geometry decoding (~25-30ms × 7 postcodes = ~200ms) plus stream processing overhead and database inserts (~2.3s total). The actual function being tested takes 6.75ms.

Q: Can we optimize this without changing behavior? A: YES. Replace WKT decoding with a simple coordinate lookup map. The test logic doesn't depend on actual geographic data - just postcode strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment