Skip to content

Instantly share code, notes, and snippets.

@HamdaanAliQuatil
Created September 27, 2025 14:05
Show Gist options
  • Select an option

  • Save HamdaanAliQuatil/83dd14d5039895f24c00104bca2fe1a9 to your computer and use it in GitHub Desktop.

Select an option

Save HamdaanAliQuatil/83dd14d5039895f24c00104bca2fe1a9 to your computer and use it in GitHub Desktop.

Real-time Amazon Price Service

Goal: Given (a) Amazon credentials and (b) a product URL, return the real-time price — while staying fully legal and avoiding “suspicious login” alerts.

Below is a detailed design. I’ll first prefer no-login paths (legal + easy) and then, only if the business insists on account context, I’ll show a session-safe, device-sticky approach that minimizes alerts.


1) Assumptions (explicit)

  • We may or may not have PA-API (Product Advertising API) access. If yes, we’ll use it.
  • If PA-API is not an option, we attempt public, non-logged-in product pages (no credentials).
  • Only if someone absolutely requires user-specific price/context (e.g., Prime-only pricing) do we support account sessions — but with strict guardrails to avoid suspicious-login triggers.
  • If account access is required, we use session handoff (cookie/token import) with user consent.

2) Architecture (high-level)

2.1 Overview

flowchart LR
    U[Client / Caller] -- Product URL --> G[Gateway/API]
    subgraph C[Coordinator]
      R[Rules: choose path] -->|PA-API ok?| P[PA-API Fetcher]
      R -->|Public page ok?| S[Public Page Fetcher]
      R -->|Account context required| B[Session Browser Runner]
    end

    G --> R
    P --> N[Normalizer]
    S --> N
    B --> N
    N --> V[Validator & Policy]
    V --> G
    G --> U

    subgraph Sec[Secrets & Compliance]
      KMS[Key mgmt]:::sec
      Vault[Session Vault]:::sec
      Audit[Audit Log redacted]:::sec
    end

    classDef sec fill:#f6f8fa,stroke:#c9d1d9,color:#24292e;

Loading

Routing logic:

  1. PA-API (if allowed) - best path
  2. Public page (no login) - legal, no alerts
  3. Account session (last resort) - device-sticky, region-consistent, and throttled

3) Primary path A — PA-API (when available)

Why this first: Official, ToS-compliant, no consumer login, lowest risk.

3.1 Flow

sequenceDiagram
    participant Client
    participant API Gateway
    participant Router
    participant PA as PA-API Fetcher
    participant N as Normalizer
    participant V as Validator/Policy

    Client->>API Gateway: GET /price?url=<product-url>
    API Gateway->>Router: Route request
    Router->>PA: Build request (ASIN, locale, fields)
    PA-->>Router: Offer/price payload
    Router->>N: Normalize (unify fields)
    N-->>V: Price object
    V-->>API Gateway: OK (price, currency, freshness)
    API Gateway-->>Client: 200 (JSON)
Loading

3.2 Key details

  • ASIN extraction: Parse from URL; if missing, resolve via product page parsing (no login, one fetch).
  • Region mapping: Derive marketplace (amazon.com, .in, .de, etc.) from URL.
  • Data model: price.value, currency, seller/offerId, buyBoxFlag, freshness_ts, source = pa-api.
  • Rate limiting: Respect PA-API quotas; apply adaptive backoff; surface 429 with retry-after.
  • Latency: Cache “last known price” for ≤30s with ETag/If-None-Match behavior to bound tail latency.

Failure modes handled:

  • PA-API quota exhausted - In this case we fall back to public page (if acceptable).
  • PA-API missing specific price (edge cases) - Return reason_code = "price_unavailable_paapi" and (optionally) try public page.

4) Primary path B — Public page (no login)

Why this second: Still legal; no credentials; no suspicious-login risk.

4.1 Flow

sequenceDiagram
    participant Client
    participant API Gateway
    participant Router
    participant Pub as Public Page Fetcher
    participant N as Normalizer
    participant V as Validator/Policy

    Client->>API Gateway: GET /price?url=<product-url>
    API Gateway->>Router: Route request
    Router->>Pub: Fetch HTML (respect robots/legal constraints)
    Pub-->>Router: Parsed price (LD-JSON, offerbox, fallback)
    Router->>N: Normalize
    N-->>V: Validate (currency, numeric)
    V-->>API Gateway: OK
    API Gateway-->>Client: price JSON
Loading

4.2 Parsing approach (conservative)

  • Prefer structured data (LD-JSON) and offerbox DOM; minimal requests (1–2).
  • Human headers: realistic UA, Accept-Language, HTTP2; no aggressive crawling.
  • Pacing: single fetch per request, with jitter and connection reuse.
  • No CAPTCHA solving; if encountered, we stop and return reason_code = captcha_block.

5) Last-resort path — Account session (credentials provided)

Only if someone insists on an account context. This is where “avoiding suspicious login” matters.

5.1 Session model (no raw passwords)

  • Session handoff: User logs into Amazon on their device and exports session cookies (short-lived).
  • We accept/import only session tokens via a secure client flow.
  • Tokens are encrypted at rest per user and rotated; TTLs are short.

5.2 Device & location stickiness (to look like the user’s normal device)

flowchart TB
  subgraph Profile["Per-User Headless Profile (long-lived)"]
    UA[User-Agent, CH-UA]
    TZ[Timezone/Locale/Langs]
    F[Fonts/Canvas/WebGL]
    LS[Cookies/Storage]
  end

  subgraph Egress["Location-aware Egress"]
    IP[Sticky Residential/ISP IP]
    City[Region near user's city]
    Hours[Run during normal local hours]
  end

  Session[Session Cookies TTL] --> Profile
  Profile --> Runner[Headless Browser]
  Egress --> Runner
  Runner --> Price[Fetch product page → extract price]
Loading

Implementation Details

I would treat the browser like a single, familiar device that belongs to the user and I would keep it that way. On the first successful run I would freeze the profile: same user-agent and Sec-CH-UA*, same timezone (e.g., Asia/Kolkata if that’s what the account would normally use), same language prefs (en-IN,en;q=0.9), same fonts, the same Canvas/WebGL fingerprint, and I would persist cookies, localStorage, and IndexedDB to a dedicated profile directory. Every call would reuse this exact profile so Amazon would keep seeing the same laptop, not a rotation of new machines.

Network-wise I would avoid hopping around. I would pin a sticky residential/ISP egress IP in the user’s city/region (Bengaluru, IN-KA, for example), keep the same ASN class, and hold that IP for roughly 30 days. Requests would run during the user’s normal window (08:00–22:00 local) rather than at odd hours. I would keep the transport boring and predictable: HTTP/2 with connection reuse, stable ALPN/JA3, and ordinary Accept/Accept-Encoding headers.

Navigation would be deliberately human. I would open the home page first and then move to the product page instead of deep-linking cold. Between steps I would add small randomized waits in the 300–900 ms range. I would avoid parallel tabs and keep the pace conservative—about 6–8 lookups per hour per account unless a longer, clean history would justify more. Timeouts would be tight but fair: 1 s connect, 1 s headers, 2 s body, and if there were a transient or a quota hit I would serve a last-known price that is ≤ 30 s old with an explicit freshness timestamp.

I would never handle raw passwords. If account context were absolutely required, the user would provide a short-lived cookie jar through a secure client flow. I would envelope-encrypt that at rest, set a TTL (≤ 7 days), and restrict everything to read-only GETs. Cart, address, and payment endpoints would be hard-blocked. A first run might trigger OTP/MFA; I would surface that once in a secure prompt, the user would complete it, and I would persist the trusted-device state so it wouldn’t keep happening.

I would also back off quickly if anything looked wrong. If I detected a country change, an ASN-class flip (residential ↔ cloud), or a device-fingerprint drift (UA/timezone/canvas/fonts), I would stop and ask the user to re-establish the session. The same would apply if CAPTCHA or MFA appeared twice in 24 hours—I wouldn’t brute-force challenges. When a clean fetch wouldn’t be possible, I would return a clear reason code (for example, captcha_block, session_mfa_required, new_region_detected) instead of guessing.

The same device, in the same place, with the same behavior, at sane times, under strict read-only boundaries. That is how I would fetch the current price without tripping “suspicious login” alarms.

5.3 Session sequence

sequenceDiagram
    participant Client
    participant Vault as Session Vault
    participant Browser as Headless Browser (Pinned Profile)
    participant Egress as Sticky Egress (User Region)

    Client->>Vault: Upload short-lived cookies (encrypted)
    Client->>Browser: Request price for <URL>
    Browser->>Egress: Route via sticky regional IP
    Browser->>Amazon: Navigate (home -> product)
    Amazon-->>Browser: Page (+ any MFA/CAPTCHA)
    alt MFA/CAPTCHA shown
        Browser-->>Client: Present challenge (secure frame)
        Client-->>Browser: User completes challenge
    end
    Browser-->>Client: Price JSON (source=session)
Loading

5.4 What I would categorically wnat to avoid

  • Rotating IPs every call: This would give the impression of a hijack.
  • I'd run at local times. Example a run at 3 a.m. local is supicious if the users usually shops at 6 p.m.
  • No hammering requests in parallel. I'd keep one active session per account.
  • I'd not want to bypass CAPTCHA/MFA programmatically. always user-solve.

6) Suspicious-login prevention — policy engine

stateDiagram-v2
    [*] --> Evaluating
    Evaluating --> Abort : new_region || new_device || MFA_loop
    Evaluating --> Proceed : region_ok && device_ok && within_hours
    Proceed --> Throttle : rate_exceeds_budget
    Throttle --> Proceed : backoff_complete
    Proceed --> [*]
    Abort --> [*]
Loading

Inputs: last successful region/IP, device fingerprint, user’s local hours, per-minute request budget. Actions: throttle, require re-auth, or abort with a clear reason code.


7) API surface (what the caller sees)

  • GET /price?url=<product_url>

    • Query selection: Router decides PA-API, public page, or session runner.

    • Response:

      {
        "asin": "...",
        "price": {"value": 1299.00, "currency": "INR"},
        "source": "pa-api | public | session",
        "marketplace": "amazon.in",
        "freshness_ts": "2025-09-27T10:30:00Z",
        "reason_code": null
      }

8) Testing plan

  • Unit tests: URL/ASIN parsing, currency normalization, policy decisions.

  • Sandbox tests: PA-API quotas and backoff; public-page parsing on top-100 templates.

  • Session tests:

    • First-run MFA flow (human in the loop).
    • Device/region stickiness across days.
    • Throttle behavior (ensure we never spike).
  • Chaos: force CAPTCHA/MFA and confirm graceful abort + clear reason codes.


9) Quick decision matrix of my approach(es)

Requirement Path Why
Legal + simple PA-API Official, stable, fastest
No credentials Public page Legal, no suspicious login risk
Account-specific price Session (last resort) Doable with device/location stickiness + human challenges
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment