Goal: Given (a) Amazon credentials and (b) a product URL, return the real-time price — while staying fully legal and avoiding “suspicious login” alerts.
Below is a detailed design. I’ll first prefer no-login paths (legal + easy) and then, only if the business insists on account context, I’ll show a session-safe, device-sticky approach that minimizes alerts.
- We may or may not have PA-API (Product Advertising API) access. If yes, we’ll use it.
- If PA-API is not an option, we attempt public, non-logged-in product pages (no credentials).
- Only if someone absolutely requires user-specific price/context (e.g., Prime-only pricing) do we support account sessions — but with strict guardrails to avoid suspicious-login triggers.
- If account access is required, we use session handoff (cookie/token import) with user consent.
flowchart LR
U[Client / Caller] -- Product URL --> G[Gateway/API]
subgraph C[Coordinator]
R[Rules: choose path] -->|PA-API ok?| P[PA-API Fetcher]
R -->|Public page ok?| S[Public Page Fetcher]
R -->|Account context required| B[Session Browser Runner]
end
G --> R
P --> N[Normalizer]
S --> N
B --> N
N --> V[Validator & Policy]
V --> G
G --> U
subgraph Sec[Secrets & Compliance]
KMS[Key mgmt]:::sec
Vault[Session Vault]:::sec
Audit[Audit Log redacted]:::sec
end
classDef sec fill:#f6f8fa,stroke:#c9d1d9,color:#24292e;
Routing logic:
- PA-API (if allowed) - best path
- Public page (no login) - legal, no alerts
- Account session (last resort) - device-sticky, region-consistent, and throttled
Why this first: Official, ToS-compliant, no consumer login, lowest risk.
sequenceDiagram
participant Client
participant API Gateway
participant Router
participant PA as PA-API Fetcher
participant N as Normalizer
participant V as Validator/Policy
Client->>API Gateway: GET /price?url=<product-url>
API Gateway->>Router: Route request
Router->>PA: Build request (ASIN, locale, fields)
PA-->>Router: Offer/price payload
Router->>N: Normalize (unify fields)
N-->>V: Price object
V-->>API Gateway: OK (price, currency, freshness)
API Gateway-->>Client: 200 (JSON)
- ASIN extraction: Parse from URL; if missing, resolve via product page parsing (no login, one fetch).
- Region mapping: Derive marketplace (
amazon.com,.in,.de, etc.) from URL. - Data model:
price.value,currency,seller/offerId,buyBoxFlag,freshness_ts,source = pa-api. - Rate limiting: Respect PA-API quotas; apply adaptive backoff; surface
429with retry-after. - Latency: Cache “last known price” for ≤30s with ETag/If-None-Match behavior to bound tail latency.
Failure modes handled:
- PA-API quota exhausted - In this case we fall back to public page (if acceptable).
- PA-API missing specific price (edge cases) - Return
reason_code = "price_unavailable_paapi"and (optionally) try public page.
Why this second: Still legal; no credentials; no suspicious-login risk.
sequenceDiagram
participant Client
participant API Gateway
participant Router
participant Pub as Public Page Fetcher
participant N as Normalizer
participant V as Validator/Policy
Client->>API Gateway: GET /price?url=<product-url>
API Gateway->>Router: Route request
Router->>Pub: Fetch HTML (respect robots/legal constraints)
Pub-->>Router: Parsed price (LD-JSON, offerbox, fallback)
Router->>N: Normalize
N-->>V: Validate (currency, numeric)
V-->>API Gateway: OK
API Gateway-->>Client: price JSON
- Prefer structured data (LD-JSON) and offerbox DOM; minimal requests (1–2).
- Human headers: realistic UA, Accept-Language, HTTP2; no aggressive crawling.
- Pacing: single fetch per request, with jitter and connection reuse.
- No CAPTCHA solving; if encountered, we stop and return
reason_code = captcha_block.
Only if someone insists on an account context. This is where “avoiding suspicious login” matters.
- Session handoff: User logs into Amazon on their device and exports session cookies (short-lived).
- We accept/import only session tokens via a secure client flow.
- Tokens are encrypted at rest per user and rotated; TTLs are short.
flowchart TB
subgraph Profile["Per-User Headless Profile (long-lived)"]
UA[User-Agent, CH-UA]
TZ[Timezone/Locale/Langs]
F[Fonts/Canvas/WebGL]
LS[Cookies/Storage]
end
subgraph Egress["Location-aware Egress"]
IP[Sticky Residential/ISP IP]
City[Region near user's city]
Hours[Run during normal local hours]
end
Session[Session Cookies TTL] --> Profile
Profile --> Runner[Headless Browser]
Egress --> Runner
Runner --> Price[Fetch product page → extract price]
I would treat the browser like a single, familiar device that belongs to the user and I would keep it that way.
On the first successful run I would freeze the profile: same user-agent and Sec-CH-UA*, same timezone (e.g., Asia/Kolkata if that’s what the account would normally use), same language prefs (en-IN,en;q=0.9), same fonts, the same Canvas/WebGL fingerprint, and I would persist cookies, localStorage, and IndexedDB to a dedicated profile directory.
Every call would reuse this exact profile so Amazon would keep seeing the same laptop, not a rotation of new machines.
Network-wise I would avoid hopping around. I would pin a sticky residential/ISP egress IP in the user’s city/region (Bengaluru, IN-KA, for example), keep the same ASN class, and hold that IP for roughly 30 days. Requests would run during the user’s normal window (08:00–22:00 local) rather than at odd hours. I would keep the transport boring and predictable: HTTP/2 with connection reuse, stable ALPN/JA3, and ordinary Accept/Accept-Encoding headers.
Navigation would be deliberately human. I would open the home page first and then move to the product page instead of deep-linking cold. Between steps I would add small randomized waits in the 300–900 ms range. I would avoid parallel tabs and keep the pace conservative—about 6–8 lookups per hour per account unless a longer, clean history would justify more. Timeouts would be tight but fair: 1 s connect, 1 s headers, 2 s body, and if there were a transient or a quota hit I would serve a last-known price that is ≤ 30 s old with an explicit freshness timestamp.
I would never handle raw passwords. If account context were absolutely required, the user would provide a short-lived cookie jar through a secure client flow. I would envelope-encrypt that at rest, set a TTL (≤ 7 days), and restrict everything to read-only GETs. Cart, address, and payment endpoints would be hard-blocked. A first run might trigger OTP/MFA; I would surface that once in a secure prompt, the user would complete it, and I would persist the trusted-device state so it wouldn’t keep happening.
I would also back off quickly if anything looked wrong. If I detected a country change, an ASN-class flip (residential ↔ cloud), or a device-fingerprint drift (UA/timezone/canvas/fonts), I would stop and ask the user to re-establish the session. The same would apply if CAPTCHA or MFA appeared twice in 24 hours—I wouldn’t brute-force challenges. When a clean fetch wouldn’t be possible, I would return a clear reason code (for example, captcha_block, session_mfa_required, new_region_detected) instead of guessing.
The same device, in the same place, with the same behavior, at sane times, under strict read-only boundaries. That is how I would fetch the current price without tripping “suspicious login” alarms.
sequenceDiagram
participant Client
participant Vault as Session Vault
participant Browser as Headless Browser (Pinned Profile)
participant Egress as Sticky Egress (User Region)
Client->>Vault: Upload short-lived cookies (encrypted)
Client->>Browser: Request price for <URL>
Browser->>Egress: Route via sticky regional IP
Browser->>Amazon: Navigate (home -> product)
Amazon-->>Browser: Page (+ any MFA/CAPTCHA)
alt MFA/CAPTCHA shown
Browser-->>Client: Present challenge (secure frame)
Client-->>Browser: User completes challenge
end
Browser-->>Client: Price JSON (source=session)
- Rotating IPs every call: This would give the impression of a hijack.
- I'd run at local times. Example a run at 3 a.m. local is supicious if the users usually shops at 6 p.m.
- No hammering requests in parallel. I'd keep one active session per account.
- I'd not want to bypass CAPTCHA/MFA programmatically. always user-solve.
stateDiagram-v2
[*] --> Evaluating
Evaluating --> Abort : new_region || new_device || MFA_loop
Evaluating --> Proceed : region_ok && device_ok && within_hours
Proceed --> Throttle : rate_exceeds_budget
Throttle --> Proceed : backoff_complete
Proceed --> [*]
Abort --> [*]
Inputs: last successful region/IP, device fingerprint, user’s local hours, per-minute request budget. Actions: throttle, require re-auth, or abort with a clear reason code.
-
GET /price?url=<product_url>-
Query selection: Router decides PA-API, public page, or session runner.
-
Response:
{ "asin": "...", "price": {"value": 1299.00, "currency": "INR"}, "source": "pa-api | public | session", "marketplace": "amazon.in", "freshness_ts": "2025-09-27T10:30:00Z", "reason_code": null }
-
-
Unit tests: URL/ASIN parsing, currency normalization, policy decisions.
-
Sandbox tests: PA-API quotas and backoff; public-page parsing on top-100 templates.
-
Session tests:
- First-run MFA flow (human in the loop).
- Device/region stickiness across days.
- Throttle behavior (ensure we never spike).
-
Chaos: force CAPTCHA/MFA and confirm graceful abort + clear reason codes.
| Requirement | Path | Why |
|---|---|---|
| Legal + simple | PA-API | Official, stable, fastest |
| No credentials | Public page | Legal, no suspicious login risk |
| Account-specific price | Session (last resort) | Doable with device/location stickiness + human challenges |