Skip to content

Instantly share code, notes, and snippets.

@sam-artuso
Last active March 14, 2026 17:04
Show Gist options
  • Select an option

  • Save sam-artuso/8a635d2f6a4a4c5324b4409408cee01a to your computer and use it in GitHub Desktop.

Select an option

Save sam-artuso/8a635d2f6a4a4c5324b4409408cee01a to your computer and use it in GitHub Desktop.
Statistically valid Amazon product scoring system using Bayesian Lower Confidence Bound

Amazon Product Rating Scorer

Score and rank Amazon search results using a statistically valid Bayesian system that accounts for average rating, review count, and star distribution.

When to use

When the user asks to score, rank, or compare Amazon products from a search results page open in their browser.

Prerequisites

  • Browser must be open with an Amazon search results page
  • Chrome DevTools MCP server must be connected

Methodology: Bayesian Lower Confidence Bound

The scoring system uses a Dirichlet-Multinomial Bayesian model (same statistical framework as Reddit's "best" ranking and IMDB Top 250).

Why this works

A single formula naturally handles three requirements:

  • Average rating: higher average = higher score
  • Number of reviews: more reviews = lower standard error = score stays close to the true mean
  • Star distribution: tight/consistent distributions have lower variance = higher score; bimodal distributions (lots of 5s and 1s) get penalized

The key insight

A Dirichlet prior of alpha=2 per star level adds 10 "phantom reviews" (2 at each star). With few real reviews, the prior dominates and pulls the score toward 3.0. With hundreds of reviews, the prior is negligible. This is why 2x 5-star reviews score much lower than 100x 5-star reviews.

Formula

For each product:
1. counts[s] = (histogram_pct[s] / 100) * total_reviews + alpha   (for s = 1..5)
2. N_adj = sum(counts)
3. mean = sum(s * counts[s] / N_adj)                               (posterior mean)
4. variance = sum((s - mean)^2 * counts[s] / N_adj)                (posterior variance)
5. SE = sqrt(variance / N_adj)                                     (standard error of the mean)
6. SCORE = mean - 1.65 * SE                                        (90% lower confidence bound)

Parameters:

  • alpha = 2 (prior pseudo-counts per star level)
  • z = 1.65 (90% one-tailed confidence z-score)

Fake Review Detection

After collecting histograms, run a suspicion analysis using 7 statistical signals. Each signal adds to a cumulative suspicion score; products are then classified as HIGH (>=40), MEDIUM (>=20), or LOW (>0) risk.

Signals

  1. Missing middle — organic dissatisfaction spreads across 2, 3, and 4 stars. If those combined are under 10%, it suggests the middle has been artificially hollowed out. Weight: (10 - middle%) * 3
  2. Unnatural 5-star concentration — 95%+ five-star is almost never organic. 85%+ with under 100 reviews is also suspicious. Weight: 40 (>=95%) or 20 (>=85%, <100 reviews)
  3. Zero 1-star with 50+ reviews — statistically improbable; even great products attract the odd unhappy buyer. Weight: 15
  4. No 2-star or 3-star at all (20+ reviews) — real unhappy customers leave a range of negative ratings, not just 1-star. Weight: 20
  5. Low distribution entropy — Shannon entropy measures how spread out ratings are. Fake reviews cluster tightly (low entropy <0.8; organic typically >1.2; max is 2.32). Weight: 25 (<0.8) or 10 (<1.0 with 30+ reviews)
  6. 5-star rate far above category average — compute the weighted average 5-star % across all products in the search. Products 15+ points above that are unusual. Weight: deviation - 10
  7. 1-star cliff — high 1-star (>=8%) but almost no 2-star (<=1%) suggests competitor attack reviews rather than organic dissatisfaction. Weight: 15

Interpreting results

  • HIGH risk: likely manipulated — present these separately, do not recommend
  • MEDIUM risk: worth scrutinising — flag in the ranking table but don't exclude
  • LOW risk: minor anomaly — note but don't penalise
  • CLEAN: no flags triggered

Note: low entropy can also indicate a genuinely excellent or genuinely terrible product. MEDIUM flags on high-volume products (500+ reviews) are less concerning than on low-volume ones. Always recommend the user manually check review text for high-scoring products that flag MEDIUM+.

Step-by-step execution

Step 1: Extract product data from the search results page

Run this JavaScript via evaluate_script on the Amazon search results page:

() => {
  const allDivs = document.querySelectorAll('div[data-asin]');
  const seen = new Set();
  const results = [];
  allDivs.forEach((div) => {
    const asin = div.getAttribute('data-asin');
    if (!asin || asin === '' || seen.has(asin)) return;
    seen.add(asin);
    const titleEl = div.querySelector('h2');
    const ratingEl = div.querySelector('span.a-icon-alt');
    const reviewSpan = div.querySelector('span.a-size-mini.puis-normal-weight-text');
    let reviewCount = null;
    if (reviewSpan) {
      const match = reviewSpan.textContent.trim().match(/\(?([\d,]+)\)?/);
      if (match) reviewCount = parseInt(match[1].replace(/,/g, ''));
    }
    const priceEl = div.querySelector('.a-price .a-offscreen');
    if (titleEl && ratingEl) {
      const ratingText = ratingEl.textContent.trim();
      const ratingMatch = ratingText.match(/([\d.]+)/);
      results.push({
        asin,
        title: titleEl.textContent.trim().substring(0, 100),
        rating: ratingMatch ? parseFloat(ratingMatch[1]) : null,
        reviewCount,
        price: priceEl ? priceEl.textContent.trim() : null
      });
    }
  });
  return results;
}

Step 2: Fetch star distribution histograms

For each product ASIN, fetch its product page and extract the histogram. Batch in groups of ~20 to avoid overwhelming the browser:

async () => {
  const asins = [/* array of ASINs from step 1 */];
  const results = {};
  for (const asin of asins) {
    try {
      const resp = await fetch(`/dp/${asin}/`);
      const html = await resp.text();
      const matches = [...html.matchAll(/aria-label="(\d+) percent of reviews have (\d) star/g)];
      if (matches.length >= 5) {
        const hist = {};
        matches.forEach(m => { hist[parseInt(m[2])] = parseInt(m[1]); });
        results[asin] = hist;
      }
    } catch(e) { }
  }
  return results;
}

Step 3: Detect suspicious review patterns

Run after Step 2. Computes the category-average distribution as a baseline, then checks each product against the 7 signals. Pass the same merged product array.

() => {
  const products = [/* merged data from steps 1 and 2 */];

  // Compute category-average 5-star rate as baseline
  let totalReviews = 0;
  const avgHist = {1:0,2:0,3:0,4:0,5:0};
  products.forEach(p => {
    totalReviews += p.reviews;
    for (let s = 1; s <= 5; s++) avgHist[s] += p.hist[s] * p.reviews;
  });
  for (let s = 1; s <= 5; s++) avgHist[s] /= totalReviews;
  const avgTotal = Object.values(avgHist).reduce((a,b) => a+b, 0);
  for (let s = 1; s <= 5; s++) avgHist[s] = Math.round(avgHist[s] / avgTotal * 100);

  return products.map(p => {
    const signals = [];
    let sus = 0;

    const middle = p.hist[2] + p.hist[3] + p.hist[4];
    if (middle < 10) { signals.push(`Missing middle: ${middle}% in 2-4 stars`); sus += (10 - middle) * 3; }

    if (p.hist[5] >= 95) { signals.push(`${p.hist[5]}% five-star`); sus += 40; }
    else if (p.hist[5] >= 85 && p.reviews < 100) { signals.push(`${p.hist[5]}% five-star with ${p.reviews} reviews`); sus += 20; }

    if (p.hist[1] === 0 && p.reviews >= 50) { signals.push(`Zero 1-star across ${p.reviews} reviews`); sus += 15; }
    if (p.hist[2] === 0 && p.hist[3] === 0 && p.reviews >= 20) { signals.push(`No 2 or 3-star reviews`); sus += 20; }

    let entropy = 0;
    for (let s = 1; s <= 5; s++) { const x = p.hist[s]/100; if (x > 0) entropy -= x * Math.log2(x); }
    if (entropy < 0.8) { signals.push(`Very low entropy: ${entropy.toFixed(2)}`); sus += 25; }
    else if (entropy < 1.0 && p.reviews >= 30) { signals.push(`Low entropy: ${entropy.toFixed(2)}`); sus += 10; }

    const dev = p.hist[5] - avgHist[5];
    if (dev > 15) { signals.push(`5-star rate ${dev}pp above category avg`); sus += dev - 10; }

    if (p.hist[1] >= 8 && p.hist[2] <= 1 && p.reviews >= 30) { signals.push(`1-star cliff: ${p.hist[1]}% vs ${p.hist[2]}% two-star`); sus += 15; }

    return {
      asin: p.asin, title: p.title, reviews: p.reviews,
      entropy: parseFloat(entropy.toFixed(2)),
      suspicionScore: sus, signals,
      risk: sus >= 40 ? 'HIGH' : sus >= 20 ? 'MEDIUM' : sus > 0 ? 'LOW' : 'CLEAN'
    };
  }).filter(p => p.suspicionScore > 0).sort((a, b) => b.suspicionScore - a.suspicionScore);
}

Step 4: Compute scores and rank

() => {
  const products = [/* merged data from steps 1 and 2 */];
  const ALPHA = 2;
  const Z = 1.65;

  const scored = products.map(p => {
    const n = p.reviews || 1;
    const counts = {};
    for (let s = 1; s <= 5; s++) {
      counts[s] = (p.hist[s] / 100) * n + ALPHA;
    }
    const totalAdj = Object.values(counts).reduce((a, b) => a + b, 0);
    let mean = 0;
    for (let s = 1; s <= 5; s++) mean += s * (counts[s] / totalAdj);
    let variance = 0;
    for (let s = 1; s <= 5; s++) variance += (s - mean) ** 2 * (counts[s] / totalAdj);
    const se = Math.sqrt(variance / totalAdj);
    const score = mean - Z * se;
    const satisfaction = ((counts[4] + counts[5]) / totalAdj) * 100;
    return {
      title: p.title, price: p.price, reviews: n,
      avgRating: p.rating,
      posteriorMean: Math.round(mean * 100) / 100,
      score: Math.round(score * 1000) / 1000,
      satisfaction: Math.round(satisfaction),
      defectRate: Math.round((counts[1] / totalAdj) * 100)
    };
  });
  scored.sort((a, b) => b.score - a.score);
  return scored;
}

Step 5: Get product links

() => {
  const asins = [/* top ASIN list */];
  const results = {};
  document.querySelectorAll('div[data-asin]').forEach(card => {
    const asin = card.getAttribute('data-asin');
    if (asins.includes(asin) && !results[asin]) {
      for (const a of card.querySelectorAll('a')) {
        const href = a.getAttribute('href') || '';
        if (href.includes('/dp/')) { results[asin] = href; break; }
      }
    }
  });
  return results;
}

Construct full URLs: https://www.amazon.co.uk + the returned href path (strip query params for cleanliness).

Output format

Present results as a ranked table with columns: Rank, Score, Adjusted Avg, Reviews, Satisfaction%, Risk, Price, Product Name.

Always include:

  • Top N overall (by score) — exclude HIGH risk products from recommendations
  • Top N value picks (high score + low price)
  • Flagged products — separate table listing HIGH and MEDIUM risk products with their signals
  • Brief explanation of why high-rated low-review products rank lower
  • Note if any top-ranked products have MEDIUM risk flags, and recommend the user check the review text manually

Notes

  • The aria-label pattern for histogram extraction works on .co.uk and .com — the format is "X percent of reviews have Y star"
  • Products with < 5 matched histogram entries may have a different page layout; skip or flag them
  • Deduplicate by ASIN before scoring (Amazon shows the same product in multiple slots)
  • The review count on the search page sometimes differs from the product page; the search page count is sufficient for scoring
  • This approach works for any Amazon product category
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment