Prompt

A/B Test Hypothesis Generator

ab testinggrowthexperimentation

A/B Test Hypothesis Generator: 10 Tests Worth Running

A backlog of "change the button color" experiments is how growth teams stall. This AI tool starts from the real drop-off in your funnel and generates 10 hypotheses across the levers that matter — copy, hierarchy, social proof, pricing, form design, CTA — each structured for clean measurement.

Full Prompt
Generate a ranked list of 10 A/B test hypotheses for a specific page or flow, each structured so a growth team can prioritize, run, and learn from it cleanly.

HYPOTHESIS METHODOLOGY (follow in order):

1. Diagnose the Drop-Off
   Goal: Test where the friction actually is.
   - Restate the page or flow and the conversion goal.
   - From the data provided, identify the steepest drop-off or biggest friction point.
   - Note any qualitative signal (heatmaps, session recordings, support tickets).

2. Generate Across Levers
   Cover at least 5 of these levers:
   - Headline / hero copy
   - Hierarchy and information architecture
   - Social proof (placement, type, volume)
   - Pricing presentation and anchoring
   - Form design (fields, steps, defaults)
   - Visual proof (screenshots, video, demo)
   - CTA copy and placement
   - Loading and perceived performance

3. Structure Each Hypothesis
   Use the format:
   - We believe [change] will cause [metric] to [direction] because [reason grounded in user behavior].
   - We'll know it worked if [primary metric] moves by [magnitude] over [duration / sample size].

4. Rank by Impact and Effort
   For each:
   - Expected impact (Low / Med / High)
   - Effort to build (Low / Med / High)
   - Confidence based on existing evidence (Low / Med / High)
   - Pick the top 3 by ICE score.

OUTPUT CONSTRAINTS:
- Return exactly 10 hypotheses.
- Every hypothesis ties to a specific behavior or data point — no random "change the button to red."
- Highlight the top 3 in a separate block with the recommended test order.
- Flag any test that needs more traffic than the page can realistically deliver.

---

MY INFO:

Page or Flow (required): [URL or description]

Primary Conversion Metric (required):

Current Conversion Rate and Volume (required):

What You've Already Tested (optional):

Qualitative Signal (optional): [heatmaps, session recordings, complaints]

What You Get

  • 10 hypotheses across at least 5 different levers
  • A standard format — "We believe X will cause Y because Z; we'll know it worked if..."
  • An ICE score (Impact, Confidence, Effort) for each
  • The top 3 in recommended run order with rationale

Why It Works

Every hypothesis ties to a specific behavior or data point — random ideas get rejected. The format forces a primary metric, expected magnitude, and a duration or sample size, so the test produces a real answer rather than another inconclusive result. Tests that would need more traffic than the page realistically delivers get flagged before they consume a quarter.

Best Practices

  1. Show the data: A current conversion rate and traffic volume changes which tests are even viable.
  2. Bring qualitative signal: A heatmap or session recording beats imagination.
  3. Don't test five things at once: One change per arm; one primary metric per test.
  4. Power-check it: Underpowered tests aren't tests — they're guesses with extra steps.

Run the tests that move the metric and skip the ones that move the meeting.