Methodology: How We Test and Rank AI Automation Platforms

Transparency matters. Most "best automation tools" articles are written by vendor marketing teams or thin affiliate sites that have never used the products they recommend. We take a different approach.

Our Six Evaluation Criteria

Every platform is scored on six dimensions, weighted equally (1/6 ≈ 16.7% each, summing to 100%). Each dimension is rated 0–10 — from hands-on benchmarks where complete, and from product documentation and verified third-party data (clearly badged Preliminary) until a platform's benchmark run lands. The overall score is the equal-weighted mean, displayed as N.N / 5 on each review page. We do not publish whole-point scores by default — that signals lazy estimation.

All six criteria carry equal weight; the overall score is their mean. The score data lives in src/data/scoring.ts.

How the Rubric Renders

Every brand review on this site embeds a Scoring Rubric block near the top of the page, showing the per-dimension score, a one-sentence note on what was tested, and the date of the most recent benchmark run. The same scores are emitted as reviewAspect[] entries in the page's Review JSON-LD so AI Overviews, Perplexity, and ChatGPT can cite specific dimensions, not just the overall.

If you see a "Hands-on rescoring in progress" notice instead, it means we have not yet run the benchmark on that platform — the editorial verdict above the notice is qualitative until the numerical score lands. We never publish synthetic scores; an empty rubric is honest.

The rubric is reproducible: the data table lives in the public source code at src/data/scoring.ts, and the rendering component at src/components/ScoringRubric.astro. Anyone auditing our methodology can read both.

Re-Scoring Schedule

Each platform is re-scored on the full rubric every 90 days, or sooner if the platform ships a major update that materially changes one or more dimensions (a price change, a new AI capability, a reliability incident). The Tested date on the rubric block reflects the most recent full benchmark run, not the most recent page edit.

Testing Environment

Hands-on benchmarks are run in a standardized environment:

Test accounts created fresh for each evaluation cycle
Same set of connected apps (HubSpot CRM, Gmail, Slack, Google Sheets, Notion)
Same test workflows executed on each platform
Testing performed on both desktop (Chrome) and mobile (iOS Safari)
Self-hosted platforms tested on a standard Ubuntu 22.04 VPS with 2GB RAM

Third-Party Data

Alongside hands-on benchmarks (added as they are completed), we draw on third-party review data from:

G2 — Enterprise-focused software reviews; scores and review counts cited per platform
Trustpilot — Consumer-facing reviews; particularly relevant for platforms with significant user dissatisfaction (e.g., Zapier's 1.4/5 score)
Product Hunt — Launch reception and community sentiment; cited where available

We cite unflattering third-party data explicitly. If a platform has a low Trustpilot score, we report it. Editorial credibility requires showing the full picture, not just the favorable data points.

Affiliate Relationship Policy

We participate in affiliate programs for 11 of the 15 platforms we review. This means we earn commissions when readers sign up through our links. However:

Rankings are never influenced by affiliate commissions. Zapier has no affiliate program, yet we review it thoroughly. Platforms with high commissions do not receive higher rankings.
Affiliate links are clearly marked with the ↗ symbol and "affiliate link" disclosure.
Every page with affiliate links includes an above-fold disclosure explaining our relationship.
We never suppress negative information to protect an affiliate relationship. If a platform has problems, we report them.

Update Schedule

All platform reviews are updated every 90 days, or sooner if a major platform update ships. Each update includes:

Re-verification of pricing and tier details against official sources
G2 and Trustpilot score refresh
Feature list verification
Date stamp update on the page

The "Last Updated" badge on each page reflects the most recent verification date.

How Platforms Are Added or Removed

We monitor the AI automation market continuously. New platforms are added to our evaluation when they meet three criteria:

The platform has been publicly available for at least 6 months
It has at least 5 independent reviews on G2, Trustpilot, or Product Hunt
It offers differentiation from existing platforms in our database (not a white-label or clone)

Platforms are removed if they shut down, pivot away from automation, or fail to maintain basic reliability over two consecutive review cycles.

Contact

Found an error? Have a platform you think we should review? Contact us at [email protected]. We read every email and correct factual errors within 48 hours.

How We Test and Rank AI Automation Platforms

Our Six Evaluation Criteria

Setup ease

Integration depth

AI capability

Cost at scale

Reliability

Documentation