Skip to main content
AI Agents Guide
Menu

How We Test and Rank AI Automation Platforms

Transparency matters. Most "best automation tools" articles are written by vendor marketing teams or thin affiliate sites that have never used the products they recommend. We take a different approach.

Our Six Evaluation Criteria

Every platform is scored on six dimensions, weighted equally (≈16.67% each, summing to 100%). Each dimension is rated 0–10 from hands-on testing; the overall score is the weighted mean, displayed as N.N / 5 on each review page. We do not publish whole-point scores by default — that signals lazy estimation.

  1. Setup ease — How long does it take to go from sign-up to a working automation? We measure time-to-first-automation for a standard workflow: "When a new row appears in Google Sheets, create a contact in HubSpot and send a Slack notification." Platforms that require documentation reading or community forum searches lose points.
  2. Integration depth — We don't just count integrations. We test how deeply each platform connects with HubSpot, Gmail, Slack, and Google Sheets — covering triggers, actions, search operations, and data mapping quality. A platform with 500 deep integrations scores better than one with 5,000 shallow ones.
  3. AI capability — Does the platform support AI as a core feature or as an afterthought? We test AI classification, text generation, and decision-making within workflows. Platforms with agent orchestration, RAG, and memory score highest.
  4. Cost at scale — We model costs at 1,000, 10,000, and 100,000 monthly executions. Per-task pricing, credit-based pricing, and self-hosted options are compared on equivalent workloads. Hidden costs (overages, premium integrations, per-user fees) are factored in.
  5. Reliability — Do automations run consistently? How does the platform handle API failures, rate limits, and data format changes? We deliberately introduce errors to test retry logic and alerting.
  6. Documentation — Is the documentation accurate, up-to-date, and useful for solving real problems? We use documentation as our primary resource during testing and note every instance where we need to resort to community forums or support.

How the Rubric Renders

Every brand review on this site embeds a Scoring Rubric block near the top of the page, showing the per-dimension score, a one-sentence note on what was tested, and the date of the most recent benchmark run. The same scores are emitted as reviewAspect[] entries in the page's Review JSON-LD so AI Overviews, Perplexity, and ChatGPT can cite specific dimensions, not just the overall.

If you see a "Hands-on rescoring in progress" notice instead, it means we have not yet run the benchmark on that platform — the editorial verdict above the notice is qualitative until the numerical score lands. We never publish synthetic scores; an empty rubric is honest.

The rubric is reproducible: the data table lives in the public source code at src/data/scoring.ts, and the rendering component at src/components/ScoringRubric.astro. Anyone auditing our methodology can read both.

Re-Scoring Schedule

Each platform is re-scored on the full rubric every 90 days, or sooner if the platform ships a major update that materially changes one or more dimensions (a price change, a new AI capability, a reliability incident). The Tested date on the rubric block reflects the most recent full benchmark run, not the most recent page edit.

Testing Environment

All platforms are tested in a standardized environment:

  • Test accounts created fresh for each evaluation cycle
  • Same set of connected apps (HubSpot CRM, Gmail, Slack, Google Sheets, Notion)
  • Same test workflows executed on each platform
  • Testing performed on both desktop (Chrome) and mobile (iOS Safari)
  • Self-hosted platforms tested on a standard Ubuntu 22.04 VPS with 2GB RAM

Third-Party Data

We supplement our hands-on testing with third-party review data from:

  • G2 — Enterprise-focused software reviews; scores and review counts cited per platform
  • Trustpilot — Consumer-facing reviews; particularly relevant for platforms with significant user dissatisfaction (e.g., Zapier's 1.4/5 score)
  • Product Hunt — Launch reception and community sentiment; cited where available

We cite unflattering third-party data explicitly. If a platform has a low Trustpilot score, we report it. Editorial credibility requires showing the full picture, not just the favorable data points.

Affiliate Relationship Policy

We participate in affiliate programs for 11 of the 15 platforms we review. This means we earn commissions when readers sign up through our links. However:

  • Rankings are never influenced by affiliate commissions. Zapier has no affiliate program, yet we review it thoroughly. Platforms with high commissions do not receive higher rankings.
  • Affiliate links are clearly marked with the ↗ symbol and "affiliate link" disclosure.
  • Every page with affiliate links includes an above-fold disclosure explaining our relationship.
  • We never suppress negative information to protect an affiliate relationship. If a platform has problems, we report them.

Update Schedule

All platform reviews are updated every 90 days, or sooner if a major platform update ships. Each update includes:

  • Re-verification of pricing and tier details against official sources
  • G2 and Trustpilot score refresh
  • Feature list verification
  • Date stamp update on the page

The "Last Updated" badge on each page reflects the most recent verification date.

How Platforms Are Added or Removed

We monitor the AI automation market continuously. New platforms are added to our evaluation when they meet three criteria:

  1. The platform has been publicly available for at least 6 months
  2. It has at least 5 independent reviews on G2, Trustpilot, or Product Hunt
  3. It offers differentiation from existing platforms in our database (not a white-label or clone)

Platforms are removed if they shut down, pivot away from automation, or fail to maintain basic reliability over two consecutive review cycles.

Contact

Found an error? Have a platform you think we should review? Contact us at [email protected]. We read every email and correct factual errors within 48 hours.

Our Top Pick: Make.com

Try Free ↗