Our Methodology

How We Score and Review AI Agents

Every review on AI Agent Square follows the same structured process — hands-on testing, verified pricing research, and a transparent six-dimension scoring framework. Here's exactly how we do it.

Data analyst reviewing AI agent evaluation scores on a dashboard

Why Methodology Matters

Scores You Can Trust, Built on a Process You Can See

Too many AI agent reviews are based on a 30-minute demo and a vendor's talking points. We built AI Agent Square because enterprise IT buyers told us they needed something different: reviews they could actually rely on when justifying a significant software investment.

Our methodology is designed to be transparent, consistent, and buyer-focused. Every agent is evaluated against the same six dimensions, using the same testing scenarios, by reviewers with real enterprise experience. No shortcuts, no vendor favouritism.

We publish our methodology so you can challenge our conclusions, identify potential blind spots, and calibrate how much weight to give our scores for your specific use case. If you think we got something wrong, contact us with evidence and we will investigate.

See Agents We've Reviewed

The Framework

Six Dimensions, One Score

Every AI agent is evaluated across six dimensions. The Overall Score is a weighted composite — not a simple average — reflecting how enterprise buyers actually prioritise these factors.

25% weight
F

Features & Capabilities

Does the agent actually do what it claims? We test core features against real enterprise use cases, not just the scenarios vendors showcase in demos.

We assess:

  • Breadth and depth of core capabilities
  • Performance on standardised test tasks
  • Reliability and consistency over time
  • AI model quality and output accuracy
  • Unique differentiating features
20% weight
$

Pricing & Value

We go beyond the published pricing page. We verify actual enterprise pricing, model true costs at different scales, and assess whether pricing is transparent and fair.

We assess:

  • Pricing transparency and predictability
  • Total cost of ownership at 100 seats
  • Free tier or trial quality
  • Enterprise pricing vs SMB pricing delta
  • Hidden costs (overages, add-ons, APIs)
20% weight
UX

Ease of Use

How long does it take a new user to get value? We assess onboarding friction, UI clarity, and the complexity of getting from deployment to measurable productivity.

We assess:

  • Time-to-first-value for new users
  • Onboarding flow and documentation quality
  • UI clarity and navigational logic
  • Admin experience and user management
  • Configuration complexity for IT teams
15% weight
SU

Support Quality

When something breaks at 11pm before a board presentation, what happens? We test support channels, escalation paths, SLA adherence, and documentation completeness.

We assess:

  • Response time across support tiers
  • Enterprise SLA availability
  • Documentation depth and search quality
  • Community and third-party resources
  • Dedicated account management (enterprise)
15% weight
IN

Integration Depth

AI agents don't work in isolation. We map out every integration, test the most critical connectors, and assess API quality and security for enterprise deployments.

We assess:

  • Number and quality of native integrations
  • API availability, docs, and rate limits
  • SSO / SAML support
  • Data residency and sovereignty options
  • Compliance certifications (SOC2, GDPR, HIPAA)
5% modifier
OV

Overall Value

A holistic editorial judgment: does this agent deliver meaningful value relative to its price and complexity? This modifier can shift the overall score up or down based on our broader assessment.

We consider:

  • ROI potential for typical enterprise use cases
  • Market position and competitive value
  • Momentum and product velocity
  • Likelihood of successful enterprise deployment

What the Scores Mean

Scores are calibrated against the full population of agents we've reviewed — not an abstract ideal. A 9.0 is genuinely exceptional. Most good enterprise agents score between 7.0 and 8.5.

9.0 – 10 Exceptional Best-in-class across multiple dimensions. Rare. Represents a genuinely category-defining product that enterprise buyers should seriously evaluate.
8.0 – 8.9 Excellent Strong performance across the board with at most minor weaknesses. Confidently recommended for enterprise evaluation.
7.0 – 7.9 Good Solid product with clear strengths and some notable gaps. Recommended for specific use cases; compare carefully against alternatives.
6.0 – 6.9 Average Functional but not differentiated. May suit specific contexts, but buyers should explore alternatives before committing.
5.0 – 5.9 Below Average Significant gaps in one or more key dimensions. Proceed with caution and validate against your specific requirements.
Below 5.0 Not Recommended Critical issues with features, pricing transparency, support, or security that make enterprise deployment risky or inadvisable.

Our Testing Process

01

Agent Selection and Scoping

We prioritise agents based on search demand, enterprise relevance, reader submissions, and category coverage gaps. Before testing begins, we define the specific use cases we'll evaluate the agent against — drawn from real procurement briefs.

02

Account Setup and Onboarding Evaluation

We sign up as a new enterprise customer — no special vendor access or pre-configured demo environments. The onboarding experience ordinary buyers get is the one we review.

03

Structured Use Case Testing

We run each agent through a standardised battery of 20+ tasks relevant to its primary category, plus cross-category scenarios where applicable. Results are logged against defined success criteria. We run each test a minimum of three times to assess consistency.

04

Pricing and Contract Analysis

We verify pricing directly with vendor sales teams, compare against published pricing pages, and model costs for three benchmark organisations: a 10-person startup, a 100-person scale-up, and a 1,000-person enterprise. We flag any discrepancy between advertised and actual pricing.

05

Integration and Security Audit

We map every published integration, test the five most common enterprise connectors (Slack, Salesforce, Microsoft 365, Jira, and one category-specific tool), and review publicly available security documentation including SOC 2 reports, GDPR policies, and data processing agreements.

06

Customer Interviews

Where possible, we interview three or more enterprise customers who have deployed the agent at scale. We seek out customers who have experienced both the product's strengths and its pain points. We do not rely solely on vendor-supplied reference customers.

07

Scoring, Drafting, and Fact-Checking

The lead reviewer scores each of the six dimensions independently before the overall score is calculated. The review draft is then fact-checked by a second team member. We give vendors a 48-hour window to flag factual errors (not editorial conclusions) before publication.

Our Transparency Commitments

Review Dates Are Always Shown

Every review displays the date it was last updated. AI agent products change rapidly — we want you to know exactly how current our information is.

Affiliate Links Are Always Disclosed

Where we include affiliate links to agent signup pages, this is disclosed prominently on every review page. Affiliate relationships never influence our scores or editorial content.

Sponsored Content Is Labeled

Sponsored listings, sponsored reviews, and promoted placements are clearly marked with a "Sponsored" badge. They are structurally separated from organic editorial content.

Corrections Are Acknowledged Publicly

When we publish a factual correction, we note the change on the review page with a date. We do not silently edit reviews without acknowledging the change.

No Pay-to-Play Scores

Vendors cannot pay to change, improve, or remove scores. This is a hard line — no exceptions. If you're ever told otherwise by someone claiming to represent us, please contact us immediately.

Reviewer Experience Is Disclosed

Our About page lists each reviewer's professional background. You can assess whether their experience is relevant to the category they're reviewing.

Methodology FAQ

Can vendors pay to improve their scores?
No. Scores are set exclusively by our editorial team based on hands-on evaluation against our published criteria. Vendors cannot pay to change scores, pros/cons lists, or editorial conclusions. This is a hard editorial policy with no exceptions.
How often are reviews updated?
We revisit our top 15 reviewed agents quarterly and update all reviews whenever a major pricing or feature change is announced. When a product ships a significant update that affects our evaluation, we re-test the affected dimensions and update scores accordingly. Review dates are shown prominently on every review page.
What does a 10/10 score mean? Is it achievable?
Scores are calibrated against the full population of reviewed agents. A 10/10 is effectively unreachable — it would require perfect performance on all six dimensions relative to every other agent we've reviewed. Scores above 9.0 are rare and represent genuinely exceptional products. Most strong enterprise agents score between 7.5 and 8.5.
Do you accept free or discounted access from vendors?
We may accept a free trial or enterprise demo account for the purpose of review — this is standard practice in technology journalism. When we do, it is disclosed in the review. Accepting access for review purposes never influences our editorial conclusions or scores. We also regularly test using accounts paid for with our own funds to validate our findings.
How do you handle it when a vendor disputes your findings?
We welcome vendor feedback on factual errors and investigate all disputes with supporting documentation. If we find we made a factual error, we correct it publicly and acknowledge the change. We do not change editorial conclusions, opinions, or scores based on vendor pressure — only on evidence.

Put It to Work

Browse Reviews Built on This Methodology

Every review on AI Agent Square follows the framework described here. Compare agents head-to-head, filter by category, or start with our most-read reviews.

Compare AI Agents Meet the Team