[Our data]

How we measure impact

We tested a range of questions across multiple companies to measure how Agent Fuel intelligence files improve AI output. Here's what we found and how we got there.

Test parameters

ModelClaude Sonnet
Temperature0
Companies testedTBD
Questions per companyTBD
VerticalsSaaS, E-commerce, Fintech, Healthcare

What we measure

Factual Accuracy

The percentage of claims in AI output that can be verified against real-world data sources. We manually fact-check each response against public records, industry reports, and verified databases.

Hallucination Rate

The frequency of fabricated statistics, fake citations, or invented company details in AI output. Lower is better — we measure the reduction when using Agent Fuel context.

Actionability Score

A human-rated score (1-5) measuring whether the AI output contains specific, implementable recommendations vs. generic advice. Scored by domain experts blind to condition.

Response Specificity

The density of concrete data points (numbers, dates, company names, source citations) per response. More specific responses indicate better contextual grounding.

Results

Metric	Without Agent Fuel	With Agent Fuel	Change
Factual Accuracy	~47%	~89%	+89%
Hallucination Rate	High	74% fewer	-74%
Actionability Score	1.2/5	3.8/5	3.2x
Data Points/Response	~2	~8.4	4.2x

Limitations

These results are preliminary and based on a limited sample set. We're transparent about what we don't yet know:
Sample size is currently small — we plan to expand testing across more verticals and company stages.
Actionability scoring is subjective, though we use multiple blind reviewers to reduce bias.
Results may vary depending on the AI model used, prompt structure, and specific use case.
We have not yet tested with all major LLM providers — current results are based on Claude Sonnet.
We'll update this page as we collect more data and refine our methodology.