How we measure impact
We tested a range of questions across multiple companies to measure how Agent Fuel intelligence files improve AI output. Here's what we found and how we got there.
Test parameters
Claude Sonnet
0
TBD
TBD
SaaS, E-commerce, Fintech, Healthcare
What we measure
Factual Accuracy
The percentage of claims in AI output that can be verified against real-world data sources. We manually fact-check each response against public records, industry reports, and verified databases.
Hallucination Rate
The frequency of fabricated statistics, fake citations, or invented company details in AI output. Lower is better — we measure the reduction when using Agent Fuel context.
Actionability Score
A human-rated score (1-5) measuring whether the AI output contains specific, implementable recommendations vs. generic advice. Scored by domain experts blind to condition.
Response Specificity
The density of concrete data points (numbers, dates, company names, source citations) per response. More specific responses indicate better contextual grounding.
Results
Limitations
These results are preliminary and based on a limited sample set. We're transparent about what we don't yet know:
- Sample size is currently small — we plan to expand testing across more verticals and company stages.
- Actionability scoring is subjective, though we use multiple blind reviewers to reduce bias.
- Results may vary depending on the AI model used, prompt structure, and specific use case.
- We have not yet tested with all major LLM providers — current results are based on Claude Sonnet.
We'll update this page as we collect more data and refine our methodology.