SEPTEMBER 2025: Detailed has been acquired by Ahrefs (& I've joined the company) πŸŽ‰

AI Search Response Consistency | Updated hourly

#
#
#
#
#
#
#
#
#
#
PRESS

February Report: How consistent are AI responses for the exact same query?

I asked ChatGPT the same five questions 100+ times each, then recorded how consistent the answers are.

Brand Consistency ? How many recommended brands are consistent across answers for the same query.
86%
Brand Match ? How many responses (for the same query) contain the exact same brands.
23%
Brand & Order Match ? How many responses (for the same query) contain the same brands in the exact same order.
7%
Rank Consistency ? How consistently brands are ranked in similar positions across answers.
43%
Sentiment Consistency ? How the sentiment of a brands response continues across the same query asked 100+ times.
88%

Brand Consistency Across Responses for the Same Query

Loading...

86%
When we repeatedly ask AI the same question, we look at how many of the brands it recommends are consistent across answers. If we find 7 brands in one response that also appear in another response with 10 brands (for the same query) then the brand consistency would be 70%. This approach is then taken across all responses. Currently, recommended brands are highly consistent across responses.

Responses Where Brands Exactly Match, in Any Order

Loading...

--%

Responses Where Brands And Their Order Match Exactly

Loading...

--%
As I write this (the page will keep updating), I'm happy I tracked multiple categories here as it's clear some are far more consistent here than others. 'Accounting software' is massively helping to pull up the numbers. Some categories don't have a single instance of the same brands (in the same order) showing again.

Brand "Ranking" Position Consistency

How consistently do brands appear in the same position?

Brand Appearances Avg Pos Most Common Consistency Position Distribution
Loading...

Consistency = % of appearances at most common position. Dot size = frequency at that position.

While some rankings are surprisingly consistent, it's fair to say that true AI "position data" is not something to take seriously. This shouldn't be a surprise to anyone who has been tracking LLM responses for a while. I'm biased, but I still thought it was cool to have this kind of position visual.

Average Number of Brands Per Response

Loading...

--

Average Number of URLs Per Response

Loading...

--

Sentiment Scores By Brand

Commentary next to each brand mention was monitored for its sentiment over time.

Brand Avg Score Consistency Score Trend
Logitech G335
8.6 90%
Quickbooks
8.5 89%
Zoho Books
8.1 90%
Monday
8.1 87%
Rogue SR-2
8.0 86%
Siegel + Gale
7.3 88%
I checked the text associated with each brand recommendation to see how consistent it was over time. The text was extracted manually (hence why there's only one example per brand), but the sentiment was calculated with AI as an average of multiple checks. There's a little more insight on this in the FAQ. I'll keep adding more examples to this list, but AI is unsurprisingly positive about its recommendations, which makes consistency here more likely.

Questions I Thought You Might Have

This page is new so go easy on me, but hopefully I've covered your questions below.

What was the inspiration behind this tool?

I've been tracking and analyzing AI search responses for over a year now, and it's a big part of my focus at Ahrefs where the team tracks hundreds of millions of prompts.

That said, full credit for the inspiration to create this page goes to Rand Fishkin, who shared a detailed guide on the Sparktoro blog about how consistent responses are.

Near the end of his article, Rand said "More data is needed. More people should look into these questions." I wanted some more answers for myself, so decided to start my own live report on the topic.

Won't answers naturally change, and impact consistency anyways?

Yes! Answers will change based on model updates, changes to data sources (such as Google updates) and more.

The current data is only looking at responses over the last few days, to try and focus on individual response consistency.

That said, going forward I should either check far more results over the course of a day, or separate data based on short time periods (such as daily, or weekly).

How do sentiment scores work?

There are lots of different APIs to assign sentiment to text.

Here I kept things simple.

I took the commentary associated with each brand and ran it through OpenAI's GPT 5.2 model, providing custom instructions on how to score responses.

I checked each snippet three times and generated an average score, then repeated that across dozens of brand mentions.

Are you using the OpenAI API or something else?

I'm not using the OpenAI API, as the results are too different to what you get with the web interface.

I hope that one day their chat models will be much closer to the "real" thing.

For each query I start a new conversation in the web interface, from the US πŸ‡ΊπŸ‡Έ.

I check results just a few times per hour, as I don't think there's a need to go overboard running this analysis. I'll likely slow this down much further now that the page is live and we have a lot of responses already.

How are you handling brand name variations?

I'm merging them πŸ™ƒ

If one response says 'HubSpot' and one response says 'HubSpot CRM', I count them both as HubSpot. I do the same for Monday, Monday CRM and Monday.com.

In the past I built a system to help me manage these for other projects. I approve all variations manually, and skip those which aren't accurate. That might sound like a lot of work, but it's easy when they're close matches.

This process might help increase the number of brand consistency matches, but I'm careful not to merge anything which is not a correct match. The Nike Fundamental weighted rope and Nike Fundamental jump rope are not the same product.

Finally, I'm not counting brand mentions when they're mentioned in a sentence with other companies. It must be a separate recommendation.

What's the plan for this page going forward?

If there's enough interest, I'm happy to keep this page running, while being conscious of not going overboard with the tracking.

This page is purposefully only looking at one query per category, and responses in one platform (ChatGPT). I'm happy to add more sources, like Perplexity and Google's AI Mode, going forward.

If you have any comments or feature requests, please send an email to [email protected]

"Think of us like the Bloomberg of SEO."

Exclusive insights from tracking the rankings & revenue of 3,078 digital goliaths.

    "Glen found a very sneaky technical SEO issue on our homepage. Sometimes a fresh set of eyes goes a long way."

    BILL KING

    "Glen's recommendations helped us improve crawl budget, remove deadweight pages and led to overall improvements in organic traffic to our key pages."

    STEVE TOTH

    "I've been a practitioner of digital marketing for over a decade and I've learned more from Glen about SEO than anyone else."

    CLAY COLLINS