Apr 25, 2026

How AI Engines Decide What Brands to Recommend, And What You Can Actually Control

The 5 factors that control whether AI engines like ChatGPT and Gemini recommend your brand, from an agency running client GEO campaigns.

Read Time:

~7 min

Author:

Emilis Zabilius

How AI Engines Decide What Brands to Recommend, And What You Can Actually Control

After running GEO campaigns across more than a dozen clients in Lithuania and across Europe, we've mapped exactly what's happening under the hood when an AI engine decides to recommend one brand and ignore another. Some of it is within your control. Some of it isn't. Here's how to focus on the right things.

What "AI recommendation" actually means

When someone asks ChatGPT, Perplexity, or Gemini for a business recommendation - "which marketing agency in Vilnius should I hire?" or "what's the best CRM for a 10-person team?" - the AI isn't browsing the web in real time. It's drawing on what it already knows, what it's been trained on, and in some cases, what it retrieves from indexed sources in that moment.

Your job is to make sure that when an AI engine is building its answer, your brand is the one it has enough information about to confidently include.

The five factors below are what we've found actually move that needle.

Factor 1: Structured data is the backbone, not a nice-to-have

If there's one thing that separates brands getting consistently cited by AI engines from those that aren't, it's structured data.

Schema markup - specifically Organization schema, Service schema, FAQ schema, and Article schema - tells AI crawlers exactly who you are, what you do, where you operate, and what you're authoritative on. The knowsAbout array in your Organization schema is particularly powerful: it's literally a list of topics you're signalling to AI engines that you're an expert in.

LLMs.txt is another lever that most businesses haven't heard of yet. Similar to how robots.txt tells search bots where they can and can't go, LLMs.txt gives AI language models a structured, human-readable summary of your site's content and purpose. Early adopters are getting a genuine edge here while adoption is still low.

The sameAs field in your schema is equally underrated. Linking your Organization schema to your LinkedIn, founders' personal LinkedIn profiles, and social profiles creates a web of connected entity signals. AI engines use these connections to verify that your brand is real, established, and consistent across the web.

Structured data is the backbone. Get this wrong and everything else you do for GEO has a hard ceiling.

Factor 2: Your robots.txt might be blocking AI crawlers right now

This is the first thing we check with every new client. And it shocks us how often we find it.

Several clients came to us after working with other agencies, and their robots.txt files were actively blocking AI crawlers - GPTBot (ChatGPT), Google-Extended (Gemini), PerplexityBot - from accessing their website entirely. Not a single page visible to these tools.

A previous agency had added those rules - possibly to reduce server load, possibly by mistake - and left. The client had no idea. They were spending money on content, on ads, on everything else, and the most powerful recommendation engines in the world couldn't even see their site.

Check your robots.txt right now. Go to yourdomain.com/robots.txt and look for any Disallow rules applied to GPTBot, Google-Extended, PerplexityBot, ClaudeBot, or anthropic-ai. If you see them, remove them. This is a zero-effort fix with immediate impact.

Factor 3: Content length and indexing discipline matter more than most people think

There's a sweet spot for blog content when it comes to GEO: around 1,500 words.

Under 1,000 words and AI engines tend not to treat the content as authoritative enough to cite. Over 2,000 words and you risk diluting the core signal - the content becomes harder for AI models to summarise and cleanly attribute. At around 1,500 words, with a clear structure, specific claims, and a well-defined topic, you hit the right balance between depth and precision.

But content length is only half of it. The other half is indexing discipline.

Every time you publish a new blog post, submit it manually to Google Search Console. Don't wait for Google to crawl it naturally - that can take days or weeks. In GSC, go to URL Inspection, paste the post URL, and hit Request Indexing. This gets your content into Google's index faster, which feeds directly into its availability to Google AI Overviews and other AI-powered surfaces.

We've seen newly published posts appear in AI-generated answers within 72 hours of manual indexing. Waiting for natural crawl can mean waiting months.

Factor 4: Ghost blogs have near-zero GEO value

Here's a hard truth: if you're publishing content that nobody reads, it's not going to help your GEO.

AI engines don't just look at whether content exists - they look at signals that indicate whether that content is trustworthy and worth citing. Traffic is one of those signals. A blog post with consistent readers, engagement, and backlinks tells AI engines this piece of content is considered valuable by humans. A post that's been live for six months with zero traffic tells a different story.

This means your content strategy can't be volume alone. It has to be distribution too. Share every post across LinkedIn, email, and wherever your audience is. The goal isn't just to publish - it's to make sure your content is actually being read, so AI engines treat it as a credible source worth citing.

Write for humans first. Then optimise for AI.

Factor 5: Reddit is one of the most powerful citation domains for AI engines

If you look at where AI engines like Perplexity and ChatGPT pull their citations from, Reddit appears constantly. It's a domain with enormous trust signals: high traffic, high engagement, strict moderation, and years of indexed content. When Reddit users mention your brand authentically, AI engines take notice.

The obvious play is to plant mentions yourself. We've seen it attempted. We don't recommend it. Reddit's moderation has become sophisticated enough to catch fabricated mentions, and accounts get banned fast. The short-term gain isn't worth the risk.

The play that actually works is harder but far more durable: spark real conversations from your actual customers.

We helped one client engineer a moment where their users - not us - started talking about the brand on Reddit. The post generated more than 200 comments. The overwhelming majority included the brand name, the domain URL, and real customer experiences - honest feedback, positive stories, and genuine debate. The result was a dense cluster of authentic brand mentions from real people, on a domain AI engines inherently trust.

That single Reddit thread drove measurable improvements in the client's AI visibility within weeks. It appeared in Perplexity answers. It got cited by ChatGPT when users asked about that product category.

Your goal isn't to be on Reddit. Your goal is to be talked about there by people who actually used your product.

What to do first

If you're starting from zero with GEO, the priority order is:

Check robots.txt - confirm AI crawlers aren't blocked
Implement structured data - Organization, Service, and FAQ schema at minimum
Publish consistent, indexed content - target 1,500 words, submit manually to GSC every time
Drive real traffic to that content - distribution is not optional
Earn third-party mentions - start with existing customers, guide them toward platforms AI engines trust

GEO is not a one-time fix. It's an ongoing process of building the signals that make AI engines trust your brand enough to recommend it. The businesses building those signals now will have a compounding advantage that becomes very hard to close as more competitors catch on.

Frequently Asked Questions

What is Generative Engine Optimization (GEO)?

GEO is the practice of optimizing your online presence so that AI tools like ChatGPT, Perplexity, and Gemini mention and recommend your business when users ask relevant questions. Unlike traditional SEO, which targets search rankings, GEO targets the answers AI models generate directly.

Does structured data actually affect AI recommendations?

Yes. Schema markup, particularly Organization schema with knowsAbout fields, FAQ schema, and Service schema, gives AI crawlers explicit, structured information about who you are and what you're authoritative on. It's one of the highest-leverage GEO investments you can make.

How long does it take to appear in AI recommendations?

Clients who fix robots.txt issues, implement structured data, and publish consistently indexed content typically see measurable changes in AI citation frequency within 6-12 weeks. Third-party mention campaigns, when executed authentically, can produce results faster.

Do I need to be on Reddit to do GEO?

Not necessarily. Reddit is one of the most powerful citation domains for AI engines, but not the only one. Quora, industry publications, LinkedIn, and trusted directories all contribute. Reddit is worth prioritising because of its domain authority and how frequently AI engines reference it.

Can I do GEO myself, or do I need an agency?

The basics - robots.txt check, schema implementation, GSC submission - can be done without an agency. The harder parts - earning third-party mentions, tracking AI visibility month over month, optimising content structure for citation - are where specialist support pays off.

Our help