Standards & Specs

llms.txt Explained: The New Robots.txt for AI Crawlers (Do You Actually Need One?)

What llms.txt actually is, which AI crawlers respect it (very few in 2026), and what robots.txt directives - GPTBot, ClaudeBot, OAI-SearchBot - actually control. Honest explainer.

Updated May 20265 min read

If you have heard "you need to add an llms.txt file to your site so AI crawlers know what to do" - that advice is half right and half marketing. The file exists. The proposal is real. The actual behavior of AI crawlers in response to llms.txt in 2026 is meaningfully different from what most pitches suggest.

This article explains what llms.txt actually is, which AI assistants respect it, what currently controls AI crawler access (it is not llms.txt), and whether you should add one to your site.

What llms.txt Actually Is

llms.txt was proposed by Jeremy Howard - the co-founder of fast.ai and Answer.AI - in September 2024 as a standard for helping large language models access and use website content efficiently. The proposal is straightforward: a Markdown-formatted file placed at the root of a website that summarizes the site's content, links to its most important pages, and provides context that an AI model could use to understand the site without crawling every page.

The original proposal also includes an extended variant: llms-full.txt, which is intended to be a single consolidated dump of the site's actual content - a one-file representation of the site that an AI assistant could ingest in a single retrieval rather than crawling individual pages.

The proposal lives at llmstxt.org and has been adopted by a meaningful number of sites - particularly developer-tool documentation sites, AI-tool companies, and SaaS products where the file maps neatly to a logical content structure.

That much is real and uncontroversial.

What llms.txt Is Not

llms.txt is not a directive file in the way robots.txt is. It does not currently control AI crawler access. It does not currently determine which AI assistants can or cannot read a site. It is a courtesy structure - a way to expose site content in a format AI models could theoretically use efficiently if they chose to.

No major AI platform - OpenAI, Anthropic, Google, Microsoft - has committed to treating llms.txt as a primary input to their training or retrieval systems as of mid-2026. Some platforms may opportunistically use llms.txt when present, but it is not a guaranteed access control or a guaranteed signal pathway.

This is the gap between the marketing claim ("add llms.txt and AI assistants will use your site better") and the reality ("AI assistants might use it; mostly they will not in 2026").

What Actually Controls AI Crawler Access in 2026

The mechanism that actually controls whether and how AI crawlers can access a site is robots.txt - the same mechanism that has controlled search engine crawler access for two decades. The major AI companies have published specific user-agent strings that respect robots.txt directives.

The current set of significant AI crawler user-agents includes:

A site that wants to block these crawlers blocks them in robots.txt with explicit user-agent directives. A site that wants to allow them does the same. The mechanism is robots.txt, not llms.txt.

Example robots.txt for AI Crawler Control

For a site that wants to allow most AI crawlers but block training-data scraping while allowing on-demand citation crawling:

``` User-agent: GPTBot Disallow: /

User-agent: Google-Extended Disallow: /

User-agent: anthropic-ai Disallow: /

User-agent: ClaudeBot Disallow: /

User-agent: OAI-SearchBot Allow: /

User-agent: ChatGPT-User Allow: /

User-agent: PerplexityBot Disallow: /

User-agent: Perplexity-User Allow: /

User-agent: * Allow: / ```

This is an illustrative configuration - the right answer depends on the site's stance on AI training versus AI search citation. The point is that the mechanism is real, documented, and respected. llms.txt is not the right place to make these decisions.

The Difference Between llms.txt and llms-full.txt

Worth a separate note because the two variants serve different purposes.

llms.txt is the summary index. It is a Markdown file with an overview of the site, links to key pages, and short descriptions. It is intended to help an AI model understand what the site is and where to find specific information.

llms-full.txt is the consolidated content dump. It is the entire site's meaningful content in a single file, formatted for AI ingestion. The intent is that an AI model could read llms-full.txt once and have a comprehensive understanding of the site without crawling individual pages.

For most sites, llms.txt is straightforward to produce. llms-full.txt is more involved - it requires generating and maintaining a synthesized representation of all important content, which becomes a non-trivial maintenance burden for content-heavy sites.

Whether You Should Add llms.txt to Your Site

The honest answer: yes, but with calibrated expectations.

The case for adding one:

The case against expecting transformative results:

A reasonable framing: add llms.txt because it is low-cost and forward-looking, but do not expect it to be the lever that moves AI search visibility. The actual levers live elsewhere.

What Actually Moves AI Search Visibility

Since llms.txt is not it, what is?

Entity completeness across the open web. Wikipedia, Wikidata, LinkedIn, industry directories, consistent sameAs schema markup. AI assistants cite entities they recognize as distinct and well-established.

Citation infrastructure. Press coverage, authoritative outbound links to the brand's content, Wikipedia citations, peer-industry mentions. AI assistants follow citation graphs that other authoritative sources have already built.

Content geometry. The primary answer in the first 100 words of an article, complete-sentence answers to each H2's implied question, statistics with date and source within two sentences, 15+ named entities per page. This is what AEO and GEO optimization actually look like.

robots.txt directives for the AI crawler user-agents listed above. This is the real access-control layer.

Schema.org markup at the page level. Article, Person, Organization, FAQPage, BreadcrumbList - this is the structured data layer AI assistants actually use to understand pages.

llms.txt is a courtesy file that lives alongside this real infrastructure. It is not a substitute.

Bottom Line

Add llms.txt to your site if you want. The cost is low and the future-option value is real. Do not believe the pitches that frame it as the missing piece for AI search visibility. The actual missing pieces - if you have AI visibility gaps - are almost always elsewhere: in your entity layer, your citation infrastructure, your content structure, and your robots.txt configuration.

llms.txt is a reasonable addition to a well-built site. It is not a fix for a poorly-positioned brand. Anyone telling you otherwise is selling something.

Want to Know Where You Stand in AI Search?

Get a free AI visibility audit from 10X Search. We'll show you exactly where your brand appears (and doesn't) across ChatGPT, Perplexity, Gemini, and Google AI Overviews.

Get My Free AI Visibility Audit

Get the AI Search Insider Breakdown Monthly

Guides, data, and tool updates for AI search visibility — sent once a month, no spam.

No spam. Unsubscribe anytime. Produced by the team at 10X Search.