llms.txt Explained: The New Robots.txt for AI Crawlers (Do You Actually Need One?)

If you have heard "you need to add an llms.txt file to your site so AI crawlers know what to do" - that advice is half right and half marketing. The file exists. The proposal is real. The actual behavior of AI crawlers in response to llms.txt in 2026 is meaningfully different from what most pitches suggest.

This article explains what llms.txt actually is, which AI assistants respect it, what currently controls AI crawler access (it is not llms.txt), and whether you should add one to your site.

What llms.txt Actually Is

llms.txt was proposed by Jeremy Howard - the co-founder of fast.ai and Answer.AI - in September 2024 as a standard for helping large language models access and use website content efficiently. The proposal is straightforward: a Markdown-formatted file placed at the root of a website that summarizes the site's content, links to its most important pages, and provides context that an AI model could use to understand the site without crawling every page.

The original proposal also includes an extended variant: llms-full.txt, which is intended to be a single consolidated dump of the site's actual content - a one-file representation of the site that an AI assistant could ingest in a single retrieval rather than crawling individual pages.

The proposal lives at llmstxt.org and has been adopted by a meaningful number of sites - particularly developer-tool documentation sites, AI-tool companies, and SaaS products where the file maps neatly to a logical content structure.

That much is real and uncontroversial.

What llms.txt Is Not

llms.txt is not a directive file in the way robots.txt is. It does not currently control AI crawler access. It does not currently determine which AI assistants can or cannot read a site. It is a courtesy structure - a way to expose site content in a format AI models could theoretically use efficiently if they chose to.

No major AI platform - OpenAI, Anthropic, Google, Microsoft - has committed to treating llms.txt as a primary input to their training or retrieval systems as of mid-2026. Some platforms may opportunistically use llms.txt when present, but it is not a guaranteed access control or a guaranteed signal pathway.

This is the gap between the marketing claim ("add llms.txt and AI assistants will use your site better") and the reality ("AI assistants might use it; mostly they will not in 2026").

What Actually Controls AI Crawler Access in 2026

The mechanism that actually controls whether and how AI crawlers can access a site is robots.txt - the same mechanism that has controlled search engine crawler access for two decades. The major AI companies have published specific user-agent strings that respect robots.txt directives.

The current set of significant AI crawler user-agents includes:

•GPTBot - OpenAI's training data crawler. Documented and respected.
•OAI-SearchBot - OpenAI's search-related crawler that powers ChatGPT search citations. Distinct from GPTBot.
•ChatGPT-User - OpenAI's on-demand user-initiated retrieval (when a user asks ChatGPT to look at a specific URL).
•ClaudeBot - Anthropic's crawler. Documented.
•anthropic-ai - An additional Anthropic-related user-agent (Anthropic has documented several over time; verify current documentation for the active set).
•Google-Extended - Google's separate user-agent for generative AI training, distinct from Googlebot.
•PerplexityBot - Perplexity's training and indexing crawler.
•Perplexity-User - Perplexity's user-initiated on-demand crawler.
•Applebot-Extended - Apple's AI-related crawler.
•CCBot - Common Crawl's crawler, used by many AI training datasets.

A site that wants to block these crawlers blocks them in robots.txt with explicit user-agent directives. A site that wants to allow them does the same. The mechanism is robots.txt, not llms.txt.

Example robots.txt for AI Crawler Control

For a site that wants to allow most AI crawlers but block training-data scraping while allowing on-demand citation crawling:

``` User-agent: GPTBot Disallow: /

User-agent: Google-Extended Disallow: /

User-agent: anthropic-ai Disallow: /

User-agent: ClaudeBot Disallow: /

User-agent: OAI-SearchBot Allow: /

User-agent: ChatGPT-User Allow: /

User-agent: PerplexityBot Disallow: /

User-agent: Perplexity-User Allow: /

User-agent: * Allow: / ```

This is an illustrative configuration - the right answer depends on the site's stance on AI training versus AI search citation. The point is that the mechanism is real, documented, and respected. llms.txt is not the right place to make these decisions.

The Difference Between llms.txt and llms-full.txt

Worth a separate note because the two variants serve different purposes.

llms.txt is the summary index. It is a Markdown file with an overview of the site, links to key pages, and short descriptions. It is intended to help an AI model understand what the site is and where to find specific information.

llms-full.txt is the consolidated content dump. It is the entire site's meaningful content in a single file, formatted for AI ingestion. The intent is that an AI model could read llms-full.txt once and have a comprehensive understanding of the site without crawling individual pages.

For most sites, llms.txt is straightforward to produce. llms-full.txt is more involved - it requires generating and maintaining a synthesized representation of all important content, which becomes a non-trivial maintenance burden for content-heavy sites.

Whether You Should Add llms.txt to Your Site

The honest answer: yes, but with calibrated expectations.

The case for adding one:

•The cost is low. A reasonably structured site can produce a serviceable llms.txt in a few hours.
•If AI platforms do begin treating llms.txt as a meaningful input over the next two to three years, having one in place produces a head start.
•For sites with clear hierarchical content (documentation, SaaS product sites, developer tools), llms.txt is a useful navigational aid for any AI model that does choose to read it.
•The act of producing llms.txt forces a thoughtful audit of your site's most important pages, which has value independent of AI crawler behavior.

The case against expecting transformative results:

•No major AI platform has committed to llms.txt as a primary signal.
•The behavior of AI assistants in 2026 is driven primarily by the underlying training data, the retrieval-augmented generation indices, and the platform-specific citation logic - not by llms.txt presence.
•Adding llms.txt will not measurably improve AI search visibility on its own. The structural drivers of AI visibility - entity completeness, citation infrastructure, content geometry - are far more important.

A reasonable framing: add llms.txt because it is low-cost and forward-looking, but do not expect it to be the lever that moves AI search visibility. The actual levers live elsewhere.

What Actually Moves AI Search Visibility

Since llms.txt is not it, what is?

Entity completeness across the open web. Wikipedia, Wikidata, LinkedIn, industry directories, consistent sameAs schema markup. AI assistants cite entities they recognize as distinct and well-established.

Citation infrastructure. Press coverage, authoritative outbound links to the brand's content, Wikipedia citations, peer-industry mentions. AI assistants follow citation graphs that other authoritative sources have already built.

Content geometry. The primary answer in the first 100 words of an article, complete-sentence answers to each H2's implied question, statistics with date and source within two sentences, 15+ named entities per page. This is what AEO and GEO optimization actually look like.

robots.txt directives for the AI crawler user-agents listed above. This is the real access-control layer.

Schema.org markup at the page level. Article, Person, Organization, FAQPage, BreadcrumbList - this is the structured data layer AI assistants actually use to understand pages.

llms.txt is a courtesy file that lives alongside this real infrastructure. It is not a substitute.

Bottom Line

Add llms.txt to your site if you want. The cost is low and the future-option value is real. Do not believe the pitches that frame it as the missing piece for AI search visibility. The actual missing pieces - if you have AI visibility gaps - are almost always elsewhere: in your entity layer, your citation infrastructure, your content structure, and your robots.txt configuration.

llms.txt is a reasonable addition to a well-built site. It is not a fix for a poorly-positioned brand. Anyone telling you otherwise is selling something.