How to Build an llms.txt File for AI Crawlers

AI crawlers are reshaping web traffic, search, and discovery. If you are a B2B startup, building an llms.txt file for AI crawlers is no longer optional. This guide covers everything you need to implement it in under 30 minutes.

What Is llms.txt and Why It Matters Now

The llms.txt file is a plain-text Markdown file placed at the root of your domain. It tells AI systems like ChatGPT, Claude, and Perplexity which pages on your site are most important, what your business does, and where to find your best content.

Jeremy Howard, co-founder of fast.ai and Answer.AI, proposed the standard in September 2024. It spread fast.

Over 844,000 websites have now implemented llms.txt (llmstxt.org / BuiltWith, October 2025)
But only 3.2% of all websites globally have one (seoscore.tools)
AI-related bot traffic increased over 300% between January 2025 and March 2026 (seoscore.tools server log analysis)

That gap is the opportunity. Early adopters are already seeing sites with llms.txt achieve 2.3x higher AI recall (lowtouch.ai) and up to 30% more AI-driven referral traffic (The Growth Terminal, 50-site study).

Major companies including Cloudflare, Anthropic, Stripe, Vercel, and Zapier have already implemented it. Anthropic directly requested that Mintlify build llms.txt and llms-full.txt support across its entire doc-hosting platform. That is a clear signal of where this is headed.

llms.txt vs robots.txt vs sitemap.xml

These three files each serve a different purpose. You need all three.

robots.txt controls access. It tells crawlers which URLs they can or cannot visit. It has been the standard since 1994 and all major search engines and AI companies respect it.

sitemap.xml lists all your URLs. It helps search engines discover and crawl your pages efficiently.

llms.txt controls context and understanding. It does not block anything. It guides AI language models to your most important content and explains what your site is about.

Think of it this way:

robots.txt = the bouncer (who gets in)
sitemap.xml = the directory (what exists)
llms.txt = the briefing (what matters and why)

How AI Crawlers Actually Use Your Content

AI crawlers like GPTBot, ClaudeBot, and PerplexityBot scan the web to power real-time answers and train language models. The problem is that your website was built for humans.

When an AI agent fetches a page, it has to process navigation menus, cookie banners, JavaScript bundles, footer links, and ad scripts just to find the content it actually needs. That is inefficient. Research shows that Markdown requires approximately 10x fewer tokens to process than raw HTML (NeuronWriter). Every unnecessary token competes for space in the AI's context window.

llms.txt solves this by giving AI a clean, structured summary before it crawls. Instead of guessing what matters, the crawler reads your llms.txt and knows exactly where to go.

Developer tools like Cursor and Mintlify auto-fetch llms.txt files when indexing sites. Google included llms.txt in its Agents to Agents (A2A) protocol. Windsurf reports that llms.txt reduces token usage by pointing agents directly to relevant endpoints.

Run SEO on autopilot.

Miniloop handles keyword research, briefs, drafts, and rank tracking. With Ahrefs, Semrush, your CMS. On a schedule.

See SEO automation

llms.txt vs llms-full.txt: Which One to Build First

The spec defines two companion files.

llms.txt is a lightweight index. It lists your key pages with short descriptions. Under 10KB. Takes 15-30 minutes to create.

llms-full.txt is the full-content version. It contains the actual Markdown text of your core pages, bundled into one file. Larger, but richer context for AI retrieval.

Data from Mintlify and Profound shows that llms-full.txt receives heavier AI traffic preference than the base llms.txt file. ChatGPT accounts for the majority of these visits.

According to aicrawlercheck.com, llms.txt contributes 20 points and llms-full.txt adds 15 points to your AI Visibility Score. Together that is 35% of your total AI visibility.

Start with llms.txt. Add llms-full.txt once you have the core file working.

The Exact Format of an llms.txt File

The file uses simple Markdown with a specific structure. Here is the spec.

# Your Company Name
> One or two sentences describing what your company does and who it serves.

## Product
- [Product Overview](https://yoursite.com/product): What your product does and who it is for.
- [Pricing](https://yoursite.com/pricing): Plans and pricing for small to enterprise teams.

## Resources
- [Blog](https://yoursite.com/blog): Guides and insights on GTM, AI search, and sales automation.
- [Case Studies](https://yoursite.com/case-studies): Real results from customers using the platform.

## Documentation
- [Getting Started](https://yoursite.com/docs/getting-started): Setup guide for new users.
- [API Reference](https://yoursite.com/docs/api): Full API documentation.

## Optional
- [About](https://yoursite.com/about): Company mission and team.
- [Contact](https://yoursite.com/contact): Get in touch with sales or support.

The rules are simple:

H1 = company or site name
Blockquote = a short description of what you do
H2 = content categories
Bullet items = links with a colon and a one-line description

Keep descriptions specific. Generic descriptions like "our blog" waste the opportunity. AI systems use those descriptions to decide whether a page is relevant to retrieve for a given query.

Step-by-Step: How to Build Your llms.txt File

Step 1: Identify your most important pages

Start with pages that represent your core business. For a B2B SaaS startup: homepage, product pages, pricing, key blog posts, case studies, and documentation.

Step 2: Group pages into logical categories

Create H2 headers for each category. Product, resources, documentation, and optional sections like about or careers. Do not try to list everything. This is a curated index, not a sitemap.

Step 3: Write useful descriptions

For each URL, write a single sentence describing what the page contains. Focus on what an AI would need to know to decide whether to retrieve it. Include relevant keywords naturally.

Step 4: Create the file

Create a file named llms.txt using any text editor. Save it as plain text with UTF-8 encoding.

Step 5: Validate your file

Paste the content into llmstxt.studio to check for formatting errors and broken URLs.

Step 6: Deploy it to your site root

The file must be accessible at yourdomain.com/llms.txt. See deployment instructions by platform below.

Step 7: Update it regularly

Article schema with dateModified is critical. Most ChatGPT citations come from content updated within the last 10-12 months (SerpNap AEO Technical Stack analysis). The same principle applies to llms.txt. Keep it current.

How to Deploy on Different Platforms

Next.js and static sites

Place llms.txt in the /public directory. It will be served automatically at yourdomain.com/llms.txt.

WordPress

Upload via FTP or cPanel to the root folder of your WordPress install. Alternatively, use a file manager plugin to add it directly. Set the Content-Type header to text/plain.

Webflow

Upload as a static asset in the Webflow Assets panel, then publish. Confirm the file is accessible at your root domain.

Any CMS or CDN

Add the file through your CDN's static file delivery configuration or server config. The goal is a publicly accessible URL at yourdomain.com/llms.txt.

The AI Crawler Landscape

Knowing which bots to support helps you configure both your llms.txt and your robots.txt correctly.

Crawler	Operator	Powers
GPTBot	OpenAI	ChatGPT training and AI search
ChatGPT-User	OpenAI	Real-time browsing during conversations
ClaudeBot	Anthropic	Claude AI training and retrieval
PerplexityBot	Perplexity	Perplexity AI search
Google-Extended	Google	Gemini AI training
CCBot	Common Crawl	Multiple LLM training datasets

Note: GPTBot (training) and ChatGPT-User (real-time search) are separate user agents. You can block one while allowing the other in your robots.txt. This distinction matters if you want to appear in ChatGPT search without contributing to training data.

The general recommendation: allow all of the above unless you have a specific legal or commercial reason to block one. Every blocked crawler is a platform where your content cannot be cited.

Managing AI Crawlers in robots.txt

robots.txt and llms.txt work together. Here is how to configure your robots.txt for AI crawlers.

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

If you want to allow AI search but block AI training:

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Allow: /

Do not accidentally block AI crawlers by using a catch-all Disallow: / under User-agent: *. Check your existing robots.txt now to confirm AI crawlers are not already being blocked.

How to Measure Whether It Is Working

Tracking llms.txt effectiveness is not yet as straightforward as tracking Google rankings. Here is what to monitor.

Server logs: Filter for GPTBot, ClaudeBot, and PerplexityBot. Increased visits to llms.txt confirm crawlers are reading the file.

AI visibility tools: Tools like aicrawlercheck.com calculate an AI Visibility Score and include llms.txt as a direct factor. Use them to benchmark before and after implementation.

AI referral traffic: Track sessions with referrer sources that include perplexity.ai, chat.openai.com, or claude.ai in Google Analytics or your analytics platform.

Brand mentions in AI outputs: Search for your company name in ChatGPT, Claude, and Perplexity. Note whether you are cited and what content they reference.

Some implementing sites report 30% increases in AI-driven referral traffic within weeks (The Growth Terminal). Results vary depending on content quality, domain authority, and crawl frequency.

Common Mistakes to Avoid

Broken URLs: Every link in llms.txt should return a 200 status. Validate before deploying.
Generic descriptions: Descriptions like "our blog" give AI nothing useful to work with. Be specific.
Placing the file in the wrong directory: The file must be at the root domain. /public/llms.txt is correct for Next.js. Placing it in a subdirectory breaks it.
Blocking AI crawlers in robots.txt: Check your robots.txt before creating llms.txt. A blocked crawler cannot read it.
Treating it as a one-time task: Update your llms.txt whenever you publish significant new content or change your site structure.
Listing outdated pages: Remove or update pages that are stale. AI systems favor fresh, recently updated content.

Connecting llms.txt to Your GEO Strategy

llms.txt is a technical signal. It works best when paired with strong content.

AI citation rates correlate with content quality, topical authority, and freshness. A lean startup publishing consistent, expert content has a significant advantage. This is why AI content marketing and programmatic SEO strategies matter so much right now.

The playbook is straightforward:

Publish high-quality, answer-first content on a consistent schedule
Structure it well with proper schema markup
Guide AI crawlers to your best work with llms.txt
Monitor and iterate

Tools like Miniloop help you maintain the publishing cadence that keeps your llms.txt pointed at fresh, authoritative content. The combination of quality content plus proper AI crawler configuration is what drives sustainable GEO visibility.

For a deeper look at the broader shift from traditional SEO to AI search optimization, read GEO vs SEO: how generative engine optimisation works in 2026. For tactical citation strategies, see how to rank in Google AI Overviews and how to rank in Google AI Mode.

The lean startup AI tool stack and SEO strategy for lean teams posts cover how to prioritize this work with limited time.

TL;DR

llms.txt is a plain-text Markdown file at your domain root that guides AI crawlers to your best content
Only 3.2% of websites have one. That is the early-mover opportunity.
It differs from robots.txt (access control) and sitemap.xml (URL listing). You need all three.
Sites with llms.txt see 2.3x higher AI recall and up to 30% more AI referral traffic
Build llms.txt first, then add llms-full.txt for full-content indexing
Format: H1 site name, blockquote description, H2 categories, bullet links with descriptions
Deploy at your domain root. Setup takes 15-30 minutes.
Key AI crawlers: GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended
Manage crawler access in robots.txt. Guide content priorities in llms.txt.
Update regularly. AI systems favor fresh content updated within the last 10-12 months.
Track performance via server logs, AI visibility tools, and AI referral traffic

For the official spec, visit llmstxt.org. To generate and validate your file, use llmstxt.studio.

Frequently Asked Questions

What is an llms.txt file?

An llms.txt file is a plain-text Markdown file placed at the root of your website (yourdomain.com/llms.txt). It tells AI language models like ChatGPT, Claude, and Perplexity which pages on your site are most important, what your company does, and where to find your best content. It was proposed by Jeremy Howard (fast.ai) in September 2024 and has been adopted by over 844,000 websites as of October 2025.

How is llms.txt different from robots.txt?

robots.txt controls crawler access — it tells bots which URLs they can or cannot visit. llms.txt controls context and understanding — it guides AI language models to your most valuable content without blocking anything. Both are needed. robots.txt is the gatekeeper; llms.txt is the briefing that tells AI what matters most on your site.

Does llms.txt actually help with AI search rankings?

The evidence is promising but still emerging. Sites with llms.txt see 2.3x higher AI recall (lowtouch.ai) and some implementing sites report up to 30% more AI-driven referral traffic (The Growth Terminal, 50-site study). llms.txt and llms-full.txt together account for 35% of the AI Visibility Score on aicrawlercheck.com. No major AI platform has officially confirmed they read it, but Anthropic, Cloudflare, Stripe, Vercel, and Zapier have all implemented it — a strong signal it matters.

How long does it take to build an llms.txt file?

A basic llms.txt file takes 15 to 30 minutes to create and deploy. You need to list your most important pages organized into categories, write one-line descriptions for each URL, and place the file at your domain root. Use llmstxt.studio to generate and validate your file for free.

What is the difference between llms.txt and llms-full.txt?

llms.txt is a lightweight index file that lists your key pages with short descriptions. It is typically under 10KB. llms-full.txt is a companion file containing the full Markdown text of your core pages bundled into one document. Data from Mintlify and Profound shows that llms-full.txt receives heavier AI traffic preference. Both files together contribute 35% of your AI Visibility Score. Start with llms.txt, then add llms-full.txt.

Which AI crawlers read llms.txt?

The main AI crawlers that access llms.txt are: GPTBot (OpenAI, ChatGPT training and search), ChatGPT-User (OpenAI, real-time browsing), ClaudeBot (Anthropic, Claude AI), PerplexityBot (Perplexity search), Google-Extended (Google, Gemini training), and CCBot (Common Crawl, multiple LLM datasets). ChatGPT accounts for the majority of llms.txt visits based on data from Mintlify and Profound. Developer tools like Cursor and Mintlify also auto-fetch llms.txt when indexing sites.

How to Build an llms.txt File for AI Crawlers: The Complete 2026 Guide