
![llms.txt](/storage/llms-middleware.png)

AI crawlers are indexing your blog whether you opt in or not. GPTBot, ClaudeBot, Perplexity — they all follow links and pull content. The problem is they receive the same response as a browser: HTML with navigation, scripts, and layout wrapped around the few paragraphs they actually care about.

Serving them clean markdown instead takes about 40 lines of code. This post walks through a two-part setup: a middleware that transparently redirects known AI crawlers from .html to .md, and a /llms.txt discovery endpoint so they can find your content without crawling at all.

## Part 1 — The Redirect Middleware

### Config

Create a dedicated config file at config/llm.php to hold the list of known AI User-Agents:

```php
return [
    'agents' => [
        'GPTBot', 'OAI-SearchBot', 'ChatGPT-User',
        'ClaudeBot', 'Claude-User', 'Claude-SearchBot',
        'PerplexityBot', 'Perplexity-User',
        'Bytespider',
        'Google-Extended', 'Gemini-Deep-Research', 'Google-NotebookLM',
        'Meta-ExternalAgent',
    ],
];
```

Keeping this in config makes it easy to add new agents as they emerge, without touching the middleware itself.

### The Middleware

```bash
php artisan make:middleware LlmMarkdownRedirect
```

```php
namespace App\Http\Middleware;

use Closure;
use Illuminate\Http\Request;
use Symfony\Component\HttpFoundation\Response;

class LlmMarkdownRedirect
{
    public function handle(Request $request, Closure $next): Response
    {
        $userAgent = $request->userAgent() ?? '';

        $isLlm = collect(config('llm.agents', []))
            ->contains(fn ($agent) => stripos($userAgent, $agent) !== false);

        $acceptsMarkdown = str_contains($request->header('Accept', ''), 'text/markdown');

        if ($isLlm && ! $acceptsMarkdown && str_ends_with($request->getPathInfo(), '.html')) {
            $mdUrl = preg_replace('/\.html$/', '.md', $request->getRequestUri());
            return redirect($mdUrl, 302);
        }

        return $next($request);
    }
}
```

Two conditions prevent unnecessary redirects. First, the path must end in .html — this keeps the middleware off the homepage, RSS feed, and any other routes. Second, if the client already sends Accept: text/markdown, it already knows how to ask for markdown directly, so no redirect is needed.

### Routes

Apply the middleware only to the .html post route, and add a parallel .md route that serves raw markdown:

```php
Route::get('/blog/{slug}.html', [BlogController::class, 'post'])
    ->middleware(\App\Http\Middleware\LlmMarkdownRedirect::class);

Route::get('/blog/{slug}.md', [BlogController::class, 'markdown']);
```

The .md route has no redirect logic — it just returns the content directly, which prevents any loop.

### The Markdown Endpoint

The markdown method in your controller should strip YAML front matter and return just the body. The important part is the response headers:

```php
public function markdown(Request $request, string $slug): Response
{
    $post = Post::findOrFail($slug);
    $body = $post->rawMarkdownBody(); // however you retrieve the plain markdown

    return response($body, 200)
        ->header('Content-Type', 'text/markdown; charset=UTF-8')
        ->header('Vary', 'Accept');
}
```

The Vary: Accept header tells HTTP caches that this URL can return different representations depending on the Accept header — important if you also support returning markdown inline when a browser sends Accept: text/markdown.

It is also worth checking this header at the top of your HTML post method to serve markdown directly without a redirect for clients that negotiate correctly:

```php
public function post(Request $request, string $slug)
{
    if (str_contains($request->header('Accept', ''), 'text/markdown')) {
        return $this->markdown($request, $slug);
    }

    // normal HTML rendering...
}
```

## Part 2 — The /llms.txt Discovery Endpoint

[llms.txt](https://llmstxt.org) is an emerging convention for helping AI tools discover what content is available on a site. It is a plain-text file at the root of your domain listing your pages with links to their machine-readable versions.

Add the route:

```php
Route::get('/llms.txt', [BlogController::class, 'llms']);
```

And the controller method:

```php
public function llms(Request $request): Response
{
    $posts = Post::online()->get();

    $lines = [
        '# ' . config('app.name'),
        '',
        '## Posts',
    ];

    foreach ($posts as $post) {
        $lines[] = '- [' . $post->title . '](' . config('app.url') . '/blog/' . $post->slug . '.md)';
    }

    return response(implode("\n", $lines), 200)
        ->header('Content-Type', 'text/plain; charset=UTF-8');
}
```

The links point at your .md endpoints — so a crawler that reads /llms.txt already has a clean list of URLs that return markdown directly.

## The Full Flow

For a human browser, nothing changes. A GET to /blog/my-post.html returns HTML as before.

For GPTBot hitting the same URL: the middleware fires, detects the User-Agent, and issues a 302 to /blog/my-post.md. The crawler follows the redirect and gets raw markdown.

For a client that sends Accept: text/markdown: the middleware skips the redirect (the Accept check), and the HTML post method detects the header and calls the markdown method inline.

For a crawler that reads /llms.txt first: it gets a plain-text index of all posts with direct .md links — no HTML crawling needed at all.

## Conclusion

The whole thing is one middleware file, one config file, two route entries, and one controller method. No package, no build step, no changes to your existing HTML routes. Human visitors see no difference; AI crawlers get content they can actually use.
