← LIST / 2026-05-19 / Laravel
POST · LARAVEL

Serving AI Crawlers Clean Markdown with a Laravel Middleware

2026-05-19 Published 4 min read 791 words
Serving AI Crawlers Clean Markdown with a Laravel Middleware

llms.txt

AI crawlers are indexing your blog whether you opt in or not. GPTBot, ClaudeBot, Perplexity — they all follow links and pull content. The problem is they receive the same response as a browser: HTML with navigation, scripts, and layout wrapped around the few paragraphs they actually care about.

Serving them clean markdown instead takes about 40 lines of code. This post walks through a two-part setup: a middleware that transparently redirects known AI crawlers from .html to .md, and a /llms.txt discovery endpoint so they can find your content without crawling at all.

Part 1 — The Redirect Middleware

Config

Create a dedicated config file at config/llm.php to hold the list of known AI User-Agents:

return [
    'agents' => [
        'GPTBot', 'OAI-SearchBot', 'ChatGPT-User',
        'ClaudeBot', 'Claude-User', 'Claude-SearchBot',
        'PerplexityBot', 'Perplexity-User',
        'Bytespider',
        'Google-Extended', 'Gemini-Deep-Research', 'Google-NotebookLM',
        'Meta-ExternalAgent',
    ],
];

Keeping this in config makes it easy to add new agents as they emerge, without touching the middleware itself.

The Middleware

php artisan make:middleware LlmMarkdownRedirect
namespace App\Http\Middleware;

use Closure;
use Illuminate\Http\Request;
use Symfony\Component\HttpFoundation\Response;

class LlmMarkdownRedirect
{
    public function handle(Request $request, Closure $next): Response
    {
        $userAgent = $request->userAgent() ?? '';

        $isLlm = collect(config('llm.agents', []))
            ->contains(fn ($agent) => stripos($userAgent, $agent) !== false);

        $acceptsMarkdown = str_contains($request->header('Accept', ''), 'text/markdown');

        if ($isLlm && ! $acceptsMarkdown && str_ends_with($request->getPathInfo(), '.html')) {
            $mdUrl = preg_replace('/\.html$/', '.md', $request->getRequestUri());
            return redirect($mdUrl, 302);
        }

        return $next($request);
    }
}

Two conditions prevent unnecessary redirects. First, the path must end in .html — this keeps the middleware off the homepage, RSS feed, and any other routes. Second, if the client already sends Accept: text/markdown, it already knows how to ask for markdown directly, so no redirect is needed.

Routes

Apply the middleware only to the .html post route, and add a parallel .md route that serves raw markdown:

Route::get('/blog/{slug}.html', [BlogController::class, 'post'])
    ->middleware(\App\Http\Middleware\LlmMarkdownRedirect::class);

Route::get('/blog/{slug}.md', [BlogController::class, 'markdown']);

The .md route has no redirect logic — it just returns the content directly, which prevents any loop.

The Markdown Endpoint

The markdown method in your controller should strip YAML front matter and return just the body. The important part is the response headers:

public function markdown(Request $request, string $slug): Response
{
    $post = Post::findOrFail($slug);
    $body = $post->rawMarkdownBody(); // however you retrieve the plain markdown

    return response($body, 200)
        ->header('Content-Type', 'text/markdown; charset=UTF-8')
        ->header('Vary', 'Accept');
}

The Vary: Accept header tells HTTP caches that this URL can return different representations depending on the Accept header — important if you also support returning markdown inline when a browser sends Accept: text/markdown.

It is also worth checking this header at the top of your HTML post method to serve markdown directly without a redirect for clients that negotiate correctly:

public function post(Request $request, string $slug)
{
    if (str_contains($request->header('Accept', ''), 'text/markdown')) {
        return $this->markdown($request, $slug);
    }

    // normal HTML rendering...
}

Part 2 — The /llms.txt Discovery Endpoint

llms.txt is an emerging convention for helping AI tools discover what content is available on a site. It is a plain-text file at the root of your domain listing your pages with links to their machine-readable versions.

Add the route:

Route::get('/llms.txt', [BlogController::class, 'llms']);

And the controller method:

public function llms(Request $request): Response
{
    $posts = Post::online()->get();

    $lines = [
        '# ' . config('app.name'),
        '',
        '## Posts',
    ];

    foreach ($posts as $post) {
        $lines[] = '- [' . $post->title . '](' . config('app.url') . '/blog/' . $post->slug . '.md)';
    }

    return response(implode("\n", $lines), 200)
        ->header('Content-Type', 'text/plain; charset=UTF-8');
}

The links point at your .md endpoints — so a crawler that reads /llms.txt already has a clean list of URLs that return markdown directly.

The Full Flow

For a human browser, nothing changes. A GET to /blog/my-post.html returns HTML as before.

For GPTBot hitting the same URL: the middleware fires, detects the User-Agent, and issues a 302 to /blog/my-post.md. The crawler follows the redirect and gets raw markdown.

For a client that sends Accept: text/markdown: the middleware skips the redirect (the Accept check), and the HTML post method detects the header and calls the markdown method inline.

For a crawler that reads /llms.txt first: it gets a plain-text index of all posts with direct .md links — no HTML crawling needed at all.

Conclusion

The whole thing is one middleware file, one config file, two route entries, and one controller method. No package, no build step, no changes to your existing HTML routes. Human visitors see no difference; AI crawlers get content they can actually use.

END OF POST READY.