Knowledge · By Jeremy DeBarros · Published 2026-06-12
How AI Crawlers Read the Web: A Technical Breakdown
Most AI crawlers don't run JavaScript. They read your raw HTML and leave. Here is how GPTBot, ClaudeBot, and PerplexityBot actually read the web.
Most AI crawlers do not run JavaScript. They request your page, read whatever HTML the server returns, and move on. If your site builds its content in the browser with JavaScript, those crawlers see an almost empty page. This is the single most important thing to understand about how AI reads the web.
The two steps every crawler takes
Every crawler does two things: it fetches your HTML, and it optionally renders it. Fetching is cheap. Rendering, actually running your JavaScript the way a browser does, is expensive. It takes real computing power and time. Search engines like Google invested years and enormous infrastructure to render pages at scale. Most AI crawlers have not.
So when GPTBot, ClaudeBot, or PerplexityBot visits your site, they typically stop at step one. They read the raw HTML your server sends and never run the code that fills the page in. Whatever is not in that first response is invisible to them.
Why Googlebot renders but AI crawlers usually don't
Googlebot uses a two-wave system. It crawls your raw HTML first, then queues the page for rendering later, where it runs your JavaScript and sees the finished result. This is why a JavaScript-heavy site can still rank on Google, eventually.
The crawlers feeding AI answers are newer and leaner. GPTBot from OpenAI, ClaudeBot from Anthropic, PerplexityBot, Applebot, and several others often do not run a full rendering wave the way Googlebot does. They take your raw HTML at face value. If your content is not there, it does not exist as far as they are concerned.
What this means for a single-page app
Most modern sites are single-page applications. The server sends a nearly empty HTML shell with a link to a JavaScript bundle, and the browser builds the page. A human sees a complete site. A non-rendering crawler sees a shell: a title, maybe an empty div, and little else.
You can see this yourself. Open your site, right click, and choose View Page Source. That is roughly what a non-rendering crawler sees. If the source is missing your headlines, your copy, and your proof, AI is missing them too.
The three ways to fix it
There are three accepted ways to make sure your real content lives in the raw HTML:
- Server-side rendering. The server builds the full HTML for each request, so crawlers and users both receive complete content.
- Static prerendering. At build time, you generate a complete HTML file for every page, so the crawler gets the finished page instantly. This is the approach I use for my own sites.
- Hybrid hydration. The page ships complete in HTML, then JavaScript takes over for interactivity. The content is present from the first byte.
What does not work is shipping an empty shell and hoping the crawler renders it. For AI, most will not.
How to check your own site in two minutes
You do not need a tool. Pick your most important page and view the page source. Use your browser find function to search for a sentence you know is on the page. If find locates it in the source, AI can read it. If it cannot, AI cannot. Do this for your homepage, a key product or service page, and an article. The gaps you find are the work.
This is the foundation of everything I write about. The full framework is the core of my book, The Two-Reality Web, and the rest of the Knowledge Hub goes deeper on each piece.