Question 1

What is a robots.txt file and why do I need one?

Accepted Answer

A robots.txt file is a plain-text file placed at the root of your website that tells search engine crawlers which pages they can and cannot access. It follows the Robots Exclusion Protocol. While not required, it helps you control crawl budget, prevent indexing of duplicate or private content, and point crawlers to your sitemap.

Question 2

How do I block AI crawlers like GPTBot from scraping my site?

Accepted Answer

Add a User-agent block for each AI crawler you want to block (e.g., GPTBot, ChatGPT-User, Google-Extended, CCBot) with 'Disallow: /' to block the entire site. Use our 'Block AI Crawlers' preset to generate this configuration instantly. Note that only crawlers that respect robots.txt will honor these rules.

Question 3

Does robots.txt prevent pages from appearing in search results?

Accepted Answer

No. robots.txt prevents crawling, not indexing. If other sites link to a disallowed page, search engines may still index the URL (without content). To prevent indexing, use a 'noindex' meta tag or X-Robots-Tag HTTP header instead. robots.txt is best used for managing crawl budget and blocking access to non-public sections.

Robots.txt Generator

About robots.txt

Frequently Asked Questions