Robots.txt Generator
Build a valid robots.txt file with multiple user-agent blocks, allow/disallow rules, crawl-delay, sitemap, and host directives.
About robots.txt
robots.txt is a plain-text file placed at the root of your website (e.g., example.com/robots.txt) that tells web crawlers which pages or sections they are allowed or disallowed from accessing. It follows the Robots Exclusion Protocol, first proposed in 1994 and now formalized as an Internet standard.
Each User-agent block targets a specific crawler (or all crawlers with *). Disallow prevents crawling of specific paths, while Allow overrides a broader Disallow rule for specific sub-paths. The Sitemap directive points crawlers to your XML sitemap for efficient discovery.
Important: robots.txt is advisory, not a security mechanism. Well-behaved crawlers like Googlebot respect it, but malicious bots may ignore it entirely. Never rely on robots.txt to protect sensitive content — use authentication and access controls instead.
Specify the crawler name — '*' targets all bots, or name specific crawlers like Googlebot, GPTBot, Bingbot
List URL paths the crawler must not access — e.g., /admin/, /api/, /private/
Override a broader Disallow for specific sub-paths — e.g., Allow: /api/public/ under Disallow: /api/
Optional Crawl-delay (seconds between requests), Host (preferred domain), and Sitemap URL for discovery
Plain-text robots.txt file — upload to your site root (example.com/robots.txt) for crawlers to read
Spec: RFC 9309 (Robots Exclusion Protocol), Google Robots.txt Specification