Robots.txt Parser API - Fetch and Structure Crawl Rules
Fetches /robots.txt from the target domain, parses the directives, and returns structured rules per user-agent (allow and disallow paths), the list of sitemap URLs, and a crawl_delay value when present. Returns found:false when the file is missing or unreachable rather than erroring.
Code examples
curl -X POST https://api.botoi.com/v1/robots \
-H "Content-Type: application/json" \
-d '{"url":"https://github.com","domain":"github.com"}'When to use this API
Respect crawl rules in your own spider
Before fetching any URL from a third-party site, parse its robots.txt and check whether your user-agent is allowed to hit the path. Keeps your crawler compliant and avoids IP bans.
SEO audit of your site's robots file
Schedule a daily check that alerts when the allow or disallow set changes unexpectedly. Catches regressions where a deploy accidentally blocks Googlebot from /blog or the sitemap.
Discover sitemaps for content crawling
Extract the sitemaps array and feed each URL into /v1/sitemap for structured URL lists. Much more reliable than guessing /sitemap.xml.
Frequently asked questions
How does this handle the multiple-user-agent block pattern?
What happens when robots.txt is missing?
Is crawl-delay included?
What user-agent does the fetcher use?
Does this follow redirects on robots.txt?
Get your API key
Free tier includes 5 requests per minute with no credit card required. Upgrade for higher limits.