コンテンツへスキップ
POST AI agent ready /v1/robots

Robots.txt Parser API - Fetch and Structure Crawl Rules

Fetches /robots.txt from the target domain, parses the directives, and returns structured rules per user-agent (allow and disallow paths), the list of sitemap URLs, and a crawl_delay value when present. Returns found:false when the file is missing or unreachable rather than erroring.

Parameters

string

URL or bare domain. If a full URL is given, the hostname is extracted.

string

Alternative to url. Bare domain name.

Code examples

curl -X POST https://api.botoi.com/v1/robots \
  -H "Content-Type: application/json" \
  -d '{"url":"https://github.com","domain":"github.com"}'

When to use this API

Respect crawl rules in your own spider

Before fetching any URL from a third-party site, parse its robots.txt and check whether your user-agent is allowed to hit the path. Keeps your crawler compliant and avoids IP bans.

SEO audit of your site's robots file

Schedule a daily check that alerts when the allow or disallow set changes unexpectedly. Catches regressions where a deploy accidentally blocks Googlebot from /blog or the sitemap.

Discover sitemaps for content crawling

Extract the sitemaps array and feed each URL into /v1/sitemap for structured URL lists. Much more reliable than guessing /sitemap.xml.

Frequently asked questions

How does this handle the multiple-user-agent block pattern?
When several User-agent lines precede the directives (common for grouping bots), each user-agent gets its own rules entry with the same allow/disallow set. The parser mirrors RFC 9309 semantics.
What happens when robots.txt is missing?
The endpoint returns found:false with empty rules and sitemaps arrays. 404 responses and network errors are both treated as "no robots.txt" rather than hard errors.
Is crawl-delay included?
Yes, when the file contains a Crawl-delay directive. Only the first value encountered is returned. Google ignores Crawl-delay, but other crawlers (Bing, Yandex) honor it.
What user-agent does the fetcher use?
BotoiBot/1.0 with a +https://botoi.com referrer. Some sites serve different robots.txt responses based on user-agent; the returned rules reflect what this bot sees.
Does this follow redirects on robots.txt?
The fetch uses the default follow behavior. Redirects from http to https or www variants are handled transparently. Cross-domain redirects are followed but the extracted domain stays the original.

Get your API key

Free tier includes 5 requests per minute with no credit card required. Upgrade for higher limits.