Question 1

How does this handle the multiple-user-agent block pattern?

Accepted Answer

When several User-agent lines precede the directives (common for grouping bots), each user-agent gets its own rules entry with the same allow/disallow set. The parser mirrors RFC 9309 semantics.

Question 2

What happens when robots.txt is missing?

Accepted Answer

The endpoint returns found:false with empty rules and sitemaps arrays. 404 responses and network errors are both treated as "no robots.txt" rather than hard errors.

Question 3

Is crawl-delay included?

Accepted Answer

Yes, when the file contains a Crawl-delay directive. Only the first value encountered is returned. Google ignores Crawl-delay, but other crawlers (Bing, Yandex) honor it.

Question 4

What user-agent does the fetcher use?

Accepted Answer

BotoiBot/1.0 with a +https://botoi.com referrer. Some sites serve different robots.txt responses based on user-agent; the returned rules reflect what this bot sees.

Question 5

Does this follow redirects on robots.txt?

Accepted Answer

The fetch uses the default follow behavior. Redirects from http to https or www variants are handled transparently. Cross-domain redirects are followed but the extracted domain stays the original.

Robots.txt Parser API - Fetch and Structure Crawl Rules

Parameters

Code examples

When to use this API

Respect crawl rules in your own spider

SEO audit of your site's robots file

Discover sitemaps for content crawling

Frequently asked questions

Get your API key