コンテンツへスキップ
POST AI agent ready /v1/sitemap

Sitemap Parser API - Extract URLs from XML Sitemaps

Fetches a sitemap URL and parses either <urlset> (regular sitemap) or <sitemapindex> (index of sub-sitemaps). Returns each entry's loc, with lastmod, changefreq, and priority when present. The type field indicates which format was detected.

Parameters

stringrequired

HTTP or HTTPS URL of the sitemap.

number

Maximum number of URLs to return. Defaults to 100, max 1000.

Code examples

curl -X POST https://api.botoi.com/v1/sitemap \
  -H "Content-Type: application/json" \
  -d '{"url":"https://vercel.com/sitemap.xml","limit":3}'

When to use this API

Crawl a site respecting its sitemap

Fetch /sitemap.xml, recurse into any sitemapindex children, and queue each URL for crawling. Much more efficient than following hyperlinks, and it surfaces pages that nothing links to from the homepage.

SEO content audit

Pull the full sitemap for your site and cross-reference lastmod against the actual last-modified date of each page. Stale lastmod values hurt crawl budgets; fix them in your sitemap generator.

Diff sitemaps between deploys

Capture the sitemap during each deploy and compare new URLs added or removed vs. the previous deploy. Catches accidental unpublishing or inadvertent page exposure.

Frequently asked questions

What is the difference between urlset and sitemapindex?
urlset is a regular sitemap containing direct page URLs. sitemapindex is a list of sub-sitemaps (common for large sites that split content across blog.xml, products.xml, etc). The type field distinguishes them so your code can recurse when needed.
Does this follow sitemapindex children automatically?
No. Each sub-sitemap is returned as a URL in the response. Make a follow-up call for each child to get the actual page URLs. This gives you control over recursion depth and rate-limiting.
What is the URL limit per request?
The default is 100 and the max is 1000. For sitemaps larger than 1000 URLs, call the endpoint multiple times or request the individual sub-sitemaps from a sitemapindex.
Can I parse compressed (.gz) sitemaps?
Not yet. The endpoint expects uncompressed XML. If a server serves sitemap.xml.gz with Content-Encoding: gzip, fetch will decompress transparently; standalone .gz files are not decompressed.
What timeout applies to the fetch?
15 seconds. Sitemaps hosted on slow origins or behind heavy firewalls may occasionally time out; retry or fetch during off-peak hours.

Get your API key

Free tier includes 5 requests per minute with no credit card required. Upgrade for higher limits.