Robots.txt

Robots.txt

Stop content theft. We'll wizz you up a robots.txt tailor made for your requirements, not theirs.

We'll also email you any updates when things change.

Work through each section below, then scroll to the bottom of the page to send your robots.txt.

Details

We'll use your email to send you a robots.txt file that implements the usage choices you select now.

We'll also email you an updated robots.txt whenever we spot new crawlers or usage changes that impact your choices.

We might also send you information related to our complimentary services.

You can unsubscribe at any time via a link in the email.


Licensing

Terms Document Locator (TDL) lines reference immutable legal documents that define the terms under which crawlers may access your site. Reference a standard agreement, your own, or both. Selected entries are emitted as TDL: lines in the generated robots.txt between User-agent: and Allow: / Disallow:.

Standard TDLs are maintained by external organisations. We check daily for updated versions and use the latest available automatically.

For a custom TDL, enter one URL per line or separate URLs with a comma. Each URL must be an absolute http or https URL pointing to an immutable legal document defining the terms under which crawlers may access your site.


Allowed Crawler Categories

Select crawler categories to allow. Crawlers in the selected categories will be allowed access. Read more about each category and how to set these values in the crawler documentation.

Permitted use Your choice

Train

Indicates that the crawler is used to train AI models.

Input

Indicates that the crawler is used to collect content for generative AI and search summaries.

Index

Indicates that the crawler is used for internal indexing of AI models.

Search

Indicates that the crawler is used to build search indexes and provide search results.

Monitor

Indicates that the crawler is used for monitoring websites.

Archiving

Indicates that the crawler is used for archiving data and websites.

Preview

Indicates that the crawler is used to create content previews.

Security

Indicates that the crawler is a security-focused web crawler that scans domains for vulnerabilities.

Analytics

Indicates that the crawler is used to gather data for marketing analytics.

Feed

Indicates that the crawler is used for aggregating news, information, or data.

Discovery

Indicates that the crawler is used to gain an understanding of the discoverability or search ranking of the crawled website or web page.


Handling Conflicts

Sometimes your choices can't be addressed via robots.txt alone. For example a single crawler that uses content for both general search and AI training. The robots.txt we generate will allow such crawlers and include a warning in the comments indicating the conflict and where possible a URL for you to contact the crawler operator to request they restrict their use of your content via contractual agreement.

Where we observe such conflicts from many users we may attempt to contact the operator to request they provide a method of resolving the conflict via robots.txt.


Please read our Privacy policy before submitting.