Robots.txt is a small text file that lives at the root of your domain (example.com/robots.txt). It tells search engine crawlers which parts of your site they can and can't access.
Basic structure
A robots.txt file is made up of one or more 'groups'. Each group specifies a user-agent (the crawler) and the URLs it's allowed or disallowed to crawl.
A simple example
User-agent: * — Disallow: /admin/ — Allow: / — Sitemap: https://example.com/sitemap.xml
What robots.txt can and can't do
- It can prevent crawling of specific paths.
- It can point crawlers to your sitemap.
- It cannot reliably hide a page from Google — use noindex for that.
- It cannot protect private content — use authentication.
Common mistakes
- Disallow: / — blocks the entire site from being crawled.
- Blocking CSS or JS that Google needs to render the page.
- Forgetting to add the sitemap location.
- Editing robots.txt on staging and pushing it to production.
Generate a clean, valid robots.txt in seconds with the Robots.txt Generator.
Frequently asked questions
Where does robots.txt live?
Always at the root of your domain, for example https://example.com/robots.txt.
Does every site need a robots.txt?
It's recommended. Without it, crawlers assume everything is open — which is usually fine, but you lose control.
Will robots.txt remove a page from Google?
No. To remove a page from search results, use a noindex tag or Google's removal tool.
Can I block specific bots?
Yes — set the user-agent to the bot's name (e.g. 'AhrefsBot') and add Disallow rules under it.
Related free tools
Put what you just read into practice.