Technical SEO

Robots.txt Explained for Beginners

A plain-English guide to robots.txt — what it does, how to write one, and common mistakes that can block your site from Google.

Jaymar SEO··6 min read

Robots.txt is a small text file that lives at the root of your domain (example.com/robots.txt). It tells search engine crawlers which parts of your site they can and can't access.

Basic structure

A robots.txt file is made up of one or more 'groups'. Each group specifies a user-agent (the crawler) and the URLs it's allowed or disallowed to crawl.

A simple example

User-agent: * — Disallow: /admin/ — Allow: / — Sitemap: https://example.com/sitemap.xml

What robots.txt can and can't do

  • It can prevent crawling of specific paths.
  • It can point crawlers to your sitemap.
  • It cannot reliably hide a page from Google — use noindex for that.
  • It cannot protect private content — use authentication.

Common mistakes

  • Disallow: / — blocks the entire site from being crawled.
  • Blocking CSS or JS that Google needs to render the page.
  • Forgetting to add the sitemap location.
  • Editing robots.txt on staging and pushing it to production.

Generate a clean, valid robots.txt in seconds with the Robots.txt Generator.

Frequently asked questions

Where does robots.txt live?

Always at the root of your domain, for example https://example.com/robots.txt.

Does every site need a robots.txt?

It's recommended. Without it, crawlers assume everything is open — which is usually fine, but you lose control.

Will robots.txt remove a page from Google?

No. To remove a page from search results, use a noindex tag or Google's removal tool.

Can I block specific bots?

Yes — set the user-agent to the bot's name (e.g. 'AhrefsBot') and add Disallow rules under it.

Related free tools

Put what you just read into practice.

Keep reading