Understanding robots.txt: What to Allow and Disallow
The robots.txt file sits at the root of your domain and tells search engine crawlers which parts of your site they can and cannot access. It is one of the first files crawlers request, and mistakes here can either expose private pages or accidentally block your entire site from being indexed.
How robots.txt Works
When a search engine bot visits your site, it first checks yourdomain.com/robots.txt. The file contains rules (directives) that specify which URL paths are off-limits. A basic robots.txt looks like this:
User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /
Sitemap: https://yourdomain.com/sitemap.xml
The User-agent: * line means the rules apply to all crawlers. You can also write rules for specific bots like Googlebot or Bingbot.
What to Disallow
Block pages that waste crawl budget or should not appear in search results:
- Admin and login pages —
/admin/,/wp-admin/,/login/ - Internal search results —
/search?pages create infinite crawl traps - Duplicate content paths — Print versions, filtered views, sorted versions of the same content
- Staging and development paths —
/staging/,/dev/ - Shopping cart and checkout —
/cart/,/checkout/ - User-generated content areas — If they produce low-quality pages at scale
What to Always Allow
Never block these resources — doing so prevents Google from properly rendering and evaluating your pages:
- CSS files — Google needs to render your page as users see it
- JavaScript files — Required for rendering dynamic content
- Images — Blocked images mean no image search traffic and poor rendering
- Your main content pages — This sounds obvious, but overly broad disallow rules accidentally block important sections
Common Mistakes to Avoid
- Disallow: / without a specific user-agent — This blocks your entire site from all crawlers
- Blocking CSS/JS — Legacy advice from the early 2000s that now actively hurts SEO
- Forgetting the sitemap directive — Always include your sitemap URL for faster discovery
- Using noindex in robots.txt — Google no longer supports the noindex directive in robots.txt. Use meta robots tags instead.
- Conflicting rules — More specific rules override general ones, but the logic can be confusing. Test thoroughly.
Generate a properly formatted robots.txt file with the Robots.txt Generator. It provides a template with common disallow patterns and reminds you to include your sitemap reference.
Speaking of sitemaps, create your XML sitemap alongside your robots.txt using the XML Sitemap Generator. The two files work together — robots.txt tells crawlers what to skip, and the sitemap tells them what to prioritize.
After deploying changes, verify that your important pages are still accessible by checking for redirect issues with the Redirect Code Generator — a redirect to a disallowed URL is a common and hard-to-diagnose problem.
Review your robots.txt after every major site change. A five-minute check can prevent months of invisible indexing problems.