Prevent Crawler from Indexing Sandbox

  1. via robots.txt file:
1
2
User-agent: *
Disallow: /
  1. (Apache) via .htaccess file:
1
Header set X-Robots-Tag "noindex, nofollow, noarchive"
  1. (Nginx) via nginx.conf file:
1
add_header X-Robots-Tag "noindex, nofollow, noarchive" always;

(Optional) Extra Steps if File/Page Already Indexed

If one of the page or file can already be seen on the Google Search result, and you want to remove it, and forbid random user from accessing it, you can do the following:

  1. Forbid public file access:

    • When using Shield module, it only locks down requests that Drupal is involved in responding to. That exclude static assets (e.g. public files inside <web-root>/sites/default/files/ directory) that the web server would serve directly without bootstrapping Drupal.

    • If you want to block public audience from accessing the files, you can edit the .htaccess file in the web root:

      1
      2
      3
      4
      5
      
      # Deny access to /sites/default/files/ and its subdirectories
      <IfModule mod_rewrite.c>
        RewriteEngine On
        RewriteRule ^sites/default/files/ - [F,L]
      </IfModule>
      
      1
      2
      3
      4
      5
      
      # Block direct access to PDF files inside /sites/default/files/
      <IfModule mod_rewrite.c>
        RewriteEngine On
        RewriteRule ^sites/default/files/.*\.pdf$ - [F,L,NC]
      </IfModule>
      
  2. Remove Index from Google Search Console:

    • If your page is already appearing under search result, you can remove it via Google Search Console’s removals feature:

      2025-10-24T1253532025-10-24T125140

Reference