Robots
Generally you can not block access to your website if it's accessible on the web. Protect your page with a password if needed.
robots.txt is being used to manage crawler traffic and to avoid visits from inside your own page.
Links from external pages will still show up in search engine result pages.
To manage what is shown in the search results, use Robots meta tag, data-nosnippet and X-Robots HTTP header.
Code example
-
robots.txtSitemap: https://www.yourwebsite.ch/sitemap.xml User-agent: * Disallow: /kasse Disallow: /cart Disallow: /account -
Robots
metaHTML tag:<meta name="robots" content="noindex" /> -
X-RobotsHTTP header:HTTP/1.1 200 OK Date: Tue, 25 May 2010 21:42:43 GMT (…) X-Robots-Tag: noarchive X-Robots-Tag: unavailable_after: 25 Jun 2010 15:00:00 PST -
data-nosnippetHTML attribute:<p>This text can be shown in a snippet <span data-nosnippet>and this part would not be shown</span>.</p>
Best practice
- If a specific page should not be crawled, the use of "disallow" in the robots.txt file is recommended.
- Annotate your content with tags, attributes and headers
- Check the Search Console Help
Check
- Synchronize crawler access rules with your sitemap
- Check your
robots.txtfor entries you don't want to have indexed and annotate the content with meta tags.