Google, Bing, Qwant need access: Here is how you set-up your robots.txt file correctly.
How to check your robots.txt file: Entry explains how to do it right. In turn, improve your site’s SEO results and traffic.
We address three questions
1. What do robots or crawlers do?
2. What does it mean if a robot cannot crawl the content of your blog or webpage?
3. How to BEST set-up your robots.txt file.
Improve your search positioning and get the latest news on your mobile.
By default, everything on your blog that visitors can see, can be indexed by search engines like Google or Qwant. Indexed content will show up in search results. Most blogs receive about 40 to 70 percent of their visitor traffic from search engines.
You can prevent search engines to index certain pages. This is done by editing your robots.txt file. However, this is usually not in the blogger’s best interest. Below we outline what your robots.txt file must contain to allow Google or DrKPI to crawl it.
What do robots do?
Web robots are sometimes also called web wanderers, crawlers or spiders.
Robots perform various tasks. In the context here we are interested in their work regarding:
1. Site Indexing: they take a copy of a website they find and store this information at the search engine’s servers.
2. Validating the site code – this means comparing the website code to W3C standards and grading the code according to accuracy.
3. Link Checking – this includes tracing incoming and outgoing links.
What should you check for?
While the robots.txt file is a great thing, we can inadvertantly make errors which results in outcomes that we may not want. For instance, recently, I informed a blogger that we could not scan his site. He wrote in reply:
All allowed, only sub-directories are not.”
The above indicates, however, much more is being disallowed then the scanning of sub-directories. For example:
This command prevents search engines (i.e. if they respect the blogger’s wish – DrKPI does, Google…?) to index the blog’s posts. To allow indexing the blog’s content, the command needs to be changed to:
In turn, Google and DrKPI can index the blog posts of this site (read also WordPress – section on robots.txt optimization).
Do I have a choice?
In order to allow us to index your blog, while possibly still preventing others from doing so, we need you to add the following two lines to your robots.txt file:
If the above has been added, the DrKPI-bot can then go ahead and crawl your blog, even if you use the command:
Thefore, by entering the above command we can crawl your blog’s content and index it. In return, you get the actionable metrics you want, in order to improve the blog’s performance for your business.
PS: 99.68% of all bloggers allow us to crawl their site. Accordingly, we provide them with the actionable metrics needed to improve their blog.
Register free and join other top bloggers that allow DrkPI to benchmark their content – You will be glad you did
Interesting read: The robots.txt website with tipps and tricks
What can you do?
1. Go to your robots.txt file and check – does it allow your site to be crawled – make sure it does (see above on how to do it).
2. Allow trackbacks and pingbacks on your blog
3. Check again – have you set-up things properly.
If you want to see your site’s or blog’s robots.txt file just add /robots.txt to your domain, such as:
QUICK CHECK: http://blog.DrKPI.com/robots.txt (click now to view)
Interesting read: Make use of the robots.txt file & ensure it’s working by Patrick Sexton
Have we forgotten to mention something?
How much of your blog’s or website’s traffic comes via search engines?
Thanks again for sharing your insights – I always appreciate your very helpful feedback.
Hooray – you read the whole post by author Urs E. Gattiker – aka DrKPI! Want to hang out more? Check out the news updates on Twitter, join our Social Media Monitoring discussion group on Xing, chat with us on Google+, and receive fortnightly updates and behind-the-scenes scoops through our newsletter.