Did you hear about Robots.txt? Not really. Right? It’s something technical. But don’t worry. I know about it and you’re saved from cracking your head. I will explain it to the level I know and understood its workings.
If you’re wondering, what this robots.txt has to do with websites or blogs, let us try getting bit technical. It’s some sort of technical protocol we defined for our blogs/websites as how to behave in the internet world. Or we can say it’s a system through which various crawlers or search engines are directed how to index or access the contents of our websites.
Robots.txt basically does the following works:
- Helps Google crawl the right contents in our blogs/websites.
- Blocks Google from accessing some selected posts/pages.
- Blocks some unwanted URLs.
All blogs or websites will have inbuilt default robots.txt. To see this default robots.txt of a blog, just add /robots.txt after URL.
My blog’s URL is www.stcbmtl.blogspot.com. If I want to see its robots.txt then I have to add /robots.txt after above URL. In URL tab, it should appear like www.stcbmtl.blogspot.com/robtots.txt. Once I press ENTER button, the robots.txt of my blog will be displayed. It will be something like
This means my blog has by default a robots.txt that debarred media partners-Google from accessing the contents and indexing them. This means the rate at which Google ranks my blog will be hurt badly.
If I change Disallow: into Allow: then Media partners-Google will be allowed to crawl and access all the contents. This will improve my page rank in search engine and help in getting organic views.
In second stanza of my robots.txt file, a * has been added to signify all other robots or search engines. When it says Disallow: /search it’s actually blocking any URL that has search extension after URL. In Allow: / it means my robots is being directed to let other search engines crawl and index others except URLs having search extension or character.
Let me make it simple. I want to let all bots access and crawl my blog’s contents because I want to increase pageviews, I will change first two stanza of my robots.txt file as:
I don’t want to add any exception. I have changed disallow into allow in first stanza and then remove whole line of disallow directive in second stanza. As I said I want all search engines to crawl my blog’s contents just to improve its pageviews and traffic.
Bonus: How you can change the default robots.txt file of your blog?
Login>settings>search preferences>crawlers and indexing>custom robots.txt>yes>paste your default robots.txt>edit>save changes.
We have also sitemap in the last stanza of the default robots.txt file of our blogs. What is this? It’s a map of our content that will be used by search engines as and when they crawl through the contents. It’s important to add that too. You shouldn’t remove it.
But if you have quite a number of videos and images in your blog and you don’t want them to leave un-crawled and un-indexed, you have add another two sitemaps for them just below your content map. It will be like:
Therefore, we can say that robots.txt can be a powerful tool to make your blog accessible to different search engines and robots. You can add many exceptions too. You can block your competitor’s website, certain posts and pages, images and also videos. Only thing is you should know how to edit and add it. Incorrect usage of robots.txt will badly hurt your blog’s ranking in various search engines. Better safe than sorry.
Hope you’ll see your default robots.txt file and then make necessary changes. If things don’t turn-out as you like them to, leave as comment. I will try to help to the level I know about it. Thanks!