What is the use of robots txt file?
The robots.txt file, is a text file that tells web robots (most often search engines) which pages on your site to be shown in search result (crawl).
It also tells web robots which pages not to crawl. Means, we can control what content should be shown to the public from our website by allowing and disallowing content or page in robots.txt file.
What is Web crawler?
A Web crawler, sometimes called a spider, is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing (web spidering).
Web search engines and some other sites use Web crawling or spidering software to update their web content or indices of others sites’ web content. Web crawlers copy pages for processing by a search engine which indexes the downloaded pages so users can search more efficiently.
When a user searches for a term or keyword, then Search engine like Google sends Internet bots across the world wide web and the Internet bots browses all the website’s content over the world wide web and find results then shows most relevant result to the user. This all process is called Web Crawler.
Optimizing robots.txt for SEO
How you optimize robots.txt all depends on the content you have on your site. There are all kinds of ways to use robots.txt to your advantage.
I’ll go over some of the most common ways to use it.
(Keep in mind that you should not use robots.txt to block pages from search engines. That’s a big no-no.)
One of the best uses of the robots.txt file is to maximize search engines’ crawl budgets by telling them to not crawl the parts of your site that aren’t displayed to the public.
For example, if you visit the robots.txt file for this site (sunilbutolia.com), you’ll see that it disallows the login page (wp-admin).
Since that page is just used for logging into the backend of the site, it wouldn’t make sense for search engine bots to waste their time crawling it.
(If you have WordPress, you can use that same exact disallow line.)