Robots.txt Configuration¶

Robots.txt file, also known as Robots Exclusion Standard, is essential for your Magento 2 store with search engine crawlers. These standards allow the bots on which pages of your online stores should be excluded from scanning or opened for crawling. Hence, the rotobs.txt file is necessary for website indexation and search visibility.

Magento 2 allows to generate and configure these files by default. You can also define custom instructions for different search engines or use default indexation settings.

Follow these steps to configure the robots.txt file:

Navigate to this path: Content -> Design -> Configuration.
Open the search engine robots and switch the Default robots from the drop-down.
Commence custom instructions to the robots.txt file.
Save Config to complete the process.

We recommend using the following custom robots.txt for your Magento 2 store:

User-agent: *
Disallow: /*?
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /wishlist/
Disallow: /admin/
Disallow: /catalogsearch/ Disallow: /checkout/
Disallow: /onestepcheckout/
Disallow: /customer/
Disallow: /review/product/
Disallow: /sendfriend/
Disallow: /enable-cookies/
Disallow: /LICENSE.txt
Disallow: /LICENSE.html
Disallow: /skin/
Disallow: /js/
Disallow: /directory/

Lets consider each groups of commands separately. Stop crawling user account and checkout pages by search engine robot:

Disallow: /checkout/
Disallow: /onestepcheckout/
Disallow: /customer/
Disallow: /customer/account/
Disallow: /customer/account/login/

Blocking native catalog and search pages:

Disallow: /catalogsearch/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/

Sometimes Webmasters block pages with filters...

Disallow: /*?dir*
Disallow: /*?dir=desc
Disallow: /*?dir=asc
Disallow: /*?limit=all
Disallow: /*?mode*

More reasons to use canonical tag on these pages. Blocking CMS directories.

Disallow: /app/
Disallow: /bin/
Disallow: /dev/
Disallow: /lib/
Disallow: /phpserver/
Disallow: /pub/

These commands are not necessary. Search engines are smart enough to avoid including CMS files in their index. Blocking duplicate content:

Disallow: /tag/
Disallow: /review/

Don't forget about domain and sitemap pointing:

Host: (www.)domain. com
Sitemap: http://www.domain.com/sitemap_en.xml

Meta tags: NOINDEX, NOFOLLOW¶

NOFOLLOW and NOINDEX tags cover some unwanted code parts from crawlers. NOFOLLOW hides a part of the text or the entire page from indexation, while NOINDEX is an attribute of < a > tag that prohibits the transfer of the page weight to an unverified source.

To apply NOFOLLOW or NOINDEX to your current Configuration, you can update the robots.txt file or use the meta name= "robots" tag.

All possible combinations:

Add the following code to the robots.txt file to hide specific pages:

User-agent: *
Disallow: /myfile.html

Alternatively, you can prohibit indexation with this code:

Vital notification¶

Noindex and Nofollow tags have advantages over blocking a page with robots.txt:

Using robots.txt, one can prevent the page crawling during scheduled website crawling; however, this page can be crawled from other website links.
All the efforts will be transmitted to other website pages through internal links if a page has inbound links.

Precisely use the instructions mentioned above to manually configure the robots.txt file to Magento 2 store and stop unnecessary parts of code or spread the weight of pages.

Last update: 2022-06-29