Monitoring your site in Google Webmaster Tools regularly is critical to the success of your site. For the purpose of this article, we are focusing specifically on monitoring the number of pages on your site that appear in Google’s index.
This number will naturally fluctuate and increase or decrease as you add and remove pages to your site. However, we are focusing on this from a troubleshooting perspective. This article is a resource to help you identify significant, unplanned spikes and dips in the number of indexed pages.
This chart is a basic overview of some standard troubleshooting to isolate and resolve the root cause(s) of indexation issues. Supporting documentation and resources can be found below.
Check Index Status
Index Status is a module in Google Webmaster Tools that displays the total number of indexed pages over a rolling 90-day period. Look for any significant spikes or dips in the number of pages.
Index Status is located under Google Index in the GWT sidebar navigation
If the number of indexed pages in GWT dips that is an indicator that something is preventing pages on the site from being crawled or certain pages are not being indexed by Google. This can be the result of the following:
- Site Errors
- Robots.txt Settings
- Google Penalties
Resolving these errors is important because you want to ensure that all of your content is visible to crawlers so it can be indexed and accessed by users.
Site Errors in GWT refer to errors that prevent the crawler from accessing the site. These errors include DNS communication issues, server connectivity (server is down or Google is being blocked) and the ability to retrieve the robots.txt file.
Site Errors is at the top of the Crawl Errors module, under Crawl in GWT
Site Errors are also displayed in the Crawl Errors section of the GWT Site Dashboard.
The three types of Site Errors are displayed in tabs at the top of the window. Tabs will have a green check mark if no errors are found or a red exclamation point to designate errors. Click on each tab to display the specific errors.
DNS Errors are produced when Google is not able to successfully recognize the hostname of a site or the request times out.
Server Connectivity Errors occur when Google is having trouble connecting with the server, losing an established connection or receiving a bad response from the server.
Robots.txt Failures are caused by the robots.txt file not returning a 200 response or a 404 response (because the site doesn’t have one). If Google cannot confirm whether or not a robots.txt file exists, it will postpone crawling the site to mitigate the risk of crawling and indexing content which is restricted by the robots.txt file.
Blocked URLs in GWT are caused by robots.txt commands that prevent Google from crawling and indexing certain pages and directories.
Blocked URLs is under Crawl in GWT
This module includes an overview of the robots.txt file and allows it to be modified and tested. It does not modify the actual robots.txt file though. So, if necessary, be sure to update the actual robots.txt file on your server.
Check for Google Penalties
To ensure quality content is being delivered to its users, Google created its Webmaster Guidelines. These guidelines are enforced by both manual reviewers and automated functions such as the Panda and Penguin algorithms.
More information about how Google manually reviews sites can be found in our blog article, Understanding Google Search Quality Rating Guidelines.
Manual Actions are penalties which are applied after a manual review of the problem. These penalties can be isolated to a specific section or applied to the entire site.
Manual Actions is under Search Traffic in GWT
Once the issues are resolved, Manual Actions can be resolved by submitting a Reconsideration Request to Google for their review.
Algorithmic Penalties occur automatically when changes to the Google algorithms go live. Unfortunately, there is no reporting of these kind of penalties in GWT.
If you believe you’ve been impacted by Algorithmic Penalties, compare your Index Status trends to see if they correspond with the algorithm updates. A comprehensive list of updates can be found in Moz’s Google Algorithm Change History article.
If the drops correspond to the updates, then you will need to review the details of the update and find the violations of the Quality Guidelines that correspond with the update (content, links, etc.).
Unfortunately, Algorithmic Penalties cannot be resolved with a Reconsideration Request. After resolving the issues, these URLs will be crawled again and submitted to the index if they meet the Quality Guidelines. You can expedite this process by manually uploading URLs to Google by using the “Fetch as Google” feature in the “Crawl” menu.
If the number of pages has increased, this means your indexed pages are being saturated with additional pages. These pages are usually the result of the following:
- URL Errors
- Other factors which are leading to URL duplication being generated on the site
Resolving increased pages is important because it ensures the search engines are crawling and indexing valuable content instead of wasting their bandwidth chasing error pages.
URL Errors are also located under the Crawl Errors module, just like Site Errors. URL Errors provides specific examples of URLs with errors or bad URL request to the site.
URL Errors is at the bottom of the Crawl Errors module, under Crawl in GWT
First, select the tab that corresponds to the specific error. Then, look below the chart for the specific URLs. Clicking on them will provide details about the errors and from where they are linked.
Specific URLs are listed below the URL Errors graph
Clicking a specific URL will provide Error details and the Linked from information
Sudden spikes in errors can have several causes. A few examples are errors in the CMS that generate URLs with errors or an increase in external links to non-existent URLs on the site.
HTML Improvements displays any issues with title tags and meta descriptions that are duplicated, missing and not properly optimized. For the purpose of researching spikes in indexed pages, we will focus on the duplicated tags.
HTML Improvements is under Search Appearance in GWT
Meta descriptions or titles that span across several versions of the exact same page can be an indicator of an increase in duplicate pages being generated on the site.
Continuously monitoring Index Status is one way to ensure your site is consistently performing and all of your content is available. This also allows you to identify and resolve any issues quickly to get your site back online or mitigate any spammy or low-value content.