Apps

About the content crawler

Next: Give access to our crawler in your robots.txt file

A crawler, also known as a spider or a bot, is the software Google uses to process and index the content of webpages. The content crawler visits your site to determine its content in order to provide relevant ads.

Important things to know about the content crawler:

The crawler report is updated weekly.
The crawl is performed automatically. We're unable to accommodate requests for more frequent crawling.
The content crawler is different from the Google crawler.
These two crawlers are separate, but they do share a cache. We do this to avoid both crawlers requesting the same pages, which helps you conserve your bandwidth. Similarly, the Search Console crawler is separate.
Resolving content crawler issues doesn't resolve issues with the Google crawl.
Fixing issues listed on your Crawler access page won't impact your Google search ranking.

Note: To learn more about your site's ranking on Google, visit getting included in Google search results.
The crawler indexes by URL.
For example, our crawler will access "site.com" and "site.com" separately. However, our crawler doesn't count "site.com" and "site.com/#anchor" separately.
The crawler won't access pages or directories prohibited by a robots.txt file.
Both the Google and AdMob Mediapartners crawlers honor your robots.txt file. If your robot.txt file prohibits access to certain pages or directories, then they won't be crawled.

Note: If you’re serving ads on pages that are being roboted out with the line User-agent: *, then the content crawler will still crawl these pages. To prevent the content crawler from accessing your pages, you need to specify User-agent: Mediapartners-Google in your robots.txt file.

Learn more about giving our crawler access to your pages.
The crawler will attempt to access URLs only where our ad tags are implemented.
Only pages displaying Google ads should be sending requests to our systems and being crawled.
The crawler will attempt to access pages that redirect.
When you have "original pages" that redirect to other pages, our crawler must access the original pages to determine that a redirect is in place. Therefore, our crawler's visit to the original pages will appear in your access logs.
There's no way to control how often our crawlers index the content on your site.
At this time, crawling is done automatically by our bots. If you make changes to a page, it may take up to 1 or 2 weeks before the changes are reflected in our index.

Was this helpful?

How can we improve it?

Apps

About the content crawler

Was this helpful?

Need more help?

Try these next steps: