Google Sitemaps - One Month Later

Been generating and submitting Google Sitemaps for a month now. I am very pleased with the results. My results are specific to generating the Sitemap using the weblog node method of extracting URLs from my web server log files and re-generating and re-submitting every three days. I have not tried changing the submission interval as yet. And I have approximately 33202 real pages on Belchfire.net.

The number of indexed pages on Google for Belchfire.net has increased incrimentally since my first Sitemap submission. From about 50 000 before using Google sitemaps (fairly consistent over previous 3 years) to approximately 150 000 on the first submission to over 250 000 after four weeks. This seems to indicate that Google records the submitted URLs and adds new URLs without deleting the ones from the previous submissions. Or at least, not within the four week timeframe I’ve been using it. If things change I’ll let you know.

I have configured my Sitemaps config file to submit .php, .html and image files. As 250 000 indexed pages is over 6 times the actual number of real pages on my site, I am assuming that Google is indexing the same pages via several URLs. For example, each page on my site can be accessed via .php and .html, and in the case of the themes gallery, via 12 optional sorting options (name, author, date, views, downloads, rating), and then by ascending or descending. So quick math time:

Forums:
(5 449 posts X 4 [regular, printer friendly, lo-fi, email topic]) + 23 282 member info pages = 45 078 URLs

Themes Gallery:
4 470 pages + (4 470 X 12 optional sorting options) = 58 110 URLs

Total = (45 078 + 58 110) X 2 [.php and .html] = 206 376 URLs

Still below the reported 250 000 indexed. I can’t explain the discrepancy, other than to speculate that it could be due to images being indexed as URLs, or I’ve missed some forums view mode or variable that Google is finding.

This all assumes, of course, that all possible URLs have been submitted to Google. Under normal circumstances, using the weblog method, this would not be the case. But I believe that it is indeed the case here, as coincidentally, I had just used a third-party software program to generate static sitemap files prior to generating my first Google Sitemap by spydering my site the night before. So assuming that software did follow every available URL on my site, every URL would have been recorded in the weblog used to generate the Google Sitemap the next day. It was not my intention to do this and it was not an attempt to inflate the URLs indexed in my Google Sitemap, however it seems to have done just that. So just an FYI, if using the weblog method to generate your Sitemap file(s), you would seem to be able to get a jump-start by spydering your own site prior to generating your Google Sitemap.

Comments are closed.