by Maskil on May 22, 2008

The little bit I know about search engine optimisation (SEO) convinced me that I needed a sitemap for my baby (Altneuland). I was therefore delighted to discover the XML Sitemaps website, and submitted Altneuland to the Free Online Sitemap Generator, which indexes a maximum of 500 pages per sitemap. As I’ve only just reached the 100 posts milestone, I wasn’t concerned with the limit.

The initial scanning reporting +/100 pages to scan and gave a reasonable time to complete. During what appeared to be a second phase, however, it reported close to 500 pages to be indexed, and the whole process eventually took about 30 minutes to complete! Processing stopped once the 500 page limit was reached.

The output looked a little suspect to me, so I’ve posted a query on their forum. Judging by the poor quality of the responses to other queries, however, I’m not confident that I’ll get a solution. I certainly wouldn’t want to use the output as my sitemap until my concerns have been addressed. Yes, I know it’s a freebie, but I assume they have a business model as to how to generate revenue from offering the free service. I certainly won’t be signing up for any premium service until I’m sure that the output is usable.

Of course, there’s still the question of how to tell the Google Webmaster Tools where the sitemap sits, not having access to "public_html/" folder of the site. One must also ask whether it’s really necessary; presumably Google would have a good handle on crawling Blogger blogs. The other alternative seems to be using the Atom/RSS feed as the sitemap. Again, would this be any better than simply allowing Google to do its stuff?

Here’s my query on the Free Online Sitemaps Generator forum:

My site is a Blogger-hosted blog. I have posted just over 100 posts, and would therefore not expect my sitemap to exceed maybe 120 pages. Instead, I’ve been given the error message "Maximum 500 pages Limit Exceeded". When I view the HTML Sitemap, I see many pages that are duplicated numerous times. Any idea why this should be occurring?

My results appeared here:

*** Update ***

As expected, the response to my query was not worth much. It read:

sitemap generator indexes all pages found on the site, so you get more URLs found (including archives etc).

Huh? I replied:

That explanation is not really satisfactory. This blog has been in existence since +/- July 2007. Posts are archived on a monthly basis, giving a maximum of perhaps 12 archive pages. That still doesn’t get us anywhere near 500 pages; perhaps 120 at the most. Even the most cursory look at the HTML sitemap you generated will show that most pages appear multiple times in the listing.

Does this have anything to do with other sites I link to? The widgets in the sidebar?

Please could you investigate further? You may find that this is a problem affecting other sitemaps generated, e.g. other Blogger blogs.

This in turn brought the following response:

Hi, yes you are getting multiple listings for your posts. The reason being is the script is picking up several links in your code. For example, if you look at the source code for you will see the following valid links:

You can view the forum topic here.

OK, so the answer is basically that XML Sitemaps indexes all the pages you link to from your site and checks them off against your limit. Is this the way a sitemap is supposed to work? Would it do more harm than good to submit such a thing to Google (even assuming one overcomes the other obstacles)? I’ll let you know once I find out.

In the meantime, my take is that XML Sitemaps is NOT a solution to create a sitemap for your Blogger blogs.


