In the edition of Path TV, Avinash Conda and Michael Stearne discuss XML sitemaps: What are they, what are they for and who cares!?
An XML sitemap is a necessary aspect to your website as it provides search engines a more efficient way to crawling your site. Essentially, a site map is a plain text file that sits on your server and provides search engines a list of all the pages that sit on your website. Rather than having a search engine crawl your site and eventually find those landing pages, a sitemap is a more efficient way to direct a search engine to your site. How can an XML sitemap improve your indexation? Google uses the data in your sitemaps to learn about your site’s structure and may use them as a factor in determining the canonical version of your URL.
Without an XML sitemap, Google or other search engines will have to organically crawl your website and hopefully locate the pages on your site. Avinash also adds that the one limitation of the sitemap is you can only provide it with 50,000 URLS. Sites such as Amazon, or Zappos are forced to create a sitemap index, which is essentially a sitemap within a sitemap. This entails making sitemaps for all of the products and placing those sitemaps in the sitemap index. Eventually, your sitemap index will point to your sitemaps which will point to your website.
To learn more about SEO tips and best practices, follow the Path Interactive blog.
0:13 Michael Stearne – Hello and welcome another exciting edition of Path TV.
0:15 I am your co-host Michael Stearne and this is my co-host Avinash Conda.
And we wanted to talk today about XML sitemap. What are they? What are they for? Who cares?
0:35 Avinash Conda – Who care; the search engine care. Alright what are XML sitemap; they are nothing but an index to your website.
4:47 Who looks at them? The search engines do. If you have like hundred pages you want search engines to index all of those pages. You create a sitemap; tell the search engines that this is the index to my website. So that’s the definition of XML sitemap.
1:05 Michael Stearne – So if you go to sitemap.org or sitemaps.org either one of them, there is specification there that defines what should be in the sitemap.
1:19 Sitemap is simply a plain text file that’s in your server that gives Google or Bing or yahoo or whatever search engine a list of all the detail that exist in your website.
1:38 So if there is no explicit list like this, Google will have to organically sort of crawl around your website and hopefully find every single page that’s in your site.
1:50 Giving Google a XML sitemap directly helps in to exactly know what you want to be indexed quicker and more efficiently than if they were just crawling around the website.
2:05 Avinash Conda – Yeah and there is actually few limitations for sitemap that you can give-in only fifty thousand URls. There is a size limit. It’s 10 MB.
2:20 You can’t go over than 10MB per sitemaps. So what about this huge site with like millions of pages like Amazons and all?
2:28 They actually have something called sitemap index which is an index for the sitemaps. So you have first suppose product in ten different categories than you can have millions of product in each different category.
2:45 You make different sitemaps for all of the products. Segregate them with a good naming convention and put all of that in sitemap-index saying sitemap one is this, sitemap 2 is this and sitemap three is this.
3:00 So basically you can have a sitemap-index which then points to a sitemap which than points to your URLs. Because we have that 10MB limit each sitemap-index can index around 2.5 billion URLs which is really huge.
3:17 Michael Stearne – Oh, its 10 megabytes per sitemap in total?
3:24 Avinash Conda – No 10 megabyte per site map, so a sitemap-index is again a sitemap which again accommodates 10 megabyte. So basically 10 megabyte of fifty thousand URLs pointing to another fifty thousand….
3:39 Michael Stearne – Pointing to thousands of other sitemaps up to fifty thousand other sitemaps.
3:45 Avinash Conda – Yeah, fifty thousand and fifty thousand gives you 2.5 billion. So that’s the limit you can get.
3:53 Michael Stearne – So that’s the total addressable namespace for all sitemaps for a domain, for all URLs of a domain. So I guess the search engine will only index 2.5 billion pages.
4:06 Avinash Conda – No the search engine will index more.
4:11 Michael Stearne – Yeah we can only give them 2.5 billon.
4:12 Avinash Conda – per sitemap-index.
4:14 Michael Stearne – But you can have one level of sitemap-index. Correct?
4:19 Avinash Conda – Well we can have to more than that.
4:20 Michael Stearne – So if you have more than 2.5 billion pages on your site. You should be watching more advance videos.
4:41 But that is a start as to what is a sitemap for and why you can go through the trouble to really get Google and Bing more aware of pages on your site without having them to organically crawl and try to find their way around your site from a page.
5:00 That’s about it.
5:03 And Avinash with his brand new sign-off. Don’t you have a sign-off?
5:08 Avinash Conda – I don’t have. Thank you for watching whenever you are watching whether it is good morning, good evening or goodnight. Thank you.