まめーじぇんと@Tech

技術ネタに関して (Android, GAE, Angular). Twitter: @mame01122

How to make Google Crawler crawl/index your Ajax (AngularJS) based web application

Let Google Crawler index your Ajax-base web application

Recently, I have tried to find out a solution to make the Google crawler to crawl and index my SPA (Single Page Application) based on Angular JS. And finally it has indexed my web application today!. Then, I'd like to share how I let it index my web application.

I known there are bunch of web sites for SEO (Search Engine Optiomization), but for Ajax, there are quite few. Then, I hope this article would help your person who has trouble for crawler / index for Ajax-based application.

What is minimum way to let the Google crawler index your application?

As you might know, there are a lot of tasks and options for SEO. Such as:

robots.txt
Site map
URL parameter
Structure data
Data highligher
HTML5Mode vs Hashbang

And so on..

And I had struggled to get to know what is mandatory and what is optional. And I came to my goal now. It seems that minimum work to index your application is:

Site map.

That's it. For others, they are optional items. Not needed.

And you can find description sitemap is mandatory for index. But I think it is difficult to recognize...
https://developers.google.com/webmasters/ajax-crawling/docs/specification

In order to crawl your site's URLs, a crawler must be able to find them"

(Neither bold font nor red font highlight.)

And some web page that describe SEO say that robots.txt is mandatory. But as you can see below, robotx.txt is not a mandatory item.
https://support.google.com/webmasters/answer/6062608?hl=ja

You only need a robots.txt file if your site includes content that you don't want Google or other search engines to index.

How to prepare sitemap.xml

And then, here is how to prepare sitemap.xml. Format is like this:

<url>
	<loc>https://xxx.yyy.com/#!/</loc>
	<lastmod>2015-06-18</lastmod>
	<priority>1.0</priority>
	<changefreq>daily</changefreq>
</url>

You have to write for each page you have.
And you can refer to other web sites if you want to get to know more details. And this can be generate many free services (If your application is NOT Ajax-based application)

But...

I don't know how I write Ajax-based (AngularJS) sitemap.xml...
If I use auto generate service for my Ajax-base application (which contains # and !), it shows an error...

I had to manage these problems.

Then, I'd like show you my sitemap.xmlformat. It's like this:
sitemap_mobile.xml

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
	xmlns:mobile="http://www.google.com/schemas/sitemap-mobile/1.0">
	<url>
		<loc>https://xxx.yyy.com/#!/</loc>
		<lastmod>2015-06-18</lastmod>
		<priority>1.0</priority>
		<changefreq>daily</changefreq>
	</url>
	<url>
		<loc>https://xxx.yyy.com/#!/signin</loc>
		<lastmod>2015-06-18</lastmod>
		<priority>1.0</priority>
		<changefreq>daily</changefreq>
	</url>
</urlset>

sitemap_pc.xml

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
            http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
	<url>
		<loc>https://xxx.yyy.com/#!/</loc>
		<lastmod>2015-06-18</lastmod>
		<priority>1.0</priority>
		<changefreq>daily</changefreq>
	</url>
	<url>
		<loc>https://xxx.yyy.com/#!/signin</loc>
		<lastmod>2015-06-18</lastmod>
		<priority>1.0</priority>
		<changefreq>daily</changefreq>
	</url>
</urlset>

As you already might know, Google announced that they prioritize mobile web site and I'd recommend to create sitemap.xml for mobile.

Once you create this xml file, you deploy it to your service and teach its URL to Google Web master tool (Of course. you can teach it from Web master tool)

In my case, my web application was indexed about one day. You can check if it is indexed by changing "http" or "https" part to "site". Like this: "site://xxx.yyy.com/#!/" If your application is indexed. indexed page list shall be shown up.

And "#" and "!" part depends on your settings. It could vary.

Now, you have finished all minimum preparation to let Crawler index your web application!