Mobile SEO Dynamic Serving Tests

In December 2011 Google announced they were adding smartphone crawling to their mobile crawler Googlebot-Mobile, which at the time handled feature phone crawling. The key sentence from the announcement above was:

The content crawled by smartphone Googlebot-Mobile will be used primarily to improve the user experience on mobile search. For example, the new crawler may discover content specifically optimized to be browsed on smartphones as well as smartphone-specific redirects.

This new functionality from Google produced some very interesting behaviour in the mobile search results. In March 2012, I provided the research that helped Cindy Krum put together a piece for Search Engine Land on the impacts of the new smartphone crawler from Google.

Fast forward to January 2014 and Google announced another substantial change to how they were going to handle smartphone crawling moving forward. To simplify configuration for webmasters and in response to the prolific growth in smartphone usage, Google moved smartphone crawling from Googlebot-Mobile into Googlebot.

Google subsequently published new recommendations and guidelines for making a website mobile friendly. After reading through this documentation, there were still quite a lot of questions unanswered and it wasn’t clear if anything about Google’s position in 2011 had changed now that smartphone crawling had a new home within Googlebot.

Current State Of Play

Since June 2012 responsive web design has been the recommended approach by Google, however they also support mobile specific websites and dynamic serving. Websites that use responsive web design are helpful for Google, as they crawl the site once with Googlebot and get all the information needed. It gets more complicated for website owners and Google when mobile specific websites or dynamic serving is involved and this is where Googlebot (smartphone) plays a role in helping Google understand what user experience a website delivers.

The role of Googlebot-Mobile when it was crawling with a smartphone user agent or now Googlebot (smartphone) is quite well understood for mobile specific websites. The smartphone crawler from Google will detect user agent based redirects, faulty redirects, mobile app download intersituals and a variety of other elements. Google uses this information to optimise the search experience for users by linking directly to mobile content and avoiding redirects where possible, correctly returning mobile optimised URLs thanks to rel=”alternate” tags and so forth.

What isn’t that well understood is how Google handles dynamic serving and what role Googlebot (smartphone) might play in that now that crawling responsibilities have been moved from Googlebot-Mobile over to Googlebot.

To help gain some additional clarity on the impact of dynamic serving in SEO, not just for mobile SEO but search engine optimisation in general – I put together a series of tests. The tests weren’t meant to be exhaustive but aimed to cover off enough functionality to better understand the impacts and risks of using dynamic serving and what role Googlebot and Googlebot (smartphone) may play in it.

Algorithms Determine If Smartphone Crawling Is Needed

Initially the tests were deployed onto Convergent Media, which runs WordPress and uses a responsive web design template. It only took a couple of days for Googlebot to discover and begin crawling through the test setup. After waiting a week, still no Googlebot (smartphone) – which I thought was odd at the time. Waiting, more waiting and more waiting, still no Googlebot (smartphone) crawling of any of the test URLs.

I reached out to John Mueller to ask about the situation I was seeing unfold and he said:

We don’t crawl everything as smartphone, but when we recognize it makes sense, we’ll do that. For responsive design, the good thing is that we don’t need to crawl it with a smartphone — once crawl is enough to get all versions.

Now it makes sense why Googlebot (smartphone) wasn’t visiting the test setup, Convergent Media uses a responsive web design and Google’s algorithms had decided it wasn’t needed.

It is worth noting that despite Googlebot discovering the test URLs and those URLs sending signals such a HTTP Vary response headers, it wasn’t strong enough a signal to trigger Googlebot (smartphone) to visit the site. If there were more pages in the site that were not using responsive web design, maybe that’d have caused Googlebot (smartphone) to visit for example but it wasn’t happening as part of the test setup.

Google have a vast amount of computing resource but double crawling every URL on the web was obviously out of the question. The comment from John when we recognize it makes sense got me thinking about the traits that Google might be looking for in a website that might trigger Googlebot (smartphone) to begin crawling a site such as:

discover common ‘mobile website’ style links on the desktop website
discover links to mobile app stores, suggesting website owner is switched on/aware of mobile specific user experiences
discover rel=”alternate” mobile tags on desktop
discover HTTP Vary response header on desktop
m/mobile subdomain verified in Google Webmaster Tools
discover m/mobile subdomain XML sitemap referenced via desktop robots.txt file as a cross domain submission
discover m/mobile subdomain
crawl m/mobile subdomain taking note of key HTML elements like meta viewport
crawl m/mobile subdomain taking note of common HTML/CSS/JavaScript frameworks in use, such as jQuery Mobile
…

If you’re wondering why your site isn’t getting the attention you think it deserves moving forward from Googlebot (smartphone), it’d be worth considering some of the above points and others with respect to sending Google the right kind of signals that you’re in the mobile space.

Googlebot (smartphone) Mobile SEO Tests

Not wanting to be dissuaded from completing the test I reached out to Dan Petrovic of Dejan SEO. While the Dejan SEO website runs over WordPress, it doesn’t currently have a responsive web design implemented. I asked Dan if he could check for Googlebot (smartphone) activity and if he’d be willing to host my test files, the answer was yes on both counts!

Eight tests were implemented with a goal to determine:

if there are crawling differences with/without the HTTP Vary response header
if URLs served only to Googlebot (smartphone) are used for discovery
if URLs crawled only by Googlebot (smartphone) are indexed
if meta robots noindex tags served to Googlebot (smartphone) are actioned
if rel=”canonical” tags served to Googlebot (smartphone) are actioned
if HTTP X-Robots-Tag noindex headers served to Googlebot (smartphone) are actioned
if HTTP Link rel=”canonical” response headers served to Googlebot (smartphone) are actioned
if anchor text seen by only Googlebot (smartphone) has an impact on rankings

Test 1

Aim: Determine if there are crawling differences with/without the HTTP Vary response header.

Implementation:

Four files were created, file 1 & 2 serve the same content to both Googlebot/Googlebot (smartphone), adding a Vary header in the latter. Files 3 & 4 serve different content based on the user agent, adding a Vary header in the latter.

Results:

No measurable crawling differences in the URLs. This could simply be that the sample size is very small or that adding a HTTP Vary response header on its own isn’t a sufficiently strong signal to influence crawl rate of Googlebot (smartphone).

Test 2

Aim: Determine if URLs served only to Googlebot (smartphone) are used for discovery.

Implementation:

Two test URLs were setup using dynamic serving, the latter of the two also uses the HTTP Vary header. The mobile versions seen by Googlebot (smartphone) both link to unique URLs not seen by Googlebot. The unique URLs are available for both Googlebot/Googlebot (smartphone) to crawl.

Results:

Only the two desktop URLs were indexed. The dynamic served content on those URLs can’t be queried for in Google successfully. The unique URLs only seen by Googlebot (smartphone) have been crawled by Googlebot (smartphone) only but have not been indexed.

Test 3

Aim: Determine if URLs crawled only by Googlebot (smartphone) are indexed.

Implementation:

Both tests use dynamic serving, the latter of the two also uses the HTTP Vary header. The mobile versions of both tests link to unique URLs not seen by Googlebot. The unique URLs are crawlable by Googlebot (smartphone), however Googlebot will receive a HTTP 403 (Forbidden) response code.

Results:

Only the two desktop URLs were indexed. Googlebot (smartphone) accessed the unique URL associated to file 1, however like Test 2 – it wasn’t indexed. Unfortunately Googlebot did not access either of the unique mobile URLs to receive a HTTP 403 response code however I suspect the outcome wouldn’t have changed.

Test 4

Aim: Determine if meta robots noindex tags served to Googlebot (smartphone) are actioned.

Implementation:

Both tests use dynamic serving, the latter also uses the HTTP Vary header. The mobile versions of both tests include a meta noindex tag that isn’t in the desktop HTML counterparts.

Results:

Both the desktop URLs are indexed. It appears the meta noindex tag served to Googlebot (smartphone) was ignored.

Test 5

Aim: Determine if rel=”canonical” tags served to Googlebot (smartphone) are actioned.

Implementation:

Tests 1 & 2 have no rel=”canonical” tag on the desktop content, but do on the mobile version and the latter also has a Vary header. Tests 3 & 4 have a different rel=”canonical” tags on desktop and mobile versions, the latter also includes a Vary header. Tests 5 & 6 return HTTP 403 (Forbidden) to Googlebot, Googlebot (smartphone) can crawl the URLs which both include a rel=”canonical” tag, the latter also has a Vary header.

Results:

The desktop URLs of test 1 and 2 were indexed, ignoring the rel=”canonical” served to Googlebot (smartphone). The desktop URLs of test 3 and 4 were indexed, ignoring the rel=”canonical” tag served to Googlebot (smartphone). Test URLs 5 and 6 which served HTTP 403 response codes to Googlebot were not indexed and Googlebot (smartphone) didn’t crawl those URLs.

Test 6

Aim: Determine if HTTP X-Robots-Tag noindex headers served to Googlebot (smartphone) are actioned.

Implementation:

Tests 1 & 2 use dynamic serving and have HTTP X-Robots-Tag noindex response headers added to the mobile version with test 2 also including a HTTP Vary response header.

Results:

Both desktop URLs were indexed, the HTTP X-Robots-Tag noindex response header was ignored when served to Googlebot (smartphone).

Test 7

Aim: Determine if HTTP Link rel=”canonical” response headers served to Googlebot (smartphone) are actioned.

Implementation:

Tests 1 & 2 use dynamic serving and include a HTTP Link rel=”canonical” response header in the mobile version of the page, the latter also includes HTTP Vary response headers.

Results:

The desktop URLs of both tests were indexed, ignoring the HTTP Link rel=”canonical” response headers served to the mobile versions. Like Test 5 above that was testing rel=”canonical” meta tags, the presence of the HTTP Link rel=”canonical” response header did trigger Googlebot (smartphone) to visit the referenced canonical URL.

Test 8

Aim: Determine if anchor text seen by only Googlebot (smartphone) has an impact on rankings.

Implementation:

Tests 1 & 2 use dynamic serving, the latter also has HTTP Vary headers. The mobile versions of each test link to a URL not seen by Googlebot with anchor text unrelated to the content on the linked URL. The unique URL linked to by the mobile versions of the content is accessible to Googlebot (smartphone), however Googlebot will receive a HTTP 403 (Forbidden) error trying to crawl the unique URLs.

Results:

The desktop URLs for each test were indexed, content served into the mobile specific versions couldn’t be queried. Like earlier tests, the HTTP 403 response code served to Googlebot for the unique mobile URLs meant those URLs weren’t indexed.

Conclusion

Now that the tests have been completed, it is helpful to have a much better understand of the role of Googlebot (smartphone) and its capabilities now that smartphone crawling has been moved over to Googlebot.

The key take away points from the above:

Simply adding a HTTP Vary response header on its own didn’t appear to have an impact on crawl rate or the outcomes of any subsequent tests. However, it should be added when user agent detection is being used as it is a strong recommendation from Google and it represents best practice to help intermediate HTTP caches on the internet.
Googlebot (smartphone) is being used for URL discovery, via in content, meta rel=”canonical” and HTTP Link rel=”canonical” response headers.
Googlebot (smartphone) appears to ignore meta robots noindex, HTTP meta robots noindex response headers, meta rel=”canonical” and HTTP Link rel=”canonical” directives. While an exhaustive list of all possible options was not tested, it is reasonable to assume that if common directives like meta noindex are ignored (something that Google will always honour via Googlebot), that all other meta style directives will also be ignored.
Googlebot (smartphone) does not appear to index unique content that is processes via dynamic serving. It does not appear as though it is possible to query Google and return that unique content via a desktop browser or mobile device.
Googlebot (smartphone) appears to be used primarily for understanding web site user experience (ie, does website X provide a mobile experience) and optimising the user experience where possible (ie, skipping redirects leading to m.domain.com, returning the correct URLs in search results via rel=”alternate” meta tags).
Despite the amazing growth of mobile, it appears that crawling, indexing and ranking is largely based upon data processed by Googlebot.

Budweiser – Rapid Fire SEO Audit

Budweiser Logo

SEMrush recently published an article titled Why Budweiser Gets An “F” In SEO.

A brand as massive as Budweiser should have no problems ranking in search engines for all manner of relevant terms, however based on the comments from Ryan Johnson – clearly that isn’t the case.

Following are some of the items identified after completing a rapid fire SEO website audit of budweiser.com to see what other sorts of issues might be causing them problems:

Domains
robots.txt
XML sitemaps
Internal Redirects
URL Canonicalisation
rel=”nofollow”
<title> tags
<hX> tags
Content
Structured Markup
Load Time Performance
Conclusion

Domains

Budweiser have a lot of sub-domains configured, which in and of itself isn’t a problem. However, it does become a problem when settings aren’t configured properly. In a few seconds the following list of sub-domains showed up:

www
m
mobile
riseasone
p12
madeinamerica
new
qa
qa.m
qa.new
qa.qr
qa.riseasone
origin
origin-www
qr

Many of the sub-domains are development versions of the site such as the qa.* or new. In an ideal world only the primary website would be crawled and indexed by search engines. Having so many copies of budweiser.com indexed poses a duplicate content issue for the site and could lead to problems down the road.

robots.txt

robots.txt files are used to control what content spiders are allowed to crawl, but they don’t control indexing (a common misconception). The robots.txt file used across many of the sub-domains listed above are incorrectly configured.

A few issues that appeared at a glance:

multiple blocks for the same user-agent
attempting to disallow a domain instead of a URL on the current domain
incorrect usage of the * and specific spider user agent blocks

In the first issue above, since the directives are split across multiple blocks – it could lead a spider to pick either block of directives without combining them, leaving half of the intended URLs blocked available for crawling.

The second issue is a massive problem, as Budweiser have disallow directives that attempt to block a domain from being crawled which isn’t a supported feature of the Robots Exclusion Protocol. As such, those domains which they had intended to block from crawling are available for crawling except for the URLs correctly specified within the robots.txt file.

Spiders that honor robots.txt will pick the most specific user-agent block for their crawler, falling back onto less specific, then into the wild card and if nothing is present they’ll assume the website is fully available for crawling.

The Budweiser robots.txt file has a block for Googlebot-Image, however has no disallow directive. That may be considered invalid and the block is ignored entirely, causing Googlebot-Image to fallback into a less specific block if one exists. In this particular instance, since the less specific blocks allow images to be crawled – it is unlikely to be causing a problem but it should be corrected as a matter of hygiene.

Following on from the above, there are two Google-specific blocks defined within the robots.txt:

Googlebot-Image
Google

The support documentation by Google on their list of crawlers doesn’t mention a crawler named ‘Google’. All of the spiders that do support falling back to a less specific Google spider, fallback to ‘Googlebot’. This behaviour documented by Google means that all directives specified in the block for the user agent ‘Google’ will be ignored and they’ll fallback to the wild card * entry.

Next on the agenda are disallow directives in the wild card block not present in the name specific blocks. Again, while not a problem in and of itself – after reviewing the content of the blocks, Budweiser’s intention versus what is actually happening aren’t in sync.

XML Sitemap

Good news, Budweiser are generating an XML sitemap.

Even more good news, it is linked from robots.txt for easy discovery by all relevant bots.

Bad news, crawling www.budweiser.com returned 81 web pages however only 20 of those pages are listed within the XML sitemap to help search engines discover, crawl and index the Budweiser content.

More bad news, the XML sitemap links to broken URLs.

Internal Redirects

Crawling www.budweiser.com with a tool identified over 6,500 internal redirects within the site.

Each time Google processes a redirect, a small amount of the equity in the 10-20% range, that Google would have passed to the linked URL is needlessly lost.

While Budweiser are correctly using HTTP 301 permanent redirects, they should simply update their internal links to point directly to the intended URL. In time the site will recover the lost equity and it has the added benefit of speeding the site up slightly for users as well.

URL Canonicalisation

URL canonicalisation is a process where one true URL is defined for a given resource.

To provide an example, search engines might find thousands of links to the home page of www.budweiser.com with marketing campaign tracking codes, which they consider completely separate URLs by default. Correctly configuring the rel=”canonical” meta tag or HTTP response header provides a mechanism to instruct search engines to merge all of the equity split over thousands of URLs into the true home page URL, boosting its strength and capacity to rank in the search results.

Budweiser are canonicalising the content throughout their site inconsistently, some URLs include a rel=”canonical” meta tag while others don’t. Crawling www.budweiser.com yielded 81 web pages however only 37 of them appeared to have the rel=”canonical” meta tag specified.

Additionally, there were examples where Budweiser have a rel=”canonical” tag specified with the wrong URL. For example the brewery locations page, www.budweiser.com/our-brand/brewery-locations.html has a canonical value of http://www.budweiser.com/our-brand/brew-location.html which produces a 404 error.

Since approximately 50% of the site doesn’t have a rel=”canonical” tag specified and in some instances, incorrectly configured, it’s possible there is a lot of equity or PageRank being squandered through poor configuration.

rel=”nofollow”

The rel=”nofollow” meta tag or attribute instructs Google to drop any links effected by the nofollow directive from their link graph. This is commonly used for links to third party websites that might be untrusted (ie, submitted via user generated content) or also for advertising.

By removing the effected links from the link graph, it has a knock on effect that those links inherently cannot play a role in boosting the search engine rankings of the linked URL, since the link is removed from the graph and no equity or PageRank can flow through that link.

Simplistically, when Google calculates how much PageRank or equity to flow through a URL, they look at the equity of the linking URL and divide that into the number of out links equally.

In years gone by, applying a rel=”nofollow” to an internal link meant that the equity that Google had originally allocated to that link would be reallocated to all other equity flowing out links, increasing the amount of equity flowing through those links. This technique of maximising the equity flowing through specific links within a site became known as PageRank sculpting.

Several years ago Google changed the behavioiur of how internal rel=”nofollow” links were handled and instead of reallocating the equity of the effected out links to all other equity flowing out links, that equity now vanishes or evaporates.

On quick inspection, Budweiser have internal rel=”nofollow” links pointing to the following URLs (maybe more):

/content/budweiser/en/loginpage.html
/content/budweiser/en/registerpage.html

<title> tags

The <title> tag is a strong indicator to Google about the content they should expect to find in a given page and is displayed prominently in the search results for users to evaluate whether or not a given URL would yield the content they are looking for.

Broadly speaking the <title> tags in use throughout the Budweiser website are okay. For example, reviewing the <title> tags used throughout the Budweiser Clydesdales blog shows that they lead in the a descriptive title of the page, they aren’t bloated, nor are they keyword stuffed.

However, there are number of high priority pages that could be improved, such as:

/shop.html
Shop
/our-beers/product-locator.html
Product Locator

<hX> tags

The rules of headings are pretty basic, no rocket science needed:

use one <h1> tag per page that describes the primary content of the page
if you need more headings, use <h2> through <h6>
nest heading tags as needed to give hierarchy to the document
use descriptive headings to help users and search engines alike understand it better

These simple rules are being broken throughout the Budweiser website:

first <h1> is actually wrapped around the logo
there are multiple <h1> tags in each page
there is no testing of <hX> tags to create hierarchy if and when needed
<h1> tags for the primary body copy area often aren’t descriptive or relevant to the content

Fortunately, this is a fast and simple problem to correct throughout the Budweiser website.

Content

Completing in depth* keyword research using Ubersuggest highlights a variety of topics that users want information on related to the Budweiser brand which is great news.

Unfortunately the Budweiser website suffers from an all too common condition of being brochureware, in that it looks good but has no real substance or content to help search engines.

Take for instance the Budweiser product Chelada. As a consumer, it’d be a reasonable expectation to head to Google and type in ‘chelada’ and find Budweiser in a top 5-10 positions but no. No problem, the consumer refines their query to ‘chelada beer’, still nothing. More refinement, ‘chelada beer budweiser’ and even with the Budweiser keyword – budweiser.com still doesn’t have a position 1 ranking.

However when reviewing the Chelada web page, Budweiser are giving Google nothing to work with – the only way they could have reasonably given Google less was to delete the page entirely from their website.

As an immediate step for Budweiser, they should perform keyword research for all of their products and build out the relevant content consumers are seeking. Budweiser could expect to see the Budweiser website bounce to the top of the search results if they do a good job of this.

* submit it once with the keyword ‘budweiser’ and scan the results

Structured Markup

Structured markup such as schema.org allows a publisher to provide rich meta data about the content on the page, which search engines like Google use to augment the search results. Common use cases that are very visible are elements like reviews that can produce star ratings in the search results.

Quickly clicking through budweiser.com, it appears they have a few opportunities for this:

brewery locations
beer nutritional information

With respect to the first point, the brewery page indicates that they have 12 brewery locations across the United States, however no additional information is provided about those locations. It’d be practical and helpful to users to provide their address, phone number, opening hours, if they offer tours, sell products and so forth. Some of this information could be marked up using the LocalBusiness schema.org element.

The extensive keyword research performed clearly indicated the consumers are interested in the nutritional information of the Budweiser products. If Budweiser were to provide detailed nutritional information, it’d help Google, help users and also allow them to mark up that information using the NutritionInformation schema.org object – which may lead to interesting universal objects appearing in the search results.

Load Time Performance

Users don’t like slow websites.

Assessing the Budweiser website with a variety of performance testing tools such as:

Google PageSpeed Insights
webpagetest.org
Y!Slow
GT Metrics

all reveal a common theme, budweiser.com could do with some serious attention.

Conclusion

This fast paced SEO audit has identified a variety of technical and on-site issues that are holding Budweiser back in the search results. No doubt if a more structured and rigorous audit was completed, the list would be even longer but the above items certainly represent an excellent starting point to improve the search engine rankings for budweiser.com.

Why Doesn’t Google Ignore Manipulative Links?

Recently Dan Petrovic published an excellent article titled The Great Link Paradox which discusses the changing behaviour of website owners regarding how they link from one page to another, the role Google is playing in this change and their fear mongering.

The key takeaway from the article is a call to action for Google:

At a risk of sounding like a broken record, I’m going to say it again, Google needs to abandon link-based penalties and gain enough confidence in its algorithms to simply ignore links they think are manipulative. The whole fear-based campaign they’re going for doesn’t really go well with the cute brand Google tries to maintain.

I’d like to talk about the following in the above quote, simply ignore links they think are manipulative but before that, let’s take a step back.

Fighting Web Spam

As Google crawls and indexes the internet, ingesting over 20 billion URLs daily, their systems identify and automatically take action on what they consider to be spam. In the How Google Search Works micro-site, Google published an interesting page about their spam fighting capabilities. Set out in the article are the different types of spam that they detect and take action on:

cloaking and/or sneaky redirects
hacked sites
hidden text and/or keyword stuffing
parked domains
pure spam
spammy free hosts and dynamic DNS providers
thin content with little or no added value
unnatural links from a site
unnatural links to a site
user generated spam

In addition to the above, Google also use humans as part of their spam fighting tools through the use of manual website reviews. If a website receives a manual review by a Google employee and after review it is deemed to have violated their guidelines, a manual penalty can be applied to the site.

Manual penalties come in two forms, site-wide or a partial penalty. The former obviously effects an entire website, all pages are going to be subject to the penalty, while the latter might effect a sub-folder or an individual page within the website.

What the actual impact of the penalty is varies as well, some penalties might see an individual page drop in rankings, an entire folder drop in rankings, an entire site drop in rankings, it might only affect non-brand keywords, it could be all keywords or it might cause Google to ignore a subset of inbound links to a site – there are a litany of options at Google’s disposal.

Devaluing Links

Now that Google can detect unnatural links from and to a given website, the next part of the problem is being able to devalue those links.

Google has had the capacity to devalue links and has been able to do so since at least January 2005 when they announced the rel=”nofollow” attribute in an attempt to curb comment spam.

For the uninitiated, normally when Google discovers a link from one page to another, they will calculate how much PageRank or equity should flow through that specific link to the linked URL. If a link as a rel=”nofollow” attribute applied, Google will completely ignore the link, as such no PageRank or equity flows to the linked URL and it does not impact organic search engine rankings.

In addition to algorithmically devaluing links explicitly marked with rel=”nofollow”, Google can devalue links via a manual action. For websites that have received a manual review, if Google doesn’t feel confident that the unnatural inbound links are deliberate or designed to manipulate the organic search results, they may devalue those inbound links without penalising the entire site.

Google Webmaster Tools Manual Action Partial Match Unnatural Inbound Links

Google Webmaster Tools: Manual Action – Partial Match For Unnatural Inbound Links

Why Aren’t Google Simply Ignoring Manipulative Links?

At this stage, it seems all of the ingredients exist:

can detect unnatural links from a site
can detect unnatural links to a site
can devalue links algorithmically via rel=”nofollow”
can devalue links via a manual penalty

With all of this technological capability, why don’t Google simple ignore or reduce the effect of any links that they deem to be manipulative? Why go to the effort of orchestrating a scare campaign around the impact of a good, bad or indifferent links? Why scare webmasters half to death about linking out to a relevant, third party websites such that their readers are disadvantaged because relevant links aren’t forthcoming?

The two obvious reasons that immediately come to mind:

Google can’t identify unnatural links with enough accuracy
Google doesn’t want to

Point 1 above doesn’t seem a likely candidate, the Google Penguin algorithm which rolled out in April 2012 was designed to target link profiles that were low quality, irrelevant or had over optimised link anchor text. If they are prepared to penalise a website by Google Penguin, it seems reasonable to assume that they have confidence in identifying unnatural links and taking action on them.

Point 2 at this stage remains the likely candidate, Google simply don’t want to flatly ignore all links that they determine are unnatural, whether by accident of poorly configured advertising, black hat link building tactics, over enthusiastic link building strategies or simply bad judgement.

What would happen if Google did simply ignore manipulative inbound links? Google would only count links that their algorithms determined were editorially earned. Search quality wouldn’t change, Google Penguin is designed to clean house periodically through algorithmic penalties and if Google simply ignored the very same links that Penguin targeted – those same websites wouldn’t haven’t risen to the top of the rankings only to get cut down by Penguin at a later date.

Google organic search results are meant to put forward the most relevant, best websites to meet a users query. Automatically ignoring manipulative links doesn’t change the search result quality but it also doesn’t provide any deterrent for spammers. With irrelevant links simply being ignored, a spammer is free to push their spam efforts into overdrive without consequence and Google wants their to be a consequence for deliberately violating their guidelines.

How effective was the 2005 addition of rel=”nofollow” in fighting comment spam by removing the reward of improved search rankings – no impact. Spam levels recorded by Akismet, the free spam detection company by Automattic, haven’t eased since they launched – in fact comment spam levels are growing at an alarming rate despite the fact that most WordPress blogs have rel=”nofollow” comment links. The parallels between removing the reward for comment spam and removing the reward of spammy links is striking and it didn’t work last time – why would Google expect it to work this time?

Why aren’t Google ignoring unnatural links automatically?

Google doesn’t like being manipulated, period.

Buffer Automatic Author Social Profile Detection

I use Buffer and their various extensions and integration’s while I’m browsing as a simple/easy way for me to share content. I love it, it’s a great tool in my opinion – highly recommend you give it a go if you haven’t already.

A little while ago, I shot the Buffer guys an email about a suggested feature, which they were good enough to respond to me on – awesome customer service. I thought I’d put it out in the public forum, maybe if it gets a little more attention it’ll make it off their ‘to do’ list onto the ‘must do’ list.

Before using Buffer, if I manually shared something – I’d try and find the social handle/profile for the author and include them in the share so they get notified such as the below if it was a tweet:

An excellent article about X by @author

One of the benefits of Buffer is that it allows me to share the content into multiple social networks in one go – awesome! However the shared snippet that goes into Twitter/LinkedIn defaults to the shared page’s <title> tag and you end up with a basic share containing a link and text description. For example, if you were to share my article about using hashed keywords instead of (not provided) keyword, it might look like this by default:

Using Hashed Keywords Instead Of (not provided) Keyword | Convergent Media http://buff.ly/S1MzQt

While you can customise the shared content, one of my primary reasons for using Buffer is about sharing it into multiple social networks and saving me time – as such I don’t spend a lot of time editing the default shared snippet. Typically I’d remove fluff, such as “| Convergent Media” in the above or rewrite it entirely if I didn’t feel it conveyed the message clearly enough but it’s typically a light touch approach.

As a result of my laziness (I’m just being honest), using Buffer has meant that I share into Twitter and LinkedIn (great) but what I share is a little less optimal than when I was only manually sharing into Twitter.

Suggested Feature To The Rescue

My feature suggestion for the Buffer guys was to look at the HTML and social widget configuration on a page that is being shared and see if they can determine the handle/profile of the author. If they can, do something smart, if they can’t – fall back to the default behaviour.

In an ideal world, if this was possible – when I share a piece of content Buffer would customise the share for each social network such that it’d use @author notation in Twitter and include the profile in LinkedIn, facebook amd Google+ if it ever arrives. Using this mechanism, the author of the content would be notified about the share in all relevant networks that Buffer could identify him/her in that the content was shared into.

This has a couple benefits:

It helps the person sharing the content as the author is going to get notified of the share. Often shares that don’t include a social profile for attribution will sadly go unnoticed by the author. Having the author being aware of the share will help strike up relationships with other awesome folks – network building than you very much.
The author is more likely to re-share my shared content with their followers as they get to show their followers other people love their product/service. A small amount of self horn tooting is okay in my book.
It’s good for the person sharing the content as they get their name in front the authors followers. Just like in point one above, this exposure – especially if it is semi regular can lead to the author’s followers becoming your followers – superb.

Determining The Author’s Social Profiles

This isn’t going to work all the time, in many cases it might not be possible for Buffer to accurately determine the author’s social profiles for inclusion into the shared snippet but I think it’d be worth a shot.

Following are some ways I can think of that Buffer could determine the author:

Check if the Twitter tweet button is on the page and if they have customised the tweet text to something like “{postname} by @author”.
Following on from the above, they could go one step further (not sure if this would be as relevant) but many websites also use the ‘related’ attribute in the Twitter share button to suggest to people sharing the content who to follow immediately after sharing the content.
Check if the author has an author profile on the site, in which case it is likely that they’ll have linked off to their various social media profiles .
Check if there is an author bio associated to the article, which might also include links to their various social profiles
If Google+ sharing becomes an option in the future, Buffer could check for links to Google+ with authorship markup attached.
Crawl all social profiles you can find associated to the author (if none are for Twitter/LinkedIn/facebook directly as an example) and see if they list their ‘other’ social profiles within the social profiles you were able to crawl.

In the case of Twitter, there are official tweet text parsing libraries provided by Twitter in various languages and there are many open source implementations of the Twitter text parsing libraries available in different languages as well.

So there you have it, a feature upgrade for Buffer that I think would be really worth while as it helps authors and social sharers alike to a mutually beneficial gain.

Do you think it’s a good idea, should Buffer add it to their queue of work or am I mad?

Unbeatable SEO Tips – Online Retailer Roadshow

At the beginning of September, Reed Exhibitions contacted me about a public speaking opportunity at Online Retailer Roadshow in Brisbane which came about via Dan Petrovic of Dejan SEO.

The one day conference went off without a hitch and Reed Exhibitions should be commended on finding such as a great list of speakers for the event. I really enjoyed the presentations from Faye, Steve and Josh – they are all doing some really interesting stuff:

Faye Ilhan
Head of Online – Dan Murphy’s
Steve Toth
Director of Omni-Channel & Mobility – Dick Smith Electronics
Joshua McNicol
Head of Marketing – Temple & Webster

The suggested topic for my presentation was great SEO tips for 2014 that’d genuinely help the attendees run a better online business and hopefully make more money online.

Given that the conference was specifically about online retailing, I focused a lot of my tips and suggestions on taking care of common issues with ecommerce product websites, website performance so users don’t have to wait for the site to load, producing great content that outpaces your competitors, link building and a couple suggestions around leveraging rich snippets and advanced social sharing implementations such as using Twitter Cards for example.

This was my first public speaking engagement with an audience of over 100 other professionals. I was quite nervous listening to the other presenters and watching the clock ticking down to my start time but once I got onto the stage – that disappeared and I think I did quite well. Two things I’ve learned out of the experience, speaking to a time allocation is an art – I went over my 30 minute limit and there is an impressive amount of time & effort required to produce the incredible slide decks you see on slideshare – for which I am left wanting but will definitely improve upon for my next presentation.

Unbeatable SEO Tips from Alistair Lattimore

Using Hashed Keywords Instead Of (not provided) Keyword

Google started providing encrypted search back in 2010 and while the connection between the user and Google was encrypted, Google were still passing the users search query through to websites. In October 2011, Google made a change whereby users logged into their Google Account on google.com would be automatically switched over to HTTPS and in March 2012, Google announced that they were rolling that same change out globally through all of their regional Google portals such as www.google.com.au.

Importantly, unlike the encrypted search product from Google released in 2010 that still passed the users search query through to the destination website, Google are not passing the users search query through to websites as of the changes rolled out in 2011 and subsequently in 2012.

(not provided) Keyword

(not provided) Keyword Growth

The lack of the keyword information being passed through to the destination website manifests itself in web statistics products like Google Analytics with a pseudo-search term known as (not provided).

To provide a high level example of what is happening, if a website received 5000 visits from 5000 different users, each with unique search phrases and all users were using Google secure search – a product like Google Analytics will report all of those 5000 visits against a single (not provided) keyword and aggregate all of the individual user metrics against that one keyword.

In more specific terms, below are some of the issues faced not having search query data:

you won’t know how many unique search queries and their respective volumes are entering a site
you can’t analyse keyword level metrics like pages/visit, bounce rate, conversion rate
you can’t find pages competing with one another inside a site and providing a poor user experience
you can’t optimise a landing page based on the users keyword
you won’t be able to understand user search behaviour in terms of their research/buy cycle
you’ll lose the ability to understand how your brand, product and generic phrases are related to one another
you’ll lose the ability to understand how different devices play a role in your marketing efforts to know that the research/buy cycle is different
you can’t report on goal completions or goal funnel completion by keyword
you can’t report on transactions, average order value or revenue by keyword
attribution for a major percentage of a sites traffic is greatly impacted

Hashed Keywords

Example Google Analytics Organic Keyword Report Using SHA-1 Hashing Function

I wondered long ago if Google might consider taking a small step back from their current stance and instead of sending no value for the query through to the destination website in the HTTP REFERER header that they might provide a unique hash for every keyword instead.

For those unaware, hashing algorithms take variable length inputs and output an associated, unique, fixed length output. There are a variety of different hashing functions available, but as an example of their use – SHA-1 is used in cryptography and is part of the security for HTTPS web traffic.

The important thing to understand about this idea, whether it is done through a hashing function or another mechanism, is that the goal would be to replace the users actual query with another unique value that doesn’t disclose or leak the users actual query for privacy reasons.

Using an approach like this isn’t going to address all of the issues raised in the bullet point list above or the longer list of issues the (not provided) keyword introduces, however it improves a businesses understanding of their website and their visitors behaviour without compromising a users right to privacy.

Unintended Side Effects

History will show that as we make advances in one area, often with only the best of intentions, that those best intentions are ultimately twisted, bent and adapted to drive some less than ideal outcomes.

The same can be seen with user privacy, the HTTP REFERER header was designed to help a website owner understand how users move through the internet at large and an individual website. When the HTTP specification was first developed, at the time I’m sure that the inventors didn’t imagine that in the future that simple concept was going to ultimately become a tool to attack a users privacy.

Now the question to ask would be, if Google were to take a couple of steps back from where they are currently and provide a hashed representation of the users query instead of no query data at all – could a website owner, opportunistic marketer or nefarious hacker misuse the hashed query against the user in some way? Could the hashed keyword value be reverse engineered to ascertain what the original users query was?

Is there hope for the future?

Social Media Isn’t Dead Yet

Recently Barry Adams wrote an article titled Social Media is Dead; Long Live SEO in which he puts forward the case that social media is a waste of time for most businesses and they should focus on what works. It should be noted that it isn’t Barry just making up sensational headlines, those comments are supported by research conducted by Forrester in late 2012 and also by Custora in 2013.

I’m not here to dispute that fact directly but I thought was worth throwing another discussion point into the melting pot for everyone to consider and that is, while the internet is a highly measurable place which marketers and businesses alike love, it does have limitations and one of those limitations is uniquely identifying a person.

While the technology wasn’t as advanced, the ability to identify an individual user 10 years ago was simpler – people had less frequent access to the internet and from fewer computers. Fast forward ten years and we are living in a multi-screen world, where an individual person might switch between phone, tablet, laptop, desktop, TV and more across the course of the day, all the while continuing what the person considers to be a single, unified experience.

All the different devices used by consumers today complicate the problem of uniquely identifying a person, as the unique identification is generally done through the use of browser cookies. That means that the same user viewing a website on their phone, tablet, laptop, desktop and TV are normally counted as separate users within web statistics software such as Google Analytics.

Barry replied to comment on his article where he mentioned that he has seen many different multi-channel attribution reports from Google Analytics that never register social media traffic sources in any significant way, even when looking at the assisted conversion report.

Google Analytics Multi-channel Funnel Assisted Conversions Report

You’re mother would have told you never to believe everything you see on TV, read in the newspapers or view in Google Analytics – okay, I’ll concede the last point. What many don’t realise when seeing a headline from companies like Forrester or a neat table like the one above, is that it is increasingly difficult to measure the impact of different traffic sources end to end due to the browser cookie issue I briefly mentioned above.

The Difficultly In Measuring Social Media Impact

For the sake of discussion, we’ll focus on both facebook and Twitter as they are the most widely used social networks. It may or may not come as a surprise, but both of these social networks report over 50% of their usage is via mobile devices.

Imagine that you’re Forrester and you’re trying to compile research about the impact of social media on businesses. When over 50% of the usage of the two biggest social networks in the world are powered by mobile and mobile conversion rates are well below their desktop browser counterparts – that alone provides a reason why it’s hard to directly measure the impact of social media.

Now consider the absurd scenario where a user returns to the same website they visited on their mobile via facebook but this time on their computer via a brand query in Google search, that ultimately leads to a conversion. It looks like search earned the conversion and they did play a role but so did facebook, however because it was across two different devices – multi channel attribution within Google Analytics fails, even when looking at the assisted conversions.

Worthy Case Study Material

Recently Google announced a major upgrade to Google Analytics named Universal Analytics. One of the big changes with Universal Analytics is that you can provide a unique user identifier into the tracking and use the identifier across devices.

The case study I want to see from someone like Forrester is a collection of big businesses who implement Universal Analytics alongside a raft of user interface components throughout their sites designed to capture something unique about those users and across all devices.

As an example, a user views your website after a referral from facebook on their mobile but doesn’t convert. The website could ask the user to sign up for an account with an incentive or to join an email database.

Now that you’ve got a unique identifier for the user, you’re now in a position to track the impact of the facebook referral if the user happens to come back, either on the same device or a different device (tablet, laptop, computer, TV, ..) and purchases using the same unique identifier they provided on their mobile, such as their email address.

I don’t know if social media is thriving or dying as far as businesses are concerned but I know that we won’t have that answer until everyone gets a lot better at media attribution across the board.

Visualising Googlebot Crawl With Excel

For most websites, search engines and more specifically Google represent a critical part of their traffic breakdown. It is common place to see Google delivering anywhere from 25% or over 80% of the traffic to different sized sites in many different verticals.

Matt Cutts was recently asked about what the most common SEO mistakes where and he lead off the list with the crawlability of a website. If Google can’t crawl through a website, it prohibits Google from indexing the content and will therefore have a serious impact on the discoverability of that content within Google search.

With the above in mind, it is important to understand how search engines crawl through a website. While it is possible to scan through log files manually, it isn’t very practical and it doesn’t provide an easy way to discover sections of your site that aren’t being crawled or are being crawled too heavily (spider traps) and this is where a heat map of crawl activity is useful:

Visualising Googlebot Crawl Activity With Excel & Conditional Formatting

In this article, we’ll briefly cover the following topics:

Microsoft Log Parser
Gaining Access To Your Log Files
Organising Your Log Files
Microsoft Log Parser Primer
Identifying Googlebot Crawl Activity
- Query: Find All URLs
- Query: Find All URLs Googlebot Accessed
Microsoft Excel
- VLOOKUP
- PivotTable
Visualisation

Microsoft Log Parser

Microsoft Log Parser is an old, little known general purpose utility to analyse a variety of log style formats, for which Microsoft describe it as:

Log parser is a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files, as well as key data sources on the Windows® operating system such as the Event Log, the Registry, the file system, and Active Directory®.

You tell Log Parser what information you need and how you want it processed. The results of your query can be custom-formatted in text based output, or they can be persisted to more specialty targets like SQL, SYSLOG, or a chart. The world is your database with Log Parser.

The latest version of Log Parser, version 2.2, was last released in back in 2005 and is available as a 1.4MB MSI from the Microsoft Download Centre. Operating system compatibility is stated as being Windows 2000, Windows XP Professional Edition & Windows Server 2003 but I run it on Windows 7, which suggests to me that it’ll probably run on Windows Vista and maybe even Windows 8.

In case you missed the really important point above that makes Microsoft Log Parser a great little utility, it allows you to run SQL like statements against your log files. A simple and familiar exercise might be to find broken links within your own website or to find 404 errors from broken inbound links.

Gaining Access To Your Log Files

Depending on the type of website you’re running and what environment you run it in, getting access to your log files can be the single biggest hurdle in this endeavor but you just need to be patient and persevere.

If you have your own web hosting, it is likely that you’ll have access to your server log files via your web hosting control panel software such as Cpanel or Plesk. That doesn’t necessarily mean that your hosting has been configured to actually log website access, as a lot of people turn it off to save a little disk space.

If your hosting doesn’t have logging enabled currently, first port of call is configuring that as it is obviously a prerequisite to visualising Googlebot crawl activity through your websites. Once configured, depending on the size of your website and how important it is in Google’s eyes – you may need to wait 4-6 weeks to get sufficient data to understand how Googlebot is accessing your website.

Corporate websites will invariably have web traffic logging enabled as it is helpful for debugging and compliance reasons. Getting access to the log files might require an email or two to your IT department or maybe a phone call to a senior system administrator. You’ll need to explain to them why you want access to the log files, as it will normally take some time for them to either organise security access for you to access that part of your corporate network or they may need to download/transfer them from your external web hosting to a convenient place for you to access them from.

Organising Your Log Files

To get the most out of this technique, you’ll want access to as many weeks or months of log files as possible. Once you download them from your web hosting provider or your IT department provides access to the log files, place them all in the same directory for log analysis by Microsoft Log Parser.

Directory Showing Daily Web Server Logs Broken Into 100MB Incremental Files

As you can see in the image above, the log files for the server I was working with generates log files with a consistent naming convention per day and produces a new incremental file for every 100MB of access logs. Your web server will probably generate a different sequence of daily, weekly or monthly log files but you should be able to put all of them into a directory without any hassle.

Microsoft Log Parser Primer

Log Parser by Microsoft is a command line utility which accepts arguments in from the command prompt to instruct it how to perform the log analysis. In the examples below, I’ve passed in three arguments to Log Parser, e, i and the query itself but you can provide as many as you need to get the desired output.

Within the query itself the columns you’d SELECT are the column headings out of the log file, so my example below I have a column heading named cs-uri-stem representing the URL without the domain information. Open one of your log files in a text editor and review the headings in the first row of the log file to find out what the column headings are to use within your SELECT statement.

Just like a SQL query in a relational database, you need to specify where to select from which under normal circumstances is a SQL database table. Log Parser maintains that same idiom, except you can select from an individual log file, where you’d provide the file name or you can select from a group of log files identified by a pattern. In the examples I’ve used below, you can see that the FROM statement has ex*, which matches the pattern in the Organising Your Log Files section above.

As you’d expect, Log Parser provides a way to restrict the set of log records to analyse with a WHERE statement and it works exactly the same way it does in a traditional SQL database. You can join multiple statements together with brackets to provide precedence along with AND or OR statements.

Conveniently Microsoft Log Parser also provides aggregate functions like COUNT, MAX, MIN, AVG and many more. This in turn suggests that Log Parse also supports other related aggregate functionality like GROUP BY and HAVING, which it does in addition to ORDER BY and a raft of other more complex functionality.

Importantly for larger log analysis, Log Parser also supports storing the output of the analysis somewhere which can be achieved by using the INTO keyword after the SELECT statement as you can see in the examples below. If you use the INTO keyword, whatever the output of the SELECT statement will be stored into the file specified, whether it is a single value or a multi-column, multi-row table of data.

Microsoft provide a Windows help document with Log Parser, which is located in the installation directory and provides a lot of help about the various options and how to combine them to get the output that you need.

Now that the super brief Log Parser primer is over and done with, time to charge forward.

Identifying Googlebot Crawl Activity

While Microsoft Log Parser is an incredible utility, it has a limitation that a normal SQL database doesn’t – it does not support joining two or more tables or queries together on a common value. That means to get the data we need to perform a Googlebot crawl analysis, we’ll need to perform two queries and merge them in Microsoft Excel using a simple VLOOKUP.

Some background context so the Log Parser queries make sense below, the website that the log files are from uses a human friendly URL structure with descriptive words in the URLs in a directory like structure which end with a forward slash. While it doesn’t happen a lot on this site, I’m lower casing the URLs to consolidate crawl activity into fewer URLs to get a better sense of Googlebot’s activity as it crawls through the site. Similarly I am deliberately ignoring query string arguments for this particular piece of analysis to consolidate crawl activity into fewer URLs. If there is a lot of crawl activity around a group of simplified URLs, it’ll show up in the visualisation and be easier to query for the specifics later.

Next up, the queries themselves – open a Command prompt by going START->RUN and entering cmd. Change directory to where you’ve stored all of your log files. Microsoft Log Parser is installed in the default location on my machine but change that accordingly if needed.

Query: Find all URLs

“c:\program files\log parser 2.2\LogParser.exe” -e:5 -i:W3C “SELECT TO_LOWERCASE(cs-uri-stem), date, count(*) INTO URLs.csv FROM ex* WHERE cs-uri-stem like ‘%/’ GROUP BY TO_LOWERCASE(cs-uri-stem), date ORDER BY TO_LOWERCASE(cs-uri-stem)”

Query: Find all URLs that Googlebot accessed

“c:\program files\log parser 2.2\LogParser.exe” -e:5 -i:W3C “SELECT TO_LOWERCASE(cs-uri-stem), date, count(*) INTO googlebot.csv FROM ex* WHERE cs-uri-stem like ‘%/’ AND cs(User-Agent) LIKE ‘%googlebot%’ GROUP BY TO_LOWERCASE(cs-uri-stem), date ORDER BY TO_LOWERCASE(cs-uri-stem)”

You could be as specific with the user agent string as you like, I’ve been very broad above. If you felt it necessary, you could filter out fake Googlebot traffic by performing a reverse DNS lookup on the IP address to verify it is a legitimate Googlebot crawler per the recommendation from Google.

Microsoft Excel

Open both the CSV files output from the queries above. Add a new Excel Worksheet named “Googlebot” to the URLs.csv file and paste into it the contents of googlebot.csv. This will allow you to merge the two queries easily into a single sheet of data that you can generate the visualisation from.

VLOOKUP

Since the queries above result in more than one line per URL for each day they were accessed, a new column needs to be added to work as a primary key for the VLOOKUP. Insert a new column at column A and title it “PK” in both worksheets. In cell A2 in both worksheets, add the following function and copy it down for all rows in both worksheets:

=CONCATENATE(B2, C2)

The CONCATENATE function will join two strings together in Excel. In our instance we want to join together the URL and the date it was accessed, so that the VLOOKUP function can access the correct Googlebot daily crawl value.

Sort both Excel worksheets by the newly created PK column A->Z. Make sure this step is carried out, as a VLOOKUP function doesn’t work as you expect if the tables of data you’re looking up data from aren’t sorted.

Add a new column named Googlebot to your URLs worksheet and in the first cell we’re going to add a VLOOKUP function to fetch the number of times a given URL was crawled by Googlebot on a given date from the Googlebot worksheet:

=IFERROR(VLOOKUP(A2,Googlebot!$A$2:$D$8281, 4), 0)

The outer IFERROR says if there is an error with the VLOOKUP function, return a 0. This is helpful since not all URLs within the URLs worksheet have been accessed by Googlebot. The inner VLOOKUP function looks up the value for A2, the URL & date value you added earlier in the first column from the rows and columns of the Googlebot worksheet minus the column headings. If you’re not familiar with the $ characters in between the Excel cell references, they cause the range to remain static when the function is copied down the worksheet.

The image above shows, left to right the URL with numeric date appended, actual URL, date the URL was crawled, number of times Googlebot crawled the URL and the total number of times the URL was accessed.

PivotTable

Microsoft Excel provides a piece of functionality named PivotTable, which essentially allows you to rotate or pivot your spreadsheet of information around a different point and perform actions on the pivoted information such as aggregate functions like sum, max, min or average.

In our example, we don’t need to perform calculations on the data – that was performed by Log Parser. Instead the pivot table is going to turn the date column from the URLs worksheet that has a unique date for each day within your log files and transform them by making each unique date a new column. For example, if you were analysing 30 days of crawl information, you’ll go from one column containing all 30 dates to having 30 columns representing each date.

Within the URLs worksheet, select the columns representing the URL, date and the number of times Googlebot crawled the URLs. Next click Insert from the Excel navigation and select Pivot Table (left most icon within the ribbon navigation in recent versions of Microsoft Office). Once selected, Excel will automatically select all rows and columns that you highlighed in the worksheet and pressing Ok will create a new worksheet with the pivot table in it ready for action.

Within the PivotTable Field List in the right column, place a check in each of the three columns of information imported into the pivot table. In the bottom of the right column, drag the fields around so that the date field is in the Column Labels section, URL is within the Row Labels section and the Googlebot crawl is within the Sum Values section. Initially Excel will default to using a count aggregate function, but needs to be updated to SUM by clicking the small down arrow to the right of the item, selecting Value Field Settings and picking SUM from the list.

Visualising Googlebot Crawl Excel Pivot Table Options

Visualisation

Now that the data has been prepared using the PivotTable functionality within Excel, we’re able to apply some sort of visual cue to the data to make it easier to understand what is happening. To solve that problem quickly and easily, we’re going to use Conditional Formatting that allows you to apply different visual cues to data based on the data itself.

Select the rows and columns that daily represent the daily crawl activity, don’t include the headings or the total column or it’ll skew the visualisation due to the large numbers in those columns. Once selected, click the Home primary navigation item and then Conditional Formatting, expand out Colour Scales and choose one you like. I chose the second item in the first row, as such URLs with lots of crawl activity will appear red or hot.

Tip
To increase the density of the visualisation in case you’ve chosen to visualise large date range, select the columns that represent the dates, right click Format Cells, then into the Alignment tab and set the text direction to 90 degrees or vertical.

Use the zoom functionality in the bottom right corner of Excel to zoom out if necessary and what you’re lead with is a heat map showing Googlebot crawl activity throughout the different URLs within your website over time.

Visualising Googlebot Crawl Activity With Excel & Conditional Formatting

Without a mechanism to visualise the crawl rate of Googlebot, it would be impossible to understand why the three URLs in the middle of the image were repeatedly crawled by Googlebot. Could this have been a surge in links off the back of a press release, maybe there was press coverage that didn’t link and that represents a fast, easily identifiable link building opportunity.

It is now dead simple to see what sections of your website aren’t getting crawled very often, what sections are getting crawled an appropriate amount and what sections could be burning up Googlebot crawl resources needlessly that could be spent crawling useful content in other sections of the website.

Go forth and plunder your web server logs!

Remarketing With Gmail Search Field Trial

The following article describes how to take advantage of changes in search behaviour across various Google products to provide free remarketing to potential customers who are shown an interest in your product or service.

Google+ has been an incredible source of inspiration for me since it came about. The number of really intelligent conversations I’ve read or been part of has been amazing. What I’m about to describe is the result of such a discussion started by Dan Manahan when he asked if anyone had tried to leverage Google Drive now that it was part of the personalised search experience.

Before jumping into how it all works, a quick bit of background about Google personalised search.

Search Plus Your World

Google began personalising the search results in 2007 when it started leveraging a users search history. In 2009 Google announced a product called social search, which used signed in users social connections through various social networks to help find higher quality, more relevant information from within your greater network of friends online.

Fast forward to January 2012 and Google announced Search Plus Your World (SPYW), as the next major evolution in personalising of the search results. The idea behind SPYW is simple, Google want to surface as much contextually relevant information about a users query from as many different sources as possible.

Search Plus Your World currently supports three types of personalisation:

Personal Results
Profiles in Search
People & Pages

As an example of what Google Search Plus Your World can do, if a person has uploaded photos to Google+ of their pet, a search for the pet name in Google will return an array of photos which include those personal pet photos alongside more generic images that Google thinks are relevant for that query. It could also include any Google+ posts from that person or their network that are relevant to the query. Google+ profiles will be shown directly in the search results, allowing a user to follow them quickly and similarly generic queries such as [music] would yield suggestions for people or pages to follow surrounding that topic.

Gmail Search Field Trial

During August 2012 Google announced and opened up a limited beta feature named Gmail Search Field Trial with little fanfare. The goal of the Gmail Search Field Trial is to remove the need for users to have to remember where to search for something.

Currently a user who is a heavy Google product user needs to search within each of the different Google products for resources that are of interest, for instance searching in Gmail for an order confirmation from Amazon, checking what the weekly sales were in a Google Spreadsheet and so on.

To address this issue, users who signed up for the Gmail Search Field Trial literally see information from various Google products in several different search boxes throughout Google’s vast product offering. For example searching for [amazon] would show a user their Amazon emails in the right hand side bar of the Google, [my flights] would show you your upcoming flights in great detail, [my events] would access Google Calendar and more. Searching within Gmail would yield results from Google Drive such as documents, spreadsheets and so on. In the image below you can see two Google Spreadsheets showing up in the Gmail search box after searching for [30 day].

A search in Gmail for [30 day] showing two relevant Google Spreadsheets in the search results

Remarketing

Remarketing allows an advertiser to show ads to users after they’ve had some amount of contact with the advertiser. Consider a user doing research for a holiday many week or months in advance of actually booking the holiday itself. After the user has been to Holiday Website A but not purchased the holiday, Holiday Website A could use remarketing to show those users ads as they browse the internet.

There are many different forms of remarketing available on the market but they all fundamentally rely on third party browser cookies as a mechanism to identify an individual user after they’ve left the advertisers website and are browsing elsewhere on the internet.

Google provide remarketing options through Google AdWords, there are companies built solely around remarketing such as AdRoll and more recently facebook entered the fray with their own product as well.

It is worth noting that while the internet has historically made a big deal when Google updates their terms of service, Search Plus Your World and Gmail Search Field Trial are the kinds of services that Google can provide their customers by reducing the number of terms of service and allowing customer data to seamlessly flow across or between different Google products.

How To Use Google Remarketing Without Paying For It

Now that you know that information stored within various Google products can show up for relevant queries by a user in various Google products, how do you leverage that to help your business? The answer, get your businesses content into your potential customers Google Account.

A practical example might be in order to really help crystallise the idea.

Scenario

Imagine an every day bank that provides home loans. Home loans are a big deal for most people and not the kind of decision people leap into, they take a lot of time to think through, are well researched and the purchase cycle could be several months.

To help keep the banks products and services in the users mind, they might provide various PDF documents for the user to download, such as comparisons of the various home loan products that they provide.

Traditionally a website would provide a call to action to download a PDF. However a savvy marketer might include an additional call to action of “Save to Google Drive” and thus provide a easy and natural way of getting the content into a users Google Account.

As soon as the user saves the home loan comparison PDF to their Google Drive, subsequent searches related to the content within the PDF will trigger the PDF to show up in various places throughout Google products as outlined above.

Reducing Friction

Google have done a lot of the hard work for marketers by providing useful entry points to Google Drive. For example, if you wanted to link to a research paper by Google describing their distributed storage system known as Bigtable – you can link to Google Drive with a URL such as:

http://docs.google.com/viewer?url=http%3A%2F%2Fresearch.google.com%2Farchive%2Fbigtable-osdi06.pdf

When the user clicks the link, the document will be opened into Google Drive and provide the user with a 1-click option to save the document to their Google Drive storage. Of course you don’t have to do this with only PDF documents, Google Drive supports many different file types that would be useful to the bank in distributing their content and to a user, such as a home loan repayment calculator in Microsoft Excel.

Measuring For Success

Google Drive uses HTTPS or Secure Sockets Layer (SSL) to encrypt the connection between the user and Google, just like internet banking or ecommerce stores do. This means that if a PDF document contained a hyperlink back to the bank website in the example above, no referral data will be leaked and the bank website won’t know the click originated via someone who stored the document in Google Drive.

To get around that problem, include links within the PDF documents that contain campaign tracking from your favourite web analytics package such as Google Analytics.

Since we’re studious marketers and want to measure the benefits of our efforts, I’d recommend using two near identical copies of a given PDF:

Standard PDF with standard campaign tracking
Standard PDF with Google Drive campaign tracking

The standard call to action within the website links to file number one above, while the “Save to Google Drive” option links to file number two.

To address duplicate content issues with the two nearly identical versions of each PDF document, ideally a canonical X-Robots-Tag HTTP response header would be implemented from the Google Drive file directed at the standard version. This will guarantee that the Google Drive version of the PDF documents won’t show up in Google searches and that any link equity that the Google Drive PDF documents might accrue will be transferred to the standard PDF document.

Understanding Purchase Cycles

Different products purchased online have different length sales cycles. As consumers move through the different steps in the purchase cycle, their behaviour will change as they become more familiar with the product they are researching or looking to buy.

Throughout this journey, consumer search behaviour also changes; at the start of the process users might search for very broad terms like [buy new house], that might morph into [buy new house with swimming pool in Melbourne]. Once the user knows what type of house they want and the approximate price, they’ll start searching for [mortgage repayment calculators] which will lead them into researching mortgages and home loans at large, first broadly but later with very specific requirements.

Businesses that have their online content development in tune with this varied and changing research behaviour could also use remarketing through Gmail Search Field Trial to understand where their content fits into the equation by including a date into each link. Google Analytics campaign tracking might be configured with the following options:

Source: google
Campaign: Google Drive Home Loan Comparison 20130315
Medium: retargeting

When reviewing campaign traffic to their site through Google Analytics, analysts will see dates within the reporting window from the past. Some simple math and suddenly the bank will know that a certain collection of PDF documents are useful to potential clients eight weeks before they make an online enquiry but are not helping to close the deal within 2 weeks for instance.

This obviously requires more work than simply adding in campaign tracking to set and forget but for certain verticals, this may well be worth the effort. The bank in the example might edit the PDFs keeping the same filename, once a fortnight to update the campaign tracking dates.

Content Development

Since Gmail Search Field Trial provides an avenue to keep your product front and centre when users are actively looking for your services, it makes sense that as a marketer you capitalise on that exposure wherever possible.

This presents an interesting exercise to leverage existing data from existing sources such as paid search campaigns or web analytics to understand user intent and lead time before micro and macro level goals within their website.

Building out custom multi-channel funnels within Google Analytics for different organic and paid search campaigns will help identify where in the purchase cycle a business should try getting their content into their potential customers Google Accounts for continued exposure.

Remember, just like standard Google search has best practices – providing maximum exposure through Gmail Search Field Trial has its own rules. For example the title of a Google Spreadsheet is critical to it showing up, so busineses should test adding different documents into Google Drive with different settings to understand how Google indexes those for search.

Conclusion

If you’re business already has a collection of non-HTML content for users to consume, check to see if the current format of those documents is supported by Google Drive. If they are, follow the steps above and see if your business can benefit from passive remarketing through Gmail Search Field Trial. If you don’t have non-HTML content and you’re business has a medium to long sales cycle, now would be a great time to start considering what resources you could develop that would slot nicely into Google Drive that will help your potential customers, which could lead to additional sales and exposure in the future.

Google Accounts Have Doubled In Under 12 Months

Why would anyone care about how many Google Accounts exist in the world? It turns out that the number of active Google Accounts has knock on effects to website owners and internet marketing at large right around the world.

Many don’t realise it, however when you perform a search for [nike running shoes] on Google and most other search engines, click through to a website, the website you visit knows that you searched for [nike running shoes]. This isn’t anything malicious, deceptive or devious that Google or most other search engines are doing, it is part of how the internet has worked since the dawn of time.

Website owners can use the search phrase that a user typed to find their site to better understand their customers needs, improve their website content, tailor the user experience based on the phrase the user typed in, understand how different advertising interacts with one another and much much more.

Unfortunately, there is a prerequisite for the users query to pass across to the website in question – the user needs to have done a search on http://www.google.com and not https://www.google.com. Notice the addition of the letter s in https in the second Google URL, that signifies that Google is using Secure Socket Layer (SSL) to encrypt the connection between the user and Google, just like what internet banking websites use.

Google started providing encrypted search back in 2010 and while the connection between the user and Google was encrypted, through technical jiggery-pokery Google were still passing the users search query through to websites. In October 2011, Google made a change whereby users logged into their Google Account on google.com would be automatically switched over to HTTPS. In March 2012, Google announced that they were rolling that same change out globally through all of their regional Google portals such as www.google.com.au. Importantly, unlike the encrypted search product from Google that was driven by user security, the most recent changes are about user privacy and as such, Google isn’t using technical jiggery-pokery to pass the users search query through to websites and instead they get no keyword data at all.

Enter the unassuming Google Account, it provides a pathway into a vast array of Google products. Once a user logs into their Google Account, the user will remained signed into their Google Account until they specifically log themselves out of their Google Account. As such, as more and more Google Accounts are created, more and more users begin using Google secure search by default and website owners and internet marketers receive less and less search query data to help improve their websites and user experience with.

Microsoft Internet Explorer – Measurement Tool Of Choice

Measuring how fast Google Account’s are growing without being on the inside of Google is a little tricky. No one network provider or ISP can see all traffic on the internet, which means that anytime someone outside of Google makes a claim about the size of Google or any other business on the internet – it is a guess, educated maybe but a guess none the less. Compounding that fact is that if a user has a Google Account, there is a good chance that they’ll be logged into it – which means that the users internet traffic is encrypted between them and Google, making it a little harder again for network providers or third parties to provide an accurate figure.

Under normal circumstances I’d be one of the first to knock Microsoft Internet Explorer as a browser, it is slow to load, slow to render, not that great on memory consumption, is prone to crashing, most versions aren’t web standards compliant and it doesn’t provide an ecosystem of plugins in the same way that other more modern browsers like Firefox or Chrome do. However I’m actually happy that Internet Explorer is a slow moving beast in this instance, as it makes measuring how fast Google Account are growing possible.

When Google announced that users signed into their Google Account would use HTTPS by default, it signaled a changing of the tides; Google was going to begin moving more and more Google products over to HTTPS. It was now but a matter of time before other businesses and vendors began using Google secure search around the world.

Internet marketers worst fears were realised when Firefox 14 announced that it’d moved the search box over to Google secure search in July 2012. Later Apple released iOS6 in September 2012 which also defaulted the Safari browser to use Google secure search by default. Most recently, Google Chrome announced that when version 25 is released shortly it will move searches in the omnibox over to Google secure search as well.

Important among the announcements above, each of those browsers are using Google secure search whether the user is logged into their Google Account or not. As such, it isn’t possible to discern through web analytics packages whether a Firefox user is logged into their Google Account or if they’ve simply performed a search in the latest version of Firefox.

Fortunately, Microsoft haven’t enabled Google secure search by default yet, which makes it a good measurement tool. That doesn’t mean that some users of Internet Explorer don’t manually use the HTTPS version of Google search, which would skew the numbers a little but I don’t think it’d be significant enough to break the trend.

Astute readers may be wondering why not produce a complex analysis of different browsers and browser versions, trying to capture the largest percentage of traffic to perform the analysis on? The answer for that is straight forward, the growth rate of Google Accounts in browsers other than Internet Explorer is even higher, which I put down to the users being more internet savvy due to the fact that they aren’t using Internet Explorer and probably have a higher likelihood to have a Google Account as a by product of that. As such, as Firefox and Chrome continue to eat market share from Internet Explorer – it is reasonable to assume that the growth rate will accelerate even faster.

Measuring Google Account Growth With Google Analytics

What I’ve proposed below won’t tell you how many Google user accounts exist as an absolute number but it does provide a guide as to how fast Google Accounts might be growing in the wild.

To do this you’ll want to create two Google Analytics Advanced Segments:

Source Google, Medium Organic, Browser Internet Explorer
Source Google, Medium Organic, Browser Internet Explorer, Keyword (not provided)

Enable both of the newly created Google Analytics Advanced Segments, pick a date range from January 2012 until present and use weekly data points to smooth out the graph a little.

Google Analytics Google Account Growth Advanced Segments

Use the Google Analytics Export feature and export the current view into a format you’re happy to work with in your spreadsheet application, I chose CSV but use whatever works for you. The way the data is exported from Google Analytics isn’t quite what we need, with a row for each weekly data point per advanced segment:

Google Account Growth Google Analytics Raw Export

In the below screenshot you can see I’ve done three changes to help with data formatting:

used a function to copy the values from the “Google, Organic, IE” rows into a new column
added a new column showing the percentage of Google, Organic, Internet Explorer, (not provided) against Google, Organic, Internet Exploer
filtered the Segment column and turned off Google, Organic, IE since the values are now in a new column thanks to point 1 above

Google Account Growth Excel

Highlight the two Google Analytics Advanced Segment columns and the percentage, chart them in Excel and you’ll quickly see the growth rate of Google Accounts for the audience relevant to that particular website.

In the graph below the purple line represents the percentage of (not provided) keyword against the total organic visits. You can see that in March 2012 when Google rolled out Google secure search globally that the percentage was at around 7% and toward the end of the 2012 and into 2013 you can see that it has climbed over 14%.

Google Account Growth 2012

Does the above graph showing the growth of Google Accounts mean that Google Accounts have actually doubled, no it doesn’t directly. It could also mean that the total number of Google Accounts remained unchanged and twice as many people started using their Google Account. Of course bother of those scenarios are equally unlikely and more likely is the steady growth of (not provided) above represents a combination of both Google Account growth and potentially that interface changes and the push of Google+ is driving higher levels of Google Account usage.

If the trend above holds over 2013, website owners and internet marketers are set to lose more keyword data from their web analytics packages through natural growth of Google Accounts. Of course rate at which the growth in Google Accounts impinges internet marketing efforts will be insignificant compared to Google Chrome switching over to Google secure search on the next update and the inevitability of Internet Explorer receiving an upgrade in a future service pack that uses Google secure search by default.

Do you have a no keyword internet marketing analysis strategy in place yet?