The Ultimate Guide to SEO Testing

SEOs love to blame their bad reputation on the spammers, but you probably know as well as we do that this blame is misplaced. Businesses, in general, really don’t care how they get results. They just care about results. And that’s where most SEOs are screwing up – they suck at metrics.

Ecommerce and SaaS businesses already know why PPC gets so much more attention. It’s simple. PPC is easy to test. You split test your ads, headlines, and landing pages, and you get irrefutable evidence that improvements have been made in ROI.

You don’t get those kinds of clean experiments in SEO. In fact, most SEOs will tell you it’s flat out impossible to test SEO theories and that you have to approach the whole thing with pure intuition and the touchy-feely advice of industry experts.

What a load of crap.

I don’t want to downplay the crucial role that intuition plays in SEO (and marketing in general), but if you aren’t testing and proving your theories, you’re going to end up wasting resources on ineffective, and even counterproductive, strategies.

All of the legitimate experts in SEO know that tests, experiments, and metrics are the key to everything; but the subject almost never sees the light of day. I think it is time to change that, so today we’re going to talk about how to test and experiment with SEO ideas, stop wasting time, and start getting results.

How to “Split Test” an SEO Theory

You’ve probably heard a million times that it’s important to put your keyword in the title tag. I’ll tell you right now that, yes, you should. But here’s the thing. You shouldn’t take this on faith. Granted, if you throw everything at the wall, some things are going to stick. But, without tests, you’re going to do a lot more work than you need to do.

What if you want to test the notion that it is a good idea to put the keyword in the title tag? How would you go about it?

I’m about to explain how, but let me begin with this preface: it’s for testing an SEO theory. It’s not the kind of thing you want to do on your main sites, because it’s slow. Don’t worry; we’ll get to “real world” methods later in this post. For now, just realize that this is all about discovering individual changes you can make that you won’t necessarily read about on top industry blogs. It’s the kind of thing you can test on your microsites.

The biggest obstacle we face here is the fact that you can’t really split test in SEO. In a genuine split test, you need to have two identical samples, change just one condition for one of your samples, and then measure the results. That doesn’t happen in SEO because you can’t set up two identical websites with identical link profiles and then change just one thing. Odds are one of the sites wouldn’t even be indexed because of duplicate content issues. In fact, the identical link profiles would be so suspicious that it’s likely neither of the sites would rank.

So, yeah, we’re not going to tell you to do that.

Instead, you would go through a process that looks something like this:

I can’t stress enough that this laborious process isn’t for your main sites; it’s for testing your pet theories about how to boost traffic (or other metrics). This is crucial, because you can’t make any other changes to the site while you’re performing the test.

So, why do all this? Well, the ten day minimum comes from statistics. You need a total sample size of at least forty before you can run a test without having to worry about skewed data or anything like that. This method guarantees at least forty days of sample data to work with.

Why switch it back and forth? If you don’t, a change in traffic could be the result of a seasonal shift or a general trend. It’s still possible that some other random factor could cause a change, but the odds of it just happening to coincide with four site changes that don’t follow a weekly cycle or any other obvious pattern are pretty low.

And, finally, you need to pay attention to Google’s cached page because that’s the copy they’re using to rank you. For the most accurate results, you’ll ignore the days between when you made the change and when the change showed up in the cache, because nobody’s quite sure if Google might be using the data before it shows up in the public cache.

So, what’s a two sample t-test? Well, it basically uses the same math as a traditional split test, but your split test software probably isn’t going to be able to handle it for you, and that’s about all you need to know about it. I’m not trying to turn you into graduate level statisticians here.

There is a two sample t-test calculator here. This is how to use it:

In the calculator above you would enter daily visits to your test URL as your data points.

On the right side, you’ll see results like this:

In this example, we can be 95 percent sure that the average traffic in the experimental sample is between 17 and 30 visits higher than in the control sample. That’s a measurable difference.

As long as this interval doesn’t include zero, we know we have a conclusive result (95% of the time, anyway). If it’s negative, like it is here, it means that the experimental group won. So, yes, putting the keyword in the title tag was a good move.

A few thoughts on what you should be using for your data before we move on:

- Go to Advanced Segments in Google Analytics and choose “Non-paid Search Traffic.”

If you’re changing only a single page, limit the data to that page. Otherwise, the noise from the rest of your site is going to make it hard to find anything useful.
Ten days for each measurement is an absolute minimum, and sometimes Google’s algorithm is slow to react, even after the page is cached. You’ll want to at least eyeball a noticeable difference before switching cases back and forth. (And, yes, that makes this only quasi-scientific; this is business after all.)
If you want to get things done a bit quicker, you can avoid the back and forth and just take 20 straight days of measurements for each case. This means there’ll be less time waiting for things to cache. Just keep in mind that the possibility of other factors playing a part is going to be a lot higher.

So this is how you can discover new ways to boost search traffic. Like we said, this is a slow method that you really can’t employ on your main sites. It’s also impossible to test external signals, like links, without walking into “black hat” tactics like private link networks.

So what about the messier world of live sites that need to be constantly updated, and where high risk tactics just aren’t acceptable? Is there any way to test and experiment with these kinds of sites?

Yes.

SEO Tests for the Ugly Real World

The first thing I’m going to say about “real world” SEO testing is that it’s not about the algorithm. You will never reverse engineer the algorithm, not even using the method discussed above. What you can do is figure out how to maximize metrics that truly matter, like lifetime visitor value.

So let’s get this out of the way. SEO testing is not about testing “ranking factors.” It’s about testing strategies and tactics in order to get the biggest boost in your KPIs. (Hint: “traffic,” “twitter followers,” and “email subscribers” aren’t KPIs.)

This can be done without bringing progress on your site to a halt and then waiting for results from a single tweak. However, it can’t be done without clearly defined strategies and keen project management. Do not approach SEO marketing as a “throw everything against the wall and hope something sticks” free-for-all.

Here’s what you do:

There’s a lot going on here, so let’s go over these one by one in more detail:

Define a List of Tactics You Use

This is all about doing the hard work of figuring out what you actually do. While it’s a bad idea to get too granular, you’ll want to be specific. These are tactics that your team will need to act on. Here are a few examples:

Spend X hours contacting influencers on social networks
Spend X hours building links with guest posts
Spend X hours on keyword research
Incorporate Dr. Robert Cialdini’s 6 principles of influence into the content
Incorporate at least one image
Use a number in the headline
Use the keyword in the headline
Incorporate Dr. Jonah Berger’s 6 principles of virality into the content

Defining what tactics you use is beneficial outside of these kinds of experiments and helps you get consistent about what you do, so there’s no excuse for skipping this step.

A word of caution. Focus on principles or small, individual tweaks. Do not attempt to plan out exactly how each piece of content should be developed and promoted. There is a danger of falling into “big design up front,” and that’s a very bad idea.

It’s absolutely crucial to give your team flexibility with how these tactics are implemented, but it’s equally crucial to measure which tactics are being used and which are not.

Define an Experimental Tactic

This is the tactic you want to test. You want to test it for one of these two possible reasons:

A tactic is particularly costly, and you want to measure whether it is effective enough to justify the cost (or even effective at all)
You have an idea, and you want to test whether it is worth incorporating into your main strategy

Either reason is justifiable. However, it’s worth noting that, according to a massive study by the Harvard Business Review, the most successful businesses put increased revenue ahead of reduced costs.

You also should recognize the possibility that some of the tactics you use may hurt revenue.

And keep in mind that you also can define an experimental tactic as a ramped up or toned down version of a tactic you already use. In other words, instead of adding or removing a tactic from your arsenal to see if it makes a difference, you can try changing the “amplitude” to see if you get improved results (or reduced costs without losing results).

Brainstorm 40+ Content Ideas

There is no reason to change your brainstorming strategy here (unless it’s what you’re testing). Just do what you always do.

As you might have guessed, 40+ is the number we need in order to get a reliable statistical test, so that’s where this comes from.

If you’re a more “news” driven site, you may need to skip this step and just take the ideas as they come. You’ll still need to find a way to incorporate this next step, though.

Randomly Assign the Ideas to Two Groups

Split the ideas into two groups. Randomness is important here. We need to make sure we aren’t biasing things by pairing our favorite content ideas with our favorite tactical approach, or we could get a false result.

The same goes for your team. They need to be randomly assigned to the topics, or at least assigned before anybody knows which topics will employ which tactics.

You can use this random assignment tool to do this for you. Just assign each idea to a number. If you entered this data into the tool, you would get numbers 1 to 40 assigned to group A or B:

And these inputs would assign ideas 1 through 40 to one of 5 employees (or teams):

If you think that randomly assigning your teams is taking things a bit too far in the name of science, you can go ahead and let the teams pick their own ideas, or assign them yourself. If you do this, though, you should make the assignments before you split the ideas into two groups. (I know I’m repeating myself, but it’s important!)

Remember, the point is to avoid biasing the results by accidentally putting your best employees with the set of tactics you really want to succeed.

Alternate between Content Types

As previously mentioned, a sudden improvement in success may not be because of your strategy. The reason you alternate between the two strategies is so that you’re comparing results at the same time.

(In an ideal situation, you would publish and promote them at exactly the same time, but this isn’t usually very realistic.)

Run a Two Sample T-Test

We already covered how to do this, but it’s worth addressing a few extra pointers here:

Use a more meaningful metric than traffic as the outcome of your experiment, like “lifetime revenue minus lifetime cost” or even “employee satisfaction.” The point is to choose metrics you genuinely care about as a business, so that you can make informed, purposeful decisions.
Consider revisiting these tests later when you have more data to work with. The ultimate value of each strategy isn’t necessarily measureable immediately after the test.
It’s important to understand the difference between “statistical significance” and practical significance. If your confidence interval doesn’t include zero, you’re 95 percent sure something real is happening. But that doesn’t mean that what’s happening is important. Look at the size of the effect from your confidence interval as well. Is it big enough to matter? You want to focus on the tactics that make the biggest impact, not pour resources into minute, inconsequential gains.

It’s also not immediately obvious what data you should be pasting into each column of the calculator in this case, so let me elaborate on that. There actually are three basic ways you could approach this:

Treat each piece of content as one observation
Treat each day as one observation, split into the two groups
Treat each day for each piece of content as one observation

All of these are valid approaches, depending on what you’re going for. It’s even worth running the test all three ways to see if you get the same basic conclusion.

Just remember that in each of these three scenarios, you’re not quite measuring the same thing. This is what you’re doing with each test, respectively:

You’re comparing the average total value (to date) of a piece of content from one group with that of the other group
You’re comparing the average daily value of one strategy against another
You’re comparing the average daily value of a piece of content from one group with that of the other group

This can get a bit taxing on the brain so I’d rather you didn’t over think this. It’s better to test than not to test. If it doesn’t seem obvious which one you should test, test all of them.

What about Flywheel Strategies?

Oh boy, I opened Pandora’s box when I started talking about SEO testing, didn’t I?

We have talked about comparing individual pieces of content (and the strategies behind them) as a way to determine which tactics work and which ones don’t. But this ignores a big part of the value of SEO – cumulative effects.

If you’ve been building or attracting links for years, you don’t necessarily have to do any link building to get significant traffic on a new piece of content. This is where things start to get very difficult to measure.

It’s next to impossible to pinpoint which cumulative tactics are making things easier as time goes by. Does the traffic come easier because you have a better link profile? Because Google has more positive user data on you? Because your social presence is larger? Because you’ve literally grown the demand for content like yours on the web?

At the beginning of this article, I said I didn’t want to downplay the role of intuition. That applies here. Intuition about your audience, the future of your industry, the future of search engines, and the future of the market as a whole is going to play a major part in how successful you are in the long term.

(Not to mention luck, the most underrated thing that separates winners from losers.)

As a data-driven marketer, I tend to sidestep the issue of flywheel tactics. It’s not because I don’t believe they exist. They do, unquestionably. But no matter what you tell yourself, immediate results always take precedent. And no matter what you claim, you can’t predict the future.

I like to focus on things we can do right now that have a directly measureable impact on lifetime value. The strategies I’ve discussed so far address how to do this.

Cumulative effects are higher order factors. They’re basically impossible to measure in a real world setting. You can demonstrate that they do, in fact, exist by pointing to case studies or cumulative results over time, but you can’t identify which strategies were most responsible for the gains.

Don’t even try. Just know they’re there.

Does Traditional Testing Play a Part in SEO?

The punch line – yes.

First, it’s important to understand that user experience and design come before link building. Put simply, these factors increase the value of traffic from any source, including search engines. If you aren’t doing split tests and usability tests to maximize the value of your site, what’s the point of doing SEO in the first place?

But there’s even more to it than that. You can use traditional tests like these to improve your “pure” SEO.

First off, if you don’t know anything about split testing, take a look at this beginner’s guide. It should be easier to follow than some of the more advanced stuff we’ve been talking about today. I also have to say that I completely agree with Peter Sandeen; you should test strategies, not page elements. (Hopefully, this article has made that clear already). The basic structure of split testing looks like this:

If you don’t have software for this, Firepole Marketing has a free tool for split tests.

Now, most people run split tests to maximize conversions or other financial factors, but there’s also another interesting possibility:

You can split test landing pages for their ability to earn links.

For example:

You can run ads on two different pages to see which earns the most natural links
You can alternate between two different pages during outreach to see which page gets the most links
You can split test landing pages for their ability to propagate through social networks

Remember to use the randomization tool I mentioned earlier to keep these tests fair.

My personal preference is to use the value of referral traffic from these links as the primary metric. In the long run, I suspect that is the most useful way to measure the value of a link in Google’s eyes. In the short run, it’s clearly the most useful metric financially.

There are two different ways you can approach these kinds of split tests:

Use two different versions of the same page
Use two entirely different pages

Again, either one of these is perfectly fine, depending on what you’re trying to accomplish. If you’re testing the message, two completely different pages will work great. If you’re testing the amplitude or the mix of the messages, it’s better to test two different versions of the same page. Again, take a look at Peter Sandeen’s article to learn how this works.

If you do end up testing two different versions of the same page, you’ll want to make sure one of them is noindexed. After the test, you can remove the noindex tag and redirect it over to the other page, assuming the pages were similar enough to deserve the same links.

These tests can help you gain direct insight into the kinds of pages that are most likely to earn links, so be sure to record and learn from your results.

You Made It

This is a monster of a post, so congratulations on wading through this. The central point here is that SEO is testable. You can test theories, you can test strategies, and you can test individual pieces of content. As long as you recognize that people and direct results come first, it’s actually easier to measure than you probably think.

Use these tests to weed out what doesn’t work, and discover the truly remarkable road forward. I hope you find this useful enough to pass along to your teams and your audiences. Thanks for reading.

About the Author: Manish Dudharejia is the co-founder of E2M Solutions, an internet marketing agency specializing in ethical and organic search engine optimization, natural links acquisition, content marketing, and more. You can follow Manish on Twitter, or Quora.

The Ultimate Guide to SEO Testing: Yes, it is Possible

How to “Split Test” an SEO Theory