Website optimization is not just about installing an A/B testing tool, creating some (random) tests and hope for the better. It's one element of a testing process that builds on consistent gathering and testing knowledge about your customers and implementing all elements can seem a daunting task.
Last year I created a workflow to highlight the moving parts involved with website optimization to explain the process and resources needed and why it can take some time to implement properly. I used it in a public presentation (slide 29) as an illustration of the complexity of the testing process and how it was implemented.
Last week I found out that at least one agency is using this model for their optimization projects. And they told me they were quite happy with following my model. This was the first time I realized I had created a model for the website optimization workflow. WOW! ;).
The model might not be clear to everyone at first glance (besides the fact that the design could be improved). I think it is a good idea to highlight the different parts and explain the intention behind them. I tried to make it a model that is generally applicable, but I haven't flattened it or tried "dumbing it down". Dumbed down versions won't help you in real life and I want you to be able to actually apply this. You need to know the nitty-gritty of the system and its use in real life. PS1: I don't claim this to be a perfect system, just my practical working experience with website optimization. If you have any ideas on optimizing my optimization model, please let me know in the comments!PS2: When I say A/B testing, you can just as easily read Multivariate testing.
All ideas we get from the above input sources are collected into one system (we use Podio) which basically functions as an (advance) spreadsheet with all ideas. We even have a public form that is accessible by everyone internally that can be used to add ideas to this list.
We collect all ideas in one big list, check if they are feasible tests and enrich them with some basic information about what we need to know to be able to evaluate this testing idea. The 3 main parts we need to know are:
Potential: How much can you improve this specific aspect of your site in business terms (revenue, profit). Do you expect 5% uplift from this? 50%? 500%? (note: uplift flowing from the specific part you are testing, not the overall numbers). In my eyes, ‘potential’ is the hardest of the 3 to estimate and put into a number. The more experienced you get, the better you'll get at this one.
Importance: How big is this part in the overall system? For example: if you look at something in the header or menu, that will be seen by 100% of all users. If a product detail page gets 50% of all pageviews, you use 50% as a number. I basically just use the pageview share % for this number (I calculate the total number of pageviews over ALL websites and see how big a certain part is, helps to prioritize over multiple websites).
Ease: How easy is it for the development team to implement this feature (both as a test and eventually on production). Makes a big difference if it's just 1 hour or 3 weeks to develop, especially when development resources are scarce.
As you can see, the first letters from the word PIE and this model is taken from Widerfunnel. Read their blog How to Prioritize Conversion Rate Optimization Tests Using PIE for more detailed information about the PIE Score. Widerfunnel gives every letter a number from 1 to 10, adds all scores and divides them by 3 to get to the eventual PIE score. But you can also multiply them and/or give different parts a different weight. For instance: if you have ample supply of developer resources, the importance of the Easy can be limited by dividing it by 2 first). You'll have to play around with this to get to an optimal formula for your process. After calculating the PIE scores for the tests (which any spreadsheet program can do automatically for you) you can simply sort your tests on that column et voila: a list of test ideas in the order of priority :).
I'm skipping 2 orange blocks here, but will get to those later.
So for the simple (stupid) tests the flow pretty much looks like this:
The above steps are what I call dumb testing: you're testing different variants of some aspect of your site but you have no idea why. if you have loads of traffic, don't intend to learn anything about your visitors and don't need to be able to replicate improvements anywhere else: go ahead. Take a button, give it 64 different colors and see which color works best. You might get a lift. You probably won't. Either way: you'll have no idea how to build upon your new findings.
I believe there is a better way.
If you do some research about the specific problem you are trying to solve and create a proper hypothesis for it, you can test that (probably with multiple tests) and actually learn something that you can even apply to other communication channels and build upon with further tests.
For example: "When a CTA button has a high contrast with the rest of the page, this will create a clear next step for the customer with a higher click-through rate."
Now thát is something you can test and can learn from. Remember: conversion uplifts are nice, but if you want long-term success, aim for tests that are producing learnings you can build upon.
Note: button color tests are kinda simple/basic (and usually don't do that much) but it's just an easy example to use here.
Besides getting learnings, you need to know up-front when a test produces an uplift in site performance, and when it isn't. And what you will do with it either way.
If you're in this line of work (and you are, or else why are you reading this?), you know that this work is usually referred to Conversion Rate Optimization (CRO) and indeed usually we look at conversion rates as one of the KPIs for A/B testing success. Sure it's important, but don't base your success/fail based only on the conversion rate. At least also consider average order value and revenue per user. (Companies that are really ahead on this don't even look at single orders, they look at impact on NPS scores, profitability per user, succeeding orders, reviews posted etc. etc., so more aimed at lifetime value). Conversion rate on its own will never tell you a complete story and is not enough to base your decisions on.
Other than KPI's, write down the intended audience for this test, specific site segments, the optimal duration of the test, how much traffic you will send to it and if it will have enough statistical power. And what do you expect to happen?
We keep everything in 1 system (Podio), so all information is always in one place (like different variants used, test parameters, screenshots etc.). Just reporting that a test was 'significant' is not enough and doesn't give others enough information about the validity of your test. So if you share the test with other departments, at least give them insight in the following metrics:
The reports/learnings have a feedback loop back to the input for new A/B tests.
If you've set the parameters up front, the ROI for this one should be easy to calculate and make a (small) business case for it. You now know how much uplift a certain change will (probably) give and how long it will take for a developer to implement. If it's worthy to deploy you can forward it to the sprint backlog for the development team to deploy, otherwise you can go back to the drawing board.
I'm amazed you made it till the end of the article, you must be really awesome! Although it's abbreviation does make it sound impressive, I don't claim this Website Optimization Workflow to be a perfect system. It's just my experience working in the field. Since you're awesome you probably have some ideas on optimizing my WOW-model or have questions about it. What does your process look like? Please let me know in the comments or reach out on Twitter!
Thx to Arnout Hellemans, Andy Copleman, AJ Dichmann and Marcus Klinge for proofreading this article!
Recently I've seen some (often absolute) statements going around, generally in the line of "open source commerce platforms are a terrible idea". Now of course different solutions always have different pros and cons.
A hierarchy of evidence (or levels of evidence) is a heuristic used to rank the relative strength of results obtained from scientific research. I've created a version of this chart/pyramid applied to CRO which you can see below. It contains the options we have as optimizers and tools and methods we often use to gather data.