Website optimization is not just about installing an A/B testing tool, creating some (random) tests and hope for the better. It's one element of a testing process that builds on consistent gathering and testing knowledge about your customers and implementing all elements can seem a daunting task.
Last year I created a workflow to highlight the moving parts involved with website optimization to explain the process and resources needed and why it can take some time to implement properly. I used it in a public presentation (slide 29) as an illustration of the complexity of the testing process and how it was implemented.
Last week I found out that at least one agency is using this model for their optimization projects. And they told me they were quite happy with following my model. This was the first time I realized I had created a model for the website optimization workflow. WOW! ;).
The model might not be clear to everyone at first glance (besides the fact that the design could be improved). I think it is a good idea to highlight the different parts and explain the intention behind them. I tried to make it a model that is generally applicable, but I haven't flattened it or tried "dumbing it down". Dumbed down versions won't help you in real life and I want you to be able to actually apply this. You need to know the nitty-gritty of the system and its use in real life.
PS1: I don't claim this to be a perfect system, just my practical working experience with website optimization. If you have any ideas on optimizing my optimization model, please let me know in the comments!PS2: When I say A/B testing, you can just as easily read Multivariate testing.
1) Input for your optimization
Business Intelligence: If you have a BI department they should be able to serve you with loads of information on how different parts of the business is performing. In most companies, these departments start out being mainly focused on financial metrics, but more and more (detailed) product and user information is getting into the BI systems which can be really useful. And of course the analytics system that runs on your website will tell you a lot about on-site behavior.
General Research (outside your company): Research on everything concerning user behavior (online and offline) can be used for this. Do try to focus on the more scientific journals/blogs: there's a lot of crap out there that has no validity at all and won't help you much.
Corporate needs: The boss will want to have a say ;). it's great to have a HiPPO since they (should be) the ones having an overview of many businesses and industry aspects and can provide valuable input. Just don't take your bosses word on it and put their ideas through the testing process, just like all other ideas.
User Research: BI/Analytics is great when you want to know WHAT people are doing. But for the WHY you need different tools. Both qualitative (like lab testing and customer service feedback) and quantitative research (like online questionnaires) can give you very useful information on explaining user behavior & intent and also gives you a better feeling on how to prioritize your tests (more about that later on). Also wise to get a feedback loop from Customer support towards your inbox.
Product roadmap/ changes: New features on the roadmap? Changing the current offering? Great! Try to get them tested before they go live.
Strategy: You don't only wait for other departments to give you input. Your website/product/service should have a strategy and objectives which lead to website goals and KPIs to test and optimize.
Brainstorming: Great for getting ideas from different perspectives within the company. Get multiple disciplines in here, leave out managers (or keep a different session with them) and don't forget to include someone from the support staff. Besides ideas, this is also great to get more involvement and buy-in for your optimization efforts in the company.
Test results: Your own tests will probably lead to optimized or segmented versions of tests, or lead to other questions/hypotheses about the customer behavior. Besides that, published tests on other sites can function as a great input for your own tests.
All ideas we get from the above input sources are collected into one system (we use Podio) which basically functions as an (advance) spreadsheet with all ideas. We even have a public form that is accessible by everyone internally that can be used to add ideas to this list.
2) Prioritizing ideas
We collect all ideas in one big list, check if they are feasible tests and enrich them with some basic information about what we need to know to be able to evaluate this testing idea. The 3 main parts we need to know are:
Potential: How much can you improve this specific aspect of your site in business terms (revenue, profit). Do you expect 5% uplift from this? 50%? 500%? (note: uplift flowing from the specific part you are testing, not the overall numbers). In my eyes, ‘potential’ is the hardest of the 3 to estimate and put into a number. The more experienced you get, the better you'll get at this one.
Importance: How big is this part in the overall system? For example: if you look at something in the header or menu, that will be seen by 100% of all users. If a product detail page gets 50% of all pageviews, you use 50% as a number. I basically just use the pageview share % for this number (I calculate the total number of pageviews over ALL websites and see how big a certain part is, helps to prioritize over multiple websites).
Ease: How easy is it for the development team to implement this feature (both as a test and eventually on production). Makes a big difference if it's just 1 hour or 3 weeks to develop, especially when development resources are scarce.
As you can see, the first letters from the word PIE and this model is taken from Widerfunnel. Read their blog How to Prioritize Conversion Rate Optimization Tests Using PIE for more detailed information about the PIE Score.
Widerfunnel gives every letter a number from 1 to 10, adds all scores and divides them by 3 to get to the eventual PIE score. But you can also multiply them and/or give different parts a different weight. For instance: if you have ample supply of developer resources, the importance of the Easy can be limited by dividing it by 2 first). You'll have to play around with this to get to an optimal formula for your process.
After calculating the PIE scores for the tests (which any spreadsheet program can do automatically for you) you can simply sort your tests on that column et voila: a list of test ideas in the order of priority :).
Two notes on the PIE score
As the Widerfunnel blog says: "By using the PIE Framework, you’ll remove gut feeling from the decision and focus your team on an objective, relative ranking." The PIE score is certainly not perfect (see my second note below) but it really helps to explain internally why you are working on test idea X and not on test idea Y. If they still don't agree, they can take it up with their/your manager and when they are right, think about how you can optimize your PIE calculation.
Maybe your company is developing a new product/strategy and maybe even doing a small pivot in their offering. New products/features don't easily get a high PIE score because a lack of I (no current traffic) and usually take long to develop (high E). Talk to the business/managers how to handle this in your processes and if and how much of your time/tests should be put on this.
3) Creating tests
I'm skipping 2 orange blocks here, but will get to those later.
So for the simple (stupid) tests the flow pretty much looks like this:
Pick the idea with the highest PIE score, determine the test parameters (more on that later) and describe your test variants (so the developer know what to do and that you and others still knew what this test was doing months after the test is ended).
Create the variants. Usually done by a frontend developer with jQuery skills since that is how most testing tools work. Perfect time for you as a test manager to inform different stakeholders (business, content managers, support staff) of the upcoming tests.
Deploy your test according to your pre-set duration and segments (I assume that if you're a person interested and still reading this article, you know you shouldn't stop a test as soon as it becomes significant, or let it run forever).
Analyze your test. Preferably, you perform a second validation test to check if the effect is real.
The above steps are what I call dumb testing: you're testing different variants of some aspect of your site but you have no idea why. if you have loads of traffic, don't intend to learn anything about your visitors and don't need to be able to replicate improvements anywhere else: go ahead. Take a button, give it 64 different colors and see which color works best. You might get a lift. You probably won't. Either way: you'll have no idea how to build upon your new findings.
I believe there is a better way.
If you do some research about the specific problem you are trying to solve and create a proper hypothesis for it, you can test that (probably with multiple tests) and actually learn something that you can even apply to other communication channels and build upon with further tests.
For example: "When a CTA button has a high contrast with the rest of the page, this will create a clear next step for the customer with a higher click-through rate."
Now thát is something you can test and can learn from. Remember: conversion uplifts are nice, but if you want long-term success, aim for tests that are producing learnings you can build upon.
Note: button color tests are kinda simple/basic (and usually don't do that much) but it's just an easy example to use here.
Besides getting learnings, you need to know up-front when a test produces an uplift in site performance, and when it isn't. And what you will do with it either way.
If you're in this line of work (and you are, or else why are you reading this?), you know that this work is usually referred to Conversion Rate Optimization (CRO) and indeed usually we look at conversion rates as one of the KPIs for A/B testing success. Sure it's important, but don't base your success/fail based only on the conversion rate. At least also consider average order value and revenue per user. (Companies that are really ahead on this don't even look at single orders, they look at impact on NPS scores, profitability per user, succeeding orders, reviews posted etc. etc., so more aimed at lifetime value). Conversion rate on its own will never tell you a complete story and is not enough to base your decisions on.
Other than KPI's, write down the intended audience for this test, specific site segments, the optimal duration of the test, how much traffic you will send to it and if it will have enough statistical power. And what do you expect to happen?
4) Test Report
We keep everything in 1 system (Podio), so all information is always in one place (like different variants used, test parameters, screenshots etc.). Just reporting that a test was 'significant' is not enough and doesn't give others enough information about the validity of your test. So if you share the test with other departments, at least give them insight in the following metrics:
Number of participant per variant
Significance level (p-value)
Number of test repetitions
The reports/learnings have a feedback loop back to the input for new A/B tests.
If you've set the parameters up front, the ROI for this one should be easy to calculate and make a (small) business case for it. You now know how much uplift a certain change will (probably) give and how long it will take for a developer to implement. If it's worthy to deploy you can forward it to the sprint backlog for the development team to deploy, otherwise you can go back to the drawing board.
Don't test everything: You'll probably have limited resources in the development department so you can't test everything you'd want and have to make choices. I often encounter technical bugs or UI/UX issues that are simply going wrong, annoy users or let them make mistakes. I don't put these fixes through the test, these go straight to the developers sprint backlog.
I'm amazed you made it till the end of the article, you must be really awesome! Although it's abbreviation does make it sound impressive, I don't claim this Website Optimization Workflow to be a perfect system. It's just my experience working in the field. Since you're awesome you probably have some ideas on optimizing my WOW-model or have questions about it. What does your process look like? Please let me know in the comments or reach out on Twitter!
Thx to Arnout Hellemans, Andy Copleman, AJ Dichmann and Marcus Klinge for proofreading this article!
A hierarchy of evidence (or levels of evidence) is a heuristic used to rank the relative strength of results obtained from scientific research. I've created a version of this chart/pyramid applied to CRO which you can see below. It contains the options we have as optimizers and tools and methods we often use to gather data.
This is a bonus episode with Emily Robinson (Senior Data Scientist at Warby Parker) en Lukas Vermeer (Director of Experimentation at Booking.com).
In her earlier session that day, Emily said that real progress starts when you put your work online for others to see and comment on which in this case was about Github. Someone from the audience wondered how that works out in larger companies where a manager or even a legal department might not be overly joyous about that to say the least so I asked Emily about her thoughts on that.
Recorded live with audience pre-covid-19 at the Conversion Hotel conference in november 2019 on the island of Texel in The Netherlands.
(oorspronkelijk gepubliceerd op https://www.cro.cafe/)