Passenger team

Experimentation is vital: Improving apps with data-driven decisions

31st Oct 2016

With everything we build here at Passenger we ask ourselves, where’s the value? We like to map out objectives, run tests and evaluate the outcome. Successful tests become part of the wider product suite, while unsuccessful tests get rolled back and the lessons learnt incorporated into future features.

Experimentation is vital to the continued progression and evolution of the products we develop, but it doesn’t have to be as scary as it sounds. One technique we use is A/B testing, in which a percentage of users are split into two groups. Users in the A control group do not receive any changes whilst users in the B variation group get to trial the new feature.

We like to work to the KISS methodology (Keep It Simple, Stupid) promoting simplicity in favour of unnecessary complexity. It may seem counter-intuitive when working with customers on behalf of their users, but a lot of experimentation is around removing functionality and streamlining any perceived complexity where possible. In a world where developers are usually paid to deliver features, this approach champions the users and keeps what they are trying to achieve with the apps firmly in the forefront of our minds.

During our tests both groups of users are still able to use the app in the way they know and love, but a few lucky ones will get an extra feature or tweaked interface. For the most part the user has no knowledge that they are participating in an experiment and at the end of a defined period we draw our conclusions based on the data collected instead of rolling the dice based on a gut feeling.

To make an informed decision a metric has to be recorded and the time frame agreed. Conversion rate is one of the more popular metrics to measure, as this registers if a user has been able to complete the desired task (e.g. purchase).

Possible outcomes of an experiment are:

Conversion rate of B is higher than A, and importantly the difference is statistically significant. This means that the new feature is actually helpful to users and therefore it is worth deploying.
Conversion rate of B is slightly higher than A, but the difference is not statistically significant. In this case the new feature seems to add some value, but results are not conclusive. From the statistical point of view, this situation is similar to the case of conversion rate A being only slightly better than B. Therefore, the final decision of deploying or discarding the new feature lies with the product owner.
If the conversion rate in the control group (A) is higher than in the variation (B), and the test results are statistically significant we can discard the new feature. From the development point of view this means less code to maintain and therefore less complex (and less expensive) software.

Let’s have a look at an example from one of our experiments. We recently added a search indicator to the result tabs, as we weren’t convinced that users were aware that results were being filtered in the all the tabs simultaneously. We thought that showing the number of search results on each tab, and that number dynamically changing as the user types, would lead to a higher number of users switching tabs to select results from the non-default tab. After running the experiment for two weeks the results were not conclusive.

The conversion rate in the control group proved slightly better than in the variation (18% vs 17%) disproving our expectation. Because the new indicator hadn’t provided a significant improvement, we decided that it didn’t make sense to add more noise to the interface.

Through careful experimentation in the right areas we’ve seen an uplift in usage, mobile ticket purchases and improved app store ratings.

If you have an app in the market and you’re considering how you and your users can get more value from it, get in touch. We’d be happy to talk more about how we work and help to keep our customers continually moving forward through experimentation.