The Novelty Effect: An Important Factor to Be Aware of When Running A/B Tests


Last week, I wrote an article running through a framework product managers can use to gain more confidence in the success of their feature without building the actual functionality behind it. This is an incredibly game changing concept for any Product Manager looking to move faster and get learnings more effectively.


To learn more about this way of thinking more broadly, a few people have recommended The Lean Startup as a great book that walks you through the concept of experimentation and ways you can apply it to building anything.


In discussing last week’s article, @stephen.m.delaney brought up a great point; while this approach relies on light weight experimentation to reduce cost and increase learnings, there are distinct risk factors with this approach, especially around the Novelty Effect.


The Novelty Effect



The novelty effect is the idea that the people who generally interact with your website or service may notice what you’re testing, and specifically interact with the feature just because it’s new. This of course would skew test results; you don’t want someone to interact with someone because it is shiny, but instead want to gauge legitimate user interaction during an A/B test. I’ve also seen the inverse happen, where returning customers can completely ignore a cool new feature because they are unfamiliar with it and instead favor sticking to what they know.


Basically when running AB tests, returning customers can react completely unnaturally to how they normally would just because the feature is new. Stephen Delaney also calls this the curiosity factor.


This is a serious challenge for getting accurate AB test results you can actually leverage and use to make decisions that impact the prioritization of your roadmap.


So how do you counteract the novelty effect?


3 Ways to Overcome the Novelty Effect


1. Test for long enough


One of the simplest ways of overcoming the novelty effect is to try to remove the novelty. In other words, you launch your AB test, and you ensure it stays live long enough for repeat customers to no longer be surprised by the new feature. Hopefully, they’ve gotten their initial curiosity clicks out of the way, and as they go through their user journey they are now interacting with the feature as if it was an embedded part of their experience.


At Nordstrom, we generally try to run our tests for two full weeks. We see significantly different performance in week one versus week two. By using two full calendar weeks, we also ensure to slightly normalize natural differences in user behavior between different days of the week.


A point to keep in mind here is that user traffic and behavior between a Monday and a Saturday differs greatly. If you’re thinking about keeping a brand new feature live on your experience year round based on just one short AB test, ideally you’d get a sampling of data that represents as closely as possible to the full year.


The length of time you choose to run your test can change for other reasons. For example if the test is reliant on an external team, promotion, or feature, we might need to cut the test short. Additionally if we are still seeing interesting results that don’t align with expectations, we’ll sometimes extend the length of our tests.


It is always a balance; shortening tests will increase team agility but will decrease test confidence because of factors like the novelty effect. However lengthening tests will decrease team agility, and increase confidence these test results can be extrapolated to the majority of the user base or remainder of the year.


2. Pay attention to what time of year you are testing


While some businesses, SaaS companies, or services are resistant to seasonal changes and holiday impacts; the majority are not. It is no secret that user behavior for Nordstrom (just like any other retailer) goes off the charts between Black Friday and Christmas.


When you are launching your test, make sure to ask yourself how external factors of seasonality might impact your test and increase the impact of the Novelty Effect.


If you can, test in a non seasonally impacted time of year that drives ideally a normal level of site traffic and typical customer purchase behavior. As an example in the retail space, a gift heavy season like the holidays can drive increases extremely price sensitive shoppers looking for gifts and stocking stuffers. This will have many impacts. Impacts including lower average order values, AND most importantly in this case, the data you would get from an AB test at this time wouldn’t be a clear representative of year round performance.





Or maybe you are in a SaaS company that services the tax industry. Of course year round behavior can’t be extrapolated from a test run right in the middle of tax preparation season.


If you do have concerns about seasonality, don’t let that stop you from testing, but be prepared to re-test in a non seasonal time to confirm the results you’ve seen during the season.


3. Look at different customer cohorts


Stephen Delaney brought up a great example of another way to protect against the Novelty Effect; looking at different customer cohorts.


For example, Stephen mentions specifically separating new users from repeat or existing users. New users would have a fresh perspective and generally shouldn’t be influenced by the addition of this new feature. Whereas a repeat customer might click on a new feature just because it is new, OR stay away from a feature because it is different from what they know.


Besides new versus repeat users which is likely the most helpful in terms of monitoring the Novelty Effect, you can also look at many other cohorts. Bracketing user spend into buckets of customer lifetime value could be another way to split customer data in post analysis to understand how this change is impacting your most valuable customers.





Lastly you can look at the product level. This goes for both folks in eCommerce selling physical products AND people selling SaaS or Services. In both types of businesses, there will be different categories, or different product lines / features people are interacting with. Generally users visiting your experience won’t interact with every single one, so bucketing users into common groups of product interaction can help provide more valuable insight.


For example, let’s pretend you are the Product Manager for google drive, and just implemented a new way of organizing files. However, you see that people who frequently use primarily google docs are reacting significantly differently than people that use google slides. This differentiation between products can tell you about potential use cases you’re missing, or raise the right questions to dig into the data further.





AB testing is critical for any fast moving technology company who wants to actually know how successful a new feature is. Keeping the impact of the Novelty Effect in mind can really make or break the future of your roadmap or success of your company.


About the author:

Ben Staples has over 7 years of Product Management and Product Marketing eCommerce experience. He is currently employed at Nordstrom as a Senior Product Manager responsible for their product pages on Nordstrom.com. Previously, Ben was a Senior Product Manager for Trunk Club responsible for their iOS and Android apps. Ben started his Product career as a Product Manager for Vistaprint where he was responsible for their cart and checkout experiences. Before leaving Vistaprint, Ben founded the Vistaprint Product Management guild with over 40 members. Learn more at www.Ben-Staples.com


I do Product Management consulting! Interested in finding out more? Want to get notified when my next Product article comes out? Interested in getting into Product Management but don't know how? Want book recommendations?! Contact me!