In the popular business book Lean Startup, the author Eric Ries encourages companies to test their way into new businesses – to create a minimally viable product (MVP), get customer reaction, iterate and reiterate, and learn as much as possible before investing the whole wad in a finished design.
Vertical specialty fashion retailing already lends itself to this type of continual hypothesis testing. Each retailer designs and introduces hundreds-to-thousands of new products a year and creates new campaigns, windows and floorsets at least monthly. Test and control groups are easily created among our scores of stores, the millions in our customer file, and tens of millions page views. Most of us talk the testing talk and walk it, too.
In some companies, the testing function is a well-oiled machine, rooted deep into planning, merchandising, ecom and marketing processes. The key danger for these companies is that some users will become too dependent on testing and less confident in their own abilities to pick messages or construct cohesive lines. But most companies in our industry face the opposite problem: their testing disciplines and infrastructure are less developed and exercised infrequently; their tests are often executed inexpertly, yielding results that are less accurate and less trusted.
Most companies in our industry treat testing far too casually. Few articulate their testing strategy, and none that I know of systematically captures its costs and benefits. While most tests are relatively easy to execute, consistently beneficial testing requires rigor across several disciplines. Consider, for each testing step, these rather typical test-deflating experiences, below:
|STEP||TYPICAL TEST-DEFLATING EXPERIENCE|
|1. Select the test and control groups||Can’t find enough predictive, high-traffic, “clean” stores, resulting in too small a test cell|
|2. Ensure the test is fully executed||Not enough hours allocated to some test stores, so the “treatment” was days late|
|3. Read for a sufficient period of time||I know the order needs to be placed Monday, but, no, Sunday sales alone will not make a good read.|
|4. Control for “circumstances”||A blizzard in Atlanta? Then Portland?|
|5. Make the right decision based on results||Wait, you are buying the teal and fuchsia anyway?|
Note, your test is only as good as the weakest link. You might execute perfectly on steps 2-5, but if the test or control group stinks, your whole test stinks. And no matter how good the test, if your merchant ignores the results (5.), then you’ve just wasted the company’s time and money.
Guess what else? By the time the test-revised items/colors/depth/ collection/floorset/ campaign/pricing land(s), your perfectly executed test may end up invalidated because the weather, fashion and/or competitive conditions may have drastically changed in the interim.
Given these challenges, it’s very important to use tests smartly and manage the function deliberately.
Managing the Testing Function
- Use first for strategy. The first and best use for testing should be for reducing the financial and operating risks of major strategic initiatives. As part of the annual planning cycle, some companies develop an explicit “learning agenda” for the upcoming year: for all of Year 2 launch goals, start testing for answers in Year 1, etc. Start systematically applying the principles of lean startup.
- Ration tactical testing. In organizations with a good testing function and curious merchants, there is typically much demand for tactical tests (color, graphic, style, depth, price, promotion, message, etc). These should be encouraged, but there is a natural limit to how many test and control cells a company can execute effectively. A chief merchant, for example, should prioritize tests on ways to grow or save strategic categories or enter new businesses and then leave the balance of testing capacity to picking styles or buy depth.
- Combine tests with other research. With some tests, you’ll want to know why something didn’t sell well or how the products can be improved. Consider combining the selling test with other research methods like store associate feedback, store intercepts or website pop-up surveys.
- Hindsight. Testing involves several steps (each prone to sets of complications) across several functions (planning, merchandising, marketing, supply chain, stores, ecom, finance). When discussing business results, mention the role (or lack of a role) of testing in those results. Of particular value is identifying how the merchant acted on the information provided by testing (and, if not, why) and then what the final results were. Testing will get better with this type of feedback, but it will not improve without it.
- Be patient. It often takes time to find the right sets of test and control stores for particular types of tests. Which “warm weather” stores are good predictors of Spring I? Are they the same for Spring II? Which stores best project denim sales? It may take 2-3 years to find good predictor stores for every type of test.
- Budget and track. Create a testing budget. Track expenses vs. the budget. Track test results and business impacts.
- Hire the right team. Skill in probability statistics is important. (Yes, you, too, can have Nate Silver-like projections: “The longer-inseam trouser had a 14% better sell-thru, but there’s only a 68% chance that it will outsell the shorter one.”) Equally important is for the testers to understand the business challenges of the merchants and marketers, and that they have the confidence to push back and suggest the best ways to design a test to solve a business problem. It’s also essential that the testing function have a very good relationship with Supply Chain, Allocation and Stores – the functions typically most responsible for executing the tests and providing updates on the quality and timing of the execution.
- Develop a testing culture and become a learning organization. Once you have items 1-7 in place, make testing a regular part of your brand growth and process improvement initiatives. Make every function develop its learning agenda and execute tests. For every, “I wonder if…”, there is a test (and, perhaps 68% of the time, an answer).