The Sweethome and The Wirecutter (part of The New York Times Company) are lists of the best stuff. When readers buy our independently chosen editorial picks, we earn affiliate commissions that support our work. Here is more on what we do.

For Product Reviews, Tests Aren’t Everything

Testing is a great way to learn about a product, especially about how it compares with its competitors, whether the results show which detergents remove stains best, for instance, or which drill has the strongest battery. But controlled tests tell only part of a product’s whole story—it’s just as important to understand what information they leave out.

Test data may not reflect reality

For example, editorial testing labs in the US all review washing machines using tests that resemble the procedures set out by the Association of Home Appliance Manufacturers. Those procedures include the use of identical prestained test strips, wear-and-tear sheets, cotton “filler” linens, and powdered detergent. All of it, down to how the testing materials are loaded into the drum, follows an industry-standard method.

But each editorial lab uses a slightly different procedure to test cleaning ability. For example, CNET bases its washing machine scoring on an 8-pound load on the normal cycle, run three times to suss out anomalies. Consumer Reports also scores based on an 8-pound, normal-cycle load. In contrast, Reviewed.com scores based on 8-pound loads on the normal, heavy, cotton/whites, and delicates cycles, plus one 4-pound quick wash.

Taking every good editorial source’s test results, we end up with data on five wash environments. In the real world, however, there are more than “900 possible environments” inside a washing machine, “between water temperature, the size of the load, the mix of the fabrics, the level of the soil, the temperature of the water,” said Tracey Long, senior communication manager at P&G.

Here are just a few of the ways in which your laundry loads probably differ from a lab’s:

  • Your loads are probably much larger than 8 pounds. The latest high-efficiency washers have at least double the capacity of older, agitator-style washers. The amount of fabric in the drum affects the tumbling motion and can thus impact cleaning performance.
  • You probably wash mixed loads of cotton and polyester (or other synthetic materials). Since polyester is more abrasive than cotton, it might actually improve cleaning performance but could also cause more wear on cotton items.
  • You’re definitely using a different detergent than the testers, because their blend is not commercially available. They also use about 59 grams of their powder per load. That’s roughly equivalent to 4 tablespoons of liquid, but the leading suggestion is that you should use a maximum of 2 tablespoons per load.
  • You probably wash more colors than just plain white.
  • You won’t be cleaning anything nearly as filthy as the stain strips they use for testing. Most people need to wash away sweat (an oil-based stain), tannin (think coffee and red wine), dye (most juices, grass stains), or protein (like blood). Your real-life clothes are never as disgustingly saturated with stains as the test strips are. The ratio of each stain can affect the way detergent performs, and each type of stain responds differently to mechanical agitation from your washer. So the test-strip results may not say much at all about how a washer will perform on your dirty clothes.

In the tests’ defense, Keith Barry of Reviewed.com (a friend and former colleague) said the strips represent the real world “in the aggregate,” even if the test conditions don’t resemble an actual load of laundry. There’s just no substitute for the data that Consumer Reports and Reviewed.com publish, even with the process’s flaws.

But as I tried to figure out why some of our readers told us their new washers sucked, I kept stumbling on data showing that laundry is much more complicated than one test or even a couple of tests can predict. As Barry told me, “There are so many different factors at play inside of a washing machine. There’s so much chemistry, so much physics, so much biology.” And that’s just about wash performance—we could go deeper on tests for gentleness, noise, and vibration, too.

Good test results don’t matter if a product doesn’t last

Like that dope Pontiac Firebird your friend bought through the Auto Trader mag back in high school, some things start out strong but turn into a frustrating money pit after a couple of years.

Most reviews consider only how a product performs in the near term, not how it will hold up over time. They’ll tell you that a washing machine could scrub barnacles off the hull of the Titanic, but not that its control board has a good chance of fizzling after two years. For expensive, hard-to-move items that you’d plan on owning for many years, such as appliances, that knowledge gap is a major shortcoming.

To be fair, nobody can reasonably, accurately test for long-term reliability. We keep the products that we recommend to get long-term impressions, and some of our staff members buy the things we recommend. But that’s only a few people’s experience over a few years before the product is discontinued and we have to recommend something else. And even if several of us test the same item at the same time, we can’t try enough production models to weed out potential lemons.

That’s where customer reviews and comments paint a fuller picture. If a product has issues, patterns emerge in the ratings on retailers’ websites. Consumer Reports and J.D. Power also publish brand-reliability data based on surveys. Customer data is not perfect—it can take a few months for clear patterns of poor quality to appear, and customer reviews can also be manipulated (here’s how we spot fake Amazon reviews). But overall this customer feedback can supplement testing data quite nicely.

You and a reviewer may not agree on what makes a product good

Consider the $500 Maytag MDB4949x dishwasher. Reviewed.com rated it a 4 out of 10, which makes it one of the worst-rated models in the site’s database, and called it “truly a disappointment.” Consumer Reports gave it an overall score of 69, which is decent considering that the top-rated model earned an 85. Actual owners rate it an average of 4.5 stars (out of five), across more than 10,000 reviews. That’s one of the best customer ratings for any dishwasher. How can people have three wildly different takes on the same machine? Probably because they all had their own ideas about what you should expect from a $500 dishwasher.

At The Sweethome, we make it our goal to understand what most people are looking for and how the price figures into their expectations, as well as why other people might be looking for something else. But to be honest, sometimes it takes us a couple of tries to get there. Our air conditioner coverage is one example: In previous years we thought ease of installation and extra cooling modes were very important factors for people who are thinking of buying an air conditioner. As it turns out, most people really just want something quiet, so this time around, that’s what we looked for.

There’s no perfect lab

We respect the work of our peers at places such as Consumer Reports, Reviewed.com, and Cook’s Illustrated. They approach product research with rigor and discipline, with repeatable results that can withstand scrutiny. For some categories, we use similar methods, including experiments, lab tests, and data-driven observations. For other categories, we think it’s better to live with something—and be bothered by its beeps and smells in our own homes or in our test kitchen—than to only measure things in a lab. These kinds of tests are not always strictly repeatable, because of the nonsterile conditions. But they do represent real life, and we hope they tell readers something meaningful about how a product performs in the wild.

(Top photo from Shutterstock/Everett Collection.)

We actively moderate the comments section to make it relevant and helpful for our readers, and to stay up to date with our latest picks. You can read our moderation policy FAQ here.

Comments are temporarily closed while we work to move them over to the new Wirecutter site.

In the meantime, we still want to hear from you: you can send us a note, a tweet, or find us on Facebook.