The Limits of Testing

In the chapter on Lorenzo Ghiberti in his famed account of the lives of the great artists, Vasari writes about how the sculptor won the competition set up by the Signoria of Florence and Merchants’ Guild to build two doors for the city’s cathedral. The task was to create a bronze panel for one door, depicting the story of Abraham’s sacrifice of Isaac, and the sculptor who would make the best panel will win the job to finish the rest of the two doors¹. In this competition Ghiberti faced seven other artists, including his fellow Florentine, the famous Donatello.

Ghiberti ended up winning the competition and getting the job. What’s interesting is how he won it. Unlike the other artists who would work in complete secrecy, Ghiberti was happy to show his work to the public, both the residents of Florence who would get to enjoy the final product and foreigners who were just passing through. He valued their opinions and took them on board. Defects that he may have missed due to having spent too long with his work would be caught out by independent observers, and so the finished panel ended up truly beautiful and well made.

This echoes the way software is designed today, with the main usability advice being to always test, test, and test some more. Yesterday I wrote my thoughts against the idea of designers and journalists surrendering their authority and experience to test driven results, and so I think it important to follow out the thought further and separate good testing from the bad.

Roughly speaking, there are two types of user interface and web design tests that people run: 1) those aimed at improving the usability of their product, and 2) those aimed at increasing conversion rates. It’s important to stress the difference between the two because a higher conversion rate does not make for a better product. Actually, there is very little relationship between the two because tests aimed at improving usability benefit the user, while tests that increase conversion rates benefit the producer. There is definitely some overlap here because a conversion rate could be anything, but for the most part, when the producer wants to increase the conversion rate they’re talking about some sort of customer acquisition or sales process, that is, getting more people to sign up or more people to use the features they want them to use.

This contrasts with usability tests in that you do not necessarily care if more people use a specific feature or not, you only care about making this feature easy to use and understand. This category of testing is also the one that is comparable to the story of Lorenzo Ghiberti. The designer never surrenders their authority and intelligence, instead, the data they receive helps them make smarter decisions, and so in turn they grow as a designer. By gaining better understanding of how users act in specific contexts, or what sort of thing they prefer, the designer can tailor their design to be more attractive and easier to use, and thus, objectively make it better as a result.

The conversion test can be used in the same way, but unfortunately it is also liable to misuse. The sort of conversion testing that aims to sell a product or generate more clicks is concerned only with user manipulation, that is, pushing people forward to the next step in the funnel. It has nothing to do with how good the product actually is. It may very well be that the product is very good, but the test itself is not concerned with that, and so the sort of design decisions it leads to are not necessarily in the best interests of the user, nor do they always reflect well on the designer who lets the A/B test dictate the aesthetic of their work, or the journalist who decides to leave their headline selection to the outcome of a test.

I will end by saying that I recognize that there is a place and time for conversion testing, and that it is an inseparable tool of modern commerce, both online and off. My point is that the sort of results it produces are based on primal impulses of the majority, not reason and intellect of the experienced few, and so that sort of work will always be of the lowest order and cannot be associated with good design, merely effective design, which is not always good.

This was no spec work by today’s standards, they were all paid a salary for their sample.