August 26, 2016

The DRY Principle

One of the first sessions I joined at SoCraTes 2016 was on the DRY principle in software design and when to apply it and -- maybe more importantly -- when not to. DRY stands for "Don't Repeat Yourself", which is an important guideline when we want to write maintainable code or documentation. Having stuff duplicated comes with two problems:

  • we have to make changes to more than one place (or risk inconsistencies);
  • to do so, we have to first find all those places.

A common way to implement DRY in programming is to extract duplicate pieces of code into a single separate component. If we then give that component a suitable name, we have created a new abstraction. In this case DRY is more formally known as the abstraction principle.

Extracting commonalities into a separate component removes duplication and addresses the two maintainability problems above. It also introduces at least two dependencies on the new component, new indirections that can also make the code harder to understand and maintain. So while DRY stands as a useful design principle, its application is generally a trade-off.

Sometimes the additional component actually represents a part of the problem domain, which we previously didn't explicitly model in our program. In this case, the new component will probably make it easier to understand the code.

Sometimes we fare better, if we accept some duplication and keep such dependencies implicit. For example, in a typical CRUD-heavy business application with layered architecture, a lot of data is being mapped between the API layer and the business logic layer. DTOs and entities often represent the exact same concept, although they fulfill very different technical purposes. Adding a single field (e.g. a VIP flag to a Customer entity) leads to changes on all layers of the application. Nevertheless, we often favor this duplication to alternatives like model-driven generative approaches.

Microservice architectures are another place where we strongly prefer duplication over common code. One selling point of microservices is that different teams can independently develop and deploy their services. We want to minimize dependencies between development teams, which minimizes communication overhead and potentially maximizes team velocity. Sharing code would no longer allow a team to change that code independently. This is why in companies that use these architectures, teams tend to only share code by treating it as a third-party dependency with all the expectations of stability that entails.

So we can either remove some duplication by introducing abstractions or embrace some duplication to avoid inconvenient dependencies and overly complicated code. Making the wrong trade-off leads to a less understandable and maintainable code base. In the end, it boils down to finding just the right abstractions to create great code. I'm not saying this is easy!

Some static analysis tools help us to discover and remove code duplication, and they work well for obvious copy & pasting of code segments. But these tools usually cannot warn us when we go too far to the other side of the trade-off and desiccate our software with over-abstraction. Code segments can also look quite different, even though they actually duplicate a concept of the domain, e.g. perform exactly the same computation.

I think, a much better way to find weaknesses in a software design is to watch for evolutionary coupling: if there are parts of a system that always seem to change simultaneously, they may contain a bad abstraction or miss an important abstraction entirely.

I also find it immensely helpful to concentrate on names. A new abstraction that helps understanding should feel natural and be easy to name. If it is not, chances are my team will be better off without it.

Tags: Design, Metrics