To make it work? Or to make it right?

Posted on Apr 28, 2024

Ah, one of my favorite topics. Alright the title is a bit provocative, as in making something right doesn’t mean it’ll never work¹, but you get the idea. When we build something, should we first aim for making it work? Or should we straight away strive for correctness, modularity, and the ability to gracefully withstand future requirements and/or context changes?

I think this question is one of the most fundamental aspects of design and architecture, at every layer of software, be it system, containers, components, and code. It underpins everything. It is also one of the hardest compromises to get completely right.

So which one is it?

You know it’s coming

Well. As for many things… “It depends”.

The first thing that must sink in is that it isn’t a dichotomy, but a spectrum. At one end, there’s spitting out code as fast as possible, without considering any edge case, and using every single ugly hack possible. The concept is very simple: the code, architecture, etc. don’t matter. There’s a single goal: getting it out, working in some fashion. At the other end of the spectrum, there’s carefully, and thoroughly designing and implementing with a focus on anticipating the future and how the software might have to evolve in the next three months, three quarters, three years.

Things can then sit anywhere on this spectrum. It can even go beyond that, with different choices about different parts of a single solution. I’ll draw from an example I’ve been through, where some decisions landed at different places on the spectrum, for different parts of the same software.

I led the effort on a system that was producing videos with some motion design rendered on top of it. The actual graphical elements, text, shapes, etc. were specified in JSON format, which hardcoded a lot of things. You had one or two enum-like fields which would define pretty much everything you’d have to draw with a few extra parameters. It would tell the shapes, the animations, the rules about where and when it’d appear and disappear.

When designing that solution we spent some time to have the rendering code be implementation agnostic. That meant you had a graphical representation, and could render it with OpenGL, SDL, or to text, basically whatever we wanted as long as you provided a handful of functions to render some primitives.

Starting right away with a generic way to specify an arbitrary rendering implementation wasn’t as easy as just hardcoding one. It was useful though, as we could iterate and try out stuff very easily. We could start with something simple (based on SDL), before working on the canonical, more complex implementation. We then added some debug renderer. One outputting text, who let us compare the input JSON with a text representation of our internal tree. Another one rendering to an image, which later on ended up… as a product feature. This genericity eventually turned out to be useful a couple years later when we decided to switch some middleware, although we hadn’t anticipated that this would come².

On the other hand, the input JSON format was a bit frustrating to work with, it forced us to do weird contortions in the drawing code to make sure everything fell into place based on what the customers had configured. There was a temptation to redesign a completely new format, way more open and customizable. It even got to the drafting phase. But we never went beyond that because it was unclear whether customers would be able to master it and use it to its full potential. It would have had a huge cost on other teams to integrate it and there was too much work and risk involved so we settled on using the legacy format, trying to fit into it and only change it lightly based on our needs.

So we went with some extra work to make the rendering implementation choice somewhat generic. But we quickly forgot about reworking the legacy and limited JSON input format, only having some ugly glue that converted between that and our internal representation. In the same system. It took already long enough to release it, so if we had also chosen to completely rework the format I’m not sure it would have been in production today, and it rolled out Q4 2021.

But wait, what is “right”?

It’s something that can be overlooked, maybe because developers tend to focus too much on the technical aspects. This isn’t bad in itself, but it’s also important to look over the fence at the product and business considerations. One problem is that defining what’s “right” isn’t always easy, and that having one technical implementation doesn’t mean that the problem at hand has been solved.

Sometimes it takes more research, sometimes it takes literally putting out an implementation in the wild and seeing how it goes, how users react and like it. Sometimes it will even take multiple iterations before enough observations can be made and conclusions can be reached.

In that sort of situation, “making it right” is not really a possibility. The only solution is to first “make it work”, because that’s what will let us figure out eventually what right is. Otherwise this could lead to a huge waste of time, working on something that might end up being thrown away when confronted with reality.

This means that code quality should be regarded as a feature, and not as a permanent goal. If the best approach at a point in time is to explore and put out multiple implementations, then code quality will get in the way, typically of a product team waiting for answers.

Way too often, code quality is heralded in isolation as a must have³. As something that should be a given, that all developers should agree to pursue and develop. Sometimes it looks like it’s become some sort of mantra of software development. People complain when they’re not given time to ensure some or even the best code quality possible. Poor code quality is blamed for many woes a developer or a team might encounter. As a result it seems as if it’s been elevated to this absolute, universal consensus. I disagree⁴.

Dealing with poor code quality isn’t great. That doesn’t mean that writing code in such a way, at the time and in the context it was written, was a bad decision⁵.

I would also argue that it’s easier to agree on software that works, rather than on what is actually correct software, and what amount of “right” is right⁶.

Finding the balance

Yet, with all that said about how to avoid the trap of code quality for the sake of it, there’s an equally dangerous trap lying at the other end of the spectrum. The “PoC into production”. Going back to exploring with quick implementations, there’s always a risk that if such an implementation ticks all the boxes, and is deemed the way forward, then that exact implementation will be released into production because “the job is done right?”⁷.

When the job is, in fact, not done. Because there’s a big difference between trying something out to see if it works, to see if users like it, if needs are met, and a fully-fledged implementation that will keep being appreciated after running for a while in production. This usually takes at least some effort and focus on reliability. And overlooking that is one of the ways we get quickly to “make it work”. It’s easy to think that because something has been put into production, then it’s production-ready.

One way I think this can be avoided, is to accept the difference, and commit to two different sets of requirements. One that applies to exploratory work, and one to production-grade work.

Exploratory work can skip checking for edge cases, can have hard-coded stuff all over the place, as long as it clears a minimal bar of functionality. Exploratory work could only be made available to power users, or a specifically selected cohort to verify a hypothesis. But it should probably not be released as-is to an entire user base.

Production-grade work can, though, but it should meet different requirements. Which include reliability, modularity, genericity, anything that’s deemed relevant and that has a justified impact down the line. Many traits that often don’t matter when just getting something to work.

Maybe this starts to look like a hard problem. Product people have to convince engineers to forgo code quality to quickly iterate, gather data about features, usage, users. Engineers have to convince product people to let them work on reliability once the path forward has been discovered, so that the experience doesn’t end up being severely degraded by throwing together a pile of code that’s not been sufficiently refined.

Creating software with empathy

This is a hard problem because it involves talking to people outside of our field. It can be a bit daunting, or may feel like a waste of time. But seriously, we could use some more of that. In this case I really believe the best results in this area can only be achieved by teams where people who decide what to code talk to people that decide how to code it, and vice-versa. That try to put themselves in the other’s shoes, to genuinely understand their needs, how they think, and what are their incentives, so that everyone can work towards the best outcome. It takes engineers with some product-awareness, and product people with an affinity for the tech details, what it involves and what drives engineers.

It is already hard enough to find the right balance on this specific aspect. I think teams that have a strong degree of respect, curiosity, and empathy, are the ones that can strike the best compromises in that regard.

By the way, this is the first time that I’m trying to coalesce some rather deep convictions I have about software into a coherent article, feedback is most welcome!

Eeeh sometimes there are exceptions, systems that can be developed for years and never make it into production … ↩︎
Our decision to do something “right” turned out to be fruitful, but if the time invested to come up with that was worth it at the component level, at the code level we also implemented it a bit too quick and dirty. That on the other hand didn’t turn out so well down the line… but at least we had something working and somewhat generic. ↩︎
When I say that, it’s based on personal experience from work, friends, or from internet communities. I know it varies, and that on the web the most vocal people are not always (or even often) the majority, even though they’re the most visible. Hopefully this contributes to adding a perspective that’s in contrast to one of the most visible discourse online. ↩︎
I won’t deny that inheriting low quality code is really not fun. Untangling a mess of spaghettis, a low-abstraction heap of code thrown together hastily, these aren’t really something to wish for. Unless you’re of a specific kind that love working on solving that sort of chaos, and if you’re such a person, kudos, I don’t know how you do that. ↩︎
In this case, I think the negativity around it stems from conflating the pain that comes with having to deal with tech debt, and whether this was a good decision or not. The presence of tech debt does not imply that the choices that led to it were wrong at the time. Independently of how frustrating it is now. ↩︎
Sorry, bear with me on this one… ↩︎
This makes me think of the topic of getting to an MVP. The term MVP has been used extensively throughout the 2010s, even though it was coined in 2001. It carries a “quick and dirty” vibe, while this is not always the goal. I like the alternative term (and approach) that is SLC (pronounced “slick”), for “Simple, Lovable, and Complete”. In this day and age, having a minimal product seems like it’s not really enough, especially given all the frameworks, SDK, solutions out there that allow people to put out some great and slick work, very feature rich, without spending years on it. Cutting too many corners doesn’t cut it anymore it seems. ↩︎