Rules are handed down from authority figures. Best practices, though, are born of community consensus on good ways to get things done. There are usually good reasons for certain practices to surface out of our collective consciousness. One simple, sometimes painful example is white space in code. The example is painful because there are so many ways white space can be used (or abused). The broader community prefers a handful of styles; these make code easier to read and digest for other developers (and even our future selves). Proper use of white space, thus, has become a best practice. Tools now exist to enforce proper spacing; languages like Python even go so far as to define scope using white space.
Okay, let's get into it. I have a piece of gold-stamped parchment declaring me "Master" of computer science. I'm not so sure about that. In any case, there are certain best practices that even I can recognize as sensible, useful, and even important. When these practices are ignored (or willfully disregarded) it gives me the fingernails-on-a-blackboard shivers.
One such best practice is known as separation of concerns. There are many levels, high and low, at which it can (and should) be observed. The basic idea, however, is that you want a given piece of code or architecture to:
- do its job
- do its entire job
- do no one else's job
Once you've called a function, you should be able to expect its job to be completed. Sure, the function might delegate responsibility to smaller helper functions, breaking the task into workable chunks; but when that function eventually returns, the job should be done. Importantly, nothing else should have been done along the way. When extra stuff gets done along the way - either intentionally or by accident - we call that a side effect. According to the tenets of functional programming and general coding best practices, side effects are bad.
My mission today is to use a frustrating, real-life example to illustrate the importance of this concept, leaving as much of my emotional baggage at the door as possible.
A company I used to work for had a design review process in place. All tasks required project lead approval before development could begin on a task. It meant some extra overhead, but I liked the sentiment. I think it's a fine idea to have strategies vetted by leadership and scoured for weaknesses before mistakes are made. Cool.
So my task was to add a reporting feature to our existing product. The existing feature retrieved documents from the back end, looping over them and calling a helper function to print each individual one. There were weaknesses in this design to begin with - you may wonder, as I did, why each individual document had to be retrieved with a separate query - but it wasn't my design and I just had to work with it.
Let's pause in brief review. Someone wants to print a batch of reports. They call the batch printing function, whose one job is to print a batch of things. We expect our function to:
- do its job (print the batch)
- do its entire job (read each document file and print it correctly, i.e. perform sub-tasks that the main task depends on)
- do no one else's job (don't randomly delete user accounts, etc)
Still with me? Okay. Now we get to my task and the problem I encountered.
At the batch printing level, I was to gather data on each individual report in the batch. This was to be aggregated and added to the final batch report. I came up with a few angles of attack. With separation of concerns and best practices in mind, I avoided solutions that leaned on side effects. What do I mean, here? I mean that I wanted to be clear with what each part of the code was doing; and I wanted the code to do nothing more than it was asked.
For example, my new function would return data to be aggregated, rather than modifying aggregation data by reference. This meant I had to add another function to manage the aggregation itself. A little extra work? Maybe. But the result was crystal clear code, functions with crystal clear purpose and methodology:
Function A, aggregator
- do its job (aggregate data from multiple calls of getter)
- do its entire job (return aggregated data)
- do no one else's job (don't email users' credit card numbers to other users)
Function B, getter
- do its job (gather data about a document)
- do its entire job (query back end and return the result)
- do no one else's job (don't drunk dial Function X)
Without getting into details, I can tell you that even this solution felt a little messy to me. Within the project and resource constraints, however, and in light of what followed next, this was a positively glowing example of separation of concerns.
The project lead rejected my design proposal wholesale. I was instructed to write a system of helper functions into the ORM class for the documents.
He specified hard-coded mock-ups of data from unrelated classes to be written into the ORM. This would, he argued, allow us to perform testing and comparisons of data and models from within the ORM class. In other words, outside code that called the ORM class constructor would have no idea that a partial testing framework and a whole suite of hard-coded data was being instantiated, parsed, judged, and passed around, complete with database interactions, that had nothing to do with the the ORM itself.
Imagine troubleshooting the reason for data coming in from the wrong table. Imagine database calls being made when you aren't making them. Imagine your report's performance profile degrading, only to find out that someone had hard-coded your document's ORM class with sample data and testing infrastructure ... for some random batch document report printing functionality. Can you imagine that?
I can; I lived it. This was an extreme example, but let the allegory of the hard-coded ORM stand as your reminder to keep best practices and separation of concerns close to your heart.