2020-03-17

The March of Progress

In ancient times, I was placed on a legacy project that was, without exaggeration, 30 years old. Much of the team on that project had been on it for just as long. These are not inherently bad things. There is great value to a team with rich experience and intimate project knowledge. The danger is in technology, tooling, processes - basically, attitudes - stagnating unnecessarily despite easy access to free education and resources. In these situations, I find myself wondering if there is a problem, and what its depth must be.

As I said, the age of that project and the developers assigned to it was not, in itself, a problem. But when I joined, the team had still not transitioned to or even considered source control at any point in their 30 year tenure. Keeping a project alive for that long without implementing source control struck me as both miraculous and dangerous. Not to mention wasteful, considering the process overhead in manually managing a large code base among a growing team of many developers. The merge process for the flagship product at the time I left the company was still to literally copy the entire code base onto a network drive with your new changes in it. The team lead would then pick it up and manually copy your changes among his own for the next release.

Now I suspected, despite being a junior on the project, that this might have been a process that was inefficient and error prone. I voiced my concerns early, and gently, but with no lack of gravity. Management responded with confidence that their project was not suited to Git. Besides, they said, the transition would be too much of a risk and they couldn't afford it. I carried out a proof of concept in a virtual machine to show both that the project was suited, and the transition only took me a couple of days. They still refused to make an attempt at picking up Git (or any alternative), saying the project was not suited, and the transition was too risky. At that point, all I could do was to stare blankly and start looking for a new job.

Management's hand was finally forced, though, by the growth of the team. Including me, the team nearly doubled in size, in about half a year. The old process of copying gigabytes of project code and assets back and forth for every significant change was no longer sustainable. The project lead, however, continued to fight the transition to source control. He voiced concerns over "poor reliability," and worried that it was "too slow" and restrictive in its capabilities. When pressed, he admitted not reading up on it, understanding how it works, or taking a look at my proof of concept.

Hey, I'm not a project lead, let alone a world-renowned thought leader or technologist. But if you genuinely believe source control is inferior to copying a .ZIP of the entire code base to a network drive every time a change is made, and you are not willing to learn what source control is before making that judgment call, then I am not sure how to help you.

To be fair, I was able to convince this team to make the transition to source control, with effort, consistent pressure, and demonstrations of my proofs of concept over a period of about a year. I can't count how many times the word "cool" came out of the team lead's mouth when he finally saw what Git could do. Their manual merge process of up to multiple weeks (not kidding) was reduced to the typical hour or less that it takes to review, discuss, and merge a pull request on GitHub.

Newfangled tech might not be right for your project. But it's worth knowing for yourself.

2020-03-10

Inverted Mouse Scrolling in Windows w/o RegEdit

I love inverted mouse scrolling. Meaning, when I spin my scroll wheel away from myself (up), I want the UI to scroll down. When I spin the wheel toward myself (down), I want the UI to scroll up. This feels intuitive to me because it's how touch screens (phones, tablets, etc), and touch pads (Mac and PC laptops, notebooks, etc) all work. When you slide your finger upward along the screen or touch pad, the thing you're touching follows your finger upward. I never understood why scroll wheels behaved in reverse. To me, the default Windows scroll wheel behavior is the "inverted" one.

Unfortunately, Windows provides no native method of flipping this simple input. If you know what you're doing, and you have enough system access, you can modify the Windows system registry using regedit.exe. This strikes me as a pretty invasive way to go about flipping something as simple as mouse wheel scrolling. But for some reason, this is the primary method I keep finding in searches. It turns out there is an easier way that doesn't require the same level of administrator privileges (though it does come with a potential downside).

PS C:\> Get-ItemProperty HKLM:\SYSTEM\CurrentControlSet\Enum\HID\*\*\Device` Parameters FlipFlopWheel -EA 0 | ForEach-Object { Set-ItemProperty $_.PSPath FlipFlopWheel 1 }

Shamelessly but usefully ripped from this post on StackOverflow.

The downside to this method, as you can see, is that the FlipFlopWheel property is being set to 1 for every single object in our query. This may not be ideal for your situation. For me, I just needed a way to get my mouse wheel flipped, and to have it persist, without needing to go into the registry. I'm surprised that this issue has persisted so long, over so many iterations of Windows itself, that this StackOverflow question from over 8 years ago is still relevant, but here's the solution!

Separation of Concerns

Rules are handed down from authority figures. Best practices, though, are born of community consensus on good ways to get things done. There are usually good reasons for certain practices to surface out of our collective consciousness. One simple, sometimes painful example is white space in code. The example is painful because there are so many ways white space can be used (or abused). The broader community prefers a handful of styles; these make code easier to read and digest for other developers (and even our future selves). Proper use of white space, thus, has become a best practice. Tools now exist to enforce proper spacing; languages like Python even go so far as to define scope using white space.

Okay, let's get into it. I have a piece of gold-stamped parchment declaring me "Master" of computer science. I'm not so sure about that. In any case, there are certain best practices that even I can recognize as sensible, useful, and even important. When these practices are ignored (or willfully disregarded) it gives me the fingernails-on-a-blackboard shivers.

One such best practice is known as separation of concerns. There are many levels, high and low, at which it can (and should) be observed. The basic idea, however, is that you want a given piece of code or architecture to:

  • do its job
  • do its entire job
  • do no one else's job

Once you've called a function, you should be able to expect its job to be completed. Sure, the function might delegate responsibility to smaller helper functions, breaking the task into workable chunks; but when that function eventually returns, the job should be done. Importantly, nothing else should have been done along the way. When extra stuff gets done along the way - either intentionally or by accident - we call that a side effect. According to the tenets of functional programming and general coding best practices, side effects are bad.

My mission today is to use a frustrating, real-life example to illustrate the importance of this concept, leaving as much of my emotional baggage at the door as possible.

A company I used to work for had a design review process in place. All tasks required project lead approval before development could begin on a task. It meant some extra overhead, but I liked the sentiment. I think it's a fine idea to have strategies vetted by leadership and scoured for weaknesses before mistakes are made. Cool.

So my task was to add a reporting feature to our existing product. The existing feature retrieved documents from the back end, looping over them and calling a helper function to print each individual one. There were weaknesses in this design to begin with - you may wonder, as I did, why each individual document had to be retrieved with a separate query - but it wasn't my design and I just had to work with it.

Let's pause in brief review. Someone wants to print a batch of reports. They call the batch printing function, whose one job is to print a batch of things. We expect our function to:

  • do its job (print the batch)
  • do its entire job (read each document file and print it correctly, i.e. perform sub-tasks that the main task depends on)
  • do no one else's job (don't randomly delete user accounts, etc)

Still with me? Okay. Now we get to my task and the problem I encountered.

At the batch printing level, I was to gather data on each individual report in the batch. This was to be aggregated and added to the final batch report. I came up with a few angles of attack. With separation of concerns and best practices in mind, I avoided solutions that leaned on side effects. What do I mean, here? I mean that I wanted to be clear with what each part of the code was doing; and I wanted the code to do nothing more than it was asked.

For example, my new function would return data to be aggregated, rather than modifying aggregation data by reference. This meant I had to add another function to manage the aggregation itself. A little extra work? Maybe. But the result was crystal clear code, functions with crystal clear purpose and methodology:

Function A, aggregator

  • do its job (aggregate data from multiple calls of getter)
  • do its entire job (return aggregated data)
  • do no one else's job (don't email users' credit card numbers to other users)

Function B, getter

  • do its job (gather data about a document)
  • do its entire job (query back end and return the result)
  • do no one else's job (don't drunk dial Function X)

Without getting into details, I can tell you that even this solution felt a little messy to me. Within the project and resource constraints, however, and in light of what followed next, this was a positively glowing example of separation of concerns.

The project lead rejected my design proposal wholesale. I was instructed to write a system of helper functions into the ORM class for the documents.

Confused?

me too

He specified hard-coded mock-ups of data from unrelated classes to be written into the ORM. This would, he argued, allow us to perform testing and comparisons of data and models from within the ORM class. In other words, outside code that called the ORM class constructor would have no idea that a partial testing framework and a whole suite of hard-coded data was being instantiated, parsed, judged, and passed around, complete with database interactions, that had nothing to do with the the ORM itself.

Imagine troubleshooting the reason for data coming in from the wrong table. Imagine database calls being made when you aren't making them. Imagine your report's performance profile degrading, only to find out that someone had hard-coded your document's ORM class with sample data and testing infrastructure ... for some random batch document report printing functionality. Can you imagine that?

I can; I lived it. This was an extreme example, but let the allegory of the hard-coded ORM stand as your reminder to keep best practices and separation of concerns close to your heart.

2016-10-11

Switching to React - First Impressions

In earlier posts I wrote that I planned to use Handlebars as my templating engine for the CYOAG project. Things happened since then that caused me to change directions towards ReactJS. I'll take a moment here to say that I've had no contact with or sponsorship from the Handlebars or React teams. This is all just my opinion, formulated based on personal experience working with these technologies on the side-project level. Incidentally, though I'm enjoying working with React, this is notwithstanding my opinion of Facebook, who we have to thank for React. So there's some food for thought.

So why did I make the switch?

I realized Handlebars was more challenging to learn than a templating system needs to be. I didn't find community support to be as extensive or helpful as I'd like. Handlebars also hasn't enjoyed industry adoption as extensively as React, so from a career growth aspect React seemed a better choice. Ironically, the reason I chose Handlebars originally was because we used it on a project at work. But after talking to the lead dev on that project, I learned it wasn't selected for its quality or utility, but due to circumstance. Finally, having built my first working Handlebars prototype, it just didn't feel right. There's no remedy for that.

So, React. It's already big in the web dev scene, has been for some time, and continues to grow. Whether it'll become a longstanding mainstay in industry remains to be seen, but signs point to yes.

I'm surprised to say it, but React was easier to pick up than Handlebars. It surprises me because React has lots of new ideas and syntax you have to learn, while Handlebars has relatively few and advertises itself as a "no frustration" templating system; which, I can attest, is patently false. React, likewise, advertises itself as "painless," also a lie. I'm being a little harsh here, but point being that I find React less painful than Handlebars, despite the fact that neither is a walk in the park.

To be fair, Handlebars lends itself better to smaller projects. The infrastructure and planning required to use it is lighter. To get Hello World up in React, you have to do more work, download more libraries, and learn more technology: JSX, props, state, rendering, components and their lifecycles, etc. In Handlebars, you slap up a template and you're off to the races (relatively-ish). Build processes are another story. I found precompilation to be roughly equivalent in difficulty from Handlebars to React. The documentation is better for React, though, due to superior community support. More community support means more flavors of help, and a greater likelihood one of them will match your style of learning.

The appeal of React is reusability, structure, and organization. From a mile-high perspective, let's say you're the app lead for Facebook, and you want a UI component to let people "Like" things. All kinds of things; wall posts, images, events, shared links, comments, everything. With React and some proper planning, you can make a single component displaying a "Like" link, that dynamically hooks up to whatever needs to be "Like"d. The component itself doesn't care what it belongs to. You can plug-and-play it wherever. One downside to this, as mentioned above, is that the initial development of this component is relatively heavy. Once you have it built, though, you can reuse it dozens, hundreds, millions of times, at very little dev cost.

Another downside is that you have to be conscious of your component structure when building larger applications. CYOAG is not a large application, but if I didn't think about component structure in advance - and thus plan out how state was to managed and communicated among components - I'd find myself having a real bad time. The complexity of component interactions isn't hard to wrap your head around, but once you've built them, that complexity makes maintenance and modification a chore, and a considerable risk.

Part of this is because state and props "cascade" (my term) from parent components to child components when you build an application The React Way. So if anything changes along the cascade path, you have to make sure you account for it at every step and in every handler. Of course, things change as we go, this is life; but I think we can all agree it's to our advantage to minimize this, and with React, best practices in design amplify this risk/advantage relationship.

In short, React encourages better software engineering principles. Other technologies allow you to start hacking out a solution before thinking it through, which can lead to other kinds of headaches down the road.

Non-technical Example

We can use my (thus far incomplete) React component design doc for CYOAG as an example for this. I won't bother quoting the doc here, you can read it if you like. In short, the application state is best managed at the top level. One easy reason to explain for this is that if you allow children to manage state, you'll quickly find yourself making more calls in integration, and having a hard time communicating among components. So we manage the state at the top level, and (in short) provide getters and setters for particular parts of the state to children through props.

When building your components, it's helpful to know what state you'll need to be passing down to children from the ancestral point of view; and what props to expect to receive from the descendent's point of view; and what functions will be needed in order to manage and pass these around. And later, if changes need to be made, you need to keep this "flow" of state in mind to preserve it.

These are my early days with React and my opinions and understanding will undoubtedly develop as I develop applications using React. Once I start using this code "in production" on CYOAG, I'll have plenty more to say, I'm sure.

Thanks for reading!
- Steven Kitzes

2016-10-03

Circular Dependency in Relational Databases

This post is going to have an element of storytelling to it, because while some elementary understanding of SQL will be required, the Cliff's Notes seem to tell it all:

You probably want to avoid circular dependencies in relational databases.

These Cliff's Notes are incomplete, a little vague, a bit of an oversimplification, and up for debate besides. Some RDBMs (relational database managers) handle circular dependency better than others, and some problem domains necessitate circular dependency. Take, for example, table representations of nodes in a graph that allows cyclical paths.

However, not all problems necessitate circular dependency, and some RDBMs prohibit, or aren't capable of handling, circular dependencies. In these cases, we need to design around the limitations of our RDBM, and in any case, it's good to be aware of circular dependencies, just on principle. The purpose of this post is thus to share my own early explorations of database design, and to describe how I broke out of the circular dependency trap on a small side project.

Some project context will help here. I've been working on a passion project for a little while now. The high level idea is to build a "Create Your Own Adventure" game, where all content is user-generated, and the best story line branches rise to the top through a voting system like that used by Reddit and StackOverflow. An important early part of this project involved designing a decent database schema to build the rest of the project on top of.

You may wonder why I chose an RDB (relational database) for this in the first place, rather than an alternative such as some flavor of NoSQL. I plan to host this on the AWS (Amazon Web Services) free tier, so I can learn more about AWS. Unfortunately, at the time I started this project, the AWS free tier didn't offer any option other than RBDs. Incidentally, an RDB is almost certainly best for this type of application due to the interrelationships of different types of data, but that's a topic for another time.

So, how did a circular dependency surface in my design?

One feature I want this app to have is position persistence. When a visitor is reading the story, I want to remember where they left off, so they can conveniently come back and resume reading the story at a later date. My plan was to give the Users table a column called Current, to track a given user's position in the story. In other words, a story node would appear as a foreign key in the Users table.

The Nodes table was designed with an Author column to record who had authored a given story node. This was to support a variety of features, such as prohibiting authors from voting on their own posts, and allowing an author to edit their contributions, while protecting those posts from being edited by others. Unfortunately, you'll notice this means a user (author) is now a foreign key in the Nodes table.

Uh oh.

Now I have a node ID as a foreign key in the Users table, and a user ID as a foreign key in the Nodes table. This is a circular dependency of sorts, and is a block in SQL. You'll find some argument as to whether circular dependencies are fundamentally bad practice, or whether SQL just suffers some unhappy limitations. After all, what's conceptually wrong with tracking current node in the Users table, and author in the Nodes table? There's no conflict in authority over the data, since a user doesn't own its current node in this read-only context, it's just a position marker. To be fair, SQL doesn't understand our motivations, and to argue that an RDBM's designers could have done more toward enabling us to convey these human intuitions to the machine is too philosophical for this post.

Whatever the reason and our opinion of it, MySQL forbids this kind of interdependency. The solution I came to was to create a simple table called Positions, with a compound primary key consisting of two foreign keys, one being a user ID from the Users table, the other being a node ID from the Nodes table. Despite some apparent duplication of information, this works for the simple reason that the Positions table relies on Users and Nodes, but neither Users nor Nodes relies on Positions. I'll try to illustrate the improvement in design without opening Visio:

An arrow (->) implies dependency

Before:

Users -> Nodes
Nodes -> Users

As you can see, this yields Users -> Nodes -> Users, and so on, ad infinitum. This is bad.

After:

Nodes -> Users
Positions -> Users
Positions -> Nodes

We can combine two of these lines together to yield:

Positions -> Nodes -> Users
Positions -> Users

but you can see that this doesn't introduce any loops, or circular dependencies, as my first design did.

Thanks for reading!
- Steven Kitzes