Visual code bumps

How we write our code matters, especially when many other people are going to read it. Lack of readability can affect how well we can understand the complexity that the code is hiding. The more code gets written, the more it needs to be edited and the harder it becomes to change it.

Most programmers read sequentially from left to right and from top to bottom, which inevitably creates some context in the same way that a rectangular monitor leads to preference for rectangular designs, which fully utilize the existing space. This means that everything we place on the way of the movement direction of our eyes matters. Through our choices which characters to include, we influence how fast/slow the piece of code can be digested.

Visual code bumps are things that prevent us from moving our eyes further as we normally would. They can slow us down in the same way jank affects the usability of a web page or stops us from being able to scroll further. Since we are much slower than the average machine, such pauses are perhaps even more important than the pure speed with which we are able to execute our code.

What makes the situation more complex is that every programming language comes with its own syntax choices. Most languages allow us to use semicolon to separate lines, but some languages omit it. In normal writing, we use semicolons to visually separate related, but different concepts. This partially explains why at the line ends we don’t put a dot, for instance.

Functional languages like Haskell or OCaml are very punctuation-rich, which makes them especially hard to read. Someone who has to move over lots of characters that add noise to the code, will find it hard to read for extended period of time. This may be the main reason why functional languages haven’t become more popular (and rightfully so).

While no language is completely free of punctuation, a language like Python has shown that some measures can be taken to reduce the amount of meaningless characters that appear in the reader’s view. It approaches code from a readability perspective and avoids unnecessary clutter.

One of the most widespread use of code bumps is the usage of camel case. Java is the main example here with function names like "someReallyLongFunctionNameHere". Every time we have to switch from low to uppercase, we encounter a code bump. Here there are five of them, just in a single function name. Since this is a well-accepted practice in Java, you can expect that reading such code an entire day will quickly tire you. You start to appreciate function names in small letters the more they appear in front of you. Unfortunately, Java isn’t the only example here; JavaScript has also introduced camel case and methods like addEventListener which are given as a recommended practice to attach events to elements. While it is understandable that sometimes an element may need to have more than one event attached to it, we need to ask ourselves whether this is truly needed. If not, we can use onclick, onmousedown, onkeyup, onsubmit or anything else that is low-profile and won’t scream to the reader. If someone tells you that Java and JavaScript have nothing in common, think again of the camel case. forEach in JavaScript is a classic example of taking a concept too far.

What we put in front of our code matters too, because it might prevent us from reading an important label. For instance, if we define Queue.prototype.add, Queue.prototype.remove, Queue.prototype.clear and 10 other functions like this, seeing the add, remove, clear and other labels becomes less than apparent so we may have a hard time understanding what this code is responsible for even when we can visually distinguish the separate blocks. An alternative is to type Queue.prototype = {code_block} and then within the code_block start each line that defines a function with its name. This is easier to read and allows us to avoid duplicating the code beyond necessity. Assuming that because Queue.prototype.remove is much longer and can be more easily perceived as a separate component doesn’t necessarily mean that this is the right way to introduce it. Additionally, here we have two code bumps in front of the label name which slows down our reading quite a bit. The period is also called full stop for a reason as it introduces a longer pause, normally between sentences. Here we need to know which item each dot refers to, which adds additional thinking time. Many libraries have deep hierarchies, which makes them architecturally more stable, but also more tedious to use on the client side. There is always a need to strike a proper balance between the two.

Order of arrangement matters as well. If we have an array of items consisting of three values each, it would be better to have them in the order of importance for the queries that will be made on this data. This is also the reading direction, so seeing the most important items first allows us to early-stop looking further if needed. It is also more natural to say that we need to find the minimum of a sequence of values than it is to say that we need the sequence’s minimum(min(seq) vs seq.min()). In the second case, with object-oriented programming, we first have to see the object to be able to say which operations are allowed on it. Since not every object supports every operation, we’ll have one code bump to discover the details.

Like the period, the comma is used to pause too, but for a shorter time period. When we encounter a long parameter list of a function, we cannot avoid pausing. Each comma adds a light code bump, because we have to slightly reduce our reading speed to distinguish that these parameters are separate things. At the same time, commas can aid our understanding if used properly. It also helps to think in terms of whether three commas may be less disturbing than two periods inside our code.

Colons are nice, because they make us wish for more, effectively speeding up our reading. But their effect is diminishing over time. For instance, when we define a CSS selector with its properties and values, we use colons to separate them and when each pair is on a separate line, this allows for very fast reading. When we define a JavaScript object with its key-value pairs, this can be true as well. But if we make the value side very hard to read, then the effect of the colon becomes negligible. This indicates that mixing fast and slow to read parts may not lead to good results. The colon is used to clarify things, so we need to ensure that our explanations are easy to understand. Having a very long explanation doesn’t contribute to this.

The characters {}, (), [] that dominate much of our code are another source of visual code bumps. Our eyes come from the left to ricochet on these walls every time we encounter them. These code bumps are particularly bad, because there are plenty of them and we usually don’t think much before introducing them. Each code block, each simple function call or array access add code bumps, slowing our reading as we move through the code. This is one reason why, whenever we can, it is probably better to batch their use. For instance add({key1: value1, key2: value2}) will probably be easier on the eyes than var href = {prop: $(this).prop("href"), attr: $(this).attr("href")}. In the first case we keep the problematic characters close together, so they can be perceived as a single unit, while in the second case, each of them is divided by a separate word, making it very hard to read. Additionally, our thinking is restricted when in block context, which shatters our attention into small pieces every time we switch to a different one. This can make it very hard to see the big picture when the countless small details overwhelm our thinking.

Long regular expressions like /^b_((?:[^_]|__)+?)_b|^*((?:**|[sS])+?)*(?!*)/, which contain lots of bumpy characters are hard to read, which is why many people have started to use multiple simple regexes instead of a single complex one, where possible. Base64 encoded images produce lots of meaningless characters within our code, even when this saves an HTTP request. The character sequence acts as a visual code bump, because it splits our other code in half. The more such images we have, the more fragmented our code will be.

The dollar sign is additional code bump, which for instance exists in front of every variable in PHP to be recognized as one. This also makes typing variables harder than average, with the need to press the Shift + 4 combination every single time.

Other code bumps are || and &&, !, !! and ?, so we need to use them sparingly whenever we can. If it is not clear what the boolean check represents, we should assign a descriptive variable to it and use that variable every time the check is needed.

Characters like -, =, ==, ―, _, >, >>, >>>, right arrow and others that are oriented in the direction of reading effectively increase its speed. The slash may be beneficial too, because it places all alternatives next to each other, saving on the usage of long words. Whenever we can express something with less words, we should probably do it. The single quotes can be less noisy than the double quotes, so in languages where they have the same meaning, it is better to prefer the first.

Although the intention of comments is always good, they can act as code bumps when used inappropriately. For instance, a lengthy multi-line comment in front of every possible function within a class is almost always a bad idea. Although they exist for documentation purposes, they distract from the actual code and chunk the attention of the reader into small pieces. Comment, function, comment, function, when repeated hundred times is always more tiring to read than just function, function. Having related functions close to each other allows to see how they interact with each other, which is not possible with a long comment standing in the way. A better place for long documentation would be a file separate from the code. Only code, only documentation is another instance of separating concerns. The documentation needs to be accessible even to people who for some reason cannot go online. Offering it as PDF allows everyone with a book reader to open it. Some complex software products come only with online documentation, but this indicates that they aren’t finished yet. Possibly the most useful comments within the code are the ones, which are short, specific, single-line, added every 10-15 lines of code if necessary to clarify the structure as opposed to give full explanation of the details. But even they can be omitted if we choose to organize our code around functions which are that long and have descriptive names. Comments that are very long may not be read at all, which effectively converts them into waste, even when we’ll have hard time to admit it.

Multiple deeply nested if-statements that contain lots of code can act as code bumps, because we have to find all entry points to understand which conditions apply. Once the paths become very hard to track, the project may not see its completion if there isn't some test code, which checks that all paths give expected output. It is then preferable to keep the contents of these statements small.

The module pattern in JavaScript allows us to make some variables accessible within the module, but they are defined as parameters at the end of it. We have to effectively start reading at the bottom and then come to the top which is another case of code bump.

Certain variables, whose location of definition or purpose are not clear, can act as code bumps when used. For instance, every time we see a strange constant name or perhaps a magic number, we pause longer than usual to think where this comes from. A good code is organized to keep such pauses to a minimum.

Another code bump are functions that do too much for the average person to follow through. An example here would be a function that updates one big matrix element-wise with a long list of additions and multiplications, each element having its own separate line. While such usage would be very efficient for the machine, understanding what each factor or summand represents, where they all vary slightly on each line, would be incredibly time-consuming. We should prefer writing code that is easy for the reader, not necessarily one which is efficient for the machine. Failing to understand this leads to cryptic libraries and the need to transfer them over the web. Sometimes new variables may be defined as part of the calculation, which mixes the two and obscures the end result.

As we know, the number of small elementary operations can be used to define the running time of our code, but we often forget that each and every separate code piece is an elementary operation on the reader, one that requires their attention. If they typed the wrong character, they need to return and fix it. If they omitted a semicolon and got internal server error, they need to return and fix it. If something is positioned five pixels to the left, they need to return and fix it. The number of possible mistakes is a function of the length of the code. Therefore it depends only on us to seek, examine and dissect the best practices of structuring our code to make it as self-explanatory and attention-preserving as possible. To question every misplaced character, code bump, accepted or promoted practice. To understand the psychological effect of the code not only to improve its function, but to add beauty to it as well. Because over time only what is considered beautiful and clear will tend to be maintained.

I hope that this will help you to identify potential code bumps within the libraries you are using. If you pay attention where the longest pauses occur and seek the reasons for that, your code can only become better.

bit.ly/2behouR