Automation candidates

We execute some tasks so often that they have the potential to keep us busy for a long time. If we consider email checking, such tasks are filling the username and password every time we wish to login, without storing the credentials on our machine. Even when the data we entered is correct, we may be asked additional questions before we even have a chance to reach our inbox. All this stands in the way of using the service, but in a subtle way. The idea that we have to “go” to where the content is compared to where we are already has a time cost embedded in it similarly to when we have to commute to a workplace. Anyone using such an email service on a daily basis is wasting a lot of time on filling forms, checking checkboxes, selecting radio buttons and clicking buttons. In other words on the attributes of the service, but not the service itself. Having a local email client configured to pull our emails automatically from that service as soon as they arrive allows us to stop constantly worrying about access rights. Once we download the latest emails, we can switch between them at native speed without having to wait for HTTP requests to finish and without the need to reload heavy JavaScript libraries. The result of this is that due to the increased speed of handling email, it becomes more manageable as well. The slower things are, the less thy can be used. If we went through the last 300 emails, holding the Delete button for 2-3 seconds gives us an empty inbox. With an online service, we would need to load several pages, depending on how long they are, then click “Select all” and “Delete” on each of these pages. Some older email services had the problem of hiding one of these options, so the only way to delete emails in them was to select all checkboxes individually. You can imagine the kind of effort this required.

Email isn’t the only automation candidate. As we already mentioned, anything that we do periodically and consistently should make us think about automation. We just need to identify which these things are in our environment. For instance, if our website has hundreds of images that need to be optimized, it is not scalable to optimize them one by one. On the other side, optimizing only a couple of images allows us to easily compare different services and the compression ratios they achieve. Then we may choose to use the most optimal solution where it matters, even when the process cannot be automated.

If there is a high probability that a server component would fail, we could take precautions and install some redundancy, whose cost corresponds to the frequency and severity of the problem. Once this component eventually fails, the additional one would take over its functions automatically with zero downtime for the clients having data on that server.

Another example are website usage statistics. Few are the businesses which aren’t constantly checking their numbers and using them to adjust their strategies. This leads to the danger of giving too much attention to the numbers and reduced attention to our actions that actually define them. Even though some analysis is important, doing too much of it can be paralyzing. The first step is to recognize how often we are doing this. If we check the statistics four times a day and it takes us 20mins in total, then this is probably an automation candidate. Then we could try to understand what we are looking at in terms of the data. Probably a program that in order to plot visually appealing charts has to take its data from somewhere. The “no program without data” idea leads us to the access logs. We can observe what they contain and seek where the most important data (in our case) appears to be and how it is formatted. Each access log is accessible through FTP where we could type our credentials once and connect/disconnect as often as we need by means of executing code and not by means of clicking on GUI controls. After the connection is established, we could retrieve the files in binary mode, copy their contents to separate files and close the connection. A separate script can then be used to parse and extract the data we need, using regular expressions. Then it can also plot this data, so that for each logging period separate time series for each variable of interest are created. Finally, another script can be responsible for the presentation of these data plots, being accessible through a simple bookmark from the browser. This script can also do some simple sorting and filtering, so that for instance only plots having data from the last year are shown in reversed chronological order. Now, anytime we click on the bookmark, the log file for the last month will be updated and the plot we’ll see will be updated as well. All without having to login to a control panel, without seeking the icon of the log analysis software, without having to choose a subdomain for which we want to obtain statistical data, without having to dig deeply into potentially irrelevant data, without having to stare at a design we are not happy with. The automated solution gave us the flexibility to organize and present the data in the most sensible way for our concrete case. What it can’t do however is to generate different sources of data that aren’t present in the logs. For that we would need to write our own statistical software. The result can look something like this:

Using unit tests to automate software testing is another area where a lot can be gained. But this often requires learning more libraries, tools and thought patterns. Sometimes the cost of automation can be too high, so we should not blindly assume that just because it is possible, it will be good in any situation.