One idea we have learned from neural networks is that using more layers allows us to capture detail at different levels of granularity to explore more fully the attributes of the instance we are trying to learn from. Convolutional neural networks combine many layers to analyze images, accepting a three dimensional input. But training a CNN can be quite slow and we often have no quick feedback until the time the results appear. And to develop a good learning algorithm, very often we need to be learning fast ourselves.

We can try to work with images slightly differently. We know that each pixel is an RGB triple of values between 0 and 255, where the image itself is a matrix of such values. If we have two pixels [0, 30, 253] and [128, 230, 8], they will look fine and within bounds, but if we choose to take their sum to form a new pixel, the result would be [128, 260, 261] and we have exceeded the threshold twice. This results in "burnt" pixels, producing artefacts in our images. So we need to keep our pixel values in check after each and every operation on the image.

There are two simple ideas we could use to try stabilizing our values. We could scale down the initial values in the image by a given amount and apply operations on the scaled values to give ourselves a greater margin of error. This means that we use the image at a reduced opacity level, where we might have to use a floor function to preserve the integer characteristic of the values. If we have the image as a matrix, we could divide this matrix by the number of layers we plan to stack in our final image. Broadcasting will then ensure that all R, G and B values for all pixels will be updated and scaled down accordingly—in a single operation. If we plan to use ten layers, the opacity of each layer would be 0.1. And if we later chose to combine/add these layers, we would return to our original image.

The second idea, which is optional, is that we could use blur on a single layer and use that as a starting point, before further filtering. This has been shown to have a positive effect in that it reduces the tendency for filters that come later in the pipeline to produce burnt pixels. The downside is that the image will look slightly blurry compared to the original. This initial blur has been used extensively to create the images on this page.

We could combine different types of filters, having different sizes. On each iteration we could vary the filter size similarly to how the size of the convolution kernel is adjusted in a neural network. Each kernel must have a size which is an odd number. If we add an even number to the kernel size on each iteration, we will still get an odd one. As the kernel size grows, we start working at a higher level of detail, which often negatively impacts the processing time.

We could create filter pipelines, where the output of each filter serves as an input for the next. But the order in which the filters are applied matters and often we will get different results by swapping the position of two filters. Neural networks may use max pooling after a convolution layer, which serves to stabilize the output before it reaches the next layer. PIL offers a variety of filters, some of which could possibly be used for the same purpose. We could repeat a sequence of layers multiple times or we could split into independent sequences of filtered results, merging them together at the end (on a single layer).

There are many ways in which we can make the computation more complex, but sometimes the gains in doing so are small. Below you can see some results from using various filters and kernel sizes on a sample image. In all cases ten layers have been used. Using more produced washed-out results and needelessly increased the processing time.

As these examples show, through the combination of many filters we can achieve some very interesting effects. The only limitation is how many filters we can choose from. But in case we need more, we can add our own kernels to the predefined ones.