A list of interesting papers

Whether we realize it or not, we are all researchers. Any time we have a problem that needs fixing, we examine whether someone else has already fixed it before us. This allows us to avoid spending too much time on the small things and the things that aren't innovative. They would suck our energy if we couldn't find a more constructive way of working.

A researcher is never happy about the current state of results. There is always more to be discovered, fixed, improved and tested. Trying to quantify the unmeasurable is always important, since we know that what can be measured, can be improved. Anything that has a number attached to it becomes a reference systsem for feature comparisons. Better is rarely better; in many cases we are not even aware of the tradeoffs we are making. What we often try is to work in relative terms.

Noone is error-free and researchers aren't as well. Just because we are truth-seekers doesn't mean that we always find the truth. In many cases (and sometimes years later) new bright minds discover previously overlooked errors and in doing so advance the field for everyone.

As web designers (and researchers), we collect various bits of scattered information from many websites. It is often impossible to immediately make sense of it. Fragmented data has the potential to lead to fragmented thoughts, which isn't something helpful during work. The breadth and depth of the information are two criteria to consider when we choose the things we would like to think about. Reading books is said to expand our worldview, so this can be considered the breadth dimension, although there are many books that go into depth too; scientific publications/papers are often more concentrated on a single topic that is then explored in depth. Balancing the two doesn't seem a bad idea.

The papers we read give us new ideas on how to approach our existing problems. They are usually a combined effort, written by teams of people collaborating across time and space. This has the potential to introduce us to many different viewpoints and ways of thinking, to many ideas that are context-specific and sometimes even conflicting. Reading papers could help us develop an intuition for how something could work. But reading alone isn't enough to increase our understanding.

It is not always easy to see how to proceed after reading a book or a paper. Should we immediately read another one or leave some time to do something and explore our understanding of the topic? How can we decide which route to go next given the countless references competing for our attention? Some people say that making the wrong decisions here may cancel our previous efforts.

A publication that has a lot of references may hide a lot of value at the end, especially when we are unlikely to pay attention to each line. Even when topics seem interconnected (like in a graph), they usually aren't the very same thing. This means that in order to explore more, we may need to jump from one thought to another which is inefficient. Relevance ranking or how related the content is to a given keyword isn't a measure that is always obvious, especially in the offline world. And the more resources are listed, the harder it becomes to choose one.

We sometimes forget that instead of studying the whole topic, we can also pick a keyword and find everything about it. The last is also more manageable. This means that we intentionally choose to study the contexts in which something is happening (for reasons of understanding our problems and which of them are related) and not the solutions and best practices that were given to us from the start. But we may not understand a solution until we have a firm grasp of the problem.

The same keyword found in the titles of many different publications may say something about them, even when the connection seems distant at first. If we are studying a given method or technique, using them as keywords in our search allows us to understand when they can be applied, not whether they are better or worse compared to other existing alternatives. Nothing is truly better or worse, but only made so in a context. By exploring various contexts through the keywords that surround them, we enhance our understanding of the world.

That said, it is not hard to create something simple like a mechanism for searching in the titles of scientific publications. Potentially interesting papers are often highly cited and therefore easy to find. Showing titles very close to each other could help to find new links between them. While it is true that no such database can be complete, we can try to store a subset of the content that has found a good acceptance.

It is often interesting to see which keywords are more frequent as they can tell something about the nature of the material. Here is a short list of the most common, ordered by the number of their occurences.

Overview of the most common keywords obtained from the database of research papers

What we see here is that the word "algorithm" seems to appear more often than "algorithms". Working on a single algorithm seems to be hard enough, yet having an overview of the many related ones is something that allows a good exploration of a particular domain. If we just took the combined sum of the number of occurences of the two words and plotted the result in the first bar, we could miss a small detail like this. The word "using" is the second most common one, which is a bit surprising. It probably comes out of the desire to do practical things, that are valuable to other people, that they can use, re-use and reproduce. If any "product" derives its value from the extent to which it is used, this only makes sense.

The keyword "data" also appears disproportionally often, because it allows us to make sense of the things around us, leading to a better understanding of our work. Data commonly fuels algorithms and knowing more about its properties and patterns allows us to adjust and write more efficient algorithms, since they would work for a concrete case and not simply be used as a general-purpose tool.

"learning" is also related to data (or data mining), but here the focus is probably more on approaches rather than concrete artefacts. Everyone (both individuals and business entities) needs to be constantly learning and evolving. The dynamics with which this happens can be seen as a source of competive advantage. Learning constantly affects our decisions. For instance, stock markets are a good illustration of this.

Starting from the word "linear", we see that others appear much less frequently, yet they give a nice overview of which things seem to be of interest to researchers. I hope that this information will help you in your own research and experiments.

bit.ly/15hsvQ1