Dissecting URLs

Every resource on our websites has its corresponding URL through which it can be accessed. This allows us to guide users from one place to another, based on content relevance, interests or intent. The path that they go through on our website can by itself create their experience. This means that programmatically updating URLs while the users interact with the site can add more value to it. Embedding the functionality that supports this is simple and immediately useful for the end user. When writing code, I always ask myself if this code has such immediate usefulness. We can very easily fall into the trap of writing thousands of lines of code that aren't directly related to improving the usability of our site. But it's often through our interactions with a site that we can see how its underlying code works. It's not enough when the code becomes apparent—it must be transparent. In other words, the site must suggests the robustness that reassures the user that nothing harmful will be loaded in the background while he/she is browsing. Even a slight delay in the reaction time of a site can make the user think that it hides something or that it collects data without permission. Of course, not everything can be made transparent, because revealing how important algorithms work threatens the integrity of the site. But in the majority of cases, when the code isn't directly related to the core of a business model, it's probably better to show how it works. CSS, for instance, rarely hides anything—selectors either match or they don't and this is immediately visible.

URLs also contribute to transparency. Hovering a link reveals where it might lead when clicked. Its anchor text describes in a clear, natural language—preferably the user's unique—what the content behind it is about. The high interconnectedness of resources means that websites must actively manage the transitions and make them as seamless as possible.

One simple way to navigate between pages is to change a value in the query string of an URL (which starts with the '?' character) and then reload the URL with the new value. Here you can see three simple variants, how this might be realized in JavaScript. You can choose one and attach it to an event of your choice. To explore which variant could be most effective, I wrapped them in functions that will execute only once, but will iterate as many times as needed to see a noticeable difference in execution times in the Firebug profiler. Generally, when choosing an option, we should always strive to pick the one that scales best, e.g. the one for which a much larger input would cause the slightest effect on the output. Sites like jsperf.com allow us to easily evaluate different alternatives and see how different browsers respond to them.

Let's look at the code:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46var iterations = 10000;
var url = 'http://www.dummeraugust.com/main/content/blog/posts.php?pid=96';

function updateURL1(url) {
    for ( var i = 0; i < iterations; i++ ) {
        var split_href = url.split('?'),
            base = split_href[0],
            query_string = split_href[1],
            split_query_string = query_string.split('='),
            query_string_base = split_query_string[0],
            pid = split_query_string[1];
        
        // Do something with PID
        // pid--;
        
        // Change the URL with the new PID when needed
        // 2 property lookups, 4 concatenations and 1 assignment -> slow
        // window.location.href = base + '?' + query_string_base + '=' + pid;
    }
}

function updateURL2(url) {
    for ( var i = 0; i < iterations; i++ ) {
        var base = url.slice(0,url.indexOf('=')+1),
            pid = url.split('=')[1];
        
        // pid--;
        // location = base + pid;
    }
}

function updateURL3(url) {
    for ( var i = 0; i < iterations; i++ ) {
        var idx = url.lastIndexOf('=') + 1,
            base = url.substring(0,idx),
            pid = url.substring(idx);
            
        // pid--;
        // location = base + pid;
    }
}

// Call functions
updateURL1(url);
updateURL2(url);
updateURL3(url);

In all three cases we want to take the URL of this page, get the pid value 96 from the query string, decrement it and reload the page. If we had and URL with more key/value parameters (separated with an ampersand '&'), we would have needed more code to distinguish between them. As you can see the actual redirects aren't performed in order to prevent crashing of the browser during the tests. But we must consider that decrements and redirects will normally add to the execution time. Here, variants 2 and 3 avoid unnecessary property lookups/concatenations and thus will run even faster. In fact, we have just measured how fast we can access the pid. Here are the results from the computation:

updateURL fnction execution time

Variant 3 performs almost twice as fast as variant 2, possibly due to the way the slice function works. In the third variant we have one variable more, but we also no longer need to access any array indices. Variant 1 is slowest, but it can be sometimes more flexible, depending on the complexity of the URL.

PHP also gives us the right instruments to work with URLs. We can use functions like dirname, basename, pathinfo, and variables like $_SERVER[ 'PHP_SELF' ], $_SERVER[ 'QUERY_STRING' ] and others to extract and manipulate only the parts of the URL that we need. When we are ready, we just use the header function and change the location to point to the new URL. In order to work, redirection must then be done in a way that isn't cyclic. Because the query string can be manipulated by the user, we need to escape input (for which the htmlentities function can help) and make sure that it is of the correct type. The functions is_bool, is_int etc. as well as the ctype family of functions can be sometimes handy here.

Even the simplest things can be realized in a variety of different ways, so it always pays to continue looking for better ways. More importantly, it is this variety that almost always guaranteess that we've chosen a suboptimal way, especially when the quality of the entire solution is a product of the quality of its individual components. Improvements to the algorithms we use, especially finding ways to reduce their complexity and to make them understandable for everyone around us (sometimes the best optimization), can make a difference between slow and fast apps/websites. It's incorrect to consider that when a famous framework uses a particular implementation, it must be the best possible one. Every decision was made in a specific context and thus it is relative only to that context. By not entirely understanding it and living in it, it's impossible for us to know if it's "right" or "wrong".

bit.ly/10YNcKd