Tuesday, June 22, 2010

A Simple Testing Library for C

To prepare for a recent post graduate computer science class, I wrote a small library in C which aids in the creation of lightweight, unit-test-like programs. The code can be found here, and using it looks a bit like this:
#include"asserts.h"

int main(void)
{
c7e3_assert(1 == 1, "1 should equal 1");
c7e3_assert(2 == 2, "2 should equal 2");
c7e3_report();
return 0;
}
The design follows the KISS principle and I think it is a nice fit to the simplicity of C. While there is not much to it, I wrote numerous tests using it over the past couple of months and all of that testing certainly paid off.

Friday, June 18, 2010

JavaScript Tricks to Speed up Your Site

One of the techniques which makes the web so powerful is the ability to load code, images, and other resources from all over the Internet. So often though, the process of loading these resources and ensuring that all of the required pieces are in place leads to a slow experience for visitors. With the ability to include so much code from across the web, visiting a site could potentially be like installing a new program when it comes to the amount of stuff that needs to be downloaded.

With this in mind, there are a couple of nifty tricks that can help make your app more responsive and I've written up an example site and testing server that shows some ideas for speeding up the user experience when you need to wait for the DOM to load or for additional JavaScript to be fetched and run. We'll begin with document operations.

Often the JavaScript running on a page manipulates the DOM, using document.getElementById here and document.createElement there. In order to ensure that all the pieces of the page are in place, web programmers often take advantage of the onload callback. It might be used like this
<body onload="runMyCodeNow()">
Using this technique ensures that all of the things your code might want to read and write from the page are in place. All images have been downloaded, CSS rules have been applied, the layout is all there. However, all of this comes with a cost, your code doesn't run until every last resource has been fetched and rendered. Even the little footer at the bottom of the page, for example, that your code doesn't care about.

There is a another way, we could request that resources be loaded in parallel and start executing our code before the page is fully loaded. Chances are, your code doesn't need the complete page to be loaded before it starts running, and running before onload will reduce the delay for your users. Before I dive into how this can be accomplished, lets look at an example which uses the old fashioned way.

Lets say you have a web page, a little HTML which includes five JavaScript files. One may be a library used to do animation, another one for loading the users data. In any case, all of these files need to be loaded and some of them depend on others.

The biggest bottleneck for your users is almost certainly having all of these resources load. Network latency is a killer, and something that is often overlooked during development. To create a simulated network environment which can give a more realistic (or even pessimistic) view of the cost of loading these resources, I wrote a "slow server" which can introduce a delay to the file requested. Here is the code for my testing server (designed to run on App Engine):
def FilePath(path):
"""The requested path into a local file path."""

return os.path.join(os.path.dirname(__file__), 'files', path[1:])


class SleepyRenderer(webapp.RequestHandler):
"""Serves the requested page with a client configured delay.

Delay is given as a URL parameter in hundredths of a second to delay.
For example, 200 means wait 2 seconds before responding.

Example request:
http://localhost:8080/hi.html?delay=300&contenttype=text/html
"""

def get(self):
path = self.request.path
delay = self.request.get('delay')
content_type = self.request.get('contenttype') or 'text/html'
if delay:
time.sleep(int(delay)/100)
http_status = 200
requested_file = None

try:
requested_file = open(FilePath(path))
self.response.out.write(requested_file.read())
requested_file.close()
except IOError:
http_status = 404

self.response.set_status(http_status)
self.response.headers['Content-Type'] = content_type


def main():
application = webapp.WSGIApplication([('/.*', SleepyRenderer)],
debug=True)
util.run_wsgi_app(application)


if __name__ == '__main__':
main()
With the above code we can introduce a delay on each individual file. To see this in action with our example, here is some HTML which shows a traditional approach, include script includes and an onload callback when everything has loaded.
<html>
<head>
<script src="/testa.js?delay=500&contenttype=text/javascript"></script>
<script src="/testb.js?delay=400&contenttype=text/javascript"></script>
<script>
function init() {
document.getElementById('output');
output.innerHTML = [
'a is ' + a,
'b is ' + b,
'c is ' + c,
'd is ' + d,
'e is ' + e
].join('<br>');
}
</script>
<script src="/testc.js?delay=300&contenttype=text/javascript"></script>
</head>
<body onload="init()">
<script src="/testd.js?delay=200&contenttype=text/javascript"></script>
<div id="output"></div>
<script src="/teste.js?delay=100&contenttype=text/javascript"></script>
<script src="/testa.js?delay=500&contenttype=text/javascript"></script>
</body>
</html>
With the above, the page takes several seconds to load and when the very last script has loaded, the 'output' div gets its contents. In many cases, the code really doesn't need to wait for all resources to load, only the ones that are necessary for the code to run. In this case, since the information is added to the output div, we need the output div to exist in the DOM, but we may not need the entire page to load.

If you look at this loading process in a profiler you might see something like this:Now for our first nifty trick. One way to check to see if the necessary prerequisites are present, is by polling the DOM or the JavaScript environment, to see if conditions are right for the code to run. Here is an example of how this code might be rewritten when using some polling helper functions:
    <script>
loader.whenNodePresent('output',
function() {
var output = document.getElementById('output');
loader.whenReady(function() {return window['a'];},
function() {
output.innerHTML += 'a is ' + a + '<br>';
});
loader.whenReady(function() {return window['b'];},
function() {
output.innerHTML += 'b is ' + b + '<br>';
});
loader.whenReady(function() {return window['c'];},
function() {
output.innerHTML += 'c is ' + c + '<br>';
});
loader.whenReady(function() {return window['d'];},
function() {
output.innerHTML += 'd is ' + d + '<br>';
});
loader.whenReady(function() {return window['e'];},
function() {
output.innerHTML += 'e is ' + e + '<br>';
});
})
</script>
The code to track the prerequisites and poll is quite simple:
loader.waiting = [];


loader.whenReady = function(testFunction, callback) {
if (testFunction()) {
callback();
} else {
loader.waiting.push([testFunction, callback]);
window.setTimeout(loader.checkWaiting, 200);
}
};


loader.checkWaiting = function() {
var oldWaiting = loader.waiting;
var numWaiting = oldWaiting.length;
loader.waiting = [];
for (var i = 0; i < numWaiting; i++) {
if (oldWaiting[i][0]()) {
oldWaiting[i][1]();
} else {
loader.waiting.push(oldWaiting[i]);
}
}

if (loader.waiting.length > 0) {
window.setTimeout(loader.checkWaiting, 200);
}
};


loader.whenNodePresent = function(nodeId, callback) {
loader.whenReady(function () {
return document.getElementById(nodeId);
}, callback);
};
In the above we use the whenReady function which takes a couple of functions, one to return a truthy or a falsey value, and one to call back when the first function evaluates to true. If the condition function isn't true when this first call is made, we check back every so often to see if it is ready.

With these changes, we shave several seconds off of the user perceived loading time. Specifically we no longer need to wait for the duplicate load (of the testa script) at the end of the body. The page also appears to be more responsive because the later script's messages appear just after they load but before the page is complete.

Now that we've seen a way to work around the need for an onload callback, lets look at another place we can tweak the browser's behavior to make a web page more responsive: dynamic script loading.

The most straightforward way to include new code in your page is to use a script tag, something like:
<html>
<head>
<script src="some_great_sites_javascript">
...
When the browser's JavaScript interpreter encounters this script src, it stops whatever it's doing and fetches that resource. It doesn't do any more rendering or executing of code until it's finished. This behavior varies a bit in different browsers and is likely an artifact of an old design in which this kind of single threaded behavior was the only option. Since some sites might depend on this linear behavior to get a script's dependencies all in order, this quirk might be with us for a long time. Most of this time, waiting like this is a really silly idea. How often do the scripts that you include depend on one another?

There are a few parts to this trick. The first is to not put all of script includes in the HTML, you could have JavaScript add new script elements to the page which will cause new code to be loaded as needed. In this way, you could load only the resources that are needed at the moment, perhaps some resources would not end up being requested at all. Including a new script could be done in two ways:
document.write('<script src="somefile.js"></script>');
or
var newScript = document.createElement('script');
newScript.src = 'somefile.js';
document.body.appendChild(newScript);
Each of the above is appropriate in different situations. Document write adds HTML directly into the page at the point where the page is being loaded, it should only be used for script tag inclusion if the page is not yet loaded. If that page is loaded, using document.write to add the script tag will wipe out the existing body entirely. I've seen this issue in the wild, if you assume document.write is always safe, you'll be bitten when using it after the page has loaded.

Instead you can perform a check to see if document.body exists, if it does then use document.body.appendChild. If it does not yet exist, use document.write. The code for this loader logic might look something like this:
loader.loadScript = function(url) {
if (document.body) {
var newScript = document.createElement('script');
newScript.type = 'text/javascript';
newScript.src = url;
document.body.appendChild(newScript);
} else {
document.write('<scr' + 'ipt type="text\/javascript" src="' +
url + '"><\/scr' + 'ipt>');
}
};
Now we can request that new JavaScript code be loaded on the fly and it works when the page has not yet finished loaded as well as after it has.

There is one more trick we can add to this loader. Some browsers will interpret the JavaScript in the order in which the scripts were requested, not the order in which they finished loading. That means that a fast loading script further down the list won't be run until a slower script, which appears above it, is loaded. One way we could defeat this delay, is to break the script includes out of linear execution in the JavaScript. If you use setTimeout to introduce a delay in adding the script include to the page, then the code which sets up the script requests can finish quickly and the browser will get back to the script requests later without the same linear constraints. In our code, we wrap the section of loader.loadScript in a short timeout as follows:
loader.loadScript = function(url) {
window.setTimeout(function() {
if (document.body) {
var newScript = document.createElement('script');
newScript.type = 'text/javascript';
newScript.src = url;
document.body.appendChild(newScript);
} else {
document.write('<scr' + 'ipt type="text/javascript" src="' +
url + '"><\/scr' + 'ipt>');
}
}, 1);
};
With the above changes in place, our example page from before now loads like this when profiled (note that the messages appear in the order that the scripts were loaded, we don't have to wait for everything before we edit the page):Through the course of this post, I've written a small library for using these tricks when loading JavaScript dynamically in the page as well as a server for trying it out. These are available here as open source code. There are some improvements that could be made here. Off the top of my head, the checkWaiting function could eventually time out if a condition continues to not be met. Also the loader could do more to check to see if a requested script has already been loaded. Any more ideas?