Wednesday, April 30, 2008

Creating and Using .so Files With gcc

And what is an .so file you might ask? Those from a Windows background might know them as .dll files, but that still doesn't answer the question.

Normally, when you compile a program, the system takes all of the pieces of code that you've written, converts them into something the computer can understand (machine code) and glues them together. You might compare the process to building a pyramid. All of the stones that you've sculpted are joined together and stacked up. Just like with a pyramid, if you want to change one of the stones, everything that sits on top of it needs to be adjusted. With a C program, making changes in a piece of code means lots of pieces need to be recompiled and reglued together.

If you use dynamic linking, shared libraries, whatever you'd like to call them, things work a bit differently. With dynamic linking, the pieces are not all glued together or stacked up, instead they are plugged in when they are needed. As a result, changes to a piece of code don't require that lots of other pieces get reshaped (recompiled) to fit. With dynamic linking, it is possible to change a piece of code while the program is running. Funny, that sounds a bit like a scripting language.

Here's an example. Let's say I wrote a library with a function called SpecialPrint. (It's like printf but different!) Here is the code for speciallib.c:
#include<stdio.h>

void SpecialPrint(char* payload) {
printf("Special Print:%s\n", payload);
}
Now I write a program that uses this library. Here's the normal, non-dynamic, static linking, built like a pyramid way of using the library.
#include<stdio.h>
#include"speciallib.h"

int main() {
printf("normal printf argument\n");
SpecialPrint("special print's argument");
}
With the above code, I compile the SpecialPrint library with my main code and link them together, producing a single executable file. However, I can use the same speciallib file with code that loads the library dynamically while the program is running. The code might look something like this:
#include<stdio.h>
#include<dlfcn.h>

void ShowError() {
char *dlError = dlerror();
if(dlError) printf("Error with dl function: %s\n", dlError);
}

int main() {
void *SharedObjectFile;
void (*SpecialPrintFunction)(char*);

// Load the shared libary;
SharedObjectFile = dlopen("./speciallib.so", RTLD_LAZY);
ShowError();
// Obtain the address of a function in the shared library.
SpecialPrintFunction = dlsym(SharedObjectFile, "SpecialPrint");
ShowError();

printf("normal printf argument\n");
// Use the dynamically loaded function.
(*SpecialPrintFunction)("special print's argument");

dlclose(SharedObjectFile);
ShowError();
}
In the above, the show error function is optional, I added it so that any errors that occur would be displayed. When you compile the above code, you'll need to make the speciallib into an .so file and make sure that the speciallib's directory (the current directory in my example) is listed as one of the places that we should look for shared object files. Here's what the steps to compile look like:
export LD_LIBRARY_PATH=`pwd`
gcc -c -fpic speciallib.c
gcc -shared -lc -o speciallib.so speciallib.o
gcc main.c -ldl
For an explanation of what the above compiler options mean, and further explanation on .so files, see these two IBM articles on using shared object files on Solaris and linux.

Learning about .so files has been very interesting to me, because they bring a new level of flexibility to a traditionally rigid environment like C.

Tuesday, April 22, 2008

Early vs Late Binding

I've been thinking recently about programming languages (surprised?), specifically about the things that make them different. One of the really nice things about C, is that it compiles into machine code which tends to run lean and mean. By that I mean it is blazing fast and doesn't take up much memory. On the other hand, programming in Python and JavaScript has really been growing on me. There is so much flexibility to create elegant solutions quickly and without rewriting lots of existing code. In fact, I'd say greater ability to reuse existing code is a natural outgrowth of programming language flexibility.

So where does this flexibility come from? One place I tend to notice it most, is in the ability to give an existing function a new body, in other words, you can plug in different behavior in place of the default.

Here's a simple example to illustrate the idea. Let's say that we created a simple checkout register which takes a receipt, adds the sales tax, and spits out the grand total. Here's our code foundation in both Python and JavaScript (these two examples do essentially the same thing):

Python:
def CalculateTax(amount):
return amount * 0.18

class Receipt(object):

def __init__(self, items=None):
self.items = items or []

def CalculateTotal(self):
return sum([item + CalculateTax(item) for item in self.items])
JavaScript:
function calculateTax(amount) {
return amount * 0.18;
}

function Receipt(items) {
if (items) {
this.items = items;
} else {
this.items = new Array();
}
}

Receipt.prototype.calculateTotal = function() {
var total = 0;
for (var i = 0; i < this.items.length; i++) {
total += this.items[i] + calculateTax(this.items[i]);
}
return total;
}
To use the above code, you might write something like this:

Python:
my_order = Receipt([5.50, 10, 7.89])
print my_order.CalculateTotal()
JavaScript:
var myOrder = new Receipt([5.50, 10, 7.89]);
alert(myOrder.calculateTotal());
Now let's say someone asks you to change the tax rate which is used when calculating the total. Here's the catch, you're not allowed to change the existing code. It turns out this is actually really easy. You can define a new function, then make an existing function name point to the new function. Here's an example of how to inject our new code:

Python:
def CalculateHigherTax(amount):
return amount * 0.25

CalculateTax = CalculateHigherTax

print my_order.CalculateTotal()
JavaScript:
function calculateHigherTax(amount) {
return amount * 0.25;
}

calculateTax = calculateHigherTax;

alert(myOrder.calculateTotal());
After adding the above code to the foundation we started with, you will notice that the calculate total method now uses calculate-higher-tax instead of the original function, even though you are calling the same method on the same object as before. Congratulations, you have just witnessed late binding in action.

So what is late binding? The idea is that the computer decides which code should be executed while the program is running. This seems normal in scripting languages, but compiled languages often use this too (I'm looking at you Java and C++). For example, overloaded methods and polymorphism take advantage of late binding. With late binding you can change the meaning of an identifier (for example, change the behavior when you call a specific function) at just about any time.

Now lets take a look at a language which uses early binding. C is a great example. With early binding, the meaning of things like function names are locked in when the code is compiled. There is no dynamic lookup while the program is running to see which code should be executed, instead the address of the desired code is embedded directly into the binary machine code.

Here is how the same calculate-total example might look in C:
#include<stdio.h>

float CalculateTax(float amount) {
return amount * 0.18;
}

typedef struct {
float* items;
int num_items;
} Receipt;

float CalculateTotal(Receipt this_order) {
int i;
float total = 0;
for(i = 0; i < this_order.num_items; i++) {
total += this_order.items[i] + CalculateTax(this_order.items[i]);
}
return total;
}

int main(void) {
Receipt my_order;
float my_items[3] = {5.50, 10, 7.89};
my_order.items = my_items;
my_order.num_items = 3;
printf("%f\n", CalculateTotal(my_order));
}
If you try to set CalculateTax to a new function definition, you will get an error at compile time because a function cannot be changed once it is bound. Early binding tends to produce more efficient programs. However, if you want to, you can still use the flexiblity available in late binding in C.

Using function pointers, you can store the address of the code that you want to be executed, and change the address while the program is running. We can achieve the same late binding effects that I've illustrated in Python and JavaScript by making some small changes to the C code (marked in bold below). Declare a function pointer named TaxCalculator which will store the address of the desired calculate-tax function, then change CalculateTotal so that it uses the TaxCalculator instead of directly calling a calculate-tax function.
#include<stdio.h>

float CalculateTax(float amount) {
return amount * 0.18;
}

float CalculateHigherTax(float amount) {
return amount * 0.25;
}


typedef struct {
float* items;
int num_items;
} Receipt;

float (*TaxCalculator)(float) = &CalculateTax;

float CalculateTotal(Receipt this_order) {
int i;
float total = 0;
for(i = 0; i < this_order.num_items; i++) {
total += this_order.items[i] + (*TaxCalculator)(this_order.items[i]);
}
return total;
}

int main(void) {
Receipt my_order;
float my_items[3] = {5.50, 10, 7.89};
my_order.items = my_items;
my_order.num_items = 3;
printf("%f\n", CalculateTotal(my_order));
TaxCalculator = &CalculateHigherTax;
printf("%f\n", CalculateTotal(my_order));

}
There you have it!

Here's another way to think about this comparison. In high level languages which don't expose pointers, functions, variables, and other identifiers actually act like pointers.

Monday, April 21, 2008

Ubuntu Hardy Heron

Last week, I downloaded the beta release of Ubuntu 8.04 (Hardy Heron) to give it a try. I've been meaning to migrate our last Windows XP machine over to Linux for some time now (the other three computers I use regularly are Linux machines), but I've been reluctant to backup, repartition, and take the plunge. It seems like this is a common feeling among computer owners, but I think the Ubuntu community may have found an effective solution.

And the name of this new innovation: Wubi. Pop in the Ubuntu CD while running in Windows, and an auto-run installer opens which allows you to install Linux alongside Windows. When you reboot your computer, you see a menu of which operating system to boot into: Windows or Ubuntu. This means you can try out Ubuntu on your computer with zero risk to your existing files. In fact, you can access the files on your Windows installation from within Ubuntu. And if you decide Ubuntu is not your you, you can uninstall it as you would any other Windows program. Pretty slick.

Managing in the installation is just the beginning of the improvements the team has made. After I installed the latest version and booted into Ubuntu, it told me that there were propriety drivers for my NVIDIA graphics card and asked if I wanted to install them. I clicked the button, the download started, and I was up and running with 3D accelerated graphical desktop effects. I think I could sit there opening and closing windows all day. I'm looking forward to the upcoming official release.

Monday, April 14, 2008

A Musical Interlude... and back to Programming

I've been listening to Daft Punk and Justice quite a bit recently. Apparently I'm on a techno kick again. I've never found any electronic music that I've enjoyed as much as Joy Electric's The White Songbook. The purity of the tones and style has made it one of my all time favorite albums.

This got me thinking about a project which I thought of years ago, started, then abandoned. It was a music synthesizer/sequencer which you would program, by well, programming. I mean that the music would be controlled exclusively through a programming language. This would alter the creative process in several ways. Most music sequencers are graphical and allow you to lay out musical patterns in sequence. Writing a program is extremely non-linear, with classes, functions, and variables being defined in the code long before they are used. In this hypothetical sythesizer language. a composition might look something like this:

sequence "intro":
playSample("beat", start=0:32.1, end=0:33.5, beats=[1,3,9,11,15])
playSample("moog", beats=[5,7,11])
shiftPitch(start=A4, end=C4, duration=bars(8))

sequence "solo":
playSample("guitarRiff", start=1:15.3, end=2:09.0)

tempo 150 BPM
play("intro", now)
play("solo", end("intro"))
play("solo" now()+bars(5))

In the above example, the first play statement will be executed, then the third play statement (5 bars into the 8 bar intro), then the second solo will play again, probably before the first solo finishes. There are other interesting features in the pseudo-code above, but the fact that these sequences are played out of order was what I really want to highlight.

I've done live coding at several events over the years and I tend to have fun with it. It doesn't always go exactly as planned, but that is the whole idea. Live coding turns programming into a performance piece. I imagine there is a niche group of people who could really get into the live coding music scene.