Friday, August 01, 2008

Speedy Dynamic Member Lookup in C (part 1)

If my senior class had voted for a "Most likely to try to invent a new programming language", I probably would have at least gotten a nomination for the category. From time to time I think about how I could design language constructs that maintain efficiency while allowing for more flexibility than statically linked, early bound languages like C. Some of my previous posts have been along these lines. And I've recently been thinking about performing member lookups on an object instance and how I could write a more flexible object system in C.

C has a concept of structs and types, and allows you to define a data structure, which is compiled. Here's an example:
typedef struct {
uint16_t age;
uint16_t weight;
wchar* name;
} Person;
Now when you say Person fred, your program will grab a chunk of memory (likely from the stack) that is 8 bytes, or 12 bytes, or something similar depending on the compiler, size of pointers, ect. Now if I write fred.age in my program, this is translated to "the value of the two bytes at the start of fred's address" and if I ask for fred.weight, that turns into "the value of two bytes which are at exactly fred's address + 2". Since C is early bound (AKA static), the compiler knows that weight is always found in the same location relative to the start of fred, and as a result, the machine code generated by the compiler can be very fast and efficient.

The place where this optimization breaks down, is when you have a different structure but you want to be able to use a function written for one type of object with a different object. Say for example, we had a different structure that also used age and weight:
typedef struct {
wchar* breed;
uint16_t weight;
uint16_t age;
wchar* name;
} Dog;
Now if I want to reuse the code I wrote earlier to find out the age of a Dog, I have a small problem. You see, the age is in a different location for a Dog than a Person. If it had been in the same position, I could have cast the Dog to a Person, and then my compiled code would be looking for age at the same location. But for one reason or another, the layout of Dog is different.

In many programming languages, the difference in layout would present no problem, because the offsets aren't set in stone at compile time. Instead there is a dynamic lookup which says "where can I find 'age' for the type of data this is?" and we can add the same functionality here in C. Often this lookup is done using some kind of associative array, but I have a few slightly different ideas in mind.

C gets around this potential limitation by requiring you to specify the type of the each chunk of data. So for example, if you have a Person fred and a Dog nelly, then age is converted into a different offset from the start of the data. However this means that you would need to write the same function multiple times, once for Person and once for Dog.

Instead of rewriting a function for each different type, I'd like to use the same function for all types which have members which match up. In other words, as long as a type has a member named age of type uint16 and member string called name. It should be possible to create a C function which behaves a bit more like dynamic languages like JavaScript and Python.

In order to accomplish a dynamic member lookup, I'll add one thing to the types I've defined above. The first member will always be a pointer to something that allows the location of a named object to be determined. This could be a function that takes a name and produces an offset, it could be a hash table that maps names to offsets, it could point to a binary tree, or something else.
typedef struct {
void* type;
uint16_t age;
uint16_t weight;
wchar* name;
} Person;

typedef struct {
void* type;
wchar* breed;
uint16_t weight;
uint16_t age;
wchar* name;
} Dog;
Now that the type information is stored along with the data, I could write a PrintAge function that would lookup the correct location of the age member. Here's some pseudocode for how this more dynamic code would look:
function PrintAge()
// Note that this age offset is stored between function calls so
// that if the function is called with the same type as in the most recent call,
// the offset will not have to be looked up again. This should improve
// performance by avoiding the need to perform a lookup in some cases.
static long age_offset = -1
static void* most_recent_type
if most_recent_type != current_type || age_offset == -1
age_offset = lookup(age)
printf("age is *(person+age_offset)")
I haven't decided yet on the best way to lookup the offset based on the member name. I think it might be a close call between pointing to a lookup function with if statements, and a lookup in a presorted binary tree. Depending on the number of members, I could see it going either way. What do you think, what is the best way to implement a dynamic member lookup?
Post a Comment