Monday, August 04, 2008

Speedy Dynamic Member Lookup in C (part 2)

The comments on my previous post were highly enlightening. Joe pointed me to an article on the Polymorphic Inline Cache, and it's use in the SELF programming language. And Joel mentioned some different techniques used in other languages (Objective C, LISP, C++, and others) to implement dynamic lookups. With this new information, I can see that my simple solution might fall far short of the optimal solution, which isn't all that surprising as I'm certain more time has been spent on these real programming languages. But what I do think is interesting, is the idea that a more complex solution can give better performance than a simpler design. It is as if there is a valley, with the simple solution of static, precompiled behavior on a peak of performance, standing opposite a complex design which introduces overhead of collecting statistics on the most common paths through the code and constantly recompiles to machine code which optimizes the most used idioms. In between lies the poorly performing trough of the half solution. Solutions which provide dynamic behavior, but don't go far enough to mitigate the added cost of runtime lookups. The simple idea that I've coded up sits in this trough of slowness, but I'd wager it is faster than some solutions I've seen out there.

This isn't really a great solution, but I had already started, and wanted to finish because I think it was a good learning experience. We begin with a definition of a data structure that points to an offset lookup method.
typedef struct {
size_t (*memberLookup)(char*);
} TypedStruct;

typedef struct {
size_t (*memberLookup)(char*);
unsigned int age;
unsigned int weight;
char* name;
} Person;

size_t PersonLookup(char*);

typedef struct {
size_t (*memberLookup)(char*);
char* breed;
unsigned int weight;
unsigned int age;
char* name;
} Dog;

size_t DogLookup(char*);
You can think of Person and Dog as subclasses of a common parent. They share a few member attributes, but the attributes are in different places within the structure. I want to write a function which can find the age member in whatever structure is passed in to it. How do we find out where a member lies within a structure? With this design, we ask the structure to tell us. The memberLookup function takes a name, and returns the offset for the desired member. Here are the lookup functions that describe the above structures:
#include"typed_struct.h"
#include<stdio.h>

size_t PersonLookup(char* memberName) {
if(strcmp(memberName, "age") == 0) {
return sizeof(size_t);
} else if(strcmp(memberName, "weight") == 0) {
return sizeof(int) + sizeof(size_t);
} else if(strcmp(memberName, "name") == 0) {
return 2*sizeof(int) + sizeof(size_t);
} else {
return -1;
}
}

size_t DogLookup(char* memberName) {
if(strcmp(memberName, "breed") == 0) {
return sizeof(size_t);
} else if(strcmp(memberName, "weight") == 0) {
return sizeof(size_t) + sizeof(char*);
} else if(strcmp(memberName, "age") == 0) {
return sizeof(size_t) + sizeof(char*) + sizeof(int);
} else if(strcmp(memberName, "name") == 0) {
return sizeof(size_t) + sizeof(char*) + 2*sizeof(int);
} else {
return -1;
}
}
With the data structures in place and the functions which map member names to locations, we can now write functions which dynamically lookup the location of a member. This simple function displays the age of the entity which is passed in. The static variables are designed to speed up repeated calls with the same type. For example, if a Person is passed in as entity several times in a row, the location of the age member will only need to be looked up on the first call.
void GiveIntroduction(void* entity) {
static size_t age_offset = 0;
static size_t (*last_lookup)(char*) = NULL;

if(last_lookup != ((TypedStruct*)entity)->memberLookup) {
// In words, convert entity to a TypedStruct pointer
// (TypedStruct*)entity
// Find the function pointer called memberLookup
// ((TypedStruct*)entity)->memberLookup
// Call the function at memberLookup with "age" as the parameter
// (*(((TypedStruct*)entity)->memberLookup))("age")
age_offset = (*(((TypedStruct*)entity)->memberLookup))("age");
last_lookup = ((TypedStruct*)entity)->memberLookup;
}

printf("I am %i years old\n", *((int*)(entity + age_offset)));
}
Here is a program which uses the above structures and functions.
int main() {
Dog nelly;
nelly.memberLookup = &DogLookup;
nelly.age = 4;
nelly.name = "Nelly";

GiveIntroduction(&nelly);
}
When the above is run, it should display "I am 4 years old".

Now, what is the point of all of this? In statically typed, compiled languages, the compiler usually has to know in advance where the data is and what it's type is. For example, with the Dog structure, the compiler knows that age is 12 bytes from the start of the structure (on most 32-bit processors), and that it's type is int. All of the offsets are calculated at compile time. By moving the offset lookup into a function which is executed at runtime, we free up the compiler from having to know where to look for a particular member. Now we can pass in any data type into GiveIntroduction and as long as the object's memberLookup function tells us where to find the age member, everything will work.

We've gained flexibility at the cost of some overhead. Setting the memberLookup on every instance of a TypedStruct uses a bit of memory, and calling the function to find the necessary offset adds a runtime cost. This example has been a bit simple, even too simple, as there are ways to provide more aggressive caching of the lookups and faster ways to perform the lookup within the XLookup functions.

No comments: