Imposterrific - a blog by Jeff Scudder: August 2008

Wednesday, August 20, 2008

Dirt Simple CMS

I recently created an App Engine app to run www.jeffscudder.com. At the moment the code is extremely simple, and I get so few visitors to that web page that I doubt I will need anything complicated.

When I write blog posts and web pages, I have always preferred to just edit the HTML, and I have always wanted a simple content management system that just let me edit the HTML, JavaScript, CSS, ect. in the browser. Blogger comes awfully close to the perfect tool in my opinion, but it is geared towards displaying a series of posts. I wanted a landing page with links to all of the other content I put out there in the blagoweb. And I wanted to be able to host the simple web app's that I write (like the recently mentioned password generator).

With those design goals in mind, I set out to create my super simple content management system. It runs on App Engine, and the admin (me) is able to sign in to a special secret /content_manager page which lets me assign a specific blob of text to the desired URL under my domain. I can also set some basic metadata, like the content type (so that your browser knows how the content should be rendered) and cache control information, since HTTP caching is excellent and saves puppies from drowning in lakes (ok seriously it will alleviate congestion and unnecessary traffic when you want to give the same content to thousands or millions of people).

Editing pages through the /content_manager looks like this:

I've also decided to open source the code and I called the project scud-cms. Since App Engine is free for you to sign up, you can just upload this code and start setting your own content from right there in the browser.

(P.S. The idea for this simple content manager is very similar to one of my earlier projects: Scorpion Server, with which an authorized user could set the content at just about any URL they wanted.)

Tuesday, August 12, 2008

Opera on the XO Laptop

No I'm not talking about singing.

I was looking forward to the release of Firefox 3 with great anticipation. I had tried out the beta versions and noticed the improvements in JavaScript execution speed and memory footprint. Since the XO laptop is relatively slow and memory constrained compared to many of the other computers that I work with, I was really hoping that Firefox 3 would be a big improvement over Firefox 2. It turns out, for the XO, it wasn't. The place where the limitations were most apparent, was scrolling on a large web page. The page would hang for a second or so and then slowly creep down. It probably had to do with the short supply of available RAM on my happy little green machine.

As momentum was building for Firefox 3, I started to hear good things about Opera 9.51. When I ran into performance issues on Firefox and the XO, I decided to give Opera a try. Some thought I was crazy since, as Arne mentioned, the XO can be a bit slow. Yes, the XO laptop is a different environment than most other computers out there, and Opera has turned out to be an improvement.

To set up Opera on the XO, I began by downloading from the Opera web site. I selected Opera 9.51 for Linux i386, selected RedHat as the distribution in the drop down menu, then I selected RedHat 6.2, 7.1, 7.2, 7.3, 8.0, 9 and checked the box to Download this package in TAR.GZ format. Whew.

Next, unpack the archive using

tar zxvf opera-9.51.gcc4-static-qt3.i386.tar.gz

and then run

install.sh -s

From there, you can just type opera on the command line in the terminal to start it up.

Monday, August 04, 2008

Speedy Dynamic Member Lookup in C (part 2)

The comments on my previous post were highly enlightening. Joe pointed me to an article on the Polymorphic Inline Cache, and it's use in the SELF programming language. And Joel mentioned some different techniques used in other languages (Objective C, LISP, C++, and others) to implement dynamic lookups. With this new information, I can see that my simple solution might fall far short of the optimal solution, which isn't all that surprising as I'm certain more time has been spent on these real programming languages. But what I do think is interesting, is the idea that a more complex solution can give better performance than a simpler design. It is as if there is a valley, with the simple solution of static, precompiled behavior on a peak of performance, standing opposite a complex design which introduces overhead of collecting statistics on the most common paths through the code and constantly recompiles to machine code which optimizes the most used idioms. In between lies the poorly performing trough of the half solution. Solutions which provide dynamic behavior, but don't go far enough to mitigate the added cost of runtime lookups. The simple idea that I've coded up sits in this trough of slowness, but I'd wager it is faster than some solutions I've seen out there.

This isn't really a great solution, but I had already started, and wanted to finish because I think it was a good learning experience. We begin with a definition of a data structure that points to an offset lookup method.

typedef struct {
  size_t (*memberLookup)(char*);
} TypedStruct;

typedef struct {
  size_t (*memberLookup)(char*);
  unsigned int age;
  unsigned int weight;
  char* name;
} Person;

size_t PersonLookup(char*);

typedef struct {
  size_t (*memberLookup)(char*);
  char* breed;
  unsigned int weight;
  unsigned int age;
  char* name;
} Dog;

size_t DogLookup(char*);

You can think of Person and Dog as subclasses of a common parent. They share a few member attributes, but the attributes are in different places within the structure. I want to write a function which can find the age member in whatever structure is passed in to it. How do we find out where a member lies within a structure? With this design, we ask the structure to tell us. The memberLookup function takes a name, and returns the offset for the desired member. Here are the lookup functions that describe the above structures:

#include"typed_struct.h"
#include<stdio.h>

size_t PersonLookup(char* memberName) {
  if(strcmp(memberName, "age") == 0) {
    return sizeof(size_t);
  } else if(strcmp(memberName, "weight") == 0) {
    return sizeof(int) + sizeof(size_t);
  } else if(strcmp(memberName, "name") == 0) {
    return 2*sizeof(int) + sizeof(size_t);
  } else {
    return -1; 
  }
}

size_t DogLookup(char* memberName) {
  if(strcmp(memberName, "breed") == 0) {
    return sizeof(size_t);
  } else if(strcmp(memberName, "weight") == 0) {
    return sizeof(size_t) + sizeof(char*);
  } else if(strcmp(memberName, "age") == 0) {
    return sizeof(size_t) + sizeof(char*) + sizeof(int);
  } else if(strcmp(memberName, "name") == 0) {
    return sizeof(size_t) + sizeof(char*) + 2*sizeof(int);
  } else {
    return -1; 
  }
}

With the data structures in place and the functions which map member names to locations, we can now write functions which dynamically lookup the location of a member. This simple function displays the age of the entity which is passed in. The static variables are designed to speed up repeated calls with the same type. For example, if a Person is passed in as entity several times in a row, the location of the age member will only need to be looked up on the first call.

void GiveIntroduction(void* entity) {
  static size_t age_offset = 0;
  static size_t (*last_lookup)(char*) = NULL;
  
  if(last_lookup != ((TypedStruct*)entity)->memberLookup) {
    // In words, convert entity to a TypedStruct pointer
    // (TypedStruct*)entity
    // Find the function pointer called memberLookup
    // ((TypedStruct*)entity)->memberLookup
    // Call the function at memberLookup with "age" as the parameter
    // (*(((TypedStruct*)entity)->memberLookup))("age")
    age_offset = (*(((TypedStruct*)entity)->memberLookup))("age");
    last_lookup = ((TypedStruct*)entity)->memberLookup;
  }

  printf("I am %i years old\n", *((int*)(entity + age_offset)));
}

Here is a program which uses the above structures and functions.

int main() {
  Dog nelly;
  nelly.memberLookup = &DogLookup;
  nelly.age = 4;
  nelly.name = "Nelly";
  
  GiveIntroduction(&nelly);
}

When the above is run, it should display "I am 4 years old".

Now, what is the point of all of this? In statically typed, compiled languages, the compiler usually has to know in advance where the data is and what it's type is. For example, with the Dog structure, the compiler knows that age is 12 bytes from the start of the structure (on most 32-bit processors), and that it's type is int. All of the offsets are calculated at compile time. By moving the offset lookup into a function which is executed at runtime, we free up the compiler from having to know where to look for a particular member. Now we can pass in any data type into GiveIntroduction and as long as the object's memberLookup function tells us where to find the age member, everything will work.

We've gained flexibility at the cost of some overhead. Setting the memberLookup on every instance of a TypedStruct uses a bit of memory, and calling the function to find the necessary offset adds a runtime cost. This example has been a bit simple, even too simple, as there are ways to provide more aggressive caching of the lookups and faster ways to perform the lookup within the XLookup functions.

Friday, August 01, 2008

Speedy Dynamic Member Lookup in C (part 1)

If my senior class had voted for a "Most likely to try to invent a new programming language", I probably would have at least gotten a nomination for the category. From time to time I think about how I could design language constructs that maintain efficiency while allowing for more flexibility than statically linked, early bound languages like C. Some of my previous posts have been along these lines. And I've recently been thinking about performing member lookups on an object instance and how I could write a more flexible object system in C.

C has a concept of structs and types, and allows you to define a data structure, which is compiled. Here's an example:

typedef struct {
  uint16_t age;
  uint16_t weight;
  wchar* name;
} Person;

Now when you say Person fred, your program will grab a chunk of memory (likely from the stack) that is 8 bytes, or 12 bytes, or something similar depending on the compiler, size of pointers, ect. Now if I write fred.age in my program, this is translated to "the value of the two bytes at the start of fred's address" and if I ask for fred.weight, that turns into "the value of two bytes which are at exactly fred's address + 2". Since C is early bound (AKA static), the compiler knows that weight is always found in the same location relative to the start of fred, and as a result, the machine code generated by the compiler can be very fast and efficient.

The place where this optimization breaks down, is when you have a different structure but you want to be able to use a function written for one type of object with a different object. Say for example, we had a different structure that also used age and weight:

typedef struct {
  wchar* breed;
  uint16_t weight;
  uint16_t age;
  wchar* name;
} Dog;

Now if I want to reuse the code I wrote earlier to find out the age of a Dog, I have a small problem. You see, the age is in a different location for a Dog than a Person. If it had been in the same position, I could have cast the Dog to a Person, and then my compiled code would be looking for age at the same location. But for one reason or another, the layout of Dog is different.

In many programming languages, the difference in layout would present no problem, because the offsets aren't set in stone at compile time. Instead there is a dynamic lookup which says "where can I find 'age' for the type of data this is?" and we can add the same functionality here in C. Often this lookup is done using some kind of associative array, but I have a few slightly different ideas in mind.

C gets around this potential limitation by requiring you to specify the type of the each chunk of data. So for example, if you have a Person fred and a Dog nelly, then age is converted into a different offset from the start of the data. However this means that you would need to write the same function multiple times, once for Person and once for Dog.

Instead of rewriting a function for each different type, I'd like to use the same function for all types which have members which match up. In other words, as long as a type has a member named age of type uint16 and member string called name. It should be possible to create a C function which behaves a bit more like dynamic languages like JavaScript and Python.

In order to accomplish a dynamic member lookup, I'll add one thing to the types I've defined above. The first member will always be a pointer to something that allows the location of a named object to be determined. This could be a function that takes a name and produces an offset, it could be a hash table that maps names to offsets, it could point to a binary tree, or something else.

typedef struct {
  void* type;
  uint16_t age;
  uint16_t weight;
  wchar* name;
} Person;

typedef struct {
  void* type;
  wchar* breed;
  uint16_t weight;
  uint16_t age;
  wchar* name;
} Dog;

Now that the type information is stored along with the data, I could write a PrintAge function that would lookup the correct location of the age member. Here's some pseudocode for how this more dynamic code would look:

function PrintAge()
  // Note that this age offset is stored between function calls so 
  // that if the function is called with the same type as in the most recent call, 
  // the offset will not have to be looked up again. This should improve 
  // performance by avoiding the need to perform a lookup in some cases.
  static long age_offset = -1
  static void* most_recent_type
  if most_recent_type != current_type || age_offset == -1
    age_offset = lookup(age)
  printf("age is *(person+age_offset)")

I haven't decided yet on the best way to lookup the offset based on the member name. I think it might be a close call between pointing to a lookup function with if statements, and a lookup in a presorted binary tree. Depending on the number of members, I could see it going either way. What do you think, what is the best way to implement a dynamic member lookup?