Imposterrific - a blog by Jeff Scudder: c

Showing posts with label c. Show all posts

Tuesday, June 22, 2010

A Simple Testing Library for C

To prepare for a recent post graduate computer science class, I wrote a small library in C which aids in the creation of lightweight, unit-test-like programs. The code can be found here, and using it looks a bit like this:

#include"asserts.h"

int main(void)
{
  c7e3_assert(1 == 1, "1 should equal 1");
  c7e3_assert(2 == 2, "2 should equal 2");
  c7e3_report();
  return 0;
}

The design follows the KISS principle and I think it is a nice fit to the simplicity of C. While there is not much to it, I wrote numerous tests using it over the past couple of months and all of that testing certainly paid off.

Monday, August 04, 2008

Speedy Dynamic Member Lookup in C (part 2)

The comments on my previous post were highly enlightening. Joe pointed me to an article on the Polymorphic Inline Cache, and it's use in the SELF programming language. And Joel mentioned some different techniques used in other languages (Objective C, LISP, C++, and others) to implement dynamic lookups. With this new information, I can see that my simple solution might fall far short of the optimal solution, which isn't all that surprising as I'm certain more time has been spent on these real programming languages. But what I do think is interesting, is the idea that a more complex solution can give better performance than a simpler design. It is as if there is a valley, with the simple solution of static, precompiled behavior on a peak of performance, standing opposite a complex design which introduces overhead of collecting statistics on the most common paths through the code and constantly recompiles to machine code which optimizes the most used idioms. In between lies the poorly performing trough of the half solution. Solutions which provide dynamic behavior, but don't go far enough to mitigate the added cost of runtime lookups. The simple idea that I've coded up sits in this trough of slowness, but I'd wager it is faster than some solutions I've seen out there.

This isn't really a great solution, but I had already started, and wanted to finish because I think it was a good learning experience. We begin with a definition of a data structure that points to an offset lookup method.

typedef struct {
  size_t (*memberLookup)(char*);
} TypedStruct;

typedef struct {
  size_t (*memberLookup)(char*);
  unsigned int age;
  unsigned int weight;
  char* name;
} Person;

size_t PersonLookup(char*);

typedef struct {
  size_t (*memberLookup)(char*);
  char* breed;
  unsigned int weight;
  unsigned int age;
  char* name;
} Dog;

size_t DogLookup(char*);

You can think of Person and Dog as subclasses of a common parent. They share a few member attributes, but the attributes are in different places within the structure. I want to write a function which can find the age member in whatever structure is passed in to it. How do we find out where a member lies within a structure? With this design, we ask the structure to tell us. The memberLookup function takes a name, and returns the offset for the desired member. Here are the lookup functions that describe the above structures:

#include"typed_struct.h"
#include<stdio.h>

size_t PersonLookup(char* memberName) {
  if(strcmp(memberName, "age") == 0) {
    return sizeof(size_t);
  } else if(strcmp(memberName, "weight") == 0) {
    return sizeof(int) + sizeof(size_t);
  } else if(strcmp(memberName, "name") == 0) {
    return 2*sizeof(int) + sizeof(size_t);
  } else {
    return -1; 
  }
}

size_t DogLookup(char* memberName) {
  if(strcmp(memberName, "breed") == 0) {
    return sizeof(size_t);
  } else if(strcmp(memberName, "weight") == 0) {
    return sizeof(size_t) + sizeof(char*);
  } else if(strcmp(memberName, "age") == 0) {
    return sizeof(size_t) + sizeof(char*) + sizeof(int);
  } else if(strcmp(memberName, "name") == 0) {
    return sizeof(size_t) + sizeof(char*) + 2*sizeof(int);
  } else {
    return -1; 
  }
}

With the data structures in place and the functions which map member names to locations, we can now write functions which dynamically lookup the location of a member. This simple function displays the age of the entity which is passed in. The static variables are designed to speed up repeated calls with the same type. For example, if a Person is passed in as entity several times in a row, the location of the age member will only need to be looked up on the first call.

void GiveIntroduction(void* entity) {
  static size_t age_offset = 0;
  static size_t (*last_lookup)(char*) = NULL;
  
  if(last_lookup != ((TypedStruct*)entity)->memberLookup) {
    // In words, convert entity to a TypedStruct pointer
    // (TypedStruct*)entity
    // Find the function pointer called memberLookup
    // ((TypedStruct*)entity)->memberLookup
    // Call the function at memberLookup with "age" as the parameter
    // (*(((TypedStruct*)entity)->memberLookup))("age")
    age_offset = (*(((TypedStruct*)entity)->memberLookup))("age");
    last_lookup = ((TypedStruct*)entity)->memberLookup;
  }

  printf("I am %i years old\n", *((int*)(entity + age_offset)));
}

Here is a program which uses the above structures and functions.

int main() {
  Dog nelly;
  nelly.memberLookup = &DogLookup;
  nelly.age = 4;
  nelly.name = "Nelly";
  
  GiveIntroduction(&nelly);
}

When the above is run, it should display "I am 4 years old".

Now, what is the point of all of this? In statically typed, compiled languages, the compiler usually has to know in advance where the data is and what it's type is. For example, with the Dog structure, the compiler knows that age is 12 bytes from the start of the structure (on most 32-bit processors), and that it's type is int. All of the offsets are calculated at compile time. By moving the offset lookup into a function which is executed at runtime, we free up the compiler from having to know where to look for a particular member. Now we can pass in any data type into GiveIntroduction and as long as the object's memberLookup function tells us where to find the age member, everything will work.

We've gained flexibility at the cost of some overhead. Setting the memberLookup on every instance of a TypedStruct uses a bit of memory, and calling the function to find the necessary offset adds a runtime cost. This example has been a bit simple, even too simple, as there are ways to provide more aggressive caching of the lookups and faster ways to perform the lookup within the XLookup functions.

Friday, August 01, 2008

Speedy Dynamic Member Lookup in C (part 1)

If my senior class had voted for a "Most likely to try to invent a new programming language", I probably would have at least gotten a nomination for the category. From time to time I think about how I could design language constructs that maintain efficiency while allowing for more flexibility than statically linked, early bound languages like C. Some of my previous posts have been along these lines. And I've recently been thinking about performing member lookups on an object instance and how I could write a more flexible object system in C.

C has a concept of structs and types, and allows you to define a data structure, which is compiled. Here's an example:

typedef struct {
  uint16_t age;
  uint16_t weight;
  wchar* name;
} Person;

Now when you say Person fred, your program will grab a chunk of memory (likely from the stack) that is 8 bytes, or 12 bytes, or something similar depending on the compiler, size of pointers, ect. Now if I write fred.age in my program, this is translated to "the value of the two bytes at the start of fred's address" and if I ask for fred.weight, that turns into "the value of two bytes which are at exactly fred's address + 2". Since C is early bound (AKA static), the compiler knows that weight is always found in the same location relative to the start of fred, and as a result, the machine code generated by the compiler can be very fast and efficient.

The place where this optimization breaks down, is when you have a different structure but you want to be able to use a function written for one type of object with a different object. Say for example, we had a different structure that also used age and weight:

typedef struct {
  wchar* breed;
  uint16_t weight;
  uint16_t age;
  wchar* name;
} Dog;

Now if I want to reuse the code I wrote earlier to find out the age of a Dog, I have a small problem. You see, the age is in a different location for a Dog than a Person. If it had been in the same position, I could have cast the Dog to a Person, and then my compiled code would be looking for age at the same location. But for one reason or another, the layout of Dog is different.

In many programming languages, the difference in layout would present no problem, because the offsets aren't set in stone at compile time. Instead there is a dynamic lookup which says "where can I find 'age' for the type of data this is?" and we can add the same functionality here in C. Often this lookup is done using some kind of associative array, but I have a few slightly different ideas in mind.

C gets around this potential limitation by requiring you to specify the type of the each chunk of data. So for example, if you have a Person fred and a Dog nelly, then age is converted into a different offset from the start of the data. However this means that you would need to write the same function multiple times, once for Person and once for Dog.

Instead of rewriting a function for each different type, I'd like to use the same function for all types which have members which match up. In other words, as long as a type has a member named age of type uint16 and member string called name. It should be possible to create a C function which behaves a bit more like dynamic languages like JavaScript and Python.

In order to accomplish a dynamic member lookup, I'll add one thing to the types I've defined above. The first member will always be a pointer to something that allows the location of a named object to be determined. This could be a function that takes a name and produces an offset, it could be a hash table that maps names to offsets, it could point to a binary tree, or something else.

typedef struct {
  void* type;
  uint16_t age;
  uint16_t weight;
  wchar* name;
} Person;

typedef struct {
  void* type;
  wchar* breed;
  uint16_t weight;
  uint16_t age;
  wchar* name;
} Dog;

Now that the type information is stored along with the data, I could write a PrintAge function that would lookup the correct location of the age member. Here's some pseudocode for how this more dynamic code would look:

function PrintAge()
  // Note that this age offset is stored between function calls so 
  // that if the function is called with the same type as in the most recent call, 
  // the offset will not have to be looked up again. This should improve 
  // performance by avoiding the need to perform a lookup in some cases.
  static long age_offset = -1
  static void* most_recent_type
  if most_recent_type != current_type || age_offset == -1
    age_offset = lookup(age)
  printf("age is *(person+age_offset)")

I haven't decided yet on the best way to lookup the offset based on the member name. I think it might be a close call between pointing to a lookup function with if statements, and a lookup in a presorted binary tree. Depending on the number of members, I could see it going either way. What do you think, what is the best way to implement a dynamic member lookup?

Wednesday, April 30, 2008

Creating and Using .so Files With gcc

And what is an .so file you might ask? Those from a Windows background might know them as .dll files, but that still doesn't answer the question.

Normally, when you compile a program, the system takes all of the pieces of code that you've written, converts them into something the computer can understand (machine code) and glues them together. You might compare the process to building a pyramid. All of the stones that you've sculpted are joined together and stacked up. Just like with a pyramid, if you want to change one of the stones, everything that sits on top of it needs to be adjusted. With a C program, making changes in a piece of code means lots of pieces need to be recompiled and reglued together.

If you use dynamic linking, shared libraries, whatever you'd like to call them, things work a bit differently. With dynamic linking, the pieces are not all glued together or stacked up, instead they are plugged in when they are needed. As a result, changes to a piece of code don't require that lots of other pieces get reshaped (recompiled) to fit. With dynamic linking, it is possible to change a piece of code while the program is running. Funny, that sounds a bit like a scripting language.

Here's an example. Let's say I wrote a library with a function called SpecialPrint. (It's like printf but different!) Here is the code for speciallib.c:

#include<stdio.h>

void SpecialPrint(char* payload) {
  printf("Special Print:%s\n", payload);
}

Now I write a program that uses this library. Here's the normal, non-dynamic, static linking, built like a pyramid way of using the library.

#include<stdio.h>
#include"speciallib.h"

int main() {
  printf("normal printf argument\n");
  SpecialPrint("special print's argument");
}

With the above code, I compile the SpecialPrint library with my main code and link them together, producing a single executable file. However, I can use the same speciallib file with code that loads the library dynamically while the program is running. The code might look something like this:

#include<stdio.h>
#include<dlfcn.h>

void ShowError() {
  char *dlError = dlerror();
  if(dlError) printf("Error with dl function: %s\n", dlError);
}

int main() {
  void *SharedObjectFile;
  void (*SpecialPrintFunction)(char*);

  // Load the shared libary;
  SharedObjectFile = dlopen("./speciallib.so", RTLD_LAZY);
  ShowError();
  // Obtain the address of a function in the shared library.
  SpecialPrintFunction = dlsym(SharedObjectFile, "SpecialPrint");
  ShowError();

  printf("normal printf argument\n");
  // Use the dynamically loaded function.
  (*SpecialPrintFunction)("special print's argument");

  dlclose(SharedObjectFile);
  ShowError();
}

In the above, the show error function is optional, I added it so that any errors that occur would be displayed. When you compile the above code, you'll need to make the speciallib into an .so file and make sure that the speciallib's directory (the current directory in my example) is listed as one of the places that we should look for shared object files. Here's what the steps to compile look like:

export LD_LIBRARY_PATH=`pwd`
gcc -c -fpic speciallib.c
gcc -shared -lc -o speciallib.so speciallib.o
gcc main.c -ldl

For an explanation of what the above compiler options mean, and further explanation on .so files, see these two IBM articles on using shared object files on Solaris and linux.

Learning about .so files has been very interesting to me, because they bring a new level of flexibility to a traditionally rigid environment like C.

Tuesday, April 22, 2008

Early vs Late Binding

I've been thinking recently about programming languages (surprised?), specifically about the things that make them different. One of the really nice things about C, is that it compiles into machine code which tends to run lean and mean. By that I mean it is blazing fast and doesn't take up much memory. On the other hand, programming in Python and JavaScript has really been growing on me. There is so much flexibility to create elegant solutions quickly and without rewriting lots of existing code. In fact, I'd say greater ability to reuse existing code is a natural outgrowth of programming language flexibility.

So where does this flexibility come from? One place I tend to notice it most, is in the ability to give an existing function a new body, in other words, you can plug in different behavior in place of the default.

Here's a simple example to illustrate the idea. Let's say that we created a simple checkout register which takes a receipt, adds the sales tax, and spits out the grand total. Here's our code foundation in both Python and JavaScript (these two examples do essentially the same thing):

Python:

def CalculateTax(amount):
  return amount * 0.18

class Receipt(object):
  
  def __init__(self, items=None):
    self.items = items or []
  
  def CalculateTotal(self):
    return sum([item + CalculateTax(item) for item in self.items])

JavaScript:

function calculateTax(amount) {
  return amount * 0.18;
}

function Receipt(items) {
  if (items) {
    this.items = items;
  } else {
    this.items = new Array();
  }
}

Receipt.prototype.calculateTotal = function() {
  var total = 0;
  for (var i = 0; i < this.items.length; i++) {
    total += this.items[i] + calculateTax(this.items[i]);
  }
  return total;
}

To use the above code, you might write something like this:

Python:

my_order = Receipt([5.50, 10, 7.89])
print my_order.CalculateTotal()

JavaScript:

var myOrder = new Receipt([5.50, 10, 7.89]);
alert(myOrder.calculateTotal());

Now let's say someone asks you to change the tax rate which is used when calculating the total. Here's the catch, you're not allowed to change the existing code. It turns out this is actually really easy. You can define a new function, then make an existing function name point to the new function. Here's an example of how to inject our new code:

Python:

def CalculateHigherTax(amount):
  return amount * 0.25

CalculateTax = CalculateHigherTax

print my_order.CalculateTotal()

JavaScript:

function calculateHigherTax(amount) {
  return amount * 0.25;
}

calculateTax = calculateHigherTax;

alert(myOrder.calculateTotal());

After adding the above code to the foundation we started with, you will notice that the calculate total method now uses calculate-higher-tax instead of the original function, even though you are calling the same method on the same object as before. Congratulations, you have just witnessed late binding in action.

So what is late binding? The idea is that the computer decides which code should be executed while the program is running. This seems normal in scripting languages, but compiled languages often use this too (I'm looking at you Java and C++). For example, overloaded methods and polymorphism take advantage of late binding. With late binding you can change the meaning of an identifier (for example, change the behavior when you call a specific function) at just about any time.

Now lets take a look at a language which uses early binding. C is a great example. With early binding, the meaning of things like function names are locked in when the code is compiled. There is no dynamic lookup while the program is running to see which code should be executed, instead the address of the desired code is embedded directly into the binary machine code.

Here is how the same calculate-total example might look in C:

#include<stdio.h>

float CalculateTax(float amount) {
  return amount * 0.18;
}

typedef struct {
  float* items;
  int num_items;
} Receipt;

float CalculateTotal(Receipt this_order) {
  int i;
  float total = 0;
  for(i = 0; i < this_order.num_items; i++) {
    total += this_order.items[i] + CalculateTax(this_order.items[i]);
  }
  return total;
}

int main(void) {
  Receipt my_order;
  float my_items[3] = {5.50, 10, 7.89};
  my_order.items = my_items;
  my_order.num_items = 3;
  printf("%f\n", CalculateTotal(my_order));
}

If you try to set CalculateTax to a new function definition, you will get an error at compile time because a function cannot be changed once it is bound. Early binding tends to produce more efficient programs. However, if you want to, you can still use the flexiblity available in late binding in C.

Using function pointers, you can store the address of the code that you want to be executed, and change the address while the program is running. We can achieve the same late binding effects that I've illustrated in Python and JavaScript by making some small changes to the C code (marked in bold below). Declare a function pointer named TaxCalculator which will store the address of the desired calculate-tax function, then change CalculateTotal so that it uses the TaxCalculator instead of directly calling a calculate-tax function.

#include<stdio.h>

float CalculateTax(float amount) {
  return amount * 0.18;
}

float CalculateHigherTax(float amount) {
  return amount * 0.25;
}

typedef struct {
  float* items;
  int num_items;
} Receipt;

float (*TaxCalculator)(float) = &CalculateTax;

float CalculateTotal(Receipt this_order) {
  int i;
  float total = 0;
  for(i = 0; i < this_order.num_items; i++) {
    total += this_order.items[i] + (*TaxCalculator)(this_order.items[i]);
  }
  return total;
}

int main(void) {
  Receipt my_order;
  float my_items[3] = {5.50, 10, 7.89};
  my_order.items = my_items;
  my_order.num_items = 3;
  printf("%f\n", CalculateTotal(my_order));
  TaxCalculator = &CalculateHigherTax;
  printf("%f\n", CalculateTotal(my_order));
}

There you have it!

Here's another way to think about this comparison. In high level languages which don't expose pointers, functions, variables, and other identifiers actually act like pointers.

Wednesday, January 23, 2008

A spoiled programmer

I've been writing quite a bit of Python code recently and I've become a bit spoiled. It's easy in Python to define new classes on the fly, create new functions, pass them here and there, and return arbitrary collections from a method. C will always have a special place in my heart (I think everyone's first language does), but I often think of ways I could make it a bit easier to do certain things like have functions that return functions or have a function return multiple values.

To explain by way of example, it would be fun to do something like this:

/* A function that returns multiple values */
int, char, int myFunction(int a, int b, int c, char d) {...}

...
  /* Invoke the function and store the results */
  int x, y;
  char c;
  {x, y, c} = myFunction(5, 6, 7, 'Z');

The above is a fairly Pythonic way of doing things, and it seems like it should be possible in C. The first way I thought of is using structs. I like to think of a struct as the precursor to a class. It allows the arbitrary grouping of variables into a single collection where they can be referred to by name. (In a couple of earlier posts, I showed how you could use structs to simulate classes in C.)

If I define a struct for each one of my multi-variable-returning functions, I can create functions which effectively return multiple values instead of just one. Yes, technically I am returning one value, the struct, but you know what I meant :-) Namely, if you look at the program's stack, there is probably no perceptible difference between returning a struct and returning multiple variables.

/* Create a 2 member struct to hold the return value */
struct myFuncReturn {
  int first;
  char second;
};

/* A function that returns an int and a char */
struct myFuncReturn myFunc(int a, int b, int c, char d) {
  struct myFuncReturn to_return;
  to_return.first = (a+b)*c;
  to_return.second = d;
  return to_return;
}

int main() {
  struct myFuncReturn pattern;
  pattern = myFunc(2, 3, 4, 'Z');
  printf("Pattern: %i, %c\n", pattern.first, pattern.second);
}

This works ok, but I would like to avoid having to create a new struct for each one of my functions. It might be easier if I didn't have to worry about types at all, so the natural choice is to have the function return a type-less void pointer (void*). The calling code would then be responsible for interpreting the function's return struct correctly. If I want to return a new anonymous struct from a function, it might look something like this:

void* myFunc(int a, int b, int c, char d);

If I use the above, I'll need to allocate memory for the struct and return it's address. This is a bit of a bother as well, because now I need to worry about cleaning up that memory later. Instead of having the function allocate a new structure to return, why not pass in a structure and have the function modify it? The code I would need to write would be more aesthetically pleasing (in my opinion) for both the function definition and the calling code which invokes it, and it might even be more efficient.

If I pass in a pointer to the result struct as the first parameter to the function, my program could look like this.

/* Function definition, the out parameter is the return value */
void myFunc(void* out, int a, int b, int c, char d) {
  ((struct{int first; char second;}*)out)->first = (a+b)*c;
  ((struct{int first; char second;}*)out)->second = d;
}

int main() {
  struct{int first; char second;} pattern;
  myFunc(&pattern, 2, 3, 4, 'Z');
  printf("Pattern: %i, %c\n", pattern.first, pattern.second);
}

Look ma, no type declarations! Now you might say that writing out the entire struct definition each time is a bit unpleasant, but you could always define a struct and use it instead. I wanted to show that you don't really need to declare a type for each function, which could create a bit of a mess if you start using multi-return functions everywhere. With the above pattern, you could also start to play some interesting games by having functions that actually return different structs in different situations (provided the out pointer's reserved space is large enough for the data you want to send back). If I've lost you by now, I do apologize.

For added effect, note that the anonymous structs don't need to match, you just need to make sure that the shape of the structure is the same so that you don't overwrite data. I could have written myFunc like this:

void myFunc(void* out, int a, int b, int c, char d) {
  ((struct{int first;}*)out)->first = (a+b)*c;
  ((struct{int x; char second;}*)out)->second = d;
}

Or if you want to go even further, like this:

void myFunc(void* out, int a, int b, int c, char d) {
  *((int*)out) = (a+b)*c;
  ((struct{int x; char second;}*)out)->second = d;
}

Ah, the joys of programming. It's little games like this that make programming lots of fun. It's like working on a big wide open puzzle that you get to build yourself. No wonder I'm spoiled.

Friday, February 02, 2007

Simulating Classes in C - Part 2

Well the $1 C contest has come to an end after generating lots of interest but no full blown solutions. I'm okay with that, thanks to all of you who expressed interest and started on it. From conversations with several of you over chat, I know that there were some great ideas out there. If you don't mind taking the time, please describe your idea in the comments section. As my friend Matt can attest, I love discussing software, ideas, designs, and just about anything related to creative problem solving. This is probably why I really enjoyed math in high school and college. I looked at each problem as a logic puzzle which could be solved multiple ways. Finding the most elegant solution made me feel like I had just written a poem of supreme beauty.


/* JSObject functions */
void InvokeObjectMethod(JSObject* object, char* methodName,
                       JSListOfObjects* inputParams,
                       JSListOfObjects* outputParams) {
 int i;
 JSClass* class_of_object;
 int method_index = -1;

 /* Impossible to invoke a method on a NULL pointer  */
 if(object == NULL) {
   /* ToDo: send a "NULL object" error code in the outputParams. */
   return;
 }

 class_of_object = (JSClass*)(object->type);

 if(class_of_object == NULL) {
   /* ToDo: send a "NULL class" error code in the outputParams. */
   return;
 }

 /* Find the index of the methodName in the object's class.  */
 /* This part of the code is O(n) efficient, but this could be
    improved if the function names were sorted or if a hash
    algorithm was used. Hashing could reduce to O(1). */
 /* I could also allow the calling code to specify the desired
    method directly by index to avoid requiring a string
    lookup for each invocation. */
 for(i = 0; i <>method_count; i++) {
   if(strcmp(class_of_object->method_names[i], methodName) == 0) {
     method_index = i;
     break;
   }
 }

 /* If the method name was not found in the list of class methods,
    return an error code. */
 if(method_index == -1) {
   /* ToDo: send a "method not found" error code in the
      outputParams. */
   return;
 }

 /* Invoke the method at the index corresponding to the method
    name. */
 (*(class_of_object->methods[method_index]))(object, inputParams,
                                             outputParams);
}

I wrote my solution a few weeks ago, then started researching how other object oriented schemes are implemented. I found out that my code has several disadvantages and some significant advantages as well. For example, I started to comapre the likelyhood of missing the RAM cache to solutions in C++ and Java. There were a few other intersting comparisons but I won't go into all of the nitty gritty details. Overall, I'm very pleased because I learned a lot. Lets discuss!

Wednesday, January 10, 2007

Simulating classes in C

I have to admit, I like to program in C. In some ways, it is simpler than some of the newer, high level languages, and I really enjoy being closer to the machine code. Perhaps I'm a bit obsessive about efficiency. Still, I sometimes long for classes and objects in my C programs, (This is where you tell me that I should use C++ or Objective-C.) so I decided to figure out how I could simulate classes in plain old ANSI C.

It turned out to be quite easy. After I read up on function pointers, I created a struct which contains references to functions (kind of like class methods). Here's a simple example of what I'm talking about:


#include<stdio.h>

typedef struct {
  int (*test)(int); /* This is the function pointer */
} functionholder;

/* At runtime, I will point to this function */
int thefunction(int x) {
  printf("thefunction says %i\n", x);
  return 0;
}

int main() {
  functionholder fholder;
  fholder.test = &thefunction;
  (*fholder.test)(7);
  return 0;
}

When I run the above, the program prints "thefunction says 7".
From there, I decided to create a family of structures which would simulate classes, objects, and allow polymorphic function calls.

One way to divide up data and methods in an OO way is to say that the class dictates the methods which can be used with the data in an object instance. So we could say that a class contains a list of functions. Using function pointers, the method called can be changed at runtime, as long as all of the functions have the same stack profile. Have I lost you yet? So what I needed to do was create a generic function prototype which abstracts the parameter list into a common format. While I was at it, I decided to lose the restriction that methods return only one value, so the results of a method call will be stored in a structure which is passed in as a parameter. (The method should probably be called a procedure instead of a function.) Perhaps it would be simpler to show you the code.


typedef struct {
  void* type; /* This will point to a JSClass. */
  void* data;
} JSObject;

typedef struct {
  int object_count; /* number of objects in the array */
  JSObject** objects; /* An array of pointers to JSObjects */
} JSListOfObjects;

typedef struct {
  char* name;
  /* Each JSClass contains a list of pointers to the methods which belong
   * to that class. */
  int method_count;
  /* strings naming the functions (allow method lookup by string) */
  char **method_names;
  /* An array of pointers to functions */
  void (**methods)(JSObject* x, JSListOfObjects* y, JSListOfObjects* z);
} JSClass;

I was going to discuss the functions I created to streamline "class" method invocation, but this post is getting a bit long. So I've decided to try something fun. I'm going to ask you, gentle reader, to suggest a design for a function which simplifies invocation of an object method. I will mail $1 (USD) to the person who posts the "best" working solution in the comments below. (I'll be compiling using gcc -ansi. Oh, and US residents only, I don't want to run into any strange rules.). In addition to a dollar, you'll have won bragging rights for the first of my blog challenges. After a week or two, I'll post my solution and we can all compare notes. Happy coding!

Wednesday, November 01, 2006

A Steganography Scheme - Part 2

My text based steganography program is here! I completed the program yesterday, and Vanessa came up with a great name: Steganosaurus. Get in touch with me if you would like me to send you the program and the source code. I'm still considering publishing it on an open source hosting site.

You may recall my previous post about this steganography scheme, and I said I would tell you how it all works. So here goes:

My steganography program needs 4 pieces of information to embed or extract a secret message, it needs a file which will be converted, the base, the shift value, and a filename to which the converted message will be stored. The base and shift need a bit of explanation.

Base: All data on a computer is a number, and a number can be expressed multiple ways. I wrote about this in "A Steganography Scheme - Part 1". So the base tells Steganosaurus how to express the data. Should each number in the source file be converted into a series of values between 0-10, 0-50, 0-200? The choice is yours.

Shift: My program outputs a range of values from the source file, and each of them is between 0 and Base. How does this become readable text? That is the purpose of the shift, it is a value added to each piece of the converted file to make it into a character. So the result of the whole process is the contents of the original file expressed as numbers in the range Shift to Base + Shift. These numbers are converted into Unicode characters (UTF-8) so the end result is a readable file. You can see an example in my wiki entry about Steganosaurus. (It's probably easiest to see an example.) By using different shift values, you can hide the data from your original file in text from any language in the world.

So check it out, give it a shot, and ask me to send Steganosaurus to you.