Imposterrific - a blog by Jeff Scudder: 2008

Wednesday, November 19, 2008

Mashup Camp 8

I participated in MashupCamp8 and created a simple mashup in a few hours which I have now open-sourced. It is a blog post editor, or article editor if you prefer, which finds key words in your text as you are writing and displays web search results for those key words next to the editor. Here's my elevator pitch:

When writing an article or blog post, I'll often note places where I should add a link and go back later to add them in. Sometimes searching for good resources to link to uncovers new information which I wish I had known while I was writing the article.

As I was walking around here at Mashup Camp, I saw a few web services which would allow me to do some data mining on raw text (like this blog post I'm writing) and search the web for relevant information.

I created a simple Ajax editor, which is hosted on Google App Engine. It takes the article's text, sends it securely (over HTTPS) to Open Calais which extracts key words then disambiguates, dedupes, and scores them for relevancy. Using this semantic web data from Calais, the editor then performs a web search on each of the keywords and requests results from the Yahoo Search BOSS API in the JSON format. The text analysis and web search is done automatically when the editor detects that you have stopped typing. To avoid excessive and distracting search refreshes, the editor will wait between three and eight seconds before performing the search.

You can try it out here:
https://sippycode.appspot.com/article_editor.

Download the source code here.

This simple little mashup was well received, and I think part of the reason is the potential for growth. The fundamental idea, is to have an environment in which relevant data is brought to you as you work. You don't even need to go info hunting, our computers can find relevant information and bring it to you automagically.

For this particular mashup app, I had envisioned a few additional features, the addition of which is left as an exercise to the reader if you should so choose.

A WSIWYG editor to create rich HTML instead of just plain text. (I looked at tinyMCE but ran out of time.)
Search on combinations of keywords instead of just individual items identified by Calais.
Search other information sources, like news results, images, videos.
Search on multiple search engines. (Google has an easy to use Ajax search API, but I wanted to try something that was totally new to me.)

Tuesday, November 11, 2008

XML Library with Versioning

I recently created a simple Python library for converting objects to and from XML. Code samples up front, here's how you would define some class to represent some hierarchical XML:

class AtomFeed(XmlElement):
  _qname = '{http://www.w3.org/2005/Atom}feed'
  title = Title
  entries = [Entry]

class Entry(XmlElement):
  _qname = '{http://www.w3.org/2005/Atom}entry'
  links = [Link]
  title = Title
  content = Content

class Link(XmlElement):
  _qname = '{http://www.w3.org/2005/Atom}link'
  rel = 'rel'
  address = 'href'

class Url(XmlElement):
  _qname = '{http://www.w3.org/2005/Atom}url'

class Title(XmlElement):
  _qname = '{http://www.w3.org/2005/Atom}title'
  title_type = 'type'

Now for the whys and hows.

For the past few years I've been working with Web Services and most of them use XML to represent the data (though I hope JSON catches on more widely). There are some great XML libraries out there, and my library is based on one of them (ElementTree). XML parsing is certainly nothing new, so why create a new one?

The Why

There are a few limitations with the XML parsing approaches I've used in Python:

XML structure isn't documented or available using help()
No autocompete for finding elements in the XML
If the XML changes in a new version of the web service, my code needs to be rewritten
My code interacting with the XML is verbose

Source code can provide a wealth of information, but parsed XML doesn't have the same level of information richness as source code. Between tool tips in IDEs, auto-generated documentation, and autocomplete, having classes loaded for your XML models can bring the tree traversal logic closer to your fingertips. Many software development tools are optimized for working with predefined classes rather than generic XML objects.

However, one of the biggest drawbacks to representing each type of XML element with it's own class is that you end up needing to write lots of class definitions. For this reason I've tried to make the XML class definitions as compact as possible. Specifying a simple XML class only takes two lines of code. For each type of sub-element and each XML attribute, you can add one line of code. You don't need to declare all of the elements or attributes either. The XmlElement will preserve all of the XML which it parses. If there are class members which correspond to a specified sub-element, the element will be placed in that member. Any unspecified elements will be converted to XmlElement instances. You can search over all XML elements (both anticipated members and unanticipated generic objects) using the get_elements method. XML attributes are handled in a similar fashion and can be searched using get_attributes.

I've saved the most unique feature of this library for last: Sometimes web services change the XML definition thereby breaking your code. If it is something small like a change in XML namespace or changing a tag, it seems like such a waste to have to edit lines upon lines of code. To address this kind of problem, this XML library supports versioning. When you parse or generate XML, you can specify the version of the available rules that you'd like to use. You can use the same objects with any version of the web service.

To use versioning, write a class definition with tuples containing the version specific information:

class Control(XmlElement):
  _qname = ('{http://purl.org/atom/app#}control', #v1
            '{http://www.w3.org/2007/app}control') #v2
  draft = Draft
  uri = 'atomURI'
  lang = 'atomLanguageTag'
  tag = ('control_tag', 'tag') # v1, v2

class Draft(XmlElement):
  _qname = ('{http://purl.org/atom/app#}draft', 
            '{http://www.w3.org/2007/app}draft')

If you create an instance of the Control element like this:

c = Control(draft=Draft('yes'), tag='test')

Then you can generate XML for each version like this:

c.to_string(1)

returns

<control xmlns="http://purl.org/atom/app#" 
    control_tag="test">
  <draft>yes</draft>
</control>

while

c.to_string(2)

returns

<control xmlns="http://www.w3.org/2007/app" 
    tag="test">
  <draft>yes</draft>
</control>

Note the difference in XML namespaces in the above. I also added an example of an attribute name which changed between versions, though "tag" doesn't actually belong in AtomPub control (so don't go trying to use it m'kay).

Since this library is open source, you're free to examine how it works and use it however you like. Allow me to highlight a few key points.

The How

Earlier I showed how to define XML element classes which look for specific sub elements and attributes and convert them into member objects. I also mentioned that this XML library handles versioning, meaning that the same object can parse and produce different XML depending on a version parameter. Both of these are accomplished by creating class level rule sets which are built up using introspection the first time an XML conversion is attempted.

In pseudo-code it works like this.

XML --> object
  - find out the desired version
  - is there an entry for this version in _rule_set?
  - if not, look at all XML members of this class 
      in _members
  - create XML matching rules based on each member's type
      (and store in _rule_set so we don't need to generate 
       the rules again)
  - iterate over all sub-elements in the XML tree
  - sub-elements and attributes which are in the rule set 
      are converted into the declared type
  - sub-elements and attributes which don't fit a rule are
      stored in _other_elements or _other_attributes

When generating XML the process is similar but slightly different.

object --> XML
  - create an XML tree with the tag and namespace for this 
      object given the desired version
  - look at all members of this class in _members
  - tell each member to attach itself to the tree using 
      it's rules for the desired version
  - iterate through _other_elements and _other_attributes 
      and tell each to attach to the XML tree

Armed with the above explanation, understanding the source code should be a bit easier.

Wednesday, November 05, 2008

Vim Line Length

I happen to be a fan of the vi text editor. I also happen to be a fan of code style guidelines. And sometimes it is very handy to know if a line in your source code has gone over eighty characters long. Some style guides still recommend a max line length of eighty characters, a tradition dating all the way back to IBM punch cards, circa 1928. For an even more impressive feat of long term engineering influence look up how the size of Roman war chariots determined the size of NASA space shuttle booster rockets. But I digress.

To mark the characters which are over the eighty character limit, enter the following either within vim or in your .vimrc file.

:match ErrorMsg '\%>80v.\+'

I've noticed that some blogger templates tend to display fifty-five characters in a fixed width format (as in a <pre> block or <code>). To edit code which I plan to post on Blogger, I sometimes turn on

:match ErrorMsg '\%>55v.\+'

Saturday, October 25, 2008

Atom-Powered Robots Run Amok

I imagine a grand total of two of my readers will recognize the title of this post. (Hint: it's from RFC 5023.) But it seemed an appropriate title for a recent addition to a local office building where some Android developers tend to congregate.

(Note the tiny t-shirt logo.)

Ai luff mai Android.

Wednesday, October 15, 2008

Twitter Client

As a proof of concept for using the sippycode HTTP library which I wrote about in my last post, I decided to create a simple text console client for Twitter. Download the Twitter terminal application here.

Twitter's RESTful API is quite simple, and I wrote an open source library for Twitter based on the sippycode HTTP library in a few minutes. Here's an example of posting a new update (tweeting):

import sippycode.http.core as http_core
import sippycode.auth.core as auth_core

class TwitterClient(object):

  def __init__(self, username, password):
    self._credentials = auth_core.BasicAuth(username, 
                                            password)

  def update(self, message):
    request = http_core.HttpRequest(method='POST')
    http_core.parse_uri(
            'http://twitter.com/statuses/update.xml'
        ).modify_request(request)
    request.add_form_inputs({'status': message})
    self._credentials.modify_request(request)
    client = http_core.HttpClient()
    response = client.request(request)
    return response

In the above, the client sends an authenticated POST to the updates URL. Using the TwitterClient in your code looks like this:

client = TwitterClient('my-username', 'my-password')
client.update('Try out this Twitter client: http://oji.me/wP')

To try out this Twitter console app, unpack the download and run sippy_twitter.py. With it, you can update your status on Twitter or read the updates from your friends. When reading, the client displays five updates at a time, since showing more at once would likely cause some to scroll off the top of the screen (assuming the terminal displays twenty-five lines).

This simple application was designed to be a proof of concept, but it's really grown on me. Cycling through all of my friend's updates doesn't require any scrolling, and it feels snappier than the web interface. It seems like others are enjoying this terminal client too.

There are quite a few ways that this client could be improved, so there's plenty of opportunity to pitch in if you are interested. I have received feature requests from friends who previewed this app, such as: support command line arguments which will allow the client to perform updates when being run from another program, show a running countdown from 140 characters as you are typing your update (could probably be done using ncurses), ability to follow users, and read updates from just one user. If you'd like to participate in any of these, let me know in the comments.

Fire up your terminal and give this client a try. Why not post an update to @jscud right now?

Monday, October 13, 2008

An Open Source Python HTTP Client

At Super Happy Dev House 27, I made significant progress on an open source library for making HTTP requests in Python. For the past few years I've been working with web services and APIs (SOAP, REST (wikipedia) - specifically AtomPub, etc.) and I wanted to create an HTTP library which is simple, clean, and precise. Python has a couple of great HTTP libraries already, but one of them is a bit too low level (httplib) and the other is too high level (urllib2).

For example, in httplib you call a method to send data as if you are writing to a file (httplib uses sockets, after all). Required HTTP headers like Content-Length are not calculated for you. You'll need to handle cookies and redirects on your own. On the plus side, you get full control of what is being sent. The higher level library, urllib2, is built on top of httplib. It adds some handy abstractions, like calculating the Content-Length, but it also has some limitations. I haven't yet been able to figure out how to perform a PUT or DELETE with urllib2.

When making HTTP calls to web services, there are often a large number of HTTP headers, URL parameters, and components to the request. Making a request feels like making a function call in most HTTP libraries. In the past, I've wrapped these functions with successive layers containing more and more function parameters. For example, in a request to send a photo and metadata to PicasaWeb, you need to include an Authorization token, Content-Type specifying a MIME-multipart request and the multipart boundary, and a multipart payload consisting of the Atom XML describing the photo and the photo's binary data. If you add in the the ability to specify other headers and URL parameters, your function call might look like this:

def post_photo(url, url_parameters, escape_parameters, 
               photo_mime_type, photo_file_handle, 
               photo_file_size, metadata_xml, 
               metadata_mime_type, auth_token, 
               additional_http_headers)
...

# Sets the request's Host, port, and uri. 
# Makes the request into a MIME multipart request, 
# adjusts the Content-Type and recalculates 
# Content-Length.
# Sets the Authorization header
post_photo('http://picasaweb.google.com/data/'
           'feed/api/user/userID/albumid/albumID', None, 
           False, 'image/jpeg', photo_file, photo_size, 
           atom_xml, 'application/atom-xml', 
           client_login_token, None)

To use the above, you have to gather all of the information in one place, and make the function call. There are cases where you want a design like the above.

However, more and more I think of ways the program could be more cleanly structured if this information could be compartmentalized. This new library relies on an HttpRequest object which various parts of the program modify. Once all of the modifications have been applied, the fully constructed request is passed to an HttpClient which communicates with the server using httplib or urlfetch if you happen to be on Google App Engine. (Support for more HTTP libraries is certainly possible.)

The photo posting example from above could look something like this. Keep in mind that these steps could be carried out in a different order in different segments of code.

photo_post = HttpRequest(method='POST')
# Sets the Authorization header
client_login_token.modify_request(photo_post)
# Adds to the body and calculated Content-Length, 
# sets the Content-Type.
photo_post.add_body_part(atom_xml, 
    'application/atom+xml')
# Makes the request into a MIME multipart request, 
# adjusts the Content-Type and recalculates 
# Content-Length.
photo_post.add_body_part(photo_file, 'image/jpeg', 
    photo_size)
# Sets the request's Host, port, and uri. 
parse_uri('http://picasaweb.google.com/data/'
          'feed/api/user/userID/albumid/albumID'
          ).modify_request(photo_post)

In fact, the above code could make up the body of the post_photo function described in the first code snippet.

I created an open source project for this and other small projects called sippycode (a play on sippy cup). This is a place where code can grow up.

Sunday, September 28, 2008

In Praise of screen

I've used screen for a few years now, but I only recently learned about one of its highly helpful features. As I was using my XO laptop in text only mode (ctrl-alt-fn-2), I was using screen to simulate multiple terminals, and I needed to copy and paste text between them. In the past, I've usually used screen when ssh-ing into a machine, and putty (my ssh client of choice) provided copy and paste, so I had never needed screen's system.

It turns out, screen's built-in, cross-window copy-paste system is a breeze to use. Press ctrl-a [ to enter copy mode, press enter to mark the start point and enter again to mark the end point. You have now copied the text. Switch to the desired window, and paste in the text using ctrl-a ].

This feature is especially handy in the XO laptop, where I've never been able to figure out how to copy and paste in the Terminal Activity.

For me, screen's most useful feature has been the ability to detach and reattach to a session which continues to run on the server. If I lose my ssh connection, all of my processes continue to run, and I can reattach to my screen session as if nothing ever happened.

Screen can be difficult to understand if you've never seen it in action, so I recommend watching this video. You can also learn more in this tutorial.

Thursday, September 18, 2008

XO Give One Get One - Round Two

As I was looking for some information on running the XO laptop in text mode, I stumbled across this page in the OLPC wiki:

One Laptop per Child is launching its second 'Give One, Get One' (G1,G1) program starting in November, 2008, delivering its XO Laptop globally via Amazon.com. Although the first iteration of the 'G1G1' program was extremely successful and sold more than 185,000 laptops, the delivery of the laptops in the USA did not run as smoothly as we anticipated. Selling the laptops on Amazon.com will provide us with the resources to process and ship the laptops globally in a timely fashion.

The laptop's operating system will be Linux-based (it will not dual-boot Windows and Linux, contrary to some reports).

If you've been thinking about getting one of these little green machines, it looks like the window will open once again. I wrote about the program last year. (Here is a list of all posts I've made on the XO Laptop.)

As for running in text mode, I've settled for using the terminal by pressing ctrl-fn-alt-1. To switch back to the normal view. press ctrl-fn-alt-3. For some reason, I can only start screen as root.

Sunday, September 14, 2008

The Ascention of JavaScript

I added a couple of new things to my little JavaScript library (q12). It turns out there was a flaw in the library when making HTTP requests which included custom HTTP headers in Firefox3 (and maybe others). If you ever run into (NS_ERROR_FAILURE) [nsIXMLHttpRequest.setRequestHeader], try opening the HTTP request first (as I recently learned). I've always said that writing this library was for education more than any anticipated serious use.

I also added a function for ARC4 encryption, which doesn't provide strong security, but it is still used in some places.

This has been quite a big month for JavaScript. In just a couple of weeks V8, Tracemonkey, and SquirrelFish Extreme have leapforgged one another and improved the state of JavaScript execution speed. It's gotten me thinking.

The more I use JavaScript, the more I like it. The language itself is extremely simple, I would argue even simpler than C - it's the APIs in browsers which are sometimes complex and inconsistent ;-) With all of these optimizations, JavaScript is looking better and better as a language in which to implement a new dynamic language.

Between the Just In Time compiler (JIT) and the polymorphic inline cache, some state of the art tools are coming to bear on one of the most widely available programming languages today. JavaScript is making strides to overcome a very difficult problem: maintaining complete runtime flexibility while providing speedy execution. There are significant investments being made in JavaScript as a language, and it seems that some of these open source components should be reusable. I'm not sure how much of these libraries would stand alone, but it would be a very interesting experiment to see if a new language could be implemented in JavaScript.

Sunday, September 07, 2008

A Revived Project

q12 is back!

I'm slowly starting back up again on my note taking wiki application. I created my own version of TiddlyWiki several months ago, but then started working on other projects. I'm planning to rewrite my note taking Ajax application to run on Google App Engine, and as I was getting started I realized that there were a few things missing from the Ajax library that I had written as part of this project.

I had created my own simple unit test framework in JavaScript, and I finally got around to uploading the unit tests for the library to the open source project. I've also been learning about manipulating browser cookies from within JavaScript. Aside: Cookie's in JavaScript are weird! When you say document.cookie = something, reading document.cookie doesn't give you the same thing back (the expiration, domain, and path information are squirreled away somewhere else).

I've also added a minified version of the q12 library, it weighs in at a mere 10k. Download the library today!

Wednesday, August 20, 2008

Dirt Simple CMS

I recently created an App Engine app to run www.jeffscudder.com. At the moment the code is extremely simple, and I get so few visitors to that web page that I doubt I will need anything complicated.

When I write blog posts and web pages, I have always preferred to just edit the HTML, and I have always wanted a simple content management system that just let me edit the HTML, JavaScript, CSS, ect. in the browser. Blogger comes awfully close to the perfect tool in my opinion, but it is geared towards displaying a series of posts. I wanted a landing page with links to all of the other content I put out there in the blagoweb. And I wanted to be able to host the simple web app's that I write (like the recently mentioned password generator).

With those design goals in mind, I set out to create my super simple content management system. It runs on App Engine, and the admin (me) is able to sign in to a special secret /content_manager page which lets me assign a specific blob of text to the desired URL under my domain. I can also set some basic metadata, like the content type (so that your browser knows how the content should be rendered) and cache control information, since HTTP caching is excellent and saves puppies from drowning in lakes (ok seriously it will alleviate congestion and unnecessary traffic when you want to give the same content to thousands or millions of people).

Editing pages through the /content_manager looks like this:

I've also decided to open source the code and I called the project scud-cms. Since App Engine is free for you to sign up, you can just upload this code and start setting your own content from right there in the browser.

(P.S. The idea for this simple content manager is very similar to one of my earlier projects: Scorpion Server, with which an authorized user could set the content at just about any URL they wanted.)

Tuesday, August 12, 2008

Opera on the XO Laptop

No I'm not talking about singing.

I was looking forward to the release of Firefox 3 with great anticipation. I had tried out the beta versions and noticed the improvements in JavaScript execution speed and memory footprint. Since the XO laptop is relatively slow and memory constrained compared to many of the other computers that I work with, I was really hoping that Firefox 3 would be a big improvement over Firefox 2. It turns out, for the XO, it wasn't. The place where the limitations were most apparent, was scrolling on a large web page. The page would hang for a second or so and then slowly creep down. It probably had to do with the short supply of available RAM on my happy little green machine.

As momentum was building for Firefox 3, I started to hear good things about Opera 9.51. When I ran into performance issues on Firefox and the XO, I decided to give Opera a try. Some thought I was crazy since, as Arne mentioned, the XO can be a bit slow. Yes, the XO laptop is a different environment than most other computers out there, and Opera has turned out to be an improvement.

To set up Opera on the XO, I began by downloading from the Opera web site. I selected Opera 9.51 for Linux i386, selected RedHat as the distribution in the drop down menu, then I selected RedHat 6.2, 7.1, 7.2, 7.3, 8.0, 9 and checked the box to Download this package in TAR.GZ format. Whew.

Next, unpack the archive using

tar zxvf opera-9.51.gcc4-static-qt3.i386.tar.gz

and then run

install.sh -s

From there, you can just type opera on the command line in the terminal to start it up.

Monday, August 04, 2008

Speedy Dynamic Member Lookup in C (part 2)

The comments on my previous post were highly enlightening. Joe pointed me to an article on the Polymorphic Inline Cache, and it's use in the SELF programming language. And Joel mentioned some different techniques used in other languages (Objective C, LISP, C++, and others) to implement dynamic lookups. With this new information, I can see that my simple solution might fall far short of the optimal solution, which isn't all that surprising as I'm certain more time has been spent on these real programming languages. But what I do think is interesting, is the idea that a more complex solution can give better performance than a simpler design. It is as if there is a valley, with the simple solution of static, precompiled behavior on a peak of performance, standing opposite a complex design which introduces overhead of collecting statistics on the most common paths through the code and constantly recompiles to machine code which optimizes the most used idioms. In between lies the poorly performing trough of the half solution. Solutions which provide dynamic behavior, but don't go far enough to mitigate the added cost of runtime lookups. The simple idea that I've coded up sits in this trough of slowness, but I'd wager it is faster than some solutions I've seen out there.

This isn't really a great solution, but I had already started, and wanted to finish because I think it was a good learning experience. We begin with a definition of a data structure that points to an offset lookup method.

typedef struct {
  size_t (*memberLookup)(char*);
} TypedStruct;

typedef struct {
  size_t (*memberLookup)(char*);
  unsigned int age;
  unsigned int weight;
  char* name;
} Person;

size_t PersonLookup(char*);

typedef struct {
  size_t (*memberLookup)(char*);
  char* breed;
  unsigned int weight;
  unsigned int age;
  char* name;
} Dog;

size_t DogLookup(char*);

You can think of Person and Dog as subclasses of a common parent. They share a few member attributes, but the attributes are in different places within the structure. I want to write a function which can find the age member in whatever structure is passed in to it. How do we find out where a member lies within a structure? With this design, we ask the structure to tell us. The memberLookup function takes a name, and returns the offset for the desired member. Here are the lookup functions that describe the above structures:

#include"typed_struct.h"
#include<stdio.h>

size_t PersonLookup(char* memberName) {
  if(strcmp(memberName, "age") == 0) {
    return sizeof(size_t);
  } else if(strcmp(memberName, "weight") == 0) {
    return sizeof(int) + sizeof(size_t);
  } else if(strcmp(memberName, "name") == 0) {
    return 2*sizeof(int) + sizeof(size_t);
  } else {
    return -1; 
  }
}

size_t DogLookup(char* memberName) {
  if(strcmp(memberName, "breed") == 0) {
    return sizeof(size_t);
  } else if(strcmp(memberName, "weight") == 0) {
    return sizeof(size_t) + sizeof(char*);
  } else if(strcmp(memberName, "age") == 0) {
    return sizeof(size_t) + sizeof(char*) + sizeof(int);
  } else if(strcmp(memberName, "name") == 0) {
    return sizeof(size_t) + sizeof(char*) + 2*sizeof(int);
  } else {
    return -1; 
  }
}

With the data structures in place and the functions which map member names to locations, we can now write functions which dynamically lookup the location of a member. This simple function displays the age of the entity which is passed in. The static variables are designed to speed up repeated calls with the same type. For example, if a Person is passed in as entity several times in a row, the location of the age member will only need to be looked up on the first call.

void GiveIntroduction(void* entity) {
  static size_t age_offset = 0;
  static size_t (*last_lookup)(char*) = NULL;
  
  if(last_lookup != ((TypedStruct*)entity)->memberLookup) {
    // In words, convert entity to a TypedStruct pointer
    // (TypedStruct*)entity
    // Find the function pointer called memberLookup
    // ((TypedStruct*)entity)->memberLookup
    // Call the function at memberLookup with "age" as the parameter
    // (*(((TypedStruct*)entity)->memberLookup))("age")
    age_offset = (*(((TypedStruct*)entity)->memberLookup))("age");
    last_lookup = ((TypedStruct*)entity)->memberLookup;
  }

  printf("I am %i years old\n", *((int*)(entity + age_offset)));
}

Here is a program which uses the above structures and functions.

int main() {
  Dog nelly;
  nelly.memberLookup = &DogLookup;
  nelly.age = 4;
  nelly.name = "Nelly";
  
  GiveIntroduction(&nelly);
}

When the above is run, it should display "I am 4 years old".

Now, what is the point of all of this? In statically typed, compiled languages, the compiler usually has to know in advance where the data is and what it's type is. For example, with the Dog structure, the compiler knows that age is 12 bytes from the start of the structure (on most 32-bit processors), and that it's type is int. All of the offsets are calculated at compile time. By moving the offset lookup into a function which is executed at runtime, we free up the compiler from having to know where to look for a particular member. Now we can pass in any data type into GiveIntroduction and as long as the object's memberLookup function tells us where to find the age member, everything will work.

We've gained flexibility at the cost of some overhead. Setting the memberLookup on every instance of a TypedStruct uses a bit of memory, and calling the function to find the necessary offset adds a runtime cost. This example has been a bit simple, even too simple, as there are ways to provide more aggressive caching of the lookups and faster ways to perform the lookup within the XLookup functions.

Friday, August 01, 2008

Speedy Dynamic Member Lookup in C (part 1)

If my senior class had voted for a "Most likely to try to invent a new programming language", I probably would have at least gotten a nomination for the category. From time to time I think about how I could design language constructs that maintain efficiency while allowing for more flexibility than statically linked, early bound languages like C. Some of my previous posts have been along these lines. And I've recently been thinking about performing member lookups on an object instance and how I could write a more flexible object system in C.

C has a concept of structs and types, and allows you to define a data structure, which is compiled. Here's an example:

typedef struct {
  uint16_t age;
  uint16_t weight;
  wchar* name;
} Person;

Now when you say Person fred, your program will grab a chunk of memory (likely from the stack) that is 8 bytes, or 12 bytes, or something similar depending on the compiler, size of pointers, ect. Now if I write fred.age in my program, this is translated to "the value of the two bytes at the start of fred's address" and if I ask for fred.weight, that turns into "the value of two bytes which are at exactly fred's address + 2". Since C is early bound (AKA static), the compiler knows that weight is always found in the same location relative to the start of fred, and as a result, the machine code generated by the compiler can be very fast and efficient.

The place where this optimization breaks down, is when you have a different structure but you want to be able to use a function written for one type of object with a different object. Say for example, we had a different structure that also used age and weight:

typedef struct {
  wchar* breed;
  uint16_t weight;
  uint16_t age;
  wchar* name;
} Dog;

Now if I want to reuse the code I wrote earlier to find out the age of a Dog, I have a small problem. You see, the age is in a different location for a Dog than a Person. If it had been in the same position, I could have cast the Dog to a Person, and then my compiled code would be looking for age at the same location. But for one reason or another, the layout of Dog is different.

In many programming languages, the difference in layout would present no problem, because the offsets aren't set in stone at compile time. Instead there is a dynamic lookup which says "where can I find 'age' for the type of data this is?" and we can add the same functionality here in C. Often this lookup is done using some kind of associative array, but I have a few slightly different ideas in mind.

C gets around this potential limitation by requiring you to specify the type of the each chunk of data. So for example, if you have a Person fred and a Dog nelly, then age is converted into a different offset from the start of the data. However this means that you would need to write the same function multiple times, once for Person and once for Dog.

Instead of rewriting a function for each different type, I'd like to use the same function for all types which have members which match up. In other words, as long as a type has a member named age of type uint16 and member string called name. It should be possible to create a C function which behaves a bit more like dynamic languages like JavaScript and Python.

In order to accomplish a dynamic member lookup, I'll add one thing to the types I've defined above. The first member will always be a pointer to something that allows the location of a named object to be determined. This could be a function that takes a name and produces an offset, it could be a hash table that maps names to offsets, it could point to a binary tree, or something else.

typedef struct {
  void* type;
  uint16_t age;
  uint16_t weight;
  wchar* name;
} Person;

typedef struct {
  void* type;
  wchar* breed;
  uint16_t weight;
  uint16_t age;
  wchar* name;
} Dog;

Now that the type information is stored along with the data, I could write a PrintAge function that would lookup the correct location of the age member. Here's some pseudocode for how this more dynamic code would look:

function PrintAge()
  // Note that this age offset is stored between function calls so 
  // that if the function is called with the same type as in the most recent call, 
  // the offset will not have to be looked up again. This should improve 
  // performance by avoiding the need to perform a lookup in some cases.
  static long age_offset = -1
  static void* most_recent_type
  if most_recent_type != current_type || age_offset == -1
    age_offset = lookup(age)
  printf("age is *(person+age_offset)")

I haven't decided yet on the best way to lookup the offset based on the member name. I think it might be a close call between pointing to a lookup function with if statements, and a lookup in a presorted binary tree. Depending on the number of members, I could see it going either way. What do you think, what is the best way to implement a dynamic member lookup?

Wednesday, July 23, 2008

Password Generator part 2

In a previous post, I mentioned that I had designed a simple password generator for use with the myriad of websites on which I have accounts. Rather that store passwords and carry them around with me, I've decided to carry around the code for a password generator in my head, so that I can generate difficult-to-guess passwords using a few easy to remember pieces of information. Namely these are a reusable master password, my username on the site, and the domain name of the site (like twitter.com for example).

The generator algorithm takes these three pieces of information as strings of text (ASCII characters to be exact), and uses them to populate three pseudo-random data streams (I used ARC4 for the pseudo-random algorithm because it is easy to memorize). These three streams are combined to create the characters in the password. For more details, see the list of steps in the first post about the password generator, or better yet, take a look at the source code.

I have uploaded the password generator here, so if you'd like to use it, feel free.

In order to account for some websites which do not allow special characters or passwords that are thirty characters long, I created a settings file which has special rules for some websites. If there are any websites that you would like me to add to the settings, please let me know in the comments.

Now for a disclaimer: I'm not entirely sure that these passwords would stand up to cryptanalysis. It might be possible to figure out the inputs (the three secrets). So I recommend just using it for websites which are not too sensitive. I'm just using it on social news websites at the moment.

Someday this might all be made unnecessary through the use of OpenID or some other authentication solution. I'm looking forward to it.

Saturday, June 28, 2008

Claire is here

Our daughter was born June 23rd at 2:29 AM and weighed in at a whopping 9 pounds, 3 ounces. She was 20.25 inches long, so it seems like Trevor was closest with his guess. She's absolutely wonderful. For now, we're quite busy keeping up with her, so I don't anticipate much writing in the near future.

The first picture (Claire sleeping) was taken by my good friend Matt.

Tuesday, June 10, 2008

Simple Password Generator

My most recent little side project is a web page for generating passwords. It's a very simple page, which takes a few inputs, generates a pseudo-random stream from them, and produces a password using printable characters. Now now, hold it down. I can hear you from way over here. "But Scudder, aren't there like 40,000 password generators out there? Won't a web page be much slower at complex calculations than an installed desktop application? Wouldn't a truly random password stored in an encrypted keystore be a more secure way to handle passwords?" Well yes that all tends to be true. The idea here though, is to be able to regenrate passwords that I use non-critical accounts (you know, for sites like digg.com and reddit.com) on whichever computer I happen to be using. In fact, even if this password generator web app is for some reason unavailable, I made sure that it is simple enough that it all fits in my head. In a pinch, I could rewrite it from memory.

So here's how this little app generates passwords:

Ask for a username, password, and domain (like example.com).
Generate an RC4 stream for each of the three inputs above.
Throw away the first 1,000 bytes from the three streams.
Get three sets of the desired number of pseudo random bytes, one for each key stream. And xor each group of three bytes together to get single pseudo-random values. For example, if you want a 15 character password, get 15 bytes from each of the three keystreams.
Translate the bytes into printable characters, or alphanumeric characters, by taking the modulus of the pseudo-random byte and using it as the index to looking up a character in a table.

A couple of the items above need some clarification. First is RC4. RC4 is a well known algorithm for generating a pseudo-random stream of bytes. It's not really up to snuff in terms of high security these days, but it is really simple. So simple in fact, that it is easy to remember the steps, and therefore, I chose it for this all-in-my-head password generator.

The second detail you would need to know in order to reproduce this generator, are the tables that I use to translate pseudo-random bytes into characters. I created this table, by typing in characters beginning on the first row of a US keyboard then I repeated the first row while holding the shift key. For example, the table begins with: `1234567890-=~!@#$%^&*()_+qwertyuiop[]QWERTYUIOP{}...

It turns out that some websites don't allow special characters like % and & in passwords. Why they don't is beyond me. So I created a table using the same method as above, but it only contains alphanumeric characters. Like this 1234567890qwertyuiopQWERTYUIOP... For websites which don't allow special characters, you can choose to use the alphanumeric table to translate the pseudo-random bytes into a password which is usable on the website.

I must sign off for now, so stay tuned for part two.

Saturday, May 31, 2008

Upcoming Arrival

I imagine most of my regular readers have already found this out through other channels, but we're expecting. (Really it is Vanessa who is expecting, I have never really figured out why people say "we".) The bouncing baby bundle should be arriving in the very near future, so we've decide to guess the delivery date, weight, and length. The due date is June 22 and the doctor will usually try to induce labor if the baby hasn't arrived a week after the due date. Please write your own guess in the comments.

Here are the guesses we have so far:

Jeff
- date: June 17
- weight: 7 pounds 8 ounces
- length: 21 inches

Vanessa
- date: June 15
- weight: 7 pounds 7 ounces
- length: 21 inches

My mom's
- date: June 18
- weight: 7 pounds 6 ounces
- length: 21 inches

Vanessa's mom
- date: June 15
- weight: 6 pounds 15 ounces
- length: 19.5 inches

Vanessa's dad
- date: June 18
- weight: 8 pounds 6 ounces
- length: 20 inches

Sister
- date: June 15
- weight: 7 pounds 4 ounces
- length: 22 inches

The person who guesses the closest may receive a major award.

Tuesday, May 20, 2008

The Big Remodel

Over the past year, we've been remodeling our first house. It has been a very big job encompassing every room, and we did most of the work ourselves with help from family and friends. We hired out for some of the more complicated tasks, and we could not have done this without the help of many people (thank you!). I have a few hundred pictures of the entire remodeling process, so I thought it would make sense to show just a few pictures from the living room.

We began by removing all of the carpet in the house. In this first picture. you can see the front door as Vanessa is cleaning the floor. The rest of the pictures are taken from the front door (on the right) looking towards the back door (to the left).

When you first walked in the front door, you used to see this big structure which contained a couple of closets and was made of wood paneling. It created a short hallway which you would walk through to get to the living room from the front entry. We decided to remove this completely to create one great room.

After a significant amount of demolition, the closets are gone along with all of the wood paneling. Now you can see the old sliding glass doors in the back of the house from the front door. We replaced these with french doors. The water bottles are marking large nails embedded in the cement which we later cut out with a circular saw. (fun!)

Fast-forward ahead a few months, and we have hardwood floors, new drywall, paint, and we're moved in.

The master bedroom was actually the first room we finished, followed by the kitchen, then this, the living and dining room. I'll probably post more pictures at some point in the future. There are still a few details left to do, but the big remodel is basically finished and it feels very very good to be so close to completely done.

Sunday, May 04, 2008

Not programming

So many posts related to programming recently. Time to change it up.

A Haiku

Tough to be funny,
because sometimes you fall flat.
Still it's worth trying.

A Fibo

Plain.
Bland.
Simple.
Serious.
Stay away from jokes.
Impress others with knowledge,
but sometimes it's dull.
Spice it up!
Let loose!
Try
puns.

A Joke

What do you call a joke that stinks?
Pungent.

A Better Joke

What's another name for a beach bum?
A tangent.

An Even Better Joke

What is equivalent to sine divided by cosine?
A beach bum.

See what I mean about falling flat ;-)

Wednesday, April 30, 2008

Creating and Using .so Files With gcc

And what is an .so file you might ask? Those from a Windows background might know them as .dll files, but that still doesn't answer the question.

Normally, when you compile a program, the system takes all of the pieces of code that you've written, converts them into something the computer can understand (machine code) and glues them together. You might compare the process to building a pyramid. All of the stones that you've sculpted are joined together and stacked up. Just like with a pyramid, if you want to change one of the stones, everything that sits on top of it needs to be adjusted. With a C program, making changes in a piece of code means lots of pieces need to be recompiled and reglued together.

If you use dynamic linking, shared libraries, whatever you'd like to call them, things work a bit differently. With dynamic linking, the pieces are not all glued together or stacked up, instead they are plugged in when they are needed. As a result, changes to a piece of code don't require that lots of other pieces get reshaped (recompiled) to fit. With dynamic linking, it is possible to change a piece of code while the program is running. Funny, that sounds a bit like a scripting language.

Here's an example. Let's say I wrote a library with a function called SpecialPrint. (It's like printf but different!) Here is the code for speciallib.c:

#include<stdio.h>

void SpecialPrint(char* payload) {
  printf("Special Print:%s\n", payload);
}

Now I write a program that uses this library. Here's the normal, non-dynamic, static linking, built like a pyramid way of using the library.

#include<stdio.h>
#include"speciallib.h"

int main() {
  printf("normal printf argument\n");
  SpecialPrint("special print's argument");
}

With the above code, I compile the SpecialPrint library with my main code and link them together, producing a single executable file. However, I can use the same speciallib file with code that loads the library dynamically while the program is running. The code might look something like this:

#include<stdio.h>
#include<dlfcn.h>

void ShowError() {
  char *dlError = dlerror();
  if(dlError) printf("Error with dl function: %s\n", dlError);
}

int main() {
  void *SharedObjectFile;
  void (*SpecialPrintFunction)(char*);

  // Load the shared libary;
  SharedObjectFile = dlopen("./speciallib.so", RTLD_LAZY);
  ShowError();
  // Obtain the address of a function in the shared library.
  SpecialPrintFunction = dlsym(SharedObjectFile, "SpecialPrint");
  ShowError();

  printf("normal printf argument\n");
  // Use the dynamically loaded function.
  (*SpecialPrintFunction)("special print's argument");

  dlclose(SharedObjectFile);
  ShowError();
}

In the above, the show error function is optional, I added it so that any errors that occur would be displayed. When you compile the above code, you'll need to make the speciallib into an .so file and make sure that the speciallib's directory (the current directory in my example) is listed as one of the places that we should look for shared object files. Here's what the steps to compile look like:

export LD_LIBRARY_PATH=`pwd`
gcc -c -fpic speciallib.c
gcc -shared -lc -o speciallib.so speciallib.o
gcc main.c -ldl

For an explanation of what the above compiler options mean, and further explanation on .so files, see these two IBM articles on using shared object files on Solaris and linux.

Learning about .so files has been very interesting to me, because they bring a new level of flexibility to a traditionally rigid environment like C.

Tuesday, April 22, 2008

Early vs Late Binding

I've been thinking recently about programming languages (surprised?), specifically about the things that make them different. One of the really nice things about C, is that it compiles into machine code which tends to run lean and mean. By that I mean it is blazing fast and doesn't take up much memory. On the other hand, programming in Python and JavaScript has really been growing on me. There is so much flexibility to create elegant solutions quickly and without rewriting lots of existing code. In fact, I'd say greater ability to reuse existing code is a natural outgrowth of programming language flexibility.

So where does this flexibility come from? One place I tend to notice it most, is in the ability to give an existing function a new body, in other words, you can plug in different behavior in place of the default.

Here's a simple example to illustrate the idea. Let's say that we created a simple checkout register which takes a receipt, adds the sales tax, and spits out the grand total. Here's our code foundation in both Python and JavaScript (these two examples do essentially the same thing):

Python:

def CalculateTax(amount):
  return amount * 0.18

class Receipt(object):
  
  def __init__(self, items=None):
    self.items = items or []
  
  def CalculateTotal(self):
    return sum([item + CalculateTax(item) for item in self.items])

JavaScript:

function calculateTax(amount) {
  return amount * 0.18;
}

function Receipt(items) {
  if (items) {
    this.items = items;
  } else {
    this.items = new Array();
  }
}

Receipt.prototype.calculateTotal = function() {
  var total = 0;
  for (var i = 0; i < this.items.length; i++) {
    total += this.items[i] + calculateTax(this.items[i]);
  }
  return total;
}

To use the above code, you might write something like this:

Python:

my_order = Receipt([5.50, 10, 7.89])
print my_order.CalculateTotal()

JavaScript:

var myOrder = new Receipt([5.50, 10, 7.89]);
alert(myOrder.calculateTotal());

Now let's say someone asks you to change the tax rate which is used when calculating the total. Here's the catch, you're not allowed to change the existing code. It turns out this is actually really easy. You can define a new function, then make an existing function name point to the new function. Here's an example of how to inject our new code:

Python:

def CalculateHigherTax(amount):
  return amount * 0.25

CalculateTax = CalculateHigherTax

print my_order.CalculateTotal()

JavaScript:

function calculateHigherTax(amount) {
  return amount * 0.25;
}

calculateTax = calculateHigherTax;

alert(myOrder.calculateTotal());

After adding the above code to the foundation we started with, you will notice that the calculate total method now uses calculate-higher-tax instead of the original function, even though you are calling the same method on the same object as before. Congratulations, you have just witnessed late binding in action.

So what is late binding? The idea is that the computer decides which code should be executed while the program is running. This seems normal in scripting languages, but compiled languages often use this too (I'm looking at you Java and C++). For example, overloaded methods and polymorphism take advantage of late binding. With late binding you can change the meaning of an identifier (for example, change the behavior when you call a specific function) at just about any time.

Now lets take a look at a language which uses early binding. C is a great example. With early binding, the meaning of things like function names are locked in when the code is compiled. There is no dynamic lookup while the program is running to see which code should be executed, instead the address of the desired code is embedded directly into the binary machine code.

Here is how the same calculate-total example might look in C:

#include<stdio.h>

float CalculateTax(float amount) {
  return amount * 0.18;
}

typedef struct {
  float* items;
  int num_items;
} Receipt;

float CalculateTotal(Receipt this_order) {
  int i;
  float total = 0;
  for(i = 0; i < this_order.num_items; i++) {
    total += this_order.items[i] + CalculateTax(this_order.items[i]);
  }
  return total;
}

int main(void) {
  Receipt my_order;
  float my_items[3] = {5.50, 10, 7.89};
  my_order.items = my_items;
  my_order.num_items = 3;
  printf("%f\n", CalculateTotal(my_order));
}

If you try to set CalculateTax to a new function definition, you will get an error at compile time because a function cannot be changed once it is bound. Early binding tends to produce more efficient programs. However, if you want to, you can still use the flexiblity available in late binding in C.

Using function pointers, you can store the address of the code that you want to be executed, and change the address while the program is running. We can achieve the same late binding effects that I've illustrated in Python and JavaScript by making some small changes to the C code (marked in bold below). Declare a function pointer named TaxCalculator which will store the address of the desired calculate-tax function, then change CalculateTotal so that it uses the TaxCalculator instead of directly calling a calculate-tax function.

#include<stdio.h>

float CalculateTax(float amount) {
  return amount * 0.18;
}

float CalculateHigherTax(float amount) {
  return amount * 0.25;
}

typedef struct {
  float* items;
  int num_items;
} Receipt;

float (*TaxCalculator)(float) = &CalculateTax;

float CalculateTotal(Receipt this_order) {
  int i;
  float total = 0;
  for(i = 0; i < this_order.num_items; i++) {
    total += this_order.items[i] + (*TaxCalculator)(this_order.items[i]);
  }
  return total;
}

int main(void) {
  Receipt my_order;
  float my_items[3] = {5.50, 10, 7.89};
  my_order.items = my_items;
  my_order.num_items = 3;
  printf("%f\n", CalculateTotal(my_order));
  TaxCalculator = &CalculateHigherTax;
  printf("%f\n", CalculateTotal(my_order));
}

There you have it!

Here's another way to think about this comparison. In high level languages which don't expose pointers, functions, variables, and other identifiers actually act like pointers.

Monday, April 21, 2008

Ubuntu Hardy Heron

Last week, I downloaded the beta release of Ubuntu 8.04 (Hardy Heron) to give it a try. I've been meaning to migrate our last Windows XP machine over to Linux for some time now (the other three computers I use regularly are Linux machines), but I've been reluctant to backup, repartition, and take the plunge. It seems like this is a common feeling among computer owners, but I think the Ubuntu community may have found an effective solution.

And the name of this new innovation: Wubi. Pop in the Ubuntu CD while running in Windows, and an auto-run installer opens which allows you to install Linux alongside Windows. When you reboot your computer, you see a menu of which operating system to boot into: Windows or Ubuntu. This means you can try out Ubuntu on your computer with zero risk to your existing files. In fact, you can access the files on your Windows installation from within Ubuntu. And if you decide Ubuntu is not your you, you can uninstall it as you would any other Windows program. Pretty slick.

Managing in the installation is just the beginning of the improvements the team has made. After I installed the latest version and booted into Ubuntu, it told me that there were propriety drivers for my NVIDIA graphics card and asked if I wanted to install them. I clicked the button, the download started, and I was up and running with 3D accelerated graphical desktop effects. I think I could sit there opening and closing windows all day. I'm looking forward to the upcoming official release.

Monday, April 14, 2008

A Musical Interlude... and back to Programming

I've been listening to Daft Punk and Justice quite a bit recently. Apparently I'm on a techno kick again. I've never found any electronic music that I've enjoyed as much as Joy Electric's The White Songbook. The purity of the tones and style has made it one of my all time favorite albums.

This got me thinking about a project which I thought of years ago, started, then abandoned. It was a music synthesizer/sequencer which you would program, by well, programming. I mean that the music would be controlled exclusively through a programming language. This would alter the creative process in several ways. Most music sequencers are graphical and allow you to lay out musical patterns in sequence. Writing a program is extremely non-linear, with classes, functions, and variables being defined in the code long before they are used. In this hypothetical sythesizer language. a composition might look something like this:


sequence "intro":
  playSample("beat", start=0:32.1, end=0:33.5, beats=[1,3,9,11,15])
  playSample("moog", beats=[5,7,11])
  shiftPitch(start=A4, end=C4, duration=bars(8))

sequence "solo":
  playSample("guitarRiff", start=1:15.3, end=2:09.0)

tempo 150 BPM
play("intro", now)
play("solo", end("intro"))
play("solo" now()+bars(5))

In the above example, the first play statement will be executed, then the third play statement (5 bars into the 8 bar intro), then the second solo will play again, probably before the first solo finishes. There are other interesting features in the pseudo-code above, but the fact that these sequences are played out of order was what I really want to highlight.

I've done live coding at several events over the years and I tend to have fun with it. It doesn't always go exactly as planned, but that is the whole idea. Live coding turns programming into a performance piece. I imagine there is a niche group of people who could really get into the live coding music scene.

Monday, March 24, 2008

XO Laptop

My XO laptop arrived in the mail recently and it is quite an amazing little machine. Conclusion up front: I'm extremely satisfied with it and in some ways this laptop computer is better than ones that sell for ten times the price.

You might recall from a previous post that I had downloaded the XO's operating system and taken it for a test drive in an emulator. Now I have the real thing in front of me, and it's safe to say that it is even better. After all, some of the most innovative features of this computer are in the hardware. My favorite feature is the screen. It is viewable in direct sunlight which makes it usable outdoors. Second up would be the wireless networking. The graphical network selector is fun to use and the connection tends to be more reliable than any of the other computers I've used with my home wireless router. The battery life is also impressive, easily five hours on a charge of its small battery. It even has a built-in camera and microphone.

It runs all of the software I need too. I used the instructions I wrote up when I installed firefox on the emulated operating system. Everything went smoothly and I was browsing the web using firefox in a few minutes (The XO laptop comes with a perfectly good web browser, but I wanted to use my favorite plugins and have more control over downloads).

I'm quite taken with the little machine. I've been using it as my primary computer at home, using it for all of the tasks I normally do (mostly browsing the web and programming). There are a couple of things that I would change if I had the chance. The first is the keyboard. It is an interesting design, made of a flexible rubber-like substance, and it works much better than other flexible keyboard that I've tried, but it took a while to get used to the shift key (I have to press in the corner of the key). The other difficulty is presented by the slower processor, but it doesn't get in my way most of the time. The only time I notice any slowness is when playing flash videos (like on YouTube). Perhaps part of the problem is flash for Linux, but I'm not sure. In any case, I don't really mind as I don't watch that much video, and if I want to, I have other computers that I can use.

It will probably come as no surprise that I wrote this post using the little green computer. I'm saddened by the end of the "give one get one" program, as I think there is still the opportunity for more people to donate and receive their own XO. If anyone is interested, it might be possible to order a batch of one hundred or more through the "give many" program.

Tuesday, March 11, 2008

BusyList

Andy and I started work on a simple little open source project for tracking tasks; it's called busylist. We wanted to experiment with Ajax, Python, and web service APIs, so we whipped up a basic system in a few hours. There is still quite a bit of work to be done, but it has been a great learning experience so far. An extremely alpha test version is available in subversion along with some instructions on the project's wiki pages. If you're interested, feel free to check it out (pun intended) and contribute if you like. It is an open source project after all.

Tuesday, March 04, 2008

(Portable) Ubuntu for Programmers

I've been trying over the past several weeks to find the best fit for Linux on a USB pen drive so that I can boot into my own operating system and get to my files no matter which computer I'm using. As you might notice from my other posts, I tend to spend quite a bit of my computer time in programming and browsing the Web, so the things I'm most interested in are a web browser (Firefox), support for wireless cards in several computers, and a variety of command line programming tools (gcc, python, vim, etc.). It should be possible to take one of the standard Linux distributions and install it on a USB drive (provided the drive is large enough), but I wanted to use a one gigabyte drive that I had, and with my simple needs I should really be able to get all of the necessities in under one gig. Along the way I've tried Puppy Linux, Slax, Feather Linux, DSL, and others, but I decided in the end to roll my own solution based on Ubuntu.

I'm a big fan of Ubuntu, but the standard desktop install is far too large for installation on a one gig drive. For a while I was using the live CD booting from a pen drive with a partition for my files. I used the instructions I found on Pen Drive Linux to set up the pen drive with the image from the live CD (only 750 megabytes). The only problem with this set-up was that all of my files were in a seperate partition and my home directory was wiped out each time. Since many Linux programs store settings in your home directory, this turned out to be a bit incovenient. I tried a few different options, but finally decided to go with a stripped down Ubunutu foundation and add the things I wanted.

I began with Ubuntu Server 7.10 and installed it on my USB drive using some of the recommendations in the installation instructions for low memory systems. During the installation process I selected guided partioning and I did not choose to install any of the software configurations in the "software to install" menu. After installing, I rebooted and added the following packages using sudo apt-get install:

lynx (optional)
screen (optional)
gcc (optional)
xorg
x-window-system-core
firefox

If you are using a laptop, you will likely want to install the following modules:

acpi
acpid

With the above installed you can check the battery's charge, remaining time, etc. by running acpi on the command line. For the graphical desktop window manager, I chose iceWM. I installed it by adding:

icewm
iceconf
icewm-themes

In the past I've worked quite a bit with Fluxbox as a window manager, but it seems like iceWM is easier to configure, especially under Ubuntu. The liQuid theme looks quite nice.

This set up boots into a text only command line mode because it is based on Ubuntu Server, to enter graphics mode, you simply run startx. I connected to my wireless network using wpa_supplicant and running iwconfig.

One of the benefits of working on a lightweight system on a flash drive is the bootup speed. In twenty seconds the computer boots from a cold start, connects to my wireless network, and enters the graphical desktop. I'm quite happy with my little portable operating system, and you probably won't be suprised to hear that I wrote this post using it.

Tuesday, February 26, 2008

In praise of Haikus

Programmers are no strangers to strict requirements on form and syntax, so working in the poetic medium of the Haiku comes almost naturally.

Programming is fun.
Little virtual widgets.
Poems that do work.

The brevity and compactness of the haiku lends itself well to writing something tighly focused. I find them quite enjoyable to write.

Of course, there are quite a few other poetic structures of note which can offer a fun challenge. The limerick and the sonnet are two of my favorites. Here's a limerick I wrote (beware, obscure programming reference ahead).

There once was a coder named Chuck.
And through all the source code he snuck.
He changed not a line,
it all worked just fine:
he programmed by punching a duck!

A sonnet would be a bit ambitious for this late hour. So unleash your creativity, let's see what you've got.

Monday, February 18, 2008

Happy Valentines Day

Vanessa prepared a wonderful meal for me this Valentines day. Four courses, the first one pictured here, all delicious. This was the roasted red pepper tomato soup with an artistic heart made of sour cream. And this was only the first course. Have I mentioned that the meal was delicious. I'm very thankful, I married quite a cook. Not only that but she decorated too.

It would seem I borrowed a page from one of Ben's blogs and wrote about food. What can I say, a meal like this makes an impression.

Wednesday, February 13, 2008

My Programming Journey so Far

One of the great things about working in computer science is that you never stop leaning. It seems that many programmer follow a progression from one popular language to the next, and I thought I'd dedicate a post to reminisce about my journey so far. I first learned to program in C. This was at the age of sometime around eleven or thirteen. I was instantly hooked, and since then, I've kept right on learning. I think the path I've taken has been fairly typical. From C, I learned C++ (starting in high school). I learned Java during a summer internship after my junior year of high school. In college, it was more C++, Java, and C (I really learned the ins and outs of C in my networking class) along with some other programming languages.

My favorite two classes in my college computer science curriculum were the ones in which I learned assembly language and designed an arithmetic logic unit and then a simple processor. I finally felt like I understood exactly how computers worked. With assembly language I learned a bit about machine code, how many clock cycles specific operations take, and it felt so good to optimize a bit of code to run blazingly fast. In circuit design I learned where those clock cycles come from, why operations take the time they do, and how those machine language op codes are determined. In all things software, at some point it all comes back to electronics.

College was also a time to get a taste of other, less widely used, but none-the-less important languages. Scheme and Prolog were particularly interesting to me, but I haven't had much occasion to use them very much recently.

Through my career, I've focused primarily on C++, then Java. After that I've had the opportunity to use a large number of languages. I learned Ajax programming using JavaScript, I wrote some PHP, C#, and a rather large amount of Python. Outside of work, I like to explore new concepts and languages and I've taken a look at some other languages too. Some notable examples include Ruby and Common Lisp, but I haven't built anything serious with them yet. Python is a language which I've really grabbed hold of recently and I've been learning quite a bit about it. At least half of the side projects I'm working on in my spare time these days are in Python. There seems to be quite a bit of momentum behind Python, and I'm very interested to see where this all goes.

So there you have it, a small glimpse into my journey thus far. How does it jive or differ from your own?

Friday, February 08, 2008

Firefox 3 Beta 2

I recently downloaded the second beta of Firefox 3 from Portable Apps. I didn't want to replace the version of Firefox I already had installed, so I used the version from Portable Apps which runs as a standalone binary. Sometimes it's really nice to unpack a program without touching the registry or worrying about installing.

Overall, I'm very happy with the changes I've seen in the latest version. I had heard that there have been some improvements to the JavaScript engine in this version, and they are noticeable. When I logged in to gmail it seemed a bit more responsive. I have to say though that my favorite changes are in the address bar. When typing the address, the address bar shows addresses, titles, and logos for pages that you've already visited. Firefox 2 did this too, but I think 3 gives more detail and a larger number of results. I found myself using it much more often than in 2. Part of the reason is that it shows the most recently visited page at the top instead of the shortest match. I also liked that you could star a URL to add it to your bookmarks, and you can even select a folder and tag the URL from within the address bar.

Tuesday, January 29, 2008

Programming Languages are Languages too

One of the reasons that people keep inventing new programming languages is that humans are good at using language and computers are not. So was we improve computers and add more complexity, programmers endeavor to make using these new features simpler and less painful. As a result, computer languages are moving in a general direction towards more natural human language. There will likely always be differences, but programming languages and human languages are strikingly similar if you understand some of the widely used syntax.

Here's an example. When you see something like this

z = x * y;
print(z);

it means that you want the computer to "store the value of x times y in the varibale z, then display z on the screen." As you can see, some programming syntax is borrowed from math. This example includes arithmetic and a function. Functions can also be though of as verbs, with variables as the nouns. In object oriented programming, variables can be nouns which are capable of performing actions. If you had a digital carrier pigeon, and you wanted to tell it to carry a letter to your grandmother's house, you might say something like:

myPidgeon.payload = myLetter;
myPideon.flyTo(grandma.house);

In human language, there are always multiple ways to say the same thing, and the same applies in programming. The programmer might just as easily design the program to give grandma the letter like this:

myLetter.setRecipient(grandma);
myPideon.deliver(myLetter);

Now for some fun. What do the following code snippets mean?

if (jack.getWorkPercent() == 100.0 &&
    jack.getPlayPercent() == 0.0) {
  jack.dullBoyFlag = true;
}

Pie aPie = new Pie();
Song aSong = new Song(sixpence);
aSong.sing();
fill(pocket, rye);
aPie.add(new Blackbird()[24]);

Mouse mice[3];
for (i in range(3)) {
  mice[i] = new BlindMouse();
}
observe(run(mice));

party = new Party(jack, jill);
party.setTarget(water);
party.equip(pail);
party.ascend(hill);

Wednesday, January 23, 2008

A spoiled programmer

I've been writing quite a bit of Python code recently and I've become a bit spoiled. It's easy in Python to define new classes on the fly, create new functions, pass them here and there, and return arbitrary collections from a method. C will always have a special place in my heart (I think everyone's first language does), but I often think of ways I could make it a bit easier to do certain things like have functions that return functions or have a function return multiple values.

To explain by way of example, it would be fun to do something like this:

/* A function that returns multiple values */
int, char, int myFunction(int a, int b, int c, char d) {...}

...
  /* Invoke the function and store the results */
  int x, y;
  char c;
  {x, y, c} = myFunction(5, 6, 7, 'Z');

The above is a fairly Pythonic way of doing things, and it seems like it should be possible in C. The first way I thought of is using structs. I like to think of a struct as the precursor to a class. It allows the arbitrary grouping of variables into a single collection where they can be referred to by name. (In a couple of earlier posts, I showed how you could use structs to simulate classes in C.)

If I define a struct for each one of my multi-variable-returning functions, I can create functions which effectively return multiple values instead of just one. Yes, technically I am returning one value, the struct, but you know what I meant :-) Namely, if you look at the program's stack, there is probably no perceptible difference between returning a struct and returning multiple variables.

/* Create a 2 member struct to hold the return value */
struct myFuncReturn {
  int first;
  char second;
};

/* A function that returns an int and a char */
struct myFuncReturn myFunc(int a, int b, int c, char d) {
  struct myFuncReturn to_return;
  to_return.first = (a+b)*c;
  to_return.second = d;
  return to_return;
}

int main() {
  struct myFuncReturn pattern;
  pattern = myFunc(2, 3, 4, 'Z');
  printf("Pattern: %i, %c\n", pattern.first, pattern.second);
}

This works ok, but I would like to avoid having to create a new struct for each one of my functions. It might be easier if I didn't have to worry about types at all, so the natural choice is to have the function return a type-less void pointer (void*). The calling code would then be responsible for interpreting the function's return struct correctly. If I want to return a new anonymous struct from a function, it might look something like this:

void* myFunc(int a, int b, int c, char d);

If I use the above, I'll need to allocate memory for the struct and return it's address. This is a bit of a bother as well, because now I need to worry about cleaning up that memory later. Instead of having the function allocate a new structure to return, why not pass in a structure and have the function modify it? The code I would need to write would be more aesthetically pleasing (in my opinion) for both the function definition and the calling code which invokes it, and it might even be more efficient.

If I pass in a pointer to the result struct as the first parameter to the function, my program could look like this.

/* Function definition, the out parameter is the return value */
void myFunc(void* out, int a, int b, int c, char d) {
  ((struct{int first; char second;}*)out)->first = (a+b)*c;
  ((struct{int first; char second;}*)out)->second = d;
}

int main() {
  struct{int first; char second;} pattern;
  myFunc(&pattern, 2, 3, 4, 'Z');
  printf("Pattern: %i, %c\n", pattern.first, pattern.second);
}

Look ma, no type declarations! Now you might say that writing out the entire struct definition each time is a bit unpleasant, but you could always define a struct and use it instead. I wanted to show that you don't really need to declare a type for each function, which could create a bit of a mess if you start using multi-return functions everywhere. With the above pattern, you could also start to play some interesting games by having functions that actually return different structs in different situations (provided the out pointer's reserved space is large enough for the data you want to send back). If I've lost you by now, I do apologize.

For added effect, note that the anonymous structs don't need to match, you just need to make sure that the shape of the structure is the same so that you don't overwrite data. I could have written myFunc like this:

void myFunc(void* out, int a, int b, int c, char d) {
  ((struct{int first;}*)out)->first = (a+b)*c;
  ((struct{int x; char second;}*)out)->second = d;
}

Or if you want to go even further, like this:

void myFunc(void* out, int a, int b, int c, char d) {
  *((int*)out) = (a+b)*c;
  ((struct{int x; char second;}*)out)->second = d;
}

Ah, the joys of programming. It's little games like this that make programming lots of fun. It's like working on a big wide open puzzle that you get to build yourself. No wonder I'm spoiled.