Imposterrific - a blog by Jeff Scudder

Tuesday, March 17, 2009

Our New Pi Day Laptop

To commemorate this year's pi day, we've christened our newly purchased laptop PiPuter. It's an Acer Aspire. We were looking for a super cheap full sized laptop with decent spects and found a good deal at the MicroCenter. Vanessa says that our computer has graduated magniu cum laude from its factory and it is currently watching us - watching, watching, always watching.

Thursday, March 05, 2009

Septoplasty

I never dreamed that I too would be undergoing septoplasty just a few short months after Joe Greogorio had his operation. It turns out I had a pretty severely deviated septum which likely contributed to the annual sinus infections which I have been experiencing. My allergist recommended I have this operation to straighten the inside of my nose, as my left nasal passage was almost completely blocked near the bridge of my nose. In looking for a possible sinus infection I had a CT scan of my sinuses which revealed just how severe the deviation was. By the way, looking at a slice by slice cross section of your own head is fascinating.

This past Wednesday I had the surgery and I will have the stitches and splints removed next Tuesday. It's been a bit different than I expected. I anticipated some pain but, surprisingly, the part that hurts most is my upper front teeth. I also didn't expect quite so much bleeding. In any case, I'm not feeling too bad and I am looking forward to the long term benefits. Also, this has been my first experience BWOV (blogging while on vicodin).

Sunday, March 01, 2009

Two Guys Arguing - Five Questions

I noticed that my friend Ben is participating in a new blog named Two Guys Arguing (in addition to blogs one two and three :-) and I thought I'd piggy back on a recent ice-breaker post.

What are you currently hacking on?
What are you currently getting better at?
What do you do when your computer is asleep?
Describe that ‘big fish’ project that’s been stewing in your brain.
What are you gonna post about this coming week?

On to the answers:

As a small part of my day job I'm currently working on a rewrite of significant portions of the gdata-python-client to support version two of the Google Data API protocol (check out the v2 branch).
I've been trying to focus on writing speedier unit tests. Unit testing is great, but sometimes it is necessary to write end to end tests which can take a rather long time to run. Slow tests are run less often, so I've been thinking of some ways to cache some of the more expensive pieces of these end to end tests while still preserving their utility.
Playing with Claire is easily one of my favorite non-computer related activities. I have a wonderful wife who keeps me company and there's nothing like relaxing with the family. Aside from the yard work, remodeling, and other house related items, I do very much enjoy playing guitar now and again. I even have a somewhat regular weekly gig.
I have a few side projects I like to hack on here and there in my spare time. One of them I've been thinking more about recently is a computer language I've decided to call "Headspace". I wanted to try an experiment and see what the effects would be of applying the rule of seven to programming.
I've been planning to write a bit about the surgery I've recently undergone so stay tuned ;-)

Two Guys Arguing seems like it's off to a great start and I look forward to reading along.

Wednesday, February 25, 2009

Google Tweets

Earlier today, Google posted the following on Twitter

I'm 01100110 01100101 01100101 01101100 01101001 01101110 01100111 00100000 01101100 01110101 01100011 01101011 01111001 00001010

This looks suspiciously like ASCII, so I set out to render it in a more human readable form:

Bin       Hex ASCII
0110 0110 66  f
0110 0101 65  e
0110 0101 65  e
0110 1100 6C  l
0110 1001 69  i
0110 1110 6E  n
0110 0111 67  g
0010 0000 20    (space)
0110 1100 6C  l
0111 0101 75  u
0110 0011 63  c
0110 1011 6B  k
0111 1001 79  y
0000 1010 0A  \n

I doubt I'm the first to post this, but there you have it :-)

Sunday, February 01, 2009

Brief Haitus

I'm taking a short break from posting here to study for the GRE. I'm planning to apply to grad school in the not too distant future, and figured I should study. Apologies for the pusillanimously prosaic post.

Tuesday, January 06, 2009

Guitar Hacking

In highschool, I began learning to play guitar. I took private lessons at a local music store, and although I had read sheet music for over seven years by this point (mostly while playing trombone), my teacher started me out by drawing guitar tabs. Unlike sheet music with staffs, clefs, meters, and measures, guitar tabs read like a simple map of where your fingers belong. In some cases, tabs are a picture of the neck of the guitar. For example, most books of guitar chords contain diagrams that look something like this:


 E A D G B e
0___________
 | | | | | |
 | | | | | |
 | | | | | |
1-----------
 | | | | | |
 | 2 | | | |
 | | | | | |
2-----------
 | | | | | |
 1 | | | | 4
 | | | | | |
3-----------
 | | | | | |
 | | | | | |
 | | | | | |
4-----------

The long vertical lines are the strings, while the horizontal lines are the frets. The lower pitched strings are on the left, while the higher pitched are on the right. Finger placement is indicated by the numbers written in the strings. The index finger is 1 and the pinky is 4. This diagram is of a G major chord and it uses three fingers, two on the third fret, and one on the second.

However, you could also draw a guitar tab like this:


 0       1       2       3       4
e|-------|-------|---4---|-------|
B|-------|-------|-------|-------|
G|-------|-------|-------|-------|
D|-------|-------|-------|-------|
A|-------|---2---|-------|-------|
E|-------|-------|---1---|-------|

The above looks a bit more like the neck of the guitar from the point of view of the person holding it.

One great way to learn how a musician plays an unusual chord, is to look at a video or picture. In that case, you'd see the neck of the guitar reversed, like this:

 
4       3       2       1       0
|-------|---1---|-------|-------|E
|-------|-------|---2---|-------|A
|-------|-------|-------|-------|D
|-------|-------|-------|-------|G
|-------|-------|-------|-------|B
|-------|---4---|-------|-------|e

Unless, of course, you are watching one of the many left-handed guitar players (Hendrix, McCartney, etc.) in which case, you neck would look like this:


 0       1       2       3       4
E|-------|-------|---1---|-------|
A|-------|---2---|-------|-------|
D|-------|-------|-------|-------|
G|-------|-------|-------|-------|
B|-------|-------|-------|-------|
e|-------|-------|---4---|-------|

The interesting thing about learning to play guitar using tabs, is that you actually have less information to go on than with sheet music. Tabs tend to lack information about the rhythm being played, tempo, and volume which are all present in sheet music. The one thing which sheet music lacks, however, is an indication of where your fingers should go when playing a particular chord or riff. Unpacking finger placement information from a cluster of notes on a staff can be difficult enough that guitar tabs make an attractive tradeoff. It is the musical equivalent of a domain specific language.

With the prevalence of recorded music which can be rewound and replayed over and over, a guitar player can often reconstruct the rythms and other necessary information by listening to the song. No longer need all information live on the page, the quick and dirty guitar hacker plays with tabs on the stand and the music in her head.

Wednesday, November 19, 2008

Mashup Camp 8

I participated in MashupCamp8 and created a simple mashup in a few hours which I have now open-sourced. It is a blog post editor, or article editor if you prefer, which finds key words in your text as you are writing and displays web search results for those key words next to the editor. Here's my elevator pitch:

When writing an article or blog post, I'll often note places where I should add a link and go back later to add them in. Sometimes searching for good resources to link to uncovers new information which I wish I had known while I was writing the article.

As I was walking around here at Mashup Camp, I saw a few web services which would allow me to do some data mining on raw text (like this blog post I'm writing) and search the web for relevant information.

I created a simple Ajax editor, which is hosted on Google App Engine. It takes the article's text, sends it securely (over HTTPS) to Open Calais which extracts key words then disambiguates, dedupes, and scores them for relevancy. Using this semantic web data from Calais, the editor then performs a web search on each of the keywords and requests results from the Yahoo Search BOSS API in the JSON format. The text analysis and web search is done automatically when the editor detects that you have stopped typing. To avoid excessive and distracting search refreshes, the editor will wait between three and eight seconds before performing the search.

You can try it out here:
https://sippycode.appspot.com/article_editor.

Download the source code here.

This simple little mashup was well received, and I think part of the reason is the potential for growth. The fundamental idea, is to have an environment in which relevant data is brought to you as you work. You don't even need to go info hunting, our computers can find relevant information and bring it to you automagically.

For this particular mashup app, I had envisioned a few additional features, the addition of which is left as an exercise to the reader if you should so choose.

A WSIWYG editor to create rich HTML instead of just plain text. (I looked at tinyMCE but ran out of time.)
Search on combinations of keywords instead of just individual items identified by Calais.
Search other information sources, like news results, images, videos.
Search on multiple search engines. (Google has an easy to use Ajax search API, but I wanted to try something that was totally new to me.)

Tuesday, November 11, 2008

XML Library with Versioning

I recently created a simple Python library for converting objects to and from XML. Code samples up front, here's how you would define some class to represent some hierarchical XML:

class AtomFeed(XmlElement):
  _qname = '{http://www.w3.org/2005/Atom}feed'
  title = Title
  entries = [Entry]

class Entry(XmlElement):
  _qname = '{http://www.w3.org/2005/Atom}entry'
  links = [Link]
  title = Title
  content = Content

class Link(XmlElement):
  _qname = '{http://www.w3.org/2005/Atom}link'
  rel = 'rel'
  address = 'href'

class Url(XmlElement):
  _qname = '{http://www.w3.org/2005/Atom}url'

class Title(XmlElement):
  _qname = '{http://www.w3.org/2005/Atom}title'
  title_type = 'type'

Now for the whys and hows.

For the past few years I've been working with Web Services and most of them use XML to represent the data (though I hope JSON catches on more widely). There are some great XML libraries out there, and my library is based on one of them (ElementTree). XML parsing is certainly nothing new, so why create a new one?

The Why

There are a few limitations with the XML parsing approaches I've used in Python:

XML structure isn't documented or available using help()
No autocompete for finding elements in the XML
If the XML changes in a new version of the web service, my code needs to be rewritten
My code interacting with the XML is verbose

Source code can provide a wealth of information, but parsed XML doesn't have the same level of information richness as source code. Between tool tips in IDEs, auto-generated documentation, and autocomplete, having classes loaded for your XML models can bring the tree traversal logic closer to your fingertips. Many software development tools are optimized for working with predefined classes rather than generic XML objects.

However, one of the biggest drawbacks to representing each type of XML element with it's own class is that you end up needing to write lots of class definitions. For this reason I've tried to make the XML class definitions as compact as possible. Specifying a simple XML class only takes two lines of code. For each type of sub-element and each XML attribute, you can add one line of code. You don't need to declare all of the elements or attributes either. The XmlElement will preserve all of the XML which it parses. If there are class members which correspond to a specified sub-element, the element will be placed in that member. Any unspecified elements will be converted to XmlElement instances. You can search over all XML elements (both anticipated members and unanticipated generic objects) using the get_elements method. XML attributes are handled in a similar fashion and can be searched using get_attributes.

I've saved the most unique feature of this library for last: Sometimes web services change the XML definition thereby breaking your code. If it is something small like a change in XML namespace or changing a tag, it seems like such a waste to have to edit lines upon lines of code. To address this kind of problem, this XML library supports versioning. When you parse or generate XML, you can specify the version of the available rules that you'd like to use. You can use the same objects with any version of the web service.

To use versioning, write a class definition with tuples containing the version specific information:

class Control(XmlElement):
  _qname = ('{http://purl.org/atom/app#}control', #v1
            '{http://www.w3.org/2007/app}control') #v2
  draft = Draft
  uri = 'atomURI'
  lang = 'atomLanguageTag'
  tag = ('control_tag', 'tag') # v1, v2

class Draft(XmlElement):
  _qname = ('{http://purl.org/atom/app#}draft', 
            '{http://www.w3.org/2007/app}draft')

If you create an instance of the Control element like this:

c = Control(draft=Draft('yes'), tag='test')

Then you can generate XML for each version like this:

c.to_string(1)

returns

<control xmlns="http://purl.org/atom/app#" 
    control_tag="test">
  <draft>yes</draft>
</control>

while

c.to_string(2)

returns

<control xmlns="http://www.w3.org/2007/app" 
    tag="test">
  <draft>yes</draft>
</control>

Note the difference in XML namespaces in the above. I also added an example of an attribute name which changed between versions, though "tag" doesn't actually belong in AtomPub control (so don't go trying to use it m'kay).

Since this library is open source, you're free to examine how it works and use it however you like. Allow me to highlight a few key points.

The How

Earlier I showed how to define XML element classes which look for specific sub elements and attributes and convert them into member objects. I also mentioned that this XML library handles versioning, meaning that the same object can parse and produce different XML depending on a version parameter. Both of these are accomplished by creating class level rule sets which are built up using introspection the first time an XML conversion is attempted.

In pseudo-code it works like this.

XML --> object
  - find out the desired version
  - is there an entry for this version in _rule_set?
  - if not, look at all XML members of this class 
      in _members
  - create XML matching rules based on each member's type
      (and store in _rule_set so we don't need to generate 
       the rules again)
  - iterate over all sub-elements in the XML tree
  - sub-elements and attributes which are in the rule set 
      are converted into the declared type
  - sub-elements and attributes which don't fit a rule are
      stored in _other_elements or _other_attributes

When generating XML the process is similar but slightly different.

object --> XML
  - create an XML tree with the tag and namespace for this 
      object given the desired version
  - look at all members of this class in _members
  - tell each member to attach itself to the tree using 
      it's rules for the desired version
  - iterate through _other_elements and _other_attributes 
      and tell each to attach to the XML tree

Armed with the above explanation, understanding the source code should be a bit easier.

Wednesday, November 05, 2008

Vim Line Length

I happen to be a fan of the vi text editor. I also happen to be a fan of code style guidelines. And sometimes it is very handy to know if a line in your source code has gone over eighty characters long. Some style guides still recommend a max line length of eighty characters, a tradition dating all the way back to IBM punch cards, circa 1928. For an even more impressive feat of long term engineering influence look up how the size of Roman war chariots determined the size of NASA space shuttle booster rockets. But I digress.

To mark the characters which are over the eighty character limit, enter the following either within vim or in your .vimrc file.

:match ErrorMsg '\%>80v.\+'

I've noticed that some blogger templates tend to display fifty-five characters in a fixed width format (as in a <pre> block or <code>). To edit code which I plan to post on Blogger, I sometimes turn on

:match ErrorMsg '\%>55v.\+'

Saturday, October 25, 2008

Atom-Powered Robots Run Amok

I imagine a grand total of two of my readers will recognize the title of this post. (Hint: it's from RFC 5023.) But it seemed an appropriate title for a recent addition to a local office building where some Android developers tend to congregate.

(Note the tiny t-shirt logo.)

Ai luff mai Android.

Wednesday, October 15, 2008

Twitter Client

As a proof of concept for using the sippycode HTTP library which I wrote about in my last post, I decided to create a simple text console client for Twitter. Download the Twitter terminal application here.

Twitter's RESTful API is quite simple, and I wrote an open source library for Twitter based on the sippycode HTTP library in a few minutes. Here's an example of posting a new update (tweeting):

import sippycode.http.core as http_core
import sippycode.auth.core as auth_core

class TwitterClient(object):

  def __init__(self, username, password):
    self._credentials = auth_core.BasicAuth(username, 
                                            password)

  def update(self, message):
    request = http_core.HttpRequest(method='POST')
    http_core.parse_uri(
            'http://twitter.com/statuses/update.xml'
        ).modify_request(request)
    request.add_form_inputs({'status': message})
    self._credentials.modify_request(request)
    client = http_core.HttpClient()
    response = client.request(request)
    return response

In the above, the client sends an authenticated POST to the updates URL. Using the TwitterClient in your code looks like this:

client = TwitterClient('my-username', 'my-password')
client.update('Try out this Twitter client: http://oji.me/wP')

To try out this Twitter console app, unpack the download and run sippy_twitter.py. With it, you can update your status on Twitter or read the updates from your friends. When reading, the client displays five updates at a time, since showing more at once would likely cause some to scroll off the top of the screen (assuming the terminal displays twenty-five lines).

This simple application was designed to be a proof of concept, but it's really grown on me. Cycling through all of my friend's updates doesn't require any scrolling, and it feels snappier than the web interface. It seems like others are enjoying this terminal client too.

There are quite a few ways that this client could be improved, so there's plenty of opportunity to pitch in if you are interested. I have received feature requests from friends who previewed this app, such as: support command line arguments which will allow the client to perform updates when being run from another program, show a running countdown from 140 characters as you are typing your update (could probably be done using ncurses), ability to follow users, and read updates from just one user. If you'd like to participate in any of these, let me know in the comments.

Fire up your terminal and give this client a try. Why not post an update to @jscud right now?

Monday, October 13, 2008

An Open Source Python HTTP Client

At Super Happy Dev House 27, I made significant progress on an open source library for making HTTP requests in Python. For the past few years I've been working with web services and APIs (SOAP, REST (wikipedia) - specifically AtomPub, etc.) and I wanted to create an HTTP library which is simple, clean, and precise. Python has a couple of great HTTP libraries already, but one of them is a bit too low level (httplib) and the other is too high level (urllib2).

For example, in httplib you call a method to send data as if you are writing to a file (httplib uses sockets, after all). Required HTTP headers like Content-Length are not calculated for you. You'll need to handle cookies and redirects on your own. On the plus side, you get full control of what is being sent. The higher level library, urllib2, is built on top of httplib. It adds some handy abstractions, like calculating the Content-Length, but it also has some limitations. I haven't yet been able to figure out how to perform a PUT or DELETE with urllib2.

When making HTTP calls to web services, there are often a large number of HTTP headers, URL parameters, and components to the request. Making a request feels like making a function call in most HTTP libraries. In the past, I've wrapped these functions with successive layers containing more and more function parameters. For example, in a request to send a photo and metadata to PicasaWeb, you need to include an Authorization token, Content-Type specifying a MIME-multipart request and the multipart boundary, and a multipart payload consisting of the Atom XML describing the photo and the photo's binary data. If you add in the the ability to specify other headers and URL parameters, your function call might look like this:

def post_photo(url, url_parameters, escape_parameters, 
               photo_mime_type, photo_file_handle, 
               photo_file_size, metadata_xml, 
               metadata_mime_type, auth_token, 
               additional_http_headers)
...

# Sets the request's Host, port, and uri. 
# Makes the request into a MIME multipart request, 
# adjusts the Content-Type and recalculates 
# Content-Length.
# Sets the Authorization header
post_photo('http://picasaweb.google.com/data/'
           'feed/api/user/userID/albumid/albumID', None, 
           False, 'image/jpeg', photo_file, photo_size, 
           atom_xml, 'application/atom-xml', 
           client_login_token, None)

To use the above, you have to gather all of the information in one place, and make the function call. There are cases where you want a design like the above.

However, more and more I think of ways the program could be more cleanly structured if this information could be compartmentalized. This new library relies on an HttpRequest object which various parts of the program modify. Once all of the modifications have been applied, the fully constructed request is passed to an HttpClient which communicates with the server using httplib or urlfetch if you happen to be on Google App Engine. (Support for more HTTP libraries is certainly possible.)

The photo posting example from above could look something like this. Keep in mind that these steps could be carried out in a different order in different segments of code.

photo_post = HttpRequest(method='POST')
# Sets the Authorization header
client_login_token.modify_request(photo_post)
# Adds to the body and calculated Content-Length, 
# sets the Content-Type.
photo_post.add_body_part(atom_xml, 
    'application/atom+xml')
# Makes the request into a MIME multipart request, 
# adjusts the Content-Type and recalculates 
# Content-Length.
photo_post.add_body_part(photo_file, 'image/jpeg', 
    photo_size)
# Sets the request's Host, port, and uri. 
parse_uri('http://picasaweb.google.com/data/'
          'feed/api/user/userID/albumid/albumID'
          ).modify_request(photo_post)

In fact, the above code could make up the body of the post_photo function described in the first code snippet.

I created an open source project for this and other small projects called sippycode (a play on sippy cup). This is a place where code can grow up.

Sunday, September 28, 2008

In Praise of screen

I've used screen for a few years now, but I only recently learned about one of its highly helpful features. As I was using my XO laptop in text only mode (ctrl-alt-fn-2), I was using screen to simulate multiple terminals, and I needed to copy and paste text between them. In the past, I've usually used screen when ssh-ing into a machine, and putty (my ssh client of choice) provided copy and paste, so I had never needed screen's system.

It turns out, screen's built-in, cross-window copy-paste system is a breeze to use. Press ctrl-a [ to enter copy mode, press enter to mark the start point and enter again to mark the end point. You have now copied the text. Switch to the desired window, and paste in the text using ctrl-a ].

This feature is especially handy in the XO laptop, where I've never been able to figure out how to copy and paste in the Terminal Activity.

For me, screen's most useful feature has been the ability to detach and reattach to a session which continues to run on the server. If I lose my ssh connection, all of my processes continue to run, and I can reattach to my screen session as if nothing ever happened.

Screen can be difficult to understand if you've never seen it in action, so I recommend watching this video. You can also learn more in this tutorial.

Thursday, September 18, 2008

XO Give One Get One - Round Two

As I was looking for some information on running the XO laptop in text mode, I stumbled across this page in the OLPC wiki:

One Laptop per Child is launching its second 'Give One, Get One' (G1,G1) program starting in November, 2008, delivering its XO Laptop globally via Amazon.com. Although the first iteration of the 'G1G1' program was extremely successful and sold more than 185,000 laptops, the delivery of the laptops in the USA did not run as smoothly as we anticipated. Selling the laptops on Amazon.com will provide us with the resources to process and ship the laptops globally in a timely fashion.

The laptop's operating system will be Linux-based (it will not dual-boot Windows and Linux, contrary to some reports).

If you've been thinking about getting one of these little green machines, it looks like the window will open once again. I wrote about the program last year. (Here is a list of all posts I've made on the XO Laptop.)

As for running in text mode, I've settled for using the terminal by pressing ctrl-fn-alt-1. To switch back to the normal view. press ctrl-fn-alt-3. For some reason, I can only start screen as root.

Sunday, September 14, 2008

The Ascention of JavaScript

I added a couple of new things to my little JavaScript library (q12). It turns out there was a flaw in the library when making HTTP requests which included custom HTTP headers in Firefox3 (and maybe others). If you ever run into (NS_ERROR_FAILURE) [nsIXMLHttpRequest.setRequestHeader], try opening the HTTP request first (as I recently learned). I've always said that writing this library was for education more than any anticipated serious use.

I also added a function for ARC4 encryption, which doesn't provide strong security, but it is still used in some places.

This has been quite a big month for JavaScript. In just a couple of weeks V8, Tracemonkey, and SquirrelFish Extreme have leapforgged one another and improved the state of JavaScript execution speed. It's gotten me thinking.

The more I use JavaScript, the more I like it. The language itself is extremely simple, I would argue even simpler than C - it's the APIs in browsers which are sometimes complex and inconsistent ;-) With all of these optimizations, JavaScript is looking better and better as a language in which to implement a new dynamic language.

Between the Just In Time compiler (JIT) and the polymorphic inline cache, some state of the art tools are coming to bear on one of the most widely available programming languages today. JavaScript is making strides to overcome a very difficult problem: maintaining complete runtime flexibility while providing speedy execution. There are significant investments being made in JavaScript as a language, and it seems that some of these open source components should be reusable. I'm not sure how much of these libraries would stand alone, but it would be a very interesting experiment to see if a new language could be implemented in JavaScript.

Sunday, September 07, 2008

A Revived Project

q12 is back!

I'm slowly starting back up again on my note taking wiki application. I created my own version of TiddlyWiki several months ago, but then started working on other projects. I'm planning to rewrite my note taking Ajax application to run on Google App Engine, and as I was getting started I realized that there were a few things missing from the Ajax library that I had written as part of this project.

I had created my own simple unit test framework in JavaScript, and I finally got around to uploading the unit tests for the library to the open source project. I've also been learning about manipulating browser cookies from within JavaScript. Aside: Cookie's in JavaScript are weird! When you say document.cookie = something, reading document.cookie doesn't give you the same thing back (the expiration, domain, and path information are squirreled away somewhere else).

I've also added a minified version of the q12 library, it weighs in at a mere 10k. Download the library today!

Wednesday, August 20, 2008

Dirt Simple CMS

I recently created an App Engine app to run www.jeffscudder.com. At the moment the code is extremely simple, and I get so few visitors to that web page that I doubt I will need anything complicated.

When I write blog posts and web pages, I have always preferred to just edit the HTML, and I have always wanted a simple content management system that just let me edit the HTML, JavaScript, CSS, ect. in the browser. Blogger comes awfully close to the perfect tool in my opinion, but it is geared towards displaying a series of posts. I wanted a landing page with links to all of the other content I put out there in the blagoweb. And I wanted to be able to host the simple web app's that I write (like the recently mentioned password generator).

With those design goals in mind, I set out to create my super simple content management system. It runs on App Engine, and the admin (me) is able to sign in to a special secret /content_manager page which lets me assign a specific blob of text to the desired URL under my domain. I can also set some basic metadata, like the content type (so that your browser knows how the content should be rendered) and cache control information, since HTTP caching is excellent and saves puppies from drowning in lakes (ok seriously it will alleviate congestion and unnecessary traffic when you want to give the same content to thousands or millions of people).

Editing pages through the /content_manager looks like this:

I've also decided to open source the code and I called the project scud-cms. Since App Engine is free for you to sign up, you can just upload this code and start setting your own content from right there in the browser.

(P.S. The idea for this simple content manager is very similar to one of my earlier projects: Scorpion Server, with which an authorized user could set the content at just about any URL they wanted.)

Tuesday, August 12, 2008

Opera on the XO Laptop

No I'm not talking about singing.

I was looking forward to the release of Firefox 3 with great anticipation. I had tried out the beta versions and noticed the improvements in JavaScript execution speed and memory footprint. Since the XO laptop is relatively slow and memory constrained compared to many of the other computers that I work with, I was really hoping that Firefox 3 would be a big improvement over Firefox 2. It turns out, for the XO, it wasn't. The place where the limitations were most apparent, was scrolling on a large web page. The page would hang for a second or so and then slowly creep down. It probably had to do with the short supply of available RAM on my happy little green machine.

As momentum was building for Firefox 3, I started to hear good things about Opera 9.51. When I ran into performance issues on Firefox and the XO, I decided to give Opera a try. Some thought I was crazy since, as Arne mentioned, the XO can be a bit slow. Yes, the XO laptop is a different environment than most other computers out there, and Opera has turned out to be an improvement.

To set up Opera on the XO, I began by downloading from the Opera web site. I selected Opera 9.51 for Linux i386, selected RedHat as the distribution in the drop down menu, then I selected RedHat 6.2, 7.1, 7.2, 7.3, 8.0, 9 and checked the box to Download this package in TAR.GZ format. Whew.

Next, unpack the archive using

tar zxvf opera-9.51.gcc4-static-qt3.i386.tar.gz

and then run

install.sh -s

From there, you can just type opera on the command line in the terminal to start it up.

Monday, August 04, 2008

Speedy Dynamic Member Lookup in C (part 2)

The comments on my previous post were highly enlightening. Joe pointed me to an article on the Polymorphic Inline Cache, and it's use in the SELF programming language. And Joel mentioned some different techniques used in other languages (Objective C, LISP, C++, and others) to implement dynamic lookups. With this new information, I can see that my simple solution might fall far short of the optimal solution, which isn't all that surprising as I'm certain more time has been spent on these real programming languages. But what I do think is interesting, is the idea that a more complex solution can give better performance than a simpler design. It is as if there is a valley, with the simple solution of static, precompiled behavior on a peak of performance, standing opposite a complex design which introduces overhead of collecting statistics on the most common paths through the code and constantly recompiles to machine code which optimizes the most used idioms. In between lies the poorly performing trough of the half solution. Solutions which provide dynamic behavior, but don't go far enough to mitigate the added cost of runtime lookups. The simple idea that I've coded up sits in this trough of slowness, but I'd wager it is faster than some solutions I've seen out there.

This isn't really a great solution, but I had already started, and wanted to finish because I think it was a good learning experience. We begin with a definition of a data structure that points to an offset lookup method.

typedef struct {
  size_t (*memberLookup)(char*);
} TypedStruct;

typedef struct {
  size_t (*memberLookup)(char*);
  unsigned int age;
  unsigned int weight;
  char* name;
} Person;

size_t PersonLookup(char*);

typedef struct {
  size_t (*memberLookup)(char*);
  char* breed;
  unsigned int weight;
  unsigned int age;
  char* name;
} Dog;

size_t DogLookup(char*);

You can think of Person and Dog as subclasses of a common parent. They share a few member attributes, but the attributes are in different places within the structure. I want to write a function which can find the age member in whatever structure is passed in to it. How do we find out where a member lies within a structure? With this design, we ask the structure to tell us. The memberLookup function takes a name, and returns the offset for the desired member. Here are the lookup functions that describe the above structures:

#include"typed_struct.h"
#include<stdio.h>

size_t PersonLookup(char* memberName) {
  if(strcmp(memberName, "age") == 0) {
    return sizeof(size_t);
  } else if(strcmp(memberName, "weight") == 0) {
    return sizeof(int) + sizeof(size_t);
  } else if(strcmp(memberName, "name") == 0) {
    return 2*sizeof(int) + sizeof(size_t);
  } else {
    return -1; 
  }
}

size_t DogLookup(char* memberName) {
  if(strcmp(memberName, "breed") == 0) {
    return sizeof(size_t);
  } else if(strcmp(memberName, "weight") == 0) {
    return sizeof(size_t) + sizeof(char*);
  } else if(strcmp(memberName, "age") == 0) {
    return sizeof(size_t) + sizeof(char*) + sizeof(int);
  } else if(strcmp(memberName, "name") == 0) {
    return sizeof(size_t) + sizeof(char*) + 2*sizeof(int);
  } else {
    return -1; 
  }
}

With the data structures in place and the functions which map member names to locations, we can now write functions which dynamically lookup the location of a member. This simple function displays the age of the entity which is passed in. The static variables are designed to speed up repeated calls with the same type. For example, if a Person is passed in as entity several times in a row, the location of the age member will only need to be looked up on the first call.

void GiveIntroduction(void* entity) {
  static size_t age_offset = 0;
  static size_t (*last_lookup)(char*) = NULL;
  
  if(last_lookup != ((TypedStruct*)entity)->memberLookup) {
    // In words, convert entity to a TypedStruct pointer
    // (TypedStruct*)entity
    // Find the function pointer called memberLookup
    // ((TypedStruct*)entity)->memberLookup
    // Call the function at memberLookup with "age" as the parameter
    // (*(((TypedStruct*)entity)->memberLookup))("age")
    age_offset = (*(((TypedStruct*)entity)->memberLookup))("age");
    last_lookup = ((TypedStruct*)entity)->memberLookup;
  }

  printf("I am %i years old\n", *((int*)(entity + age_offset)));
}

Here is a program which uses the above structures and functions.

int main() {
  Dog nelly;
  nelly.memberLookup = &DogLookup;
  nelly.age = 4;
  nelly.name = "Nelly";
  
  GiveIntroduction(&nelly);
}

When the above is run, it should display "I am 4 years old".

Now, what is the point of all of this? In statically typed, compiled languages, the compiler usually has to know in advance where the data is and what it's type is. For example, with the Dog structure, the compiler knows that age is 12 bytes from the start of the structure (on most 32-bit processors), and that it's type is int. All of the offsets are calculated at compile time. By moving the offset lookup into a function which is executed at runtime, we free up the compiler from having to know where to look for a particular member. Now we can pass in any data type into GiveIntroduction and as long as the object's memberLookup function tells us where to find the age member, everything will work.

We've gained flexibility at the cost of some overhead. Setting the memberLookup on every instance of a TypedStruct uses a bit of memory, and calling the function to find the necessary offset adds a runtime cost. This example has been a bit simple, even too simple, as there are ways to provide more aggressive caching of the lookups and faster ways to perform the lookup within the XLookup functions.

Friday, August 01, 2008

Speedy Dynamic Member Lookup in C (part 1)

If my senior class had voted for a "Most likely to try to invent a new programming language", I probably would have at least gotten a nomination for the category. From time to time I think about how I could design language constructs that maintain efficiency while allowing for more flexibility than statically linked, early bound languages like C. Some of my previous posts have been along these lines. And I've recently been thinking about performing member lookups on an object instance and how I could write a more flexible object system in C.

C has a concept of structs and types, and allows you to define a data structure, which is compiled. Here's an example:

typedef struct {
  uint16_t age;
  uint16_t weight;
  wchar* name;
} Person;

Now when you say Person fred, your program will grab a chunk of memory (likely from the stack) that is 8 bytes, or 12 bytes, or something similar depending on the compiler, size of pointers, ect. Now if I write fred.age in my program, this is translated to "the value of the two bytes at the start of fred's address" and if I ask for fred.weight, that turns into "the value of two bytes which are at exactly fred's address + 2". Since C is early bound (AKA static), the compiler knows that weight is always found in the same location relative to the start of fred, and as a result, the machine code generated by the compiler can be very fast and efficient.

The place where this optimization breaks down, is when you have a different structure but you want to be able to use a function written for one type of object with a different object. Say for example, we had a different structure that also used age and weight:

typedef struct {
  wchar* breed;
  uint16_t weight;
  uint16_t age;
  wchar* name;
} Dog;

Now if I want to reuse the code I wrote earlier to find out the age of a Dog, I have a small problem. You see, the age is in a different location for a Dog than a Person. If it had been in the same position, I could have cast the Dog to a Person, and then my compiled code would be looking for age at the same location. But for one reason or another, the layout of Dog is different.

In many programming languages, the difference in layout would present no problem, because the offsets aren't set in stone at compile time. Instead there is a dynamic lookup which says "where can I find 'age' for the type of data this is?" and we can add the same functionality here in C. Often this lookup is done using some kind of associative array, but I have a few slightly different ideas in mind.

C gets around this potential limitation by requiring you to specify the type of the each chunk of data. So for example, if you have a Person fred and a Dog nelly, then age is converted into a different offset from the start of the data. However this means that you would need to write the same function multiple times, once for Person and once for Dog.

Instead of rewriting a function for each different type, I'd like to use the same function for all types which have members which match up. In other words, as long as a type has a member named age of type uint16 and member string called name. It should be possible to create a C function which behaves a bit more like dynamic languages like JavaScript and Python.

In order to accomplish a dynamic member lookup, I'll add one thing to the types I've defined above. The first member will always be a pointer to something that allows the location of a named object to be determined. This could be a function that takes a name and produces an offset, it could be a hash table that maps names to offsets, it could point to a binary tree, or something else.

typedef struct {
  void* type;
  uint16_t age;
  uint16_t weight;
  wchar* name;
} Person;

typedef struct {
  void* type;
  wchar* breed;
  uint16_t weight;
  uint16_t age;
  wchar* name;
} Dog;

Now that the type information is stored along with the data, I could write a PrintAge function that would lookup the correct location of the age member. Here's some pseudocode for how this more dynamic code would look:

function PrintAge()
  // Note that this age offset is stored between function calls so 
  // that if the function is called with the same type as in the most recent call, 
  // the offset will not have to be looked up again. This should improve 
  // performance by avoiding the need to perform a lookup in some cases.
  static long age_offset = -1
  static void* most_recent_type
  if most_recent_type != current_type || age_offset == -1
    age_offset = lookup(age)
  printf("age is *(person+age_offset)")

I haven't decided yet on the best way to lookup the offset based on the member name. I think it might be a close call between pointing to a lookup function with if statements, and a lookup in a presorted binary tree. Depending on the number of members, I could see it going either way. What do you think, what is the best way to implement a dynamic member lookup?