Imposterrific - a blog by Jeff Scudder: python

Showing posts with label python. Show all posts

Saturday, August 01, 2009

A Test Client for App Engine

I created a simple utility library in Python to help anyone debug their App Engine application. Many of the App Engine apps that I've seen use HTTP, HTML form posts, and the Users API and the easiest way to test these features is to fire up the web browser and click through the web pages generated by the app. However this can be a bit slow and repetative and it is difficult to determine exactly what is being sent over the wire (though this is greatly helped by using wireshark, fiddler, tcpdump, or antoher network packet sniffing tool).

Enter my little App Engine HTTP module. It provides a simple interface for making arbitrary HTTP requests and will print the full request and response to the terminal (though you can turn the noisy printing off if you want). Download, copy the http.py file to you working directory and try it out in your Python interpreter.

For our first demonstration, let's try to visit the Google search page.

import http
client = http.Client()
resp = client.request('GET', 'http://www.google.com')

You should see your request and the server's response (with the HTML for the Google Search page) in your terminal window. This should work with just about any website out there.

Other HTTP debugging tools can show you the request and response like this, but I find that this kind of simple Python client can be useful in writing end-to-end or integration tests which contact your App Engine app remotely.

Along those lines, one of the things which standard HTTP debugging tools do not provide, is a way to sign in to an App Engine app with a Google Account so that the App Engine Users API can identify the current user. I wrote an extremely simple app which illustrates the Users API, try it out here:

http://jscudtest.appspot.com/user

After signing in, the page should simply say, "Hello yourusername (yourusername@yourdomain.com)" You'll notice that during the sign in process, you signed in on www.google.com/accounts and were asked to approve access to the app. This kind of interaction works great in a browser, but can be tricky when you are using a command line, browserless, client.

It is possible however, to sign in to an App Engine app without using a browser. You can use the same technique used in appcfg, use ClientLogin and use the authorization token to obtain an app specific cookie which indicates the current user. This simple HTTP library can do this for you and all subsequent requests will use this cookie to tell the App Engine app who the current user is. Try it out by making the request to the simple user app that you visited earlier:

import http
client = http.Client()
client.appengine_login('jscudtest')
resp = client.request('GET',
                      'http://jscudtest.appspot.com/user')
print resp.body

You should see the following text displayed in the terminal:

Hello, yourusername (yourusername@yourdomain.com)

You can use the appengine_login method with your own app, just change the argument to the App ID of the app you want to access.

Along with simplifying access to apps which use Google Accounts, I wanted this library to simplify the process of using another feature used by many web apps: HTML form posts. Now I'm certain you've used HTML forms before, here's a simple example:

http://shoutout.appspot.com/

The above app uses both the Users API and a simple form. As an alternative to visiting this page in the web browser, you can post your shout-out using the following:

import http
client = http.Client()
client.appengine_login('shoutout')
client.request('POST', 'http://shoutout.appspot.com/',
    form_data={'who': raw_input('From: '),
               'message': raw_input('Message: ')})

If you've even wondered what gets sent across the wire to post on a form like this, look back in your terminal to see the request from your computer and the response from the server (this is of course just the HTTP layer, wireshark will show you traffic on the IP and Ethernet layer as well).

That's really all there is to it. I designed this as just a simple script to use on the command line and I wrote it in less time than it's taken me to write this blog post about it (I borrowed atom.http_core from the gdata-python-client as a foundation). With some tweaks to remove the interactive (getpass and raw_input) calls and replace them with parameters, I could see this module as a utility layer in a larger, more complex, App Engine client application. If you're creating on I'd love to hear about it ;-)

For more information on how the appengine_login method works behind the scenes, see this presentation I gave a few months ago:

Many thanks to Trevor Johns and Nick Johnson for helping me to understand how this ClientLogin-to-cookie exchange works.

I'm sure that App Engine's Java runtime users would appreciate a port of this simple library to Java, if you feel so inclined.

Tuesday, July 14, 2009

New version of my Dirt Simple CMS

I have just uploaded "version 2" of the dirt simple content management system which I developed almost a year ago. For those who don't recall, scud-cms is an extremely thin layer on top of App Engine with a plain text box based editor for creating web pages. The only new feature in this release is the ability to page through all of the content in your app by visiting the /content_lister page. I implemented the pagination system using "key only" queries and order-by-key which are fairly recent features in App Engine. For more potential designs to page through datastore entities, see this article on pagination.

I was inpired to work on this in part by a comment from Jason Worley (swirleydude on twitter) who has been using it and appears to like it quite a bit. At some point I plan to do a version 3 release which will include file uploads (great for images) just as Jason has done in his own modifications. Having someone use your software, enjoy it, tweak it, and build on it is rewarding in a way which is quite unique.

Thursday, June 25, 2009

Partial Function Invocation

My wife tells me that I often jump into an explanation by starting at the beginning of my train of thought without giving any indication of where I'm going. It would be better if I began with the point I'm trying to make, then explain how I reched my conclusion, How am I doing so far? Oh wait, right... Here is my conclusion:

Allowing a function to be partially invoked, to allow some of the arguments to be specified at different times, can allow for code which is more flexible than by just using objects or pure functions.

I've been thinking about this lately as I refactored the gdata-python-client which is a library which can be used with AtomPub services. I'll spare you all the gory details, and offer a simple example of how partial function invocation might come in handy.

When making a request to a remote server, you might need the following information, just for example: username, password. URL, message body, and content type. So we start out by writing a stateless function to take this information and open a connnecion to the server, format our inputs, transmit our request, and parse the response. We'll call it post, and using it looks like this:

serverResponse = post(url, data, contentType, username, 
                      password)

This is all well and good, but suppose the final request, as shown above, is preceeded by a whole series of function calls. Each function would need to dutifully pass along parts of the request. Say for example that the user types in their username and password long before the request is made, so these get passed as parameters to lots of functions which only receive them so they can pass them on. In addition, the username and password are almost always the same from request to request, so the same values are being passed to the post function over and over. In cases like this, we will often use an object to hold common values.

class Requestor {
  username
  password

  method post(url, data, conteentType) {...}
}

Now our request will look like:

client = new Requestor(username, password)
serverResponse = client.post(url, data, contentType)

What we've effectively done here is specified some of the information in advance and left other pieces of information to be specified at the last minute. I would argue that this make the code better (cleaner, less chance of human error in listing lots of parameters, perhaps less data on the call stack, etc.).

Now the question becomes: Did we extract the right pieces of information from the function call into the object? Suppose the code you are writing needs to use a different password for each service you are making a request to, but the content type of the data is always the same. Then it would have made more sense to design our class like this:

class Requestor {
  username
  contentType

  method post(url, data, password) {...}
}

Since we've established that not everyone who is using our code has the same usage patterns, lets design for utimate flexibility. Every parameter can be specified either in the object, or in the function call. Also, if the object has a parameter already, we could override it by passing in that parameter when we call the method. This is not too difficult in in Python, so here is a non-pseudocode example:

class Requestor(object):
  
  def __init__(self, url=None, data=None, 
               content_type=None, username=None, 
               password=None):
    self.url = url
    self.data = data
    self.content_type =  content_type
    self.username = username
    self.password = password
  
  def post(self, url=None, data=None, 
           content_type=None, username=None, 
           password=None):
    url = url or self.url
    data = data or self.data
    content_type = content_type or self.content_type
    username = username or self.username
    password = password or self.password
    # Now we have our inputs, code to make 
    # the request starts here
    ...

If you think this seems a bit excessive, I would agree. I didn't go nearly this far when designing the library that started me thinking about this. There was one request parameter in particular though that does use this pattern. (Five points to the first person to post it in the comments. ;-)

To use the above class, you would do:

requestor = Requestor(username='...')
requestor.password = '...'
...
server_response = requestor.post(url, data, content_type)

It will also handle our alternate usage where we want to give the password to the post method and set the content type at the object level:

requestor = Requestor(username='...', content_type='...')
...
server_response = requestor.post(url, data, password='...')

We can even override parameters which are set in the object:

requestor = Requestor(password='...', content_type='...')
requestor.username = '...'
...
# Override the content_type, just on this request.
server_response = requestor.post(url, data, content_type='...')

With the above example we end up with a lot of code just to let us specify each parameter in either the object or as a function argument. In fact, this can introduce so cases where the user forgets to specify in either, which is possible because all function arguments are now optional. Wouldn't it be better if we could instead specify some of the function parameters, pass the half-specified function call around, and fill in the ramaining values when we finally invoke. For this illustration, I'm using the following syntax to show a partial invocation, < > around arguments instead of ( ).

function post(url, data, contentType, username, password) {...}

started = post<username, password>
...
serverRespense = started(url, data, contentType)

Recall our case from earler, what if the contentType is constant but the password is instead more variable:

started = post<username, contentType>
...
serverRespense = started(url, data, password)

It turns out I'm not the first person to think of this pattern, not by a long shot. Functional programming often makes use of this pattern, referred to as function currying. I found the following example for Scheme which also shows how easy this is in Haskell. The prototype library for JavaScript includes a bind function which can accomplish the same thing. Here's a paper on the topic in C++: (pdf, Google cache HTML). I also found PEP 309 which was a proposal for this in Python. Perhaps I should have called my Python example above: Function Currying using Classes. If you can think of other examples, I'd love to see them.

Tuesday, November 11, 2008

XML Library with Versioning

I recently created a simple Python library for converting objects to and from XML. Code samples up front, here's how you would define some class to represent some hierarchical XML:

class AtomFeed(XmlElement):
  _qname = '{http://www.w3.org/2005/Atom}feed'
  title = Title
  entries = [Entry]

class Entry(XmlElement):
  _qname = '{http://www.w3.org/2005/Atom}entry'
  links = [Link]
  title = Title
  content = Content

class Link(XmlElement):
  _qname = '{http://www.w3.org/2005/Atom}link'
  rel = 'rel'
  address = 'href'

class Url(XmlElement):
  _qname = '{http://www.w3.org/2005/Atom}url'

class Title(XmlElement):
  _qname = '{http://www.w3.org/2005/Atom}title'
  title_type = 'type'

Now for the whys and hows.

For the past few years I've been working with Web Services and most of them use XML to represent the data (though I hope JSON catches on more widely). There are some great XML libraries out there, and my library is based on one of them (ElementTree). XML parsing is certainly nothing new, so why create a new one?

The Why

There are a few limitations with the XML parsing approaches I've used in Python:

XML structure isn't documented or available using help()
No autocompete for finding elements in the XML
If the XML changes in a new version of the web service, my code needs to be rewritten
My code interacting with the XML is verbose

Source code can provide a wealth of information, but parsed XML doesn't have the same level of information richness as source code. Between tool tips in IDEs, auto-generated documentation, and autocomplete, having classes loaded for your XML models can bring the tree traversal logic closer to your fingertips. Many software development tools are optimized for working with predefined classes rather than generic XML objects.

However, one of the biggest drawbacks to representing each type of XML element with it's own class is that you end up needing to write lots of class definitions. For this reason I've tried to make the XML class definitions as compact as possible. Specifying a simple XML class only takes two lines of code. For each type of sub-element and each XML attribute, you can add one line of code. You don't need to declare all of the elements or attributes either. The XmlElement will preserve all of the XML which it parses. If there are class members which correspond to a specified sub-element, the element will be placed in that member. Any unspecified elements will be converted to XmlElement instances. You can search over all XML elements (both anticipated members and unanticipated generic objects) using the get_elements method. XML attributes are handled in a similar fashion and can be searched using get_attributes.

I've saved the most unique feature of this library for last: Sometimes web services change the XML definition thereby breaking your code. If it is something small like a change in XML namespace or changing a tag, it seems like such a waste to have to edit lines upon lines of code. To address this kind of problem, this XML library supports versioning. When you parse or generate XML, you can specify the version of the available rules that you'd like to use. You can use the same objects with any version of the web service.

To use versioning, write a class definition with tuples containing the version specific information:

class Control(XmlElement):
  _qname = ('{http://purl.org/atom/app#}control', #v1
            '{http://www.w3.org/2007/app}control') #v2
  draft = Draft
  uri = 'atomURI'
  lang = 'atomLanguageTag'
  tag = ('control_tag', 'tag') # v1, v2

class Draft(XmlElement):
  _qname = ('{http://purl.org/atom/app#}draft', 
            '{http://www.w3.org/2007/app}draft')

If you create an instance of the Control element like this:

c = Control(draft=Draft('yes'), tag='test')

Then you can generate XML for each version like this:

c.to_string(1)

returns

<control xmlns="http://purl.org/atom/app#" 
    control_tag="test">
  <draft>yes</draft>
</control>

while

c.to_string(2)

returns

<control xmlns="http://www.w3.org/2007/app" 
    tag="test">
  <draft>yes</draft>
</control>

Note the difference in XML namespaces in the above. I also added an example of an attribute name which changed between versions, though "tag" doesn't actually belong in AtomPub control (so don't go trying to use it m'kay).

Since this library is open source, you're free to examine how it works and use it however you like. Allow me to highlight a few key points.

The How

Earlier I showed how to define XML element classes which look for specific sub elements and attributes and convert them into member objects. I also mentioned that this XML library handles versioning, meaning that the same object can parse and produce different XML depending on a version parameter. Both of these are accomplished by creating class level rule sets which are built up using introspection the first time an XML conversion is attempted.

In pseudo-code it works like this.

XML --> object
  - find out the desired version
  - is there an entry for this version in _rule_set?
  - if not, look at all XML members of this class 
      in _members
  - create XML matching rules based on each member's type
      (and store in _rule_set so we don't need to generate 
       the rules again)
  - iterate over all sub-elements in the XML tree
  - sub-elements and attributes which are in the rule set 
      are converted into the declared type
  - sub-elements and attributes which don't fit a rule are
      stored in _other_elements or _other_attributes

When generating XML the process is similar but slightly different.

object --> XML
  - create an XML tree with the tag and namespace for this 
      object given the desired version
  - look at all members of this class in _members
  - tell each member to attach itself to the tree using 
      it's rules for the desired version
  - iterate through _other_elements and _other_attributes 
      and tell each to attach to the XML tree

Armed with the above explanation, understanding the source code should be a bit easier.

Wednesday, October 15, 2008

Twitter Client

As a proof of concept for using the sippycode HTTP library which I wrote about in my last post, I decided to create a simple text console client for Twitter. Download the Twitter terminal application here.

Twitter's RESTful API is quite simple, and I wrote an open source library for Twitter based on the sippycode HTTP library in a few minutes. Here's an example of posting a new update (tweeting):

import sippycode.http.core as http_core
import sippycode.auth.core as auth_core

class TwitterClient(object):

  def __init__(self, username, password):
    self._credentials = auth_core.BasicAuth(username, 
                                            password)

  def update(self, message):
    request = http_core.HttpRequest(method='POST')
    http_core.parse_uri(
            'http://twitter.com/statuses/update.xml'
        ).modify_request(request)
    request.add_form_inputs({'status': message})
    self._credentials.modify_request(request)
    client = http_core.HttpClient()
    response = client.request(request)
    return response

In the above, the client sends an authenticated POST to the updates URL. Using the TwitterClient in your code looks like this:

client = TwitterClient('my-username', 'my-password')
client.update('Try out this Twitter client: http://oji.me/wP')

To try out this Twitter console app, unpack the download and run sippy_twitter.py. With it, you can update your status on Twitter or read the updates from your friends. When reading, the client displays five updates at a time, since showing more at once would likely cause some to scroll off the top of the screen (assuming the terminal displays twenty-five lines).

This simple application was designed to be a proof of concept, but it's really grown on me. Cycling through all of my friend's updates doesn't require any scrolling, and it feels snappier than the web interface. It seems like others are enjoying this terminal client too.

There are quite a few ways that this client could be improved, so there's plenty of opportunity to pitch in if you are interested. I have received feature requests from friends who previewed this app, such as: support command line arguments which will allow the client to perform updates when being run from another program, show a running countdown from 140 characters as you are typing your update (could probably be done using ncurses), ability to follow users, and read updates from just one user. If you'd like to participate in any of these, let me know in the comments.

Fire up your terminal and give this client a try. Why not post an update to @jscud right now?

Monday, October 13, 2008

An Open Source Python HTTP Client

At Super Happy Dev House 27, I made significant progress on an open source library for making HTTP requests in Python. For the past few years I've been working with web services and APIs (SOAP, REST (wikipedia) - specifically AtomPub, etc.) and I wanted to create an HTTP library which is simple, clean, and precise. Python has a couple of great HTTP libraries already, but one of them is a bit too low level (httplib) and the other is too high level (urllib2).

For example, in httplib you call a method to send data as if you are writing to a file (httplib uses sockets, after all). Required HTTP headers like Content-Length are not calculated for you. You'll need to handle cookies and redirects on your own. On the plus side, you get full control of what is being sent. The higher level library, urllib2, is built on top of httplib. It adds some handy abstractions, like calculating the Content-Length, but it also has some limitations. I haven't yet been able to figure out how to perform a PUT or DELETE with urllib2.

When making HTTP calls to web services, there are often a large number of HTTP headers, URL parameters, and components to the request. Making a request feels like making a function call in most HTTP libraries. In the past, I've wrapped these functions with successive layers containing more and more function parameters. For example, in a request to send a photo and metadata to PicasaWeb, you need to include an Authorization token, Content-Type specifying a MIME-multipart request and the multipart boundary, and a multipart payload consisting of the Atom XML describing the photo and the photo's binary data. If you add in the the ability to specify other headers and URL parameters, your function call might look like this:

def post_photo(url, url_parameters, escape_parameters, 
               photo_mime_type, photo_file_handle, 
               photo_file_size, metadata_xml, 
               metadata_mime_type, auth_token, 
               additional_http_headers)
...

# Sets the request's Host, port, and uri. 
# Makes the request into a MIME multipart request, 
# adjusts the Content-Type and recalculates 
# Content-Length.
# Sets the Authorization header
post_photo('http://picasaweb.google.com/data/'
           'feed/api/user/userID/albumid/albumID', None, 
           False, 'image/jpeg', photo_file, photo_size, 
           atom_xml, 'application/atom-xml', 
           client_login_token, None)

To use the above, you have to gather all of the information in one place, and make the function call. There are cases where you want a design like the above.

However, more and more I think of ways the program could be more cleanly structured if this information could be compartmentalized. This new library relies on an HttpRequest object which various parts of the program modify. Once all of the modifications have been applied, the fully constructed request is passed to an HttpClient which communicates with the server using httplib or urlfetch if you happen to be on Google App Engine. (Support for more HTTP libraries is certainly possible.)

The photo posting example from above could look something like this. Keep in mind that these steps could be carried out in a different order in different segments of code.

photo_post = HttpRequest(method='POST')
# Sets the Authorization header
client_login_token.modify_request(photo_post)
# Adds to the body and calculated Content-Length, 
# sets the Content-Type.
photo_post.add_body_part(atom_xml, 
    'application/atom+xml')
# Makes the request into a MIME multipart request, 
# adjusts the Content-Type and recalculates 
# Content-Length.
photo_post.add_body_part(photo_file, 'image/jpeg', 
    photo_size)
# Sets the request's Host, port, and uri. 
parse_uri('http://picasaweb.google.com/data/'
          'feed/api/user/userID/albumid/albumID'
          ).modify_request(photo_post)

In fact, the above code could make up the body of the post_photo function described in the first code snippet.

I created an open source project for this and other small projects called sippycode (a play on sippy cup). This is a place where code can grow up.

Wednesday, August 20, 2008

Dirt Simple CMS

I recently created an App Engine app to run www.jeffscudder.com. At the moment the code is extremely simple, and I get so few visitors to that web page that I doubt I will need anything complicated.

When I write blog posts and web pages, I have always preferred to just edit the HTML, and I have always wanted a simple content management system that just let me edit the HTML, JavaScript, CSS, ect. in the browser. Blogger comes awfully close to the perfect tool in my opinion, but it is geared towards displaying a series of posts. I wanted a landing page with links to all of the other content I put out there in the blagoweb. And I wanted to be able to host the simple web app's that I write (like the recently mentioned password generator).

With those design goals in mind, I set out to create my super simple content management system. It runs on App Engine, and the admin (me) is able to sign in to a special secret /content_manager page which lets me assign a specific blob of text to the desired URL under my domain. I can also set some basic metadata, like the content type (so that your browser knows how the content should be rendered) and cache control information, since HTTP caching is excellent and saves puppies from drowning in lakes (ok seriously it will alleviate congestion and unnecessary traffic when you want to give the same content to thousands or millions of people).

Editing pages through the /content_manager looks like this:

I've also decided to open source the code and I called the project scud-cms. Since App Engine is free for you to sign up, you can just upload this code and start setting your own content from right there in the browser.

(P.S. The idea for this simple content manager is very similar to one of my earlier projects: Scorpion Server, with which an authorized user could set the content at just about any URL they wanted.)

Tuesday, April 22, 2008

Early vs Late Binding

I've been thinking recently about programming languages (surprised?), specifically about the things that make them different. One of the really nice things about C, is that it compiles into machine code which tends to run lean and mean. By that I mean it is blazing fast and doesn't take up much memory. On the other hand, programming in Python and JavaScript has really been growing on me. There is so much flexibility to create elegant solutions quickly and without rewriting lots of existing code. In fact, I'd say greater ability to reuse existing code is a natural outgrowth of programming language flexibility.

So where does this flexibility come from? One place I tend to notice it most, is in the ability to give an existing function a new body, in other words, you can plug in different behavior in place of the default.

Here's a simple example to illustrate the idea. Let's say that we created a simple checkout register which takes a receipt, adds the sales tax, and spits out the grand total. Here's our code foundation in both Python and JavaScript (these two examples do essentially the same thing):

Python:

def CalculateTax(amount):
  return amount * 0.18

class Receipt(object):
  
  def __init__(self, items=None):
    self.items = items or []
  
  def CalculateTotal(self):
    return sum([item + CalculateTax(item) for item in self.items])

JavaScript:

function calculateTax(amount) {
  return amount * 0.18;
}

function Receipt(items) {
  if (items) {
    this.items = items;
  } else {
    this.items = new Array();
  }
}

Receipt.prototype.calculateTotal = function() {
  var total = 0;
  for (var i = 0; i < this.items.length; i++) {
    total += this.items[i] + calculateTax(this.items[i]);
  }
  return total;
}

To use the above code, you might write something like this:

Python:

my_order = Receipt([5.50, 10, 7.89])
print my_order.CalculateTotal()

JavaScript:

var myOrder = new Receipt([5.50, 10, 7.89]);
alert(myOrder.calculateTotal());

Now let's say someone asks you to change the tax rate which is used when calculating the total. Here's the catch, you're not allowed to change the existing code. It turns out this is actually really easy. You can define a new function, then make an existing function name point to the new function. Here's an example of how to inject our new code:

Python:

def CalculateHigherTax(amount):
  return amount * 0.25

CalculateTax = CalculateHigherTax

print my_order.CalculateTotal()

JavaScript:

function calculateHigherTax(amount) {
  return amount * 0.25;
}

calculateTax = calculateHigherTax;

alert(myOrder.calculateTotal());

After adding the above code to the foundation we started with, you will notice that the calculate total method now uses calculate-higher-tax instead of the original function, even though you are calling the same method on the same object as before. Congratulations, you have just witnessed late binding in action.

So what is late binding? The idea is that the computer decides which code should be executed while the program is running. This seems normal in scripting languages, but compiled languages often use this too (I'm looking at you Java and C++). For example, overloaded methods and polymorphism take advantage of late binding. With late binding you can change the meaning of an identifier (for example, change the behavior when you call a specific function) at just about any time.

Now lets take a look at a language which uses early binding. C is a great example. With early binding, the meaning of things like function names are locked in when the code is compiled. There is no dynamic lookup while the program is running to see which code should be executed, instead the address of the desired code is embedded directly into the binary machine code.

Here is how the same calculate-total example might look in C:

#include<stdio.h>

float CalculateTax(float amount) {
  return amount * 0.18;
}

typedef struct {
  float* items;
  int num_items;
} Receipt;

float CalculateTotal(Receipt this_order) {
  int i;
  float total = 0;
  for(i = 0; i < this_order.num_items; i++) {
    total += this_order.items[i] + CalculateTax(this_order.items[i]);
  }
  return total;
}

int main(void) {
  Receipt my_order;
  float my_items[3] = {5.50, 10, 7.89};
  my_order.items = my_items;
  my_order.num_items = 3;
  printf("%f\n", CalculateTotal(my_order));
}

If you try to set CalculateTax to a new function definition, you will get an error at compile time because a function cannot be changed once it is bound. Early binding tends to produce more efficient programs. However, if you want to, you can still use the flexiblity available in late binding in C.

Using function pointers, you can store the address of the code that you want to be executed, and change the address while the program is running. We can achieve the same late binding effects that I've illustrated in Python and JavaScript by making some small changes to the C code (marked in bold below). Declare a function pointer named TaxCalculator which will store the address of the desired calculate-tax function, then change CalculateTotal so that it uses the TaxCalculator instead of directly calling a calculate-tax function.

#include<stdio.h>

float CalculateTax(float amount) {
  return amount * 0.18;
}

float CalculateHigherTax(float amount) {
  return amount * 0.25;
}

typedef struct {
  float* items;
  int num_items;
} Receipt;

float (*TaxCalculator)(float) = &CalculateTax;

float CalculateTotal(Receipt this_order) {
  int i;
  float total = 0;
  for(i = 0; i < this_order.num_items; i++) {
    total += this_order.items[i] + (*TaxCalculator)(this_order.items[i]);
  }
  return total;
}

int main(void) {
  Receipt my_order;
  float my_items[3] = {5.50, 10, 7.89};
  my_order.items = my_items;
  my_order.num_items = 3;
  printf("%f\n", CalculateTotal(my_order));
  TaxCalculator = &CalculateHigherTax;
  printf("%f\n", CalculateTotal(my_order));
}

There you have it!

Here's another way to think about this comparison. In high level languages which don't expose pointers, functions, variables, and other identifiers actually act like pointers.

Tuesday, March 11, 2008

BusyList

Andy and I started work on a simple little open source project for tracking tasks; it's called busylist. We wanted to experiment with Ajax, Python, and web service APIs, so we whipped up a basic system in a few hours. There is still quite a bit of work to be done, but it has been a great learning experience so far. An extremely alpha test version is available in subversion along with some instructions on the project's wiki pages. If you're interested, feel free to check it out (pun intended) and contribute if you like. It is an open source project after all.

Wednesday, January 23, 2008

A spoiled programmer

I've been writing quite a bit of Python code recently and I've become a bit spoiled. It's easy in Python to define new classes on the fly, create new functions, pass them here and there, and return arbitrary collections from a method. C will always have a special place in my heart (I think everyone's first language does), but I often think of ways I could make it a bit easier to do certain things like have functions that return functions or have a function return multiple values.

To explain by way of example, it would be fun to do something like this:

/* A function that returns multiple values */
int, char, int myFunction(int a, int b, int c, char d) {...}

...
  /* Invoke the function and store the results */
  int x, y;
  char c;
  {x, y, c} = myFunction(5, 6, 7, 'Z');

The above is a fairly Pythonic way of doing things, and it seems like it should be possible in C. The first way I thought of is using structs. I like to think of a struct as the precursor to a class. It allows the arbitrary grouping of variables into a single collection where they can be referred to by name. (In a couple of earlier posts, I showed how you could use structs to simulate classes in C.)

If I define a struct for each one of my multi-variable-returning functions, I can create functions which effectively return multiple values instead of just one. Yes, technically I am returning one value, the struct, but you know what I meant :-) Namely, if you look at the program's stack, there is probably no perceptible difference between returning a struct and returning multiple variables.

/* Create a 2 member struct to hold the return value */
struct myFuncReturn {
  int first;
  char second;
};

/* A function that returns an int and a char */
struct myFuncReturn myFunc(int a, int b, int c, char d) {
  struct myFuncReturn to_return;
  to_return.first = (a+b)*c;
  to_return.second = d;
  return to_return;
}

int main() {
  struct myFuncReturn pattern;
  pattern = myFunc(2, 3, 4, 'Z');
  printf("Pattern: %i, %c\n", pattern.first, pattern.second);
}

This works ok, but I would like to avoid having to create a new struct for each one of my functions. It might be easier if I didn't have to worry about types at all, so the natural choice is to have the function return a type-less void pointer (void*). The calling code would then be responsible for interpreting the function's return struct correctly. If I want to return a new anonymous struct from a function, it might look something like this:

void* myFunc(int a, int b, int c, char d);

If I use the above, I'll need to allocate memory for the struct and return it's address. This is a bit of a bother as well, because now I need to worry about cleaning up that memory later. Instead of having the function allocate a new structure to return, why not pass in a structure and have the function modify it? The code I would need to write would be more aesthetically pleasing (in my opinion) for both the function definition and the calling code which invokes it, and it might even be more efficient.

If I pass in a pointer to the result struct as the first parameter to the function, my program could look like this.

/* Function definition, the out parameter is the return value */
void myFunc(void* out, int a, int b, int c, char d) {
  ((struct{int first; char second;}*)out)->first = (a+b)*c;
  ((struct{int first; char second;}*)out)->second = d;
}

int main() {
  struct{int first; char second;} pattern;
  myFunc(&pattern, 2, 3, 4, 'Z');
  printf("Pattern: %i, %c\n", pattern.first, pattern.second);
}

Look ma, no type declarations! Now you might say that writing out the entire struct definition each time is a bit unpleasant, but you could always define a struct and use it instead. I wanted to show that you don't really need to declare a type for each function, which could create a bit of a mess if you start using multi-return functions everywhere. With the above pattern, you could also start to play some interesting games by having functions that actually return different structs in different situations (provided the out pointer's reserved space is large enough for the data you want to send back). If I've lost you by now, I do apologize.

For added effect, note that the anonymous structs don't need to match, you just need to make sure that the shape of the structure is the same so that you don't overwrite data. I could have written myFunc like this:

void myFunc(void* out, int a, int b, int c, char d) {
  ((struct{int first;}*)out)->first = (a+b)*c;
  ((struct{int x; char second;}*)out)->second = d;
}

Or if you want to go even further, like this:

void myFunc(void* out, int a, int b, int c, char d) {
  *((int*)out) = (a+b)*c;
  ((struct{int x; char second;}*)out)->second = d;
}

Ah, the joys of programming. It's little games like this that make programming lots of fun. It's like working on a big wide open puzzle that you get to build yourself. No wonder I'm spoiled.

Saturday, October 27, 2007

Hey look, a simple web server

I think I'm done writing my web server. I have gotten it to do what I want, namely this. This server will:

Run just about anywhere. I sometimes run it off of a USB pen drive.
Send all traffic over an HTTPS connection.
Handle user authentication and permissions. You can only read and write where you have permissions.
Allow you to store data in a remote location. Just POST to a URL to store something at that location, GET to retrieve it.

There really wasn't much to it. I was able to write this quickly and there wasn't that much code. The main thing it is lacking (in my opinion) is speed. This could be fixed by making it multithreaded and adding some caching since right now it always reads from the disk. Without further ado, here's the code. I also wrote a Python client and a JavaScript client to go with it will be coming soon. If you found this to be useful, please let me know.

I've also been looking at CherryPy as a framework to create the same type of portable, simple, and secure web server. As usual, stay tuned for details.

Thursday, October 18, 2007

Design for the simple secure storage server

In my last post, I mentioned my motivation for writing this server and pointed to the foundation I'm building on. Now it's time for more detail.

This server may not be supper fast (single thread execution) and it may not be super secure (user data stored in plaintext on the server) but it will be super easy to set up.

Allow me to clarify the security point in the above summary. In the initial version of this server, all traffic will be sent over an SSL connection (HTTPS) and users will authenticate with the server using Basic Auth. In Basic Auth, the users password is sent to the server in plaintext. There are better authentication schemes out there, but for this version of the server, I'm going for quick and simple. Basic Auth is just barely acceptable for this project because the connection is secure, but the server will likely store these passwords in plaintext as well (for now) so server disk security may be the weak link. With that said, the idea for this server is to provide a simple and portable back-end for my AJAX applications.

Thursday, October 11, 2007

A simple HTTPS server

Recently, I've been working on an AJAX application in my spare time and there's something I could really use: a simple network data store.

A JavaScript application isn't very useful without some persistent data. However this usually requires running a web server. My original idea was to distribute the application as a file which is loaded from the local disk. At this point you may be saying, "Wait a minute JavaScript running in a browser can't access the local disk." But scripts can read and write to the local disk if they are loaded from disk instead of the Internet. See TiddlyWiki for a great example of a useful application that uses this design. The problem though, is what happens when you want to sync the data from the AJAX application across multiple computers. Well, once again, it looks like I need a web server after all.

So I set out to build a simple server. All it really needs to do is allow applications to store and retrieve data. To make sure that the data remains a secret, the traffic will be sent over an HTTPS connection. Access to certain directories and files on the server will be granted only to select users, so usernames and passwords are required too. Since I've been working with Python recently, I tried to see if it was possible to create a simple HTTPS server which could handle GET and POST requests and perform dynamic behavior. I found a great example on activestate.com which uses an open SSL .pem file. The instructions in the article made setting up this server a breeze. I've been working on a customized version of the above example, but it isn't quite ready. As usual, stay tuned :)