Monday, October 13, 2008

An Open Source Python HTTP Client

At Super Happy Dev House 27, I made significant progress on an open source library for making HTTP requests in Python. For the past few years I've been working with web services and APIs (SOAP, REST (wikipedia) - specifically AtomPub, etc.) and I wanted to create an HTTP library which is simple, clean, and precise. Python has a couple of great HTTP libraries already, but one of them is a bit too low level (httplib) and the other is too high level (urllib2).

For example, in httplib you call a method to send data as if you are writing to a file (httplib uses sockets, after all). Required HTTP headers like Content-Length are not calculated for you. You'll need to handle cookies and redirects on your own. On the plus side, you get full control of what is being sent. The higher level library, urllib2, is built on top of httplib. It adds some handy abstractions, like calculating the Content-Length, but it also has some limitations. I haven't yet been able to figure out how to perform a PUT or DELETE with urllib2.

When making HTTP calls to web services, there are often a large number of HTTP headers, URL parameters, and components to the request. Making a request feels like making a function call in most HTTP libraries. In the past, I've wrapped these functions with successive layers containing more and more function parameters. For example, in a request to send a photo and metadata to PicasaWeb, you need to include an Authorization token, Content-Type specifying a MIME-multipart request and the multipart boundary, and a multipart payload consisting of the Atom XML describing the photo and the photo's binary data. If you add in the the ability to specify other headers and URL parameters, your function call might look like this:
def post_photo(url, url_parameters, escape_parameters, 
photo_mime_type, photo_file_handle,
photo_file_size, metadata_xml,
metadata_mime_type, auth_token,
additional_http_headers)
...

# Sets the request's Host, port, and uri.
# Makes the request into a MIME multipart request,
# adjusts the Content-Type and recalculates
# Content-Length.
# Sets the Authorization header
post_photo('http://picasaweb.google.com/data/'
'feed/api/user/userID/albumid/albumID', None,
False, 'image/jpeg', photo_file, photo_size,
atom_xml, 'application/atom-xml',
client_login_token, None)
To use the above, you have to gather all of the information in one place, and make the function call. There are cases where you want a design like the above.

However, more and more I think of ways the program could be more cleanly structured if this information could be compartmentalized. This new library relies on an HttpRequest object which various parts of the program modify. Once all of the modifications have been applied, the fully constructed request is passed to an HttpClient which communicates with the server using httplib or urlfetch if you happen to be on Google App Engine. (Support for more HTTP libraries is certainly possible.)

The photo posting example from above could look something like this. Keep in mind that these steps could be carried out in a different order in different segments of code.
photo_post = HttpRequest(method='POST')
# Sets the Authorization header
client_login_token.modify_request(photo_post)
# Adds to the body and calculated Content-Length,
# sets the Content-Type.
photo_post.add_body_part(atom_xml,
'application/atom+xml')
# Makes the request into a MIME multipart request,
# adjusts the Content-Type and recalculates
# Content-Length.
photo_post.add_body_part(photo_file, 'image/jpeg',
photo_size)
# Sets the request's Host, port, and uri.
parse_uri('http://picasaweb.google.com/data/'
'feed/api/user/userID/albumid/albumID'
).modify_request(photo_post)
In fact, the above code could make up the body of the post_photo function described in the first code snippet.

I created an open source project for this and other small projects called sippycode (a play on sippy cup). This is a place where code can grow up.

No comments: