Saturday, May 28, 2011

Go on App Engine Example - Part 1

The App Engine team recently announced support for Go as a runtime for use in apps. Summary up front, the App Engine SDK for the Go runtime is the easiest way I've found yet to get started with Go. As I change my code, it is recompiled in the background when I make a request to my app, so it feels very much like developing in a scripting language.

I've been excited about the Go language for some time now (specifics on why will have to wait for another post) so I was eager to try it out in one of my favorite platforms: App Engine. I wanted to start with something small, so I wrote a simplified version of a web app that I've been itching to write lately, a site for hosting plain text content. Specifically, I want something that preserves whitespace, allows me to line up columns of text, and supports non-English characters (Unicode). Those are the kinds of things I need to share and talk about code. Also there is a great deal more you can do with plain old monospaced text, maybe you'll find this useful as well.

With that objective in mind I give you the Plain Text Machine. This little app lets you enter a small amount of text, somewhere around 2,000 characters, and gives you a link that others can visit to see an HTML reproduction of your writing. I mentioned I wanted to keep this simple, so here's the odd little bit, this app doesn't store your text anywhere. The URL that is generated contains the text, hence the somewhat low limit on message length. It certainly keeps the app simple, the most complex logic is that which converts the text from the URL into HTML.

A request starts by hitting the Init function:
func init() {
http.HandleFunc("/", handle)
http.HandleFunc("/show", show)
}
The main page, at /, is just static content, we're just interested in the /show handler. It looks like this:
func show(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "text/html; charset=utf-8")
// Get the message from the URL.
PrintHtml(utf8.NewString(r.FormValue("msg")), w)
}
The above does two things, sets the content type of our response so that the browser will know it is HTML, and reads the message URL parameter from the request to convert it to HTML.

The PrintHtml method prints out some boilerplate HTML then reads the message one character at a time and converts each character to its HTML-safe equivalent. There's a tiny bit of complexity to make sure that the whitespace is preserved instead of being collapsed as would normally be done with repeated spaces in HTML. Here's the code:
func PrintHtml(text *utf8.String, out http.ResponseWriter) {
spaces := false

fmt.Fprint(out, textHeader, middle)
for i := 0; i < text.RuneCount(); i++ {
currentChar := text.At(i)

if currentChar == 32 && !spaces {
// A first space.
fmt.Fprint(out, " ")
spaces = true
} else {
if currentChar == 32 {
// Space following another space
fmt.Fprint(out, "&nbsp;")
} else if currentChar == 10 {
// Newline
fmt.Fprint(out, "<br>")
} else if currentChar == 9 {
// Tab
fmt.Fprint(out, "&nbsp;&nbsp;&nbsp; ")
} else if currentChar == 38 {
// &
fmt.Fprint(out, "&amp;")
} else if currentChar == 60 {
// <
fmt.Fprint(out, "&lt;")
} else if currentChar == 62 {
// >
fmt.Fprint(out, "&gt;")
} else if currentChar < 31 || currentChar == 128 {
// Skip control characters.
} else if currentChar < 127 {
fmt.Fprintf(out, "%c", currentChar)
} else {
fmt.Fprintf(out, "&#%d;", text.At(i))
}
spaces = false
}
}
fmt.Fprint(out, footer)
}
The textHeader, middle, and footer variables are string constants containing the wrapper HTML which gives style information.

If you're interested in the full source code for this tiny little app, you can find it in the Plain Text Machine open source project. Hopefully this example provides an easy to understand picture of what Go code for App Engine looks like.

I had quite a bit of fun putting together this app. By keeping it simple I was able to go from idea to done in less time than it took me to write this blog post. As an added bonus, having an app with no persistent storage brings up some interesting philosophical questions. For example, if a message is created but no one stores the link to it, does it still exist?

Monday, May 09, 2011

Setup OAuth2 for Google APIs

Today I'm at I/O Bootcamp and helping out with a walkthrough on how to get starting using Google APIs in a variety of languages.

One of the things I appreciate about the Google APIs is the authorization mechanism which lets me see which applications I've granted access for my data and allows me to revoke access. As an application developer, there are some things that I need to do to identify my application so that Google knows which app is requesting access so that it can show the user more information about my app. The first step, then, in writing an application that uses OAuth2 is registering your app.

You can begin the registration process on the Google API console by creating a project:





Now that you have an application, you'll need to configure it for use with OAuth2 and get the secret tokens that your application will use in its requests. For that create an OAuth2 client ID.



The most vital decision to make during the sign up flow is if your application is a "web application" or an "installed application". If you're a site accessed in a browser and you're able to send the users to a Google web page for authorization and then have the broswer redirect back to your app, then you want web application. For an installed application, the user will still need to authorize your app by visiting a web page, but once authorization is complete, the secret token will be sent to the app either using a redirect to a local running web server or by having the user copy and paste the secret into your application.

For the command line samples I've been playing with I choose installed application.

After creating the client ID you should see information something like this

Client ID:      #######.apps.googleusercontent.com
Client secret: Amzz5Yip2SJPqqq5Jx
Redirect URIs: urn:ietf:wg:oauth:2.0:oob
http://localhost


You'll need to put this information into your application so that it can use the client ID and secret when making requests to get an authorization token from the user. This can be as simple as copying and pasting these strings into your code.

The one other thing that needs to be done before you begin using OAuth2 with one of the Google APIs is to turn on the API for your application. This can be done on the developer console "Services" section.

Let's say that I wanted to access the URL Shortener API. First I would need to enable it for my application.



Then I would need to specify the URL Shortener's API scope when I request authorization from a specific user. The scopes that are requested by my app are turned into a list of APIs that the user must grant access to when they authorize my application.



The scope for an API can be found in the API documentation under authorization.

For an example that brings all of these settings together, see the urlshortener.py example:


FLOW = OAuth2WebServerFlow(
client_id='433807057907.apps.googleusercontent.com',
client_secret='jigtZpMApkRxncxikFpR+SFg',
scope='https://www.googleapis.com/auth/urlshortener',
user_agent='urlshortener-cmdline-sample/1.0')
...
credentials = run(FLOW, storage)
...
http = httplib2.Http()
http = credentials.authorize(http)


Python may not be your bag, but no worries, there are client libraries for the Google APIs in a variety of languages, and even better, there is an API Explorer that lets you try out the underlying protocol without any language specific stuff getting in the way.

For example, here is getting details and stats about a short URL:



And here is creating a new short link:





For all the details on using these APIs, take a look at the documentation. For example here are the URL shortener docs. The common first step for almost all of the Google APIs that access user information is the registration step we started with. For more details on OAuth2 with Google APIs, there is some excellent documentation here.

Tuesday, March 08, 2011

Free Verse #397

Sipping a late,
writing a free verse poem.
Hey look, a haiku!

Sunday, January 23, 2011

Mercurial in Five Commands

Completing any sort of significant programming project can be nearly impossible without version control. For a class project, I recently collaborated with a small group on a large programming assignment. To keep all of our changes in sync while working miles away from each other, we used a centralized version control system.

Since I've been working quite a bit with Mercurial lately, we went with a free private project on bitbucket. It was the first time my teammates had used Mercurial so I gave a crash course that I thought would be helpful for others as well. A distributed version control system has a large number of commands and features, but 90% of the time, you're just dealing with the basics. When working with a team you can get by with just these five Mercurial commands: clone, commit, push, pull, and update.

Setting up the Project


As I mentioned, we were using a hosted repository on bitbucket, so getting a working copy of the repository on our machines started with each of us executing hg clone. It looked a little something like this:
hg clone https://your-username@bitbucket.org/your-username/project
This will create a local repository with a copy of each file and you're now free to make changes. You can edit files locally and Mercurial will track the changes for you. If you want to add new files or remove existing files, make sure to use hg add and hg remove. (Bonus commands!). Once you're happy with your changes and you want to gather them up in a logical unit it's time for:

Local Snapshots


To save a set of changes locally, you use hg commit. Be sure to write an informative description of this change set since you and others will want to remember what this set of changes was all about. There are a few arguments to the commit command that come in handy, often I do:
hg ci -u your-username -m 'Reduces codebase entropy. All tests pass!'
For more options on the commit command and any other Mercurial command you can use hg help <command>. You can also see a diff of your current files compared to the most recent snapshot using the hg diff command. (Bonus times two!)

Pinning your changes locally is a fantastic feature of distributed version control systems. As I work I tend to take a local snapshot several times an hour. These revisions just exist in your local copy, so you don't need to worry about clashing with other people's code at this point or making sure that all changes are usable. Sometimes when I decide I've gone down the wrong path I'll take a snapshot of the ill conceived changes before I roll them back, just in case I decide later on that some of the ideas weren't so bad after all.

Once you have something that's ready for others to use, it's time to:

Share


To get your changes into the hands of others, you can push them back up to the central repository. You do this with the hg push command. Since we created our local repository using clone, your local copy knows where to send the changes. Also, if you'd like to do something different, like push your changes to a different location than where we cloned from, you can take a look at more options of the push command using hg help push.

The push command will upload all of your local commits, along with those handy descriptions, for the rest of your team to see.

Now others on your team are pushing up their changes too and it would be great to get their changes so you are all working on the same code. What's that saying, "It's better to give than to...?"

Receive


To get the changes others have posted to the repository into your local copy, you use hg pull. Running this command copies the change sets that you haven't received yet to your local repository, but it doesn't edit your files or apply the changes just yet. Since you might want to be selective about what changes you apply, Mercurial splits applying the changes into two steps. First you pull, then you use hg update. When called with no additional arguments like that, hg update applies all of the changes.

I recommend pulling and updating often. In our group, since there were just a few of us, we'd post in a chat room when we pushed changes so others would know to pull. If you push before you pull in other people's changes, you might need to use hg merge. Also, you can see all of the changes which are in your repository with the hg log command. (Triple bonus, hey a hat-trick!)

There you have it!

Want more?


I tried to keep this simple and focused, the bare minimum to work on a project with a small team. Mercurial has more to offer. I haven't touched branches yet, or looking at diffs, or creating your own local repository from scratch, or sharing your changes by running your own server locally. All of these are just single commands! For further reading take a look at Mercurial: The Definitive Guide.

There are a few services which offer hosting of Mercurial repositories. For open source, Project Hosting on Google Code is an option as well as bitbucket which I used for the first time this past week. Any other Mercurial hosting providers that you recommend?

Friday, September 10, 2010

Greekish

I've long been fascinated with other alphabets. All of these strange and unusual symbols, it's almost like a code. This love of secrets was one of the reasons that I studied Ancient Greek. After reading and writing quite a bit of Greek, reading the alphabet became second nature. I even began taking notes using the Greek alphabet but using English words (since my Greek vocabulary is sadly inadequate). Performing simple character substitution sounded like a perfect one-hour project so I whipped up a simple web page to convert English text into a Greek alphabet equivalent. I call it Greekish. For example, the phrase
So long and thanks for all the fish.
becomes
Σο λονγ ανδ θανκσ φορ αλλ θε φισh.
which would be quite confusing to a Greek speaker but perfectly natural to an English speaker who knows the Greek alphabet.

Note that some English characters do not have direct equivalents in Greek. A c would be a κ, a σ, or a χ for ch. The h is one of the more interesting stories. For a leading h before a vowel Greek uses a breathing mark. When combined with a consonant, special characters are used, like θ for th, χ for ch, and φ for ph. I chose to use φ only for the English f, not ph, since ph does not make the f sound in some English contexts. The word uphill is one example. Also, I didn't bother to handle the special case of s at the end of word, for which ς is used instead of σ.

Now that you know more than you ever wanted to about the Greek alphabet, what is a simple project that you can tackle in an hour? Don't just think of one, go do it!

Tuesday, June 22, 2010

A Simple Testing Library for C

To prepare for a recent post graduate computer science class, I wrote a small library in C which aids in the creation of lightweight, unit-test-like programs. The code can be found here, and using it looks a bit like this:
#include"asserts.h"

int main(void)
{
c7e3_assert(1 == 1, "1 should equal 1");
c7e3_assert(2 == 2, "2 should equal 2");
c7e3_report();
return 0;
}
The design follows the KISS principle and I think it is a nice fit to the simplicity of C. While there is not much to it, I wrote numerous tests using it over the past couple of months and all of that testing certainly paid off.

Friday, June 18, 2010

JavaScript Tricks to Speed up Your Site

One of the techniques which makes the web so powerful is the ability to load code, images, and other resources from all over the Internet. So often though, the process of loading these resources and ensuring that all of the required pieces are in place leads to a slow experience for visitors. With the ability to include so much code from across the web, visiting a site could potentially be like installing a new program when it comes to the amount of stuff that needs to be downloaded.

With this in mind, there are a couple of nifty tricks that can help make your app more responsive and I've written up an example site and testing server that shows some ideas for speeding up the user experience when you need to wait for the DOM to load or for additional JavaScript to be fetched and run. We'll begin with document operations.

Often the JavaScript running on a page manipulates the DOM, using document.getElementById here and document.createElement there. In order to ensure that all the pieces of the page are in place, web programmers often take advantage of the onload callback. It might be used like this
<body onload="runMyCodeNow()">
Using this technique ensures that all of the things your code might want to read and write from the page are in place. All images have been downloaded, CSS rules have been applied, the layout is all there. However, all of this comes with a cost, your code doesn't run until every last resource has been fetched and rendered. Even the little footer at the bottom of the page, for example, that your code doesn't care about.

There is a another way, we could request that resources be loaded in parallel and start executing our code before the page is fully loaded. Chances are, your code doesn't need the complete page to be loaded before it starts running, and running before onload will reduce the delay for your users. Before I dive into how this can be accomplished, lets look at an example which uses the old fashioned way.

Lets say you have a web page, a little HTML which includes five JavaScript files. One may be a library used to do animation, another one for loading the users data. In any case, all of these files need to be loaded and some of them depend on others.

The biggest bottleneck for your users is almost certainly having all of these resources load. Network latency is a killer, and something that is often overlooked during development. To create a simulated network environment which can give a more realistic (or even pessimistic) view of the cost of loading these resources, I wrote a "slow server" which can introduce a delay to the file requested. Here is the code for my testing server (designed to run on App Engine):
def FilePath(path):
"""The requested path into a local file path."""

return os.path.join(os.path.dirname(__file__), 'files', path[1:])


class SleepyRenderer(webapp.RequestHandler):
"""Serves the requested page with a client configured delay.

Delay is given as a URL parameter in hundredths of a second to delay.
For example, 200 means wait 2 seconds before responding.

Example request:
http://localhost:8080/hi.html?delay=300&contenttype=text/html
"""

def get(self):
path = self.request.path
delay = self.request.get('delay')
content_type = self.request.get('contenttype') or 'text/html'
if delay:
time.sleep(int(delay)/100)
http_status = 200
requested_file = None

try:
requested_file = open(FilePath(path))
self.response.out.write(requested_file.read())
requested_file.close()
except IOError:
http_status = 404

self.response.set_status(http_status)
self.response.headers['Content-Type'] = content_type


def main():
application = webapp.WSGIApplication([('/.*', SleepyRenderer)],
debug=True)
util.run_wsgi_app(application)


if __name__ == '__main__':
main()
With the above code we can introduce a delay on each individual file. To see this in action with our example, here is some HTML which shows a traditional approach, include script includes and an onload callback when everything has loaded.
<html>
<head>
<script src="/testa.js?delay=500&contenttype=text/javascript"></script>
<script src="/testb.js?delay=400&contenttype=text/javascript"></script>
<script>
function init() {
document.getElementById('output');
output.innerHTML = [
'a is ' + a,
'b is ' + b,
'c is ' + c,
'd is ' + d,
'e is ' + e
].join('<br>');
}
</script>
<script src="/testc.js?delay=300&contenttype=text/javascript"></script>
</head>
<body onload="init()">
<script src="/testd.js?delay=200&contenttype=text/javascript"></script>
<div id="output"></div>
<script src="/teste.js?delay=100&contenttype=text/javascript"></script>
<script src="/testa.js?delay=500&contenttype=text/javascript"></script>
</body>
</html>
With the above, the page takes several seconds to load and when the very last script has loaded, the 'output' div gets its contents. In many cases, the code really doesn't need to wait for all resources to load, only the ones that are necessary for the code to run. In this case, since the information is added to the output div, we need the output div to exist in the DOM, but we may not need the entire page to load.

If you look at this loading process in a profiler you might see something like this:Now for our first nifty trick. One way to check to see if the necessary prerequisites are present, is by polling the DOM or the JavaScript environment, to see if conditions are right for the code to run. Here is an example of how this code might be rewritten when using some polling helper functions:
    <script>
loader.whenNodePresent('output',
function() {
var output = document.getElementById('output');
loader.whenReady(function() {return window['a'];},
function() {
output.innerHTML += 'a is ' + a + '<br>';
});
loader.whenReady(function() {return window['b'];},
function() {
output.innerHTML += 'b is ' + b + '<br>';
});
loader.whenReady(function() {return window['c'];},
function() {
output.innerHTML += 'c is ' + c + '<br>';
});
loader.whenReady(function() {return window['d'];},
function() {
output.innerHTML += 'd is ' + d + '<br>';
});
loader.whenReady(function() {return window['e'];},
function() {
output.innerHTML += 'e is ' + e + '<br>';
});
})
</script>
The code to track the prerequisites and poll is quite simple:
loader.waiting = [];


loader.whenReady = function(testFunction, callback) {
if (testFunction()) {
callback();
} else {
loader.waiting.push([testFunction, callback]);
window.setTimeout(loader.checkWaiting, 200);
}
};


loader.checkWaiting = function() {
var oldWaiting = loader.waiting;
var numWaiting = oldWaiting.length;
loader.waiting = [];
for (var i = 0; i < numWaiting; i++) {
if (oldWaiting[i][0]()) {
oldWaiting[i][1]();
} else {
loader.waiting.push(oldWaiting[i]);
}
}

if (loader.waiting.length > 0) {
window.setTimeout(loader.checkWaiting, 200);
}
};


loader.whenNodePresent = function(nodeId, callback) {
loader.whenReady(function () {
return document.getElementById(nodeId);
}, callback);
};
In the above we use the whenReady function which takes a couple of functions, one to return a truthy or a falsey value, and one to call back when the first function evaluates to true. If the condition function isn't true when this first call is made, we check back every so often to see if it is ready.

With these changes, we shave several seconds off of the user perceived loading time. Specifically we no longer need to wait for the duplicate load (of the testa script) at the end of the body. The page also appears to be more responsive because the later script's messages appear just after they load but before the page is complete.

Now that we've seen a way to work around the need for an onload callback, lets look at another place we can tweak the browser's behavior to make a web page more responsive: dynamic script loading.

The most straightforward way to include new code in your page is to use a script tag, something like:
<html>
<head>
<script src="some_great_sites_javascript">
...
When the browser's JavaScript interpreter encounters this script src, it stops whatever it's doing and fetches that resource. It doesn't do any more rendering or executing of code until it's finished. This behavior varies a bit in different browsers and is likely an artifact of an old design in which this kind of single threaded behavior was the only option. Since some sites might depend on this linear behavior to get a script's dependencies all in order, this quirk might be with us for a long time. Most of this time, waiting like this is a really silly idea. How often do the scripts that you include depend on one another?

There are a few parts to this trick. The first is to not put all of script includes in the HTML, you could have JavaScript add new script elements to the page which will cause new code to be loaded as needed. In this way, you could load only the resources that are needed at the moment, perhaps some resources would not end up being requested at all. Including a new script could be done in two ways:
document.write('<script src="somefile.js"></script>');
or
var newScript = document.createElement('script');
newScript.src = 'somefile.js';
document.body.appendChild(newScript);
Each of the above is appropriate in different situations. Document write adds HTML directly into the page at the point where the page is being loaded, it should only be used for script tag inclusion if the page is not yet loaded. If that page is loaded, using document.write to add the script tag will wipe out the existing body entirely. I've seen this issue in the wild, if you assume document.write is always safe, you'll be bitten when using it after the page has loaded.

Instead you can perform a check to see if document.body exists, if it does then use document.body.appendChild. If it does not yet exist, use document.write. The code for this loader logic might look something like this:
loader.loadScript = function(url) {
if (document.body) {
var newScript = document.createElement('script');
newScript.type = 'text/javascript';
newScript.src = url;
document.body.appendChild(newScript);
} else {
document.write('<scr' + 'ipt type="text\/javascript" src="' +
url + '"><\/scr' + 'ipt>');
}
};
Now we can request that new JavaScript code be loaded on the fly and it works when the page has not yet finished loaded as well as after it has.

There is one more trick we can add to this loader. Some browsers will interpret the JavaScript in the order in which the scripts were requested, not the order in which they finished loading. That means that a fast loading script further down the list won't be run until a slower script, which appears above it, is loaded. One way we could defeat this delay, is to break the script includes out of linear execution in the JavaScript. If you use setTimeout to introduce a delay in adding the script include to the page, then the code which sets up the script requests can finish quickly and the browser will get back to the script requests later without the same linear constraints. In our code, we wrap the section of loader.loadScript in a short timeout as follows:
loader.loadScript = function(url) {
window.setTimeout(function() {
if (document.body) {
var newScript = document.createElement('script');
newScript.type = 'text/javascript';
newScript.src = url;
document.body.appendChild(newScript);
} else {
document.write('<scr' + 'ipt type="text/javascript" src="' +
url + '"><\/scr' + 'ipt>');
}
}, 1);
};
With the above changes in place, our example page from before now loads like this when profiled (note that the messages appear in the order that the scripts were loaded, we don't have to wait for everything before we edit the page):Through the course of this post, I've written a small library for using these tricks when loading JavaScript dynamically in the page as well as a server for trying it out. These are available here as open source code. There are some improvements that could be made here. Off the top of my head, the checkWaiting function could eventually time out if a condition continues to not be met. Also the loader could do more to check to see if a requested script has already been loaded. Any more ideas?