Understanding Tornado fundamentals

Tornado's documentation is very feeble, or at least I found it so. It doesn't explain certain things in depth. Being new to async programming model, I found many things quite difficult to understand. The documentation also lacks a good tutorial, like the Django's documnetation has an excellent Polls tutorial which explains almost everything that one needs to get started with Django.

I feel I should share what I learned in the hope that it might help somebody. These are the things I think if were present in the documentation, would have made my life much easier.

Can Tornado make blocking code non-blocking?

No, it can't. Tornado isn't magic. For example, you can't use time.sleep in your Tornado app and expect that Tornado will pause the code execution for one particular client while serving other clients. Tornado is a single threaded server. And time.sleep blocks the thread. So, if you use time.sleep, for example, to simulate a slow network connection (or a database query), the whole server will block.

Consider this code:

import time

class BlockingHandler(web.RequestHandler):
    def get(self):
        time.sleep(10) 
        self.write("Hello 1")

class NonBlockingHandler(web.RequestHandler):
    def get(self):
        self.write("Hello 2")

The code is pretty obvious: BlockingHandler will wait 10 seconds before returning a response. NonBlockingHandler will return the response as soon as it gets a request.

So, you might think that if you make a request to BlockingHandler and then make another request to NonBlockingHandler, you'll get a response from NonBlockingHander first and then 10 seconds later, you'll get a response from BlockingHandler. But that's not the case.

Try it yourself: make a request to BlockingHandler from your browser. Then open a new tab and make a request to NonBlockingHandler. You'll notice that even though the NonBlockingHandler doesn't block anywhere, you're not getting a response for it.

But as soon as the 10 seconds pass, you'll get the response for both BlockingHandler and NonBlockingHandler. This is happening because the BlockingHandler blocks the event loop. Therefore, Tornado can't accept other requests as long as this handler is running. That is why you won't get a response for NonBlockingHandler if BlockingHandler is already running.

This is what will happen if you try to query a database synchronously. It will block the server for all the clients and all the handlers until the query finishes.

How coroutines make writing async code easier than callbacks?

With callbacks, you need to write extra code that you won't even require if you use coroutines. Thus, code becomes shorter and more readable. Other than that, coroutines won't automatically make your code async. Again, they just make the code shorter and easier to maintain and debug. In smaller projects, this might not seem a big advantage, but in bigger projects, it's a godsend.

Let's say we want do the following things:

  1. Dowload a page asynchronously from a given url
  2. Do some processing on the downloaded page

First, let's see the code using callbacks:

from tornado.httpclient AsyncHttpClient

def fetch_url():
    url = 'http://example.com'
    http_client = AsyncHttpClient()
    http_client.fetch(url, callback=process_result)

def process_result(result):
    # do something with result
    pass

You can see we can't process the result in the fetch_url function (sure, you can move the process_result function inside fetch_url but still it's a different function). You need to create a separate function to process the result. A separate function is required which will be called when the page downloads.

Now let's do the same thing using coroutines:

from tornado.http client import AsyncHttpClient
from tornado import gen

@gen.coroutine
def fetch_url_and_process_result():
    url = 'http://example.com'
    http_client = AsyncHttpClient()
    result = yield http_client(url)
    # do something with the result
    pass

The code is shorter and much straight forward. The way it works is that the coroutine pauses every time it sees the yield statement. Then you need to call the coroutine's next() method to execute the yield statement and the code after that. So, when the page downloads, Tornado will call the next() on fetch_url_and_process_result and it will execute the yield statement and the code after that. So, all the processing can be done below the yield statement. No need for a separate function. More is explained below in the gen.coroutine section.

This example is a very simple use case. Consider this: let's say you need to do many different operations on the result. If you're using callbacks, you'll need to create as many functions as many time you need to process the result.

But with coroutines, you can do all the operations in one single function (coroutine).

What are Futures?

A Future is a promise to return the result when it's available. For example, if you request a web page using Tornado's httpclient.AsyncHttpClient, it will return a Future - an instance of Tornado's Future class - while it waits for the web page to load. As soon as the web page is loaded, Tornado will set the result on the Future instance. Then, if you yield the Future object, you will get the result.

For the most part, you don't need to worry about Futures. But sometimes, you might need to return a Future manually from functions/coroutines. But that's a somewhat advanced usage. This example code from Tornado's GitHub repo might be helpful for that.

Summary:
  1. Yielding a Future returns it's result

How gen.coroutine works?

gen.coroutine is a decorator that you can use with your functions or coroutines to avoid callback based programming. But its real use case is if you're calling a function/coroutine that returns a Future, you should decorate the caller with gen.coroutine to make things work as expected.

See the code below:

def return_future():
    """This returns a Future object"""
    f = Future()
    return f

@gen.coroutine
def process_result():
    future = return_future()
    result = yield future
    # do something with the result

Code explanation:

  1. process_result is a generator and we've decorated it will gen.coroutine.
  2. gen.coroutine automatically calls process_result's next()method to start the generator.
  3. Then it puts this process_result coroutine to a "waiting list" until future has a result. There might be some other code that's responsible for setting the result on the future, I've not shown that code. So, no need to dwell into that.
  4. When the result is set on the future, it calls the next() method of process_result again which executes the result = yield future statement and sets result variable to future's result.

If you don't decorate the process_result with gen.coroutine you will need to use callbacks and the code becomes a mess. Example:

def return_future(callback):
    f = Future()
    f.add_done_callback(callback)
    return f

def get_future():
    future = return_future(callback=process_result)

def process_result(future):
    result = future.result()
    # do something with the result

gen.coroutine decorated functions will return a Future automatically

If you decorate a function/coroutine using gen.coroutine, a Future object will be returned automatically.

@gen.coroutine
def my_func():
    pass # return nothing

x = my_func()

print type(x)
# OUTPUTS:  tornado.concurrent.Future

See? Even if you don't return anything, gen.coroutine will return a Future.

So, if you return a Future from a decorated function, gen.coroutine will wrap that Future object inside another Future object, which is not what you want. Example:

@gen.coroutine
def my_func():
    f = Future()
    return f # never return a future from a gen.coroutine decorated function

In above code, you're returning a Future object, but that is not what you'll get. You'll get this instead - Future(f) - a Future within a Future. Because of this, you'll have to yield twice to get the result. If you're returning a Future manually, you don't need to decorate the function.

Important:

If you're returning a Future manually, you don't need to decorate the function.

Calling a gen.coroutine decorated function

If you're calling a gen.coroutine decorated function/coroutine, you must decorate the caller with gen.coroutine too. And you must use the yield keyword in the caller.

Example:

@gen.coroutine
def do_nothing():
    pass

@gen.coroutine
def do_something():
    result = yield do_nothing()
    if result is not None:
        print result

It makes perfect sense to use yield keyword in the caller (above, do_something) because gen.coroutine will return a Future. And to get it's result, you need to yield it.

Summary:
  1. gen.coroutine decorated functions will automatically retrurn a Future
  2. You don't need to use callbacks if you're using coroutines
  3. If a function/coroutine is calling another coroutine decorated with gen.coroutine, it must also be decorated with gen.coroutine and must use the yield keyword

What is web.asynchronous and when to use it?

This decorator is meant to be used on methods of Tornado's web.RequestHandler, namely: get, post etc.

Since all Tornado handlers are synchronous, you can't perform any async operations inside them. What that means is as soon as the method ends or returns, the request is considered finished.

Look at the following code example:

class MainHandler(web.RequestHandler):
    def get(self):
        http = AsyncHttpClient()
        http.fetch("example.com", callback=self.on_response)

    def on_response(self, response):
        # do something with response ...
        self.write(response)
        self.finish()

The above code aims to do the following things:

  1. Asynchronously fetch "example.com" from get method
  2. Keep the client waiting unless url is fetched
  3. When url is fetched, call on_response method
  4. Perform some operations on the response (fetched url)
  5. Return the response to the client

But this won't work. Because get is synchronous by default, as soon as this method ends, the request is considered finished. Tornado won't keep the client in the waiting list while the url is fetched. It will just return an empty response.

This is where the web.asynchronous decorator comes in. You decorate the get method with this decorator and Tornado won't terminate the request when get method exists.

class MainHandler(web.RequestHandler):
    @web.asynchronous
    def get(self):
        http = AsyncHttpClient()
        http.fetch("example.com", callback=self.on_response)

    def on_response(self, response):
        # do something with response ...
        self.write(response)
        self.finish()

The updated code will work as expected. One thing to note here is the last line - self.finish(). It means you are telling Tornado to terminate the request. If you don't write this line, Tornado will keep the request open. You only need to use self.finish() when you're using web.asynchronous decorator.

Summary:
  1. web.asynchronous is used with handler methods (eg. get, post, etc.)
  2. It's used to make handler methods wait and not terminate the request on exit
  3. You need to manually terminate the request using self.finish() when using this decorator
  4. This is useful when using callbacks

That's it

The things that I've written about in this post were the only things that I found most confusing. The other features of Tornado are straight-forward and documentation covers them nicely.

Until next time.