Understanding Tornado fundamentals
Tornado's documentation is very feeble, or at least I found it so. It doesn't explain certain things in depth. Being new to async programming model, I found many things quite difficult to understand. The documentation also lacks a good tutorial, like the Django's documnetation has an excellent Polls tutorial which explains almost everything that one needs to get started with Django.
I feel I should share what I learned in the hope that it might help somebody. These are the things I think if were present in the documentation, would have made my life much easier.
Can Tornado make blocking code non-blocking?¶
No, it can't. Tornado isn't magic. For example, you can't use time.sleep
in your
Tornado app and expect that Tornado will pause the code execution for one particular
client while serving other clients. Tornado is a single threaded server. And
time.sleep
blocks the thread. So, if you use time.sleep
, for example, to simulate
a slow network connection (or a database query), the whole server will block.
Consider this code:
import time
class BlockingHandler(web.RequestHandler):
def get(self):
time.sleep(10)
self.write("Hello 1")
class NonBlockingHandler(web.RequestHandler):
def get(self):
self.write("Hello 2")
The code is pretty obvious: BlockingHandler
will wait 10 seconds before returning
a response. NonBlockingHandler
will return the response as soon as it gets a request.
So, you might think that if you make a request to BlockingHandler
and then make
another request to NonBlockingHandler
, you'll get a response from NonBlockingHander
first and then 10 seconds later, you'll get a response from BlockingHandler
.
But that's not the case.
Try it yourself: make a request to BlockingHandler
from your browser. Then open a new tab
and make a request to NonBlockingHandler
. You'll notice that even though the
NonBlockingHandler
doesn't block anywhere, you're not getting a response for it.
But as soon as the 10 seconds pass, you'll get the response for both BlockingHandler
and NonBlockingHandler
. This is happening because the BlockingHandler
blocks
the event loop. Therefore, Tornado can't accept other requests as long as this
handler is running. That is why you won't get a response for NonBlockingHandler
if BlockingHandler
is already running.
This is what will happen if you try to query a database synchronously. It will block the server for all the clients and all the handlers until the query finishes.
How coroutines make writing async code easier than callbacks?¶
With callbacks, you need to write extra code that you won't even require if you use coroutines. Thus, code becomes shorter and more readable. Other than that, coroutines won't automatically make your code async. Again, they just make the code shorter and easier to maintain and debug. In smaller projects, this might not seem a big advantage, but in bigger projects, it's a godsend.
Let's say we want do the following things:
- Dowload a page asynchronously from a given url
- Do some processing on the downloaded page
First, let's see the code using callbacks:
from tornado.httpclient AsyncHttpClient
def fetch_url():
url = 'http://example.com'
http_client = AsyncHttpClient()
http_client.fetch(url, callback=process_result)
def process_result(result):
# do something with result
pass
You can see we can't process the result in the fetch_url
function (sure, you can
move the process_result
function inside fetch_url
but still it's a different
function). You need to create a separate function to process the result. A separate
function is required which will be called when the page downloads.
Now let's do the same thing using coroutines:
from tornado.http client import AsyncHttpClient
from tornado import gen
@gen.coroutine
def fetch_url_and_process_result():
url = 'http://example.com'
http_client = AsyncHttpClient()
result = yield http_client(url)
# do something with the result
pass
The code is shorter and much straight forward. The way it works is that the coroutine
pauses every time it sees the yield
statement. Then you need to call the coroutine's
next()
method to execute the yield
statement and the code after that. So, when
the page downloads, Tornado will call the next()
on fetch_url_and_process_result
and it will execute the yield
statement and the code after that. So, all the
processing can be done below the yield
statement. No need for a separate function.
More is explained below in the gen.coroutine
section.
This example is a very simple use case. Consider this: let's say you need to do many different operations on the result. If you're using callbacks, you'll need to create as many functions as many time you need to process the result.
But with coroutines, you can do all the operations in one single function (coroutine).
What are Futures?¶
A Future is a promise to return the result when it's available. For example,
if you request a web page using Tornado's httpclient.AsyncHttpClient
, it will
return a Future - an instance of Tornado's Future
class - while it waits for
the web page to load. As soon as the web page is loaded, Tornado will set the
result on the Future
instance. Then, if you yield the Future object, you will
get the result.
For the most part, you don't need to worry about Futures. But sometimes, you might need to return a Future manually from functions/coroutines. But that's a somewhat advanced usage. This example code from Tornado's GitHub repo might be helpful for that.
Summary:¶
- Yielding a Future returns it's result
How gen.coroutine
works?¶
gen.coroutine
is a decorator that you can use with your functions or coroutines
to avoid callback based programming. But its real use case is if you're calling a
function/coroutine that returns a Future, you should decorate the caller with
gen.coroutine
to make things work as expected.
See the code below:
def return_future():
"""This returns a Future object"""
f = Future()
return f
@gen.coroutine
def process_result():
future = return_future()
result = yield future
# do something with the result
Code explanation:
process_result
is a generator and we've decorated it willgen.coroutine
.gen.coroutine
automatically callsprocess_result
'snext()
method to start the generator.- Then it puts this
process_result
coroutine to a "waiting list" untilfuture
has a result. There might be some other code that's responsible for setting the result on thefuture
, I've not shown that code. So, no need to dwell into that. - When the result is set on the
future
, it calls thenext()
method ofprocess_result
again which executes theresult = yield future
statement and setsresult
variable tofuture
's result.
If you don't decorate the process_result
with gen.coroutine
you will need to
use callbacks and the code becomes a mess. Example:
def return_future(callback):
f = Future()
f.add_done_callback(callback)
return f
def get_future():
future = return_future(callback=process_result)
def process_result(future):
result = future.result()
# do something with the result
gen.coroutine
decorated functions will return a Future automatically¶
If you decorate a function/coroutine using gen.coroutine
, a Future object
will be returned automatically.
@gen.coroutine
def my_func():
pass # return nothing
x = my_func()
print type(x)
# OUTPUTS: tornado.concurrent.Future
See? Even if you don't return anything, gen.coroutine
will return a Future.
So, if you return a Future from a decorated function, gen.coroutine
will
wrap that Future object inside another Future object, which is not what you want.
Example:
@gen.coroutine
def my_func():
f = Future()
return f # never return a future from a gen.coroutine decorated function
In above code, you're returning a Future object, but that is not what you'll get.
You'll get this instead - Future(f)
- a Future within a Future. Because of this,
you'll have to yield
twice to get the result. If you're returning a Future
manually, you don't need to decorate the function.
Important:¶
If you're returning a Future manually, you don't need to decorate the function.
Calling a gen.coroutine
decorated function¶
If you're calling a gen.coroutine
decorated function/coroutine, you must decorate
the caller with gen.coroutine
too. And you must use the yield
keyword in the caller.
Example:
@gen.coroutine
def do_nothing():
pass
@gen.coroutine
def do_something():
result = yield do_nothing()
if result is not None:
print result
It makes perfect sense to use yield
keyword in the caller (above, do_something
)
because gen.coroutine
will return a Future. And to get it's result, you need to
yield it.
Summary:¶
gen.coroutine
decorated functions will automatically retrurn a Future- You don't need to use callbacks if you're using coroutines
- If a function/coroutine is calling another coroutine decorated with
gen.coroutine
, it must also be decorated withgen.coroutine
and must use theyield
keyword
What is web.asynchronous
and when to use it?¶
This decorator is meant to be used on methods of Tornado's web.RequestHandler
, namely: get
,
post
etc.
Since all Tornado handlers are synchronous, you can't perform any async operations inside them. What that means is as soon as the method ends or returns, the request is considered finished.
Look at the following code example:
class MainHandler(web.RequestHandler):
def get(self):
http = AsyncHttpClient()
http.fetch("example.com", callback=self.on_response)
def on_response(self, response):
# do something with response ...
self.write(response)
self.finish()
The above code aims to do the following things:
- Asynchronously fetch "example.com" from
get
method - Keep the client waiting unless url is fetched
- When url is fetched, call
on_response
method - Perform some operations on the response (fetched url)
- Return the response to the client
But this won't work. Because get
is synchronous by default, as soon as this method
ends, the request is considered finished. Tornado won't keep the client in the waiting
list while the url is fetched. It will just return an empty response.
This is where the web.asynchronous
decorator comes in. You decorate the get
method
with this decorator and Tornado won't terminate the request when get
method exists.
class MainHandler(web.RequestHandler):
@web.asynchronous
def get(self):
http = AsyncHttpClient()
http.fetch("example.com", callback=self.on_response)
def on_response(self, response):
# do something with response ...
self.write(response)
self.finish()
The updated code will work as expected. One thing to note here is the last line -
self.finish()
. It means you are telling Tornado to terminate the request. If
you don't write this line, Tornado will keep the request open. You only need to
use self.finish()
when you're using web.asynchronous
decorator.
Summary:¶
web.asynchronous
is used with handler methods (eg.get
,post
, etc.)- It's used to make handler methods wait and not terminate the request on exit
- You need to manually terminate the request using
self.finish()
when using this decorator- This is useful when using callbacks
That's it¶
The things that I've written about in this post were the only things that I found most confusing. The other features of Tornado are straight-forward and documentation covers them nicely.
Until next time.