Making multiple async HTTP requests using Tornado

Although I don't like using callbacks for writing async code, but to make multiple HTTP requests and process them asynchronously, I do.

The advantage of using callbacks in this case over coroutines is that as soon as Tornado gets a response, it calls the callback function to handle the response. So, for example, if you're making 5 HTTP requests, and Tornado gets a response for 3rd request before other requests, it will call the callback function so that the response can be handled right away. You don't have to wait for all the requests to finish.

But with coroutines, this can't be done. Since, you have to use yield statement in a coroutine to get the result of a request future, the coroutine won't move forward until that request has been fetched (i.e. until the future gets a result).

Although, you can yield a list of futures, but then again, the coroutine won't move forward until all the futures in the list get resolved.

Since Tornado 4.1, there's a class called gen.WaitIterator which can be used to circumvent this issue. You can give it a few futures to hold and it allows you to yield them as any of them get resolved. This way, you don't have to wait for all the futures to resolve.

Example using callbacks

from tornado import ioloop
from tornado.httpclient import AsyncHTTPClient

def fetch_urls():
    """Fetches urls"""

    urls = [
        'url 1',
        'url 2',
        'url 3',
        'url 4',
    ]

    http_client = AsyncHTTPClient()

    for url in urls:
        print "Fetching %s" % url
        http_client.fetch(url, callback=handle_response)


def handle_response(response):
    """Handles response"""

    # do something with the response
    if response.error:
        print error
    else:
        print response


if __name__ == '__main__':
    fetch_urls()
    ioloop.IOLoop.current().start()

But how to stop the loop?

Although, the above solution is pretty good, but there's a catch - you'd have to stop the loop manually. But how would you know if you've fetched and processed all the urls?

One solution that I can think of is you can have a global variable to store the number of urls to fetch, and another variable to store the number of responses handled. Then you can check if you've gotten the responses for all the urls by checking if number of responses handled is equal to number of urls to fetch. If yes, that means all the requests have been finished and processed and now you can stop the loop by calling ioloop.IOLoop.current().stop().

Example using coroutines

Since version 4.1, Tornado provides a [gen.WaitIterator] class to which allows you to make all the requests at once and process them as they come, without having to wait for others.

To use gen.WaitIterator, you need to create an instance of it. It takes an arbitrary number of futures as arguments. It doesn't take a list as argument, but you can use * to unpack the list. See the example:

from tornado import ioloop
from tornado.httpclient import AsyncHTTPClient

@gen.couroutine
def fetch_and_handle():
    """Fetches the urls and handles/processes the response"""

    urls = [
        'url 1',
        'url 2',
        'url 3',
        'url 4',
    ]

    http_client = AsyncHTTPClient()

    waiter = gen.WaitIterator(*[http_client.fetch(url) for url in urls])

    while not waiter.done():
        try:
            response = yield waiter.next()
        except Exception as e:
            print e
            continue

        print response.body

if __name__ == '__main__':
    loop = ioloop.IOLoop.current()
    loop.run_sync(fetch_and_handle)

The limitation of gen.WaitIterator is that you can't add more futures

No need to worry about stopping the loop

We can't run a coroutine like we run a normal function. To run a coroutine, we have used IOLoop's run_sync method. This also gives us an advantage of not worrying about stopping the loop. Tornado will automatically stop the loop when all the requests are finished and processed.