Making multiple async HTTP requests using Tornado
Although I don't like using callbacks for writing async code, but to make multiple HTTP requests and process them asynchronously, I do.
The advantage of using callbacks in this case over coroutines is that as soon as Tornado gets a response, it calls the callback function to handle the response. So, for example, if you're making 5 HTTP requests, and Tornado gets a response for 3rd request before other requests, it will call the callback function so that the response can be handled right away. You don't have to wait for all the requests to finish.
But with coroutines, this can't be done. Since, you have to use yield
statement
in a coroutine to get the result of a request future, the coroutine won't move
forward until that request has been fetched (i.e. until the future gets a result).
Although, you can yield a list of futures, but then again, the coroutine won't move forward until all the futures in the list get resolved.
Since Tornado 4.1, there's a class called gen.WaitIterator
which can be
used to circumvent this issue. You can give it a few futures to hold and it allows
you to yield them as any of them get resolved. This way, you don't have to wait
for all the futures to resolve.
Example using callbacks¶
from tornado import ioloop
from tornado.httpclient import AsyncHTTPClient
def fetch_urls():
"""Fetches urls"""
urls = [
'url 1',
'url 2',
'url 3',
'url 4',
]
http_client = AsyncHTTPClient()
for url in urls:
print "Fetching %s" % url
http_client.fetch(url, callback=handle_response)
def handle_response(response):
"""Handles response"""
# do something with the response
if response.error:
print error
else:
print response
if __name__ == '__main__':
fetch_urls()
ioloop.IOLoop.current().start()
But how to stop the loop?¶
Although, the above solution is pretty good, but there's a catch - you'd have to stop the loop manually. But how would you know if you've fetched and processed all the urls?
One solution that I can think of is you can have a global variable to store the
number of urls to fetch, and another variable to store the number of responses
handled. Then you can check if you've gotten the responses for all the urls by
checking if number of responses handled is equal to number of urls to fetch. If yes,
that means all the requests have been finished and processed and now
you can stop the loop by calling ioloop.IOLoop.current().stop()
.
Example using coroutines¶
Since version 4.1, Tornado provides a [gen.WaitIterator
] class to which allows
you to make all the requests at once and process them as they come, without
having to wait for others.
To use gen.WaitIterator
, you need to create an instance of it. It takes
an arbitrary number of futures as arguments. It doesn't take a list as argument,
but you can use *
to unpack the list. See the example:
from tornado import ioloop
from tornado.httpclient import AsyncHTTPClient
@gen.couroutine
def fetch_and_handle():
"""Fetches the urls and handles/processes the response"""
urls = [
'url 1',
'url 2',
'url 3',
'url 4',
]
http_client = AsyncHTTPClient()
waiter = gen.WaitIterator(*[http_client.fetch(url) for url in urls])
while not waiter.done():
try:
response = yield waiter.next()
except Exception as e:
print e
continue
print response.body
if __name__ == '__main__':
loop = ioloop.IOLoop.current()
loop.run_sync(fetch_and_handle)
The limitation of gen.WaitIterator
is that you can't add more futures
No need to worry about stopping the loop¶
We can't run a coroutine like we run a normal function.
To run a coroutine, we have used IOLoop's run_sync
method. This also
gives us an advantage of not worrying about stopping the loop. Tornado will
automatically stop the loop when all the requests are finished and processed.