I’ve been investigating long polling solutions. This blog entry describes the technique I used on the client-side (I will probably change my mind a few more times before settling for a server-side implementation, and I may end up not using the code below on the client at all; but it may be useful to others who, for other obscure reasons, want to iterate over chunks as they got produced by the server).
The server produces response snippets as they become available, and sends them down the HTTP connection as chunks. In my case, the content being XML, it works well to concatenate a series of XML blobs into a large XML stream (somewhat similar to XMPP’s streams).
On the client side, I wanted a way to consume chunks as they become available. As it turns out, python’s httplib is not very generator-friendly: if the server specifies chunked encoding, the library will correctly decode chunks, but it won’t give me control to stop at chunk boundaries.
So here is a working example that handles both chunked and non-chunked responses, and exposes the data as it gets produced.
import httplib
import requests
import sys
def main():
if len(sys.argv) != 2:
print "Usage: %s " % sys.argv[0]
return 1
headers = { 'Accept-Encoding' : 'identity' }
sess = requests.sessions.Session()
sess.headers.update(headers)
sess.verify = False
sess.prefetch = False
sess.hooks.update(response=response_hook)
resp = sess.get(sys.argv[1])
cb = lambda x: sys.stdout.write("Read: %s\n" % x)
for chunk in resp.iter_chunks():
cb(chunk)
def response_hook(response, *args, **kwargs):
response.iter_chunks = lambda amt=None: iter_chunks(response.raw._fp, amt=amt)
return response
def iter_chunks(response, amt=None):
"""
A copy-paste version of httplib.HTTPConnection._read_chunked() that
yields chunks served by the server.
"""
if response.chunked:
while True:
line = response.fp.readline().strip()
arr = line.split(';', 1)
try:
chunk_size = int(arr[0], 16)
except ValueError:
response.close()
raise httplib.IncompleteRead(chunk_size)
if chunk_size == 0:
break
value = response._safe_read(chunk_size)
yield value
# we read the whole chunk, get another
response._safe_read(2) # toss the CRLF at the end of the chunk
# read and discard trailer up to the CRLF terminator
### note: we shouldn't have any trailers!
while True:
line = response.fp.readline()
if not line:
# a vanishingly small number of sites EOF without
# sending the trailer
break
if line == '\r\n':
break
# we read everything; close the "file"
response.close()
else:
# Non-chunked response. If amt is None, then just drop back to
# response.read()
if amt is None:
yield response.read()
else:
# Yield chunks as read from the HTTP connection
while True:
ret = response.read(amt)
if not ret:
break
yield ret
if __name__ == '__main__':
sys.exit(main())
Save it as test-request.py and run it against a server that produces chunks.
The Requests library does not directly allow one to do this, but it has a hook mechanism in place, thus permitting access to various entities as they get produced (in this case, the response, before it gets read).
I hope this will be useful to others too.