Chunked encoding and python’s requests library

I’ve been investigating long polling solutions. This blog entry describes the technique I used on the client-side (I will probably change my mind a few more times before settling for a server-side implementation, and I may end up not using the code below on the client at all; but it may be useful to others who, for other obscure reasons, want to iterate over chunks as they got produced by the server).

The server produces response snippets as they become available, and sends them down the HTTP connection as chunks. In my case, the content being XML, it works well to concatenate a series of XML blobs into a large XML stream (somewhat similar to XMPP’s streams).

On the client side, I wanted a way to consume chunks as they become available. As it turns out, python’s httplib is not very generator-friendly: if the server specifies chunked encoding, the library will correctly decode chunks, but it won’t give me control to stop at chunk boundaries.

So here is a working example that handles both chunked and non-chunked responses, and exposes the data as it gets produced.

import httplib
import requests
import sys

def main():
    if len(sys.argv) != 2:
        print "Usage: %s " % sys.argv[0]
        return 1

    headers = { 'Accept-Encoding' : 'identity' }
    sess = requests.sessions.Session()
    sess.headers.update(headers)
    sess.verify = False
    sess.prefetch = False
    sess.hooks.update(response=response_hook)
    resp = sess.get(sys.argv[1])
    cb = lambda x: sys.stdout.write("Read: %s\n" % x)
    for chunk in resp.iter_chunks():
        cb(chunk)

def response_hook(response, *args, **kwargs):
    response.iter_chunks = lambda amt=None: iter_chunks(response.raw._fp, amt=amt)
    return response

def iter_chunks(response, amt=None):
    """
    A copy-paste version of httplib.HTTPConnection._read_chunked() that
    yields chunks served by the server.
    """
    if response.chunked:
        while True:
            line = response.fp.readline().strip()
            arr = line.split(';', 1)
            try:
                chunk_size = int(arr[0], 16)
            except ValueError:
                response.close()
                raise httplib.IncompleteRead(chunk_size)
            if chunk_size == 0:
                break
            value = response._safe_read(chunk_size)
            yield value
            # we read the whole chunk, get another
            response._safe_read(2)      # toss the CRLF at the end of the chunk

        # read and discard trailer up to the CRLF terminator
        ### note: we shouldn't have any trailers!
        while True:
            line = response.fp.readline()
            if not line:
                # a vanishingly small number of sites EOF without
                # sending the trailer
                break
            if line == '\r\n':
                break

        # we read everything; close the "file"
        response.close()
    else:
        # Non-chunked response. If amt is None, then just drop back to
        # response.read()
        if amt is None:
            yield response.read()
        else:
            # Yield chunks as read from the HTTP connection
            while True:
                ret = response.read(amt)
                if not ret:
                    break
                yield ret

if __name__ == '__main__':
    sys.exit(main())

Save it as test-request.py and run it against a server that produces chunks.

The Requests library does not directly allow one to do this, but it has a hook mechanism in place, thus permitting access to various entities as they get produced (in this case, the response, before it gets read).

I hope this will be useful to others too.

3 thoughts on “Chunked encoding and python’s requests library

  1. Andrew

    FYI, this errors out:

    # python test_request.py 'http://www.google.com'
    Traceback (most recent call last):
      File "test_request.py", line 79, in 
        sys.exit(main())
      File "test_request.py", line 17, in main
        hooks=dict(response=response_hook))
    TypeError: __init__() got an unexpected keyword argument 'headers'
    
  2. misa Post author

    What version of the Requests library do you have? It appears that in your case the Session object does not accept additional headers.
    I had tested this with 0.14.0.

  3. misa Post author

    Oh, version 1.1.0 has that problem. Boo for not keeping a backwards-compatible interface. I’ve updated the example, thank you for pointing this out.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>