Getting started: Writing your own HTTP/1.1 client

h11 can be used to implement both HTTP/1.1 clients and servers. To give a flavor for how the API works, we’ll demonstrate a small client.

HTTP basics

An HTTP interaction always starts with a client sending a request, optionally some data (e.g., a POST body); and then the server responds with a response and optionally some data (e.g. the requested document). Requests and responses have some data associated with them: for requests, this is a method (e.g. GET), a target (e.g. /index.html), and a collection of headers (e.g. User-agent: demo-clent). For responses, it’s a status code (e.g. 404 Not Found) and a collection of headers.

Of course, as far as the network is concerned, there’s no such thing as “requests” and “responses” – there’s just bytes being sent from one computer to another. Let’s see what this looks like, by fetching https://httpbin.org/xml:

In [1]: import ssl, socket

In [2]: ctx = ssl.create_default_context()

In [3]: sock = ctx.wrap_socket(socket.create_connection(("httpbin.org", 443)),
   ...:                        server_hostname="httpbin.org")
   ...: 

# Send request
In [4]: sock.sendall(b"GET /xml HTTP/1.1\r\nhost: httpbin.org\r\n\r\n")
Out[4]: 40

# Read response
In [5]: response_data = sock.recv(1024)

# Let's see what we got!
In [6]: print(response_data)
b'HTTP/1.1 200 OK\r\nServer: nginx\r\nDate: Sat, 14 May 2016 01:32:04 GMT\r\nContent-Type: application/xml\r\nContent-Length: 522\r\nConnection: keep-alive\r\nAccess-Control-Allow-Origin: *\r\nAccess-Control-Allow-Credentials: true\r\n\r\n<?xml version=\'1.0\' encoding=\'us-ascii\'?>\n\n<!--  A SAMPLE set of slides  -->\n\n<slideshow \n    title="Sample Slide Show"\n    date="Date of publication"\n    author="Yours Truly"\n    >\n\n    <!-- TITLE SLIDE -->\n    <slide type="all">\n      <title>Wake up to WonderWidgets!</title>\n    </slide>\n\n    <!-- OVERVIEW -->\n    <slide type="all">\n        <title>Overview</title>\n        <item>Why <em>WonderWidgets</em> are great</item>\n        <item/>\n        <item>Who <em>buys</em> WonderWidgets</item>\n    </slide>\n\n</slideshow>'

So that’s, uh, very convenient and readable. It’s a little more understandable if we print the bytes as text:

In [7]: print(response_data.decode("ascii"))
HTTP/1.1 200 OK
Server: nginx
Date: Sat, 14 May 2016 01:32:04 GMT
Content-Type: application/xml
Content-Length: 522
Connection: keep-alive
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true

<?xml version='1.0' encoding='us-ascii'?>

<!--  A SAMPLE set of slides  -->

<slideshow 
    title="Sample Slide Show"
    date="Date of publication"
    author="Yours Truly"
    >

    <!-- TITLE SLIDE -->
    <slide type="all">
      <title>Wake up to WonderWidgets!</title>
    </slide>

    <!-- OVERVIEW -->
    <slide type="all">
        <title>Overview</title>
        <item>Why <em>WonderWidgets</em> are great</item>
        <item/>
        <item>Who <em>buys</em> WonderWidgets</item>
    </slide>

</slideshow>

Here we can see the status code at the top (200, which is the code for “OK”), followed by the headers, followed by the data (a silly little XML document). But we can already see that working with bytes by hand like this is really cumbersome. What we need to do is to move up to a higher level of abstraction.

This is what h11 does. Instead of talking in bytes, it lets you talk in high-level HTTP “events”. To see what this means, let’s repeat the above exercise, but using h11. We start by making a TLS connection like before, but now we’ll also import h11, and create a h11.Connection object:

In [8]: import ssl, socket

In [9]: import h11

In [10]: ctx = ssl.create_default_context()

In [11]: sock = ctx.wrap_socket(socket.create_connection(("httpbin.org", 443)),
   ....:                        server_hostname="httpbin.org")
   ....: 

In [12]: conn = h11.Connection(our_role=h11.CLIENT)

Next, to send an event to the server, there are three steps we have to take. First, we create an object representing the event we want to send – in this case, a h11.Request:

In [13]: request = h11.Request(method="GET",
   ....:                       target="/xml",
   ....:                       headers=[("Host", "httpbin.org")])
   ....: 

Next, we pass this to our connection’s send() method, which gives us back the bytes corresponding to this message:

In [14]: bytes_to_send = conn.send(request)

And then we send these bytes across the network:

In [15]: sock.sendall(bytes_to_send)
Out[15]: 40

There’s nothing magical here – these are the same bytes that we sent up above:

In [16]: bytes_to_send
Out[16]: b'GET /xml HTTP/1.1\r\nhost: httpbin.org\r\n\r\n'

Why doesn’t h11 go ahead and send the bytes for you? Because it’s designed to be usable no matter what socket API you’re using – doesn’t matter if it’s synchronous like this, asynchronous, callback-based, whatever; if you can read and write bytes from the network, then you can use h11.

In this case, we’re not quite done yet – we have to send another event to tell the other side that we’re finished, which we do by sending an EndOfMessage event:

In [17]: end_of_message_bytes_to_send = conn.send(h11.EndOfMessage())

In [18]: sock.sendall(end_of_message_bytes_to_send)
Out[18]: 0

Of course, it turns out that in this case, the HTTP/1.1 specification tells us that any request that doesn’t contain either a Content-Length or Transfer-Encoding header automatically has a 0 length body, and h11 knows that, and h11 knows that the server knows that, so it actually encoded the EndOfMessage event as the empty string:

In [19]: end_of_message_bytes_to_send
Out[19]: b''

But there are other cases where it might not, depending on what headers are set, what message is being responded to, the HTTP version of the remote peer, etc. etc. So for consistency, h11 requires that you always finish your messages by sending an explicit EndOfMessage event; then it keeps track of the details of what that actually means in any given situation, so that you don’t have to.

Finally, we have to read the server’s reply. By now you can probably guess how this is done: we read some bytes from the network, then we hand them to Connection.receive_data() and it gives us back high-level events from the server.

In [20]: bytes_received = sock.recv(1024)

In [21]: events_received = conn.receive_data(bytes_received)

In [22]: events_received
Out[22]: 
[Response(status_code=200, headers=[(b'server', b'nginx'), (b'date', b'Sat, 14 May 2016 01:32:05 GMT'), (b'content-type', b'application/xml'), (b'content-length', b'522'), (b'connection', b'keep-alive'), (b'access-control-allow-origin', b'*'), (b'access-control-allow-credentials', b'true')], http_version=b'1.1'),
 Data(data=bytearray(b'<?xml version=\'1.0\' encoding=\'us-ascii\'?>\n\n<!--  A SAMPLE set of slides  -->\n\n<slideshow \n    title="Sample Slide Show"\n    date="Date of publication"\n    author="Yours Truly"\n    >\n\n    <!-- TITLE SLIDE -->\n    <slide type="all">\n      <title>Wake up to WonderWidgets!</title>\n    </slide>\n\n    <!-- OVERVIEW -->\n    <slide type="all">\n        <title>Overview</title>\n        <item>Why <em>WonderWidgets</em> are great</item>\n        <item/>\n        <item>Who <em>buys</em> WonderWidgets</item>\n    </slide>\n\n</slideshow>')),
 EndOfMessage(headers=[])]

Here the server sent us three events: a Response object, which is similar to the Request object that we created earlier and has the response’s status code (200 OK) and headers; a Data object containing the response data; and another EndOfMessage object. This similarity between what we send and what we receive isn’t accidental: if we were using h11 to write an HTTP server, then these are the objects we would have created and passed to send() – h11 in client and server mode has an API that’s almost exactly symmetric.

A basic client object

To make this a little more convenient to play with, we can wrap up our socket and Connection into a single object with some convenience methods:

import socket, ssl
import h11

class MyHttpClient:
    def __init__(self, host, port):
        self.sock = socket.create_connection((host, port))
        if port == 443:
            self.sock = ssl.wrap_socket(self.sock)
        self.conn = h11.Connection(our_role=h11.CLIENT)

    def send(self, *events):
        for event in events:
            data = self.conn.send(event)
            if data is None:
                # event was a ConnectionClosed(), meaning that we won't be
                # sending any more data:
                self.sock.shutdown(socket.SHUT_WR)
            else:
                self.sock.sendall(data)

    # max_bytes set intentionally small for pedagogical purposes
    def receive(self, max_bytes=200):
        return self.conn.receive_data(self.sock.recv(max_bytes))

And then we can send requests:

In [23]: client = MyHttpClient("httpbin.org", 443)

In [24]: client.send(h11.Request(method="GET", target="/xml",
   ....:                         headers=[("Host", "httpbin.org")]),
   ....:             h11.EndOfMessage())
   ....: 

And read back the events:

In [25]: client.receive()
Out[25]: []

What happened here? We only read a max of 200 bytes from the socket (see max_bytes= above), and it turns out that this wasn’t enough to form a complete event. This happens all the time in real life, due to slow networks or whatever – data trickles in at its own pace. When this happens, h11 buffers the unprocessed data internally, and if you keep reading then eventually you’ll get a complete event:

In [26]: client.receive()
Out[26]: 
[Response(status_code=200, headers=[(b'server', b'nginx'), (b'date', b'Sat, 14 May 2016 01:32:05 GMT'), (b'content-type', b'application/xml'), (b'content-length', b'522'), (b'connection', b'keep-alive'), (b'access-control-allow-origin', b'*'), (b'access-control-allow-credentials', b'true')], http_version=b'1.1'),
 Data(data=bytearray(b'<?xml version=\'1.0\' encoding=\'us-ascii\'?>\n\n<!--  A SAMPLE set of slides  -->\n\n<slideshow \n    title="Sample Slide Show"\n    date="Date of publication"\n    author="Yours Truly"\n    >'))]

Note here that we received a Data event that only has part of the response body – h11 streams out data as it arrives, which might mean that you receive multiple Data events. (Of course, if you’re the one sending data, you can do the same thing: instead of buffering all your data in one giant Data event, you can send multiple Data events yourself to stream the data out incrementally; just make sure that you set the appropriate Content-Length / Transfer-Encoding headers.) If we keep reading, we’ll see more Data events, and then eventually the EndOfMessage:

In [27]: client.receive()
Out[27]: [Data(data=bytearray(b'\n\n    <!-- TITLE SLIDE -->\n    <slide type="all">\n      <title>Wake up to WonderWidgets!</title>\n    </slide>\n\n    <!-- OVERVIEW -->\n    <slide type="all">\n        <title>Overview</title>\n        <ite'))]

In [28]: client.receive()
Out[28]: 
[Data(data=bytearray(b'm>Why <em>WonderWidgets</em> are great</item>\n        <item/>\n        <item>Who <em>buys</em> WonderWidgets</item>\n    </slide>\n\n</slideshow>')),
 EndOfMessage(headers=[])]

Now we can see why EndOfMessage is so important – otherwise, we can’t tell when we’ve received the end of the data. And since that’s the end of this response, the server won’t send us anything more until we make another request – if we try, then the socket read will just hang forever, unless we set a timeout or interrupt it:

In [29]: client.sock.settimeout(2)

In [30]: client.receive()

timeoutTraceback (most recent call last)
<ipython-input-30-4394c01d9ea0> in <module>()
----> 1 client.receive()

<string> in receive(self, max_bytes)

/usr/lib/python3.4/ssl.py in recv(self, buflen, flags)
    752                     "non-zero flags not allowed in calls to recv() on %s" %
    753                     self.__class__)
--> 754             return self.read(buflen)
    755         else:
    756             return socket.recv(self, buflen, flags)

/usr/lib/python3.4/ssl.py in read(self, len, buffer)
    641                 v = self._sslobj.read(len, buffer)
    642             else:
--> 643                 v = self._sslobj.read(len or 1024)
    644             return v
    645         except SSLError as x:

timeout: The read operation timed out

Keep-alive

For some servers, we’d have to stop here, because they require a new connection for every request/response. But, this server is smarter than that – it supports keep-alive, so we can re-use this connection to send another request. There’s a few ways we can tell. First, if it didn’t, then it would have closed the connection already, and we would have gotten a ConnectionClosed event on our last call to receive(). We can also tell by checking h11’s internal idea of what state the two sides of the conversation are in:

In [31]: client.conn.our_state, client.conn.their_state
Out[31]: (DONE, DONE)

If the server didn’t support keep-alive, then these would be MUST_CLOSE and either MUST_CLOSE or CLOSED, respectively (depending on whether we’d seen the socket actually close yet). DONE / DONE, on the other hand, means that this request/response cycle has totally finished, but the connection itself is still viable, and we can start over and send a new request on this same connection.

To do this, we tell h11 to get ready (this is needed as a safety measure to make sure different requests/responses on the same connection don’t get accidentally mixed up):

In [32]: client.conn.prepare_to_reuse()

This resets both sides back to their initial IDLE state, allowing us to send another Request:

In [33]: client.conn.our_state, client.conn.their_state
Out[33]: (IDLE, IDLE)

In [34]: client.send(h11.Request(method="GET", target="/get",
   ....:                         headers=[("Host", "httpbin.org")]),
   ....:             h11.EndOfMessage())
   ....: 

In [35]: client.receive(max_bytes=4096)
Out[35]: 
[Response(status_code=200, headers=[(b'server', b'nginx'), (b'date', b'Sat, 14 May 2016 01:32:07 GMT'), (b'content-type', b'application/json'), (b'content-length', b'132'), (b'connection', b'keep-alive'), (b'access-control-allow-origin', b'*'), (b'access-control-allow-credentials', b'true')], http_version=b'1.1'),
 Data(data=bytearray(b'{\n  "args": {}, \n  "headers": {\n    "Host": "httpbin.org"\n  }, \n  "origin": "104.239.163.78", \n  "url": "https://httpbin.org/get"\n}\n')),
 EndOfMessage(headers=[])]

What’s next?

Here’s some ideas of things you might try:

  • Adapt the above examples to make a POST request. (Don’t forget to set the Content-Length header – but don’t worry, if you do forget, then h11 will give you an error when you try to send data):

    client.send(h11.Request(method="POST", target="/post",
                            headers=[("Host", "httpbin.org"),
                                     ("Content-Length", "10")]),
                h11.Data(data=b"1234567890"),
                h11.EndOfMessage())
    client.receive(max_bytes=4096)
    
  • Experiment with what happens if you try to violate the HTTP protocol by sending a Response as a client, or sending two Requests in a row.

  • Write your own basic http_get function that takes a URL, parses out the host/port/path, then connects to the server, does a GET request, and then collects up all the resulting Data objects, concatenates their payloads, and returns it.

  • Adapt the above code to use your favorite non-blocking API

  • Use h11 to write a simple HTTP server. (If you get stuck, there’s an example in the test suite.)

And of course, you’ll want to read the API documentation for all the details.