h11: A pure-Python HTTP/1.1 protocol library¶
h11 is an HTTP/1.1 protocol library written in Python, heavily inspired by hyper-h2.
h11’s goal is to be a simple, robust, complete, and non-hacky implementation of the first “chapter” of the HTTP/1.1 spec: RFC 7230: HTTP/1.1 Message Syntax and Routing. That is, it mostly focuses on implementing HTTP at the level of taking bytes on and off the wire, and the headers related to that, and tries to be picky about spec conformance when possible. It doesn’t know about higher-level concerns like URL routing, conditional GETs, cross-origin cookie policies, or content negotiation. But it does know how to take care of framing, cross-version differences in keep-alive handling, and the “obsolete line folding” rule, and to use bounded time and space to process even pathological / malicious input, so that you can focus your energies on the hard / interesting parts for your application. And it tries to support the full specification in the sense that any useful HTTP/1.1 conformant application should be able to use h11.
This is a “bring-your-own-I/O” protocol library; like h2, it contains no I/O code whatsoever. This means you can hook h11 up to your favorite network API, and that could be anything you want: synchronous, threaded, asynchronous, or your own implementation of RFC 6214 – h11 won’t judge you. This is h11’s main feature compared to the current state of the art, where every HTTP library is tightly bound to a particular network framework, and every time a new network API comes along then someone has to start over reimplementing the entire HTTP stack from scratch. We highly recommend Cory Benfield’s excellent blog post about the advantages of this approach.
This also means that h11 is not immediately useful out of the box:
it’s a toolkit for building programs that speak HTTP, not something
that could directly replace requests
or twisted.web
or
whatever. But h11 makes it much easier to implement something like
requests
or twisted.web
.
Vital statistics¶
- Requirements: Python 2.7 or Python 3.3+, including PyPy
- Install:
pip install h11
- Sources and bug tracker: https://github.com/njsmith/h11
- Docs: https://h11.readthedocs.io
- License: MIT
- Code of conduct: Contributors are requested to follow our code of conduct in all project spaces.
Contents¶
Getting started: Writing your own HTTP/1.1 client¶
h11 can be used to implement both HTTP/1.1 clients and servers. To give a flavor for how the API works, we’ll demonstrate a small client.
HTTP basics¶
An HTTP interaction always starts with a client sending a request,
optionally some data (e.g., a POST body); and then the server
responds with a response and optionally some data (e.g. the
requested document). Requests and responses have some data associated
with them: for requests, this is a method (e.g. GET
), a target
(e.g. /index.html
), and a collection of headers
(e.g. User-agent: demo-clent
). For responses, it’s a status code
(e.g. 404 Not Found) and a collection of headers.
Of course, as far as the network is concerned, there’s no such thing as “requests” and “responses” – there’s just bytes being sent from one computer to another. Let’s see what this looks like, by fetching https://httpbin.org/xml:
In [1]: import ssl, socket
In [2]: ctx = ssl.create_default_context()
In [3]: sock = ctx.wrap_socket(socket.create_connection(("httpbin.org", 443)),
...: server_hostname="httpbin.org")
...:
# Send request
In [4]: sock.sendall(b"GET /xml HTTP/1.1\r\nhost: httpbin.org\r\n\r\n")
Out[4]: 40
# Read response
In [5]: response_data = sock.recv(1024)
# Let's see what we got!
In [6]: print(response_data)
b'HTTP/1.1 200 OK\r\nServer: nginx\r\nDate: Sat, 26 Nov 2016 05:20:23 GMT\r\nContent-Type: application/xml\r\nContent-Length: 522\r\nConnection: keep-alive\r\nAccess-Control-Allow-Origin: *\r\nAccess-Control-Allow-Credentials: true\r\n\r\n<?xml version=\'1.0\' encoding=\'us-ascii\'?>\n\n<!-- A SAMPLE set of slides -->\n\n<slideshow \n title="Sample Slide Show"\n date="Date of publication"\n author="Yours Truly"\n >\n\n <!-- TITLE SLIDE -->\n <slide type="all">\n <title>Wake up to WonderWidgets!</title>\n </slide>\n\n <!-- OVERVIEW -->\n <slide type="all">\n <title>Overview</title>\n <item>Why <em>WonderWidgets</em> are great</item>\n <item/>\n <item>Who <em>buys</em> WonderWidgets</item>\n </slide>\n\n</slideshow>'
So that’s, uh, very convenient and readable. It’s a little more understandable if we print the bytes as text:
In [7]: print(response_data.decode("ascii"))
HTTP/1.1 200 OK
Server: nginx
Date: Sat, 26 Nov 2016 05:20:23 GMT
Content-Type: application/xml
Content-Length: 522
Connection: keep-alive
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
<?xml version='1.0' encoding='us-ascii'?>
<!-- A SAMPLE set of slides -->
<slideshow
title="Sample Slide Show"
date="Date of publication"
author="Yours Truly"
>
<!-- TITLE SLIDE -->
<slide type="all">
<title>Wake up to WonderWidgets!</title>
</slide>
<!-- OVERVIEW -->
<slide type="all">
<title>Overview</title>
<item>Why <em>WonderWidgets</em> are great</item>
<item/>
<item>Who <em>buys</em> WonderWidgets</item>
</slide>
</slideshow>
Here we can see the status code at the top (200, which is the code for “OK”), followed by the headers, followed by the data (a silly little XML document). But we can already see that working with bytes by hand like this is really cumbersome. What we need to do is to move up to a higher level of abstraction.
This is what h11 does. Instead of talking in bytes, it lets you talk
in high-level HTTP “events”. To see what this means, let’s repeat the
above exercise, but using h11. We start by making a TLS connection
like before, but now we’ll also import h11
, and create a
h11.Connection
object:
In [8]: import ssl, socket
In [9]: import h11
In [10]: ctx = ssl.create_default_context()
In [11]: sock = ctx.wrap_socket(socket.create_connection(("httpbin.org", 443)),
....: server_hostname="httpbin.org")
....:
In [12]: conn = h11.Connection(our_role=h11.CLIENT)
Next, to send an event to the server, there are three steps we have to
take. First, we create an object representing the event we want to
send – in this case, a h11.Request
:
In [13]: request = h11.Request(method="GET",
....: target="/xml",
....: headers=[("Host", "httpbin.org")])
....:
Next, we pass this to our connection’s send()
method, which gives us back the bytes corresponding to this message:
In [14]: bytes_to_send = conn.send(request)
And then we send these bytes across the network:
In [15]: sock.sendall(bytes_to_send)
Out[15]: 40
There’s nothing magical here – these are the same bytes that we sent up above:
In [16]: bytes_to_send
Out[16]: b'GET /xml HTTP/1.1\r\nhost: httpbin.org\r\n\r\n'
Why doesn’t h11 go ahead and send the bytes for you? Because it’s designed to be usable no matter what socket API you’re using – doesn’t matter if it’s synchronous like this, asynchronous, callback-based, whatever; if you can read and write bytes from the network, then you can use h11.
In this case, we’re not quite done yet – we have to send another
event to tell the other side that we’re finished, which we do by
sending an EndOfMessage
event:
In [17]: end_of_message_bytes_to_send = conn.send(h11.EndOfMessage())
In [18]: sock.sendall(end_of_message_bytes_to_send)
Out[18]: 0
Of course, it turns out that in this case, the HTTP/1.1 specification
tells us that any request that doesn’t contain either a
Content-Length
or Transfer-Encoding
header automatically has a
0 length body, and h11 knows that, and h11 knows that the server knows
that, so it actually encoded the EndOfMessage
event as the
empty string:
In [19]: end_of_message_bytes_to_send
Out[19]: b''
But there are other cases where it might not, depending on what
headers are set, what message is being responded to, the HTTP version
of the remote peer, etc. etc. So for consistency, h11 requires that
you always finish your messages by sending an explicit
EndOfMessage
event; then it keeps track of the details of
what that actually means in any given situation, so that you don’t
have to.
Finally, we have to read the server’s reply. By now you can probably
guess how this is done, at least in the general outline: we read some
bytes from the network, then we hand them to the connection (using
Connection.receive_data()
) and it converts them into events
(using Connection.next_event()
).
In [20]: bytes_received = sock.recv(1024)
In [21]: conn.receive_data(bytes_received)
In [22]: conn.next_event()
Out[22]: Response(status_code=200, headers=[(b'server', b'nginx'), (b'date', b'Sat, 26 Nov 2016 05:20:23 GMT'), (b'content-type', b'application/xml'), (b'content-length', b'522'), (b'connection', b'keep-alive'), (b'access-control-allow-origin', b'*'), (b'access-control-allow-credentials', b'true')], http_version=b'1.1', reason=b'OK')
In [23]: conn.next_event()