Getting started: Writing your own HTTP/1.1 client¶
h11 can be used to implement both HTTP/1.1 clients and servers. To give a flavor for how the API works, we’ll demonstrate a small client.
HTTP basics¶
An HTTP interaction always starts with a client sending a request,
optionally some data (e.g., a POST body); and then the server
responds with a response and optionally some data (e.g. the
requested document). Requests and responses have some data associated
with them: for requests, this is a method (e.g. GET
), a target
(e.g. /index.html
), and a collection of headers
(e.g. User-agent: demo-clent
). For responses, it’s a status code
(e.g. 404 Not Found) and a collection of headers.
Of course, as far as the network is concerned, there’s no such thing as “requests” and “responses” – there’s just bytes being sent from one computer to another. Let’s see what this looks like, by fetching https://httpbin.org/xml:
In [1]: import ssl, socket
In [2]: ctx = ssl.create_default_context()
In [3]: sock = ctx.wrap_socket(socket.create_connection(("httpbin.org", 443)),
...: server_hostname="httpbin.org")
...:
# Send request
In [4]: sock.sendall(b"GET /xml HTTP/1.1\r\nhost: httpbin.org\r\n\r\n")
# Read response
In [5]: response_data = sock.recv(1024)
# Let's see what we got!
In [6]: print(response_data)
b'HTTP/1.1 200 OK\r\nDate: Sun, 25 Sep 2022 15:43:41 GMT\r\nContent-Type: application/xml\r\nContent-Length: 522\r\nConnection: keep-alive\r\nServer: gunicorn/19.9.0\r\nAccess-Control-Allow-Origin: *\r\nAccess-Control-Allow-Credentials: true\r\n\r\n'
Warning
If you try to reproduce these examples interactively, then you’ll
have the most luck if you paste them in all at once. Remember we’re
talking to a remote server here – if you type them in one at a
time, and you’re too slow, then the server might give up on waiting
for you and close the connection. One way to recognize that this
has happened is if response_data
comes back as an empty string,
or later on when we’re working with h11 this might cause errors
that mention ConnectionClosed
.
So that’s, uh, very convenient and readable. It’s a little more understandable if we print the bytes as text:
In [7]: print(response_data.decode("ascii"))
HTTP/1.1 200 OK
Date: Sun, 25 Sep 2022 15:43:41 GMT
Content-Type: application/xml
Content-Length: 522
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Here we can see the status code at the top (200, which is the code for “OK”), followed by the headers, followed by the data (a silly little XML document). But we can already see that working with bytes by hand like this is really cumbersome. What we need to do is to move up to a higher level of abstraction.
This is what h11 does. Instead of talking in bytes, it lets you talk
in high-level HTTP “events”. To see what this means, let’s repeat the
above exercise, but using h11. We start by making a TLS connection
like before, but now we’ll also import h11
, and create a
h11.Connection
object:
In [8]: import ssl, socket
In [9]: import h11
In [10]: ctx = ssl.create_default_context()
In [11]: sock = ctx.wrap_socket(socket.create_connection(("httpbin.org", 443)),
....: server_hostname="httpbin.org")
....:
In [12]: conn = h11.Connection(our_role=h11.CLIENT)
Next, to send an event to the server, there are three steps we have to
take. First, we create an object representing the event we want to
send – in this case, a h11.Request
:
In [13]: request = h11.Request(method="GET",
....: target="/xml",
....: headers=[("Host", "httpbin.org")])
....:
Next, we pass this to our connection’s send()
method, which gives us back the bytes corresponding to this message:
In [14]: bytes_to_send = conn.send(request)
And then we send these bytes across the network:
In [15]: sock.sendall(bytes_to_send)
There’s nothing magical here – these are the same bytes that we sent up above:
In [16]: bytes_to_send
Out[16]: b'GET /xml HTTP/1.1\r\nHost: httpbin.org\r\n\r\n'
Why doesn’t h11 go ahead and send the bytes for you? Because it’s designed to be usable no matter what socket API you’re using – doesn’t matter if it’s synchronous like this, asynchronous, callback-based, whatever; if you can read and write bytes from the network, then you can use h11.
In this case, we’re not quite done yet – we have to send another
event to tell the other side that we’re finished, which we do by
sending an EndOfMessage
event:
In [17]: end_of_message_bytes_to_send = conn.send(h11.EndOfMessage())
In [18]: sock.sendall(end_of_message_bytes_to_send)
Of course, it turns out that in this case, the HTTP/1.1 specification
tells us that any request that doesn’t contain either a
Content-Length
or Transfer-Encoding
header automatically has a
0 length body, and h11 knows that, and h11 knows that the server knows
that, so it actually encoded the EndOfMessage
event as the
empty string:
In [19]: end_of_message_bytes_to_send
Out[19]: b''
But there are other cases where it might not, depending on what
headers are set, what message is being responded to, the HTTP version
of the remote peer, etc. etc. So for consistency, h11 requires that
you always finish your messages by sending an explicit
EndOfMessage
event; then it keeps track of the details of
what that actually means in any given situation, so that you don’t
have to.
Finally, we have to read the server’s reply. By now you can probably
guess how this is done, at least in the general outline: we read some
bytes from the network, then we hand them to the connection (using
Connection.receive_data()
) and it converts them into events
(using Connection.next_event()
).
In [20]: bytes_received = sock.recv(1024)
In [21]: conn.receive_data(bytes_received)
In [22]: conn.next_event()
Out[22]: Response(headers=<Headers([(b'date', b'Sun, 25 Sep 2022 15:43:41 GMT'), (b'content-type', b'application/xml'), (b'content-length', b'522'), (b'connection', b'keep-alive'), (b'server', b'gunicorn/19.9.0'), (b'access-control-allow-origin', b'*'), (b'access-control-allow-credentials', b'true')])>, http_version=b'1.1', reason=b'OK', status_code=200)
In [23]: conn.next_event()
Out[23]: NEED_DATA
In [24]: conn.next_event()
Out[24]: NEED_DATA
(Remember, if you’re following along and get an error here mentioning
ConnectionClosed
, then try again, but going through the steps
faster!)
Here the server sent us three events: a Response
object,
which is similar to the Request
object that we created
earlier and has the response’s status code (200 OK) and headers; a
Data
object containing the response data; and another
EndOfMessage
object. This similarity between what we send and
what we receive isn’t accidental: if we were using h11 to write an HTTP
server, then these are the objects we would have created and passed to
send()
– h11 in client and server mode has an API
that’s almost exactly symmetric.
One thing we have to deal with, though, is that an entire response
doesn’t always arrive in a single call to socket.recv()
–
sometimes the network will decide to trickle it in at its own pace, in
multiple pieces. Let’s try that again:
In [25]: import ssl, socket
In [26]: import h11
In [27]: ctx = ssl.create_default_context()
In [28]: sock = ctx.wrap_socket(socket.create_connection(("httpbin.org", 443)),
....: server_hostname="httpbin.org")
....:
In [29]: conn = h11.Connection(our_role=h11.CLIENT)
In [30]: request = h11.Request(method="GET",
....: target="/xml",
....: headers=[("Host", "httpbin.org")])
....:
In [31]: sock.sendall(conn.send(request))
and this time, we’ll read in chunks of 200 bytes, to see how h11 handles it:
In [32]: bytes_received = sock.recv(200)
In [33]: conn.receive_data(bytes_received)
In [34]: conn.next_event()
Out[34]: NEED_DATA
NEED_DATA
is a special value that indicates that we, well,
need more data. h11 has buffered the first chunk of data; let’s read
some more:
In [35]: bytes_received = sock.recv(200)
In [36]: conn.receive_data(bytes_received)
In [37]: conn.next_event()
Out[37]: Response(headers=<Headers([(b'date', b'Sun, 25 Sep 2022 15:43:41 GMT'), (b'content-type', b'application/xml'), (b'content-length', b'522'), (b'connection', b'keep-alive'), (b'server', b'gunicorn/19.9.0'), (b'access-control-allow-origin', b'*'), (b'access-control-allow-credentials', b'true')])>, http_version=b'1.1', reason=b'OK', status_code=200)
Now it’s managed to read a complete Request
.
A basic client object¶
Now let’s use what we’ve learned to wrap up our socket and
Connection
into a single object with some convenience
methods:
import socket, ssl
import h11
class MyHttpClient:
def __init__(self, host, port):
self.sock = socket.create_connection((host, port))
if port == 443:
ctx = ssl.create_default_context()
self.sock = ctx.wrap_socket(self.sock, server_hostname=host)
self.conn = h11.Connection(our_role=h11.CLIENT)
def send(self, *events):
for event in events:
data = self.conn.send(event)
if data is None:
# event was a ConnectionClosed(), meaning that we won't be
# sending any more data:
self.sock.shutdown(socket.SHUT_WR)
else:
self.sock.sendall(data)
# max_bytes_per_recv intentionally set low for pedagogical purposes
def next_event(self, max_bytes_per_recv=200):
while True:
# If we already have a complete event buffered internally, just
# return that. Otherwise, read some data, add it to the internal
# buffer, and then try again.
event = self.conn.next_event()
if event is h11.NEED_DATA:
self.conn.receive_data(self.sock.recv(max_bytes_per_recv))
continue
return event
And then we can send requests:
In [38]: client = MyHttpClient("httpbin.org", 443)
In [39]: client.send(h11.Request(method="GET", target="/xml",
....: headers=[("Host", "httpbin.org")]))
....:
In [40]: client.send(h11.EndOfMessage())
And read back the events:
In [41]: client.next_event()
Out[41]: Response(headers=<Headers([(b'date', b'Sun, 25 Sep 2022 15:43:41 GMT'), (b'content-type', b'application/xml'), (b'content-length', b'522'), (b'connection', b'keep-alive'), (b'server', b'gunicorn/19.9.0'), (b'access-control-allow-origin', b'*'), (b'access-control-allow-credentials', b'true')])>, http_version=b'1.1', reason=b'OK', status_code=200)
In [42]: client.next_event()
Out[42]: Data(data=bytearray(b'<?xml version=\'1.0\' encoding=\'us-ascii\'?>\n\n<!-- A SAMPLE set of slides -->\n\n<slideshow \n title="Sample Slide Show"\n date="Date of publication"\n author="Yours Tr'), chunk_start=False, chunk_end=False)
Note here that we received a Data
event that only has part
of the response body – this is another consequence of our reading in
small chunks. h11 tries to buffer as little as it can, so it streams
out data as it arrives, which might mean that a message body might be
split up into multiple Data
events. (Of course, if you’re the
one sending data, you can do the same thing: instead of buffering all
your data in one giant Data
event, you can send multiple
Data
events yourself to stream the data out incrementally;
just make sure that you set the appropriate Content-Length
/
Transfer-Encoding
headers.) If we keep reading, we’ll see more
Data
events, and then eventually the EndOfMessage
:
In [43]: client.next_event()
Out[43]: Data(data=bytearray(b'uly"\n >\n\n <!-- TITLE SLIDE -->\n <slide type="all">\n <title>Wake up to WonderWidgets!</title>\n </slide>\n\n <!-- OVERVIEW -->\n <slide type="all">\n <title>Overview</title>\n '), chunk_start=False, chunk_end=False)
In [44]: client.next_event()
Out[44]: Data(data=bytearray(b' <item>Why <em>WonderWidgets</em> are great</item>\n <item/>\n <item>Who <em>buys</em> WonderWidgets</item>\n </slide>\n\n</slideshow>'), chunk_start=False, chunk_end=False)
In [45]: client.next_event()
Out[45]: EndOfMessage(headers=<Headers([])>)
Now we can see why EndOfMessage
is so important – otherwise,
we can’t tell when we’ve received the end of the data. And since
that’s the end of this response, the server won’t send us anything
more until we make another request – if we try, then the socket read
will just hang forever, unless we set a timeout or interrupt it:
In [46]: client.sock.settimeout(2)
In [47]: client.next_event()
---------------------------------------------------------------------------
timeout Traceback (most recent call last)
<ipython-input-47-c552b3004411> in <module>
----> 1 client.next_event()
<string> in next_event(self, max_bytes_per_recv)
~/.pyenv/versions/3.7.9/lib/python3.7/ssl.py in recv(self, buflen, flags)
1054 "non-zero flags not allowed in calls to recv() on %s" %
1055 self.__class__)
-> 1056 return self.read(buflen)
1057 else:
1058 return super().recv(buflen, flags)
~/.pyenv/versions/3.7.9/lib/python3.7/ssl.py in read(self, len, buffer)
929 return self._sslobj.read(len, buffer)
930 else:
--> 931 return self._sslobj.read(len)
932 except SSLError as x:
933 if x.args[0] == SSL_ERROR_EOF and self.suppress_ragged_eofs:
timeout: The read operation timed out
Keep-alive¶
For some servers, we’d have to stop here, because they require a new
connection for every request/response. But, this server is smarter
than that – it supports keep-alive, so we
can re-use this connection to send another request. There’s a few ways
we can tell. First, if it didn’t, then it would have closed the
connection already, and we would have gotten a
ConnectionClosed
event on our last call to
next_event()
. We can also tell by checking h11’s
internal idea of what state the two sides of the conversation are in:
In [48]: client.conn.our_state, client.conn.their_state
Out[48]: (DONE, DONE)
If the server didn’t support keep-alive, then these would be
MUST_CLOSE
and either MUST_CLOSE
or CLOSED
,
respectively (depending on whether we’d seen the socket actually close
yet). DONE
/ DONE
, on the other hand, means that this
request/response cycle has totally finished, but the connection itself
is still viable, and we can start over and send a new request on this
same connection.
To do this, we tell h11 to get ready (this is needed as a safety measure to make sure different requests/responses on the same connection don’t get accidentally mixed up):
In [49]: client.conn.start_next_cycle()
This resets both sides back to their initial IDLE
state,
allowing us to send another Request
:
In [50]: client.conn.our_state, client.conn.their_state
Out[50]: (IDLE, IDLE)
In [51]: client.send(h11.Request(method="GET", target="/get",
....: headers=[("Host", "httpbin.org")]))
....:
In [52]: client.send(h11.EndOfMessage())
In [53]: client.next_event()
Out[53]: Response(headers=<Headers([(b'date', b'Sun, 25 Sep 2022 15:43:43 GMT'), (b'content-type', b'application/json'), (b'content-length', b'198'), (b'connection', b'keep-alive'), (b'server', b'gunicorn/19.9.0'), (b'access-control-allow-origin', b'*'), (b'access-control-allow-credentials', b'true')])>, http_version=b'1.1', reason=b'OK', status_code=200)
What’s next?¶
Here’s some ideas of things you might try:
Adapt the above examples to make a POST request. (Don’t forget to set the
Content-Length
header – but don’t worry, if you do forget, then h11 will give you an error when you try to send data):client.send(h11.Request(method="POST", target="/post", headers=[("Host", "httpbin.org"), ("Content-Length", "10")])) client.send(h11.Data(data=b"1234567890")) client.send(h11.EndOfMessage())
Experiment with what happens if you try to violate the HTTP protocol by sending a
Response
as a client, or sending twoRequest
s in a row.Write your own basic
http_get
function that takes a URL, parses out the host/port/path, then connects to the server, does aGET
request, and then collects up all the resultingData
objects, concatenates their payloads, and returns it.Adapt the above code to use your favorite non-blocking API
Use h11 to write a simple HTTP server. (If you get stuck, here’s an example.)
And of course, you’ll want to read the API documentation for all the details.