Getting started: Writing your own HTTP/1.1 client¶
h11 can be used to implement both HTTP/1.1 clients and servers. To give a flavor for how the API works, we’ll demonstrate a small client.
HTTP basics¶
An HTTP interaction always starts with a client sending a request,
optionally some data (e.g., a POST body); and then the server
responds with a response and optionally some data (e.g. the
requested document). Requests and responses have some data associated
with them: for requests, this is a method (e.g. GET
), a target
(e.g. /index.html
), and a collection of headers
(e.g. User-agent: demo-clent
). For responses, it’s a status code
(e.g. 404 Not Found) and a collection of headers.
Of course, as far as the network is concerned, there’s no such thing as “requests” and “responses” – there’s just bytes being sent from one computer to another. Let’s see what this looks like, by fetching https://httpbin.org/xml:
In [1]: import ssl, socket
In [2]: ctx = ssl.create_default_context()
In [3]: sock = ctx.wrap_socket(socket.create_connection(("httpbin.org", 443)),
...: server_hostname="httpbin.org")
...:
# Send request
In [4]: sock.sendall(b"GET /xml HTTP/1.1\r\nhost: httpbin.org\r\n\r\n")
Out[4]: 40
# Read response
In [5]: response_data = sock.recv(1024)
# Let's see what we got!
In [6]: print(response_data)
b'HTTP/1.1 200 OK\r\nServer: nginx\r\nDate: Sat, 14 May 2016 01:32:04 GMT\r\nContent-Type: application/xml\r\nContent-Length: 522\r\nConnection: keep-alive\r\nAccess-Control-Allow-Origin: *\r\nAccess-Control-Allow-Credentials: true\r\n\r\n<?xml version=\'1.0\' encoding=\'us-ascii\'?>\n\n<!-- A SAMPLE set of slides -->\n\n<slideshow \n title="Sample Slide Show"\n date="Date of publication"\n author="Yours Truly"\n >\n\n <!-- TITLE SLIDE -->\n <slide type="all">\n <title>Wake up to WonderWidgets!</title>\n </slide>\n\n <!-- OVERVIEW -->\n <slide type="all">\n <title>Overview</title>\n <item>Why <em>WonderWidgets</em> are great</item>\n <item/>\n <item>Who <em>buys</em> WonderWidgets</item>\n </slide>\n\n</slideshow>'
So that’s, uh, very convenient and readable. It’s a little more understandable if we print the bytes as text:
In [7]: print(response_data.decode("ascii"))
HTTP/1.1 200 OK
Server: nginx
Date: Sat, 14 May 2016 01:32:04 GMT
Content-Type: application/xml
Content-Length: 522
Connection: keep-alive
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
<?xml version='1.0' encoding='us-ascii'?>
<!-- A SAMPLE set of slides -->
<slideshow
title="Sample Slide Show"
date="Date of publication"
author="Yours Truly"
>
<!-- TITLE SLIDE -->
<slide type="all">
<title>Wake up to WonderWidgets!</title>
</slide>
<!-- OVERVIEW -->
<slide type="all">
<title>Overview</title>
<item>Why <em>WonderWidgets</em> are great</item>
<item/>
<item>Who <em>buys</em> WonderWidgets</item>
</slide>
</slideshow>
Here we can see the status code at the top (200, which is the code for “OK”), followed by the headers, followed by the data (a silly little XML document). But we can already see that working with bytes by hand like this is really cumbersome. What we need to do is to move up to a higher level of abstraction.
This is what h11 does. Instead of talking in bytes, it lets you talk
in high-level HTTP “events”. To see what this means, let’s repeat the
above exercise, but using h11. We start by making a TLS connection
like before, but now we’ll also import h11
, and create a
h11.Connection
object:
In [8]: import ssl, socket
In [9]: import h11
In [10]: ctx = ssl.create_default_context()
In [11]: sock = ctx.wrap_socket(socket.create_connection(("httpbin.org", 443)),
....: server_hostname="httpbin.org")
....:
In [12]: conn = h11.Connection(our_role=h11.CLIENT)
Next, to send an event to the server, there are three steps we have to
take. First, we create an object representing the event we want to
send – in this case, a h11.Request
:
In [13]: request = h11.Request(method="GET",
....: target="/xml",
....: headers=[("Host", "httpbin.org")])
....:
Next, we pass this to our connection’s send()
method, which gives us back the bytes corresponding to this message:
In [14]: bytes_to_send = conn.send(request)
And then we send these bytes across the network:
In [15]: sock.sendall(bytes_to_send)
Out[15]: 40
There’s nothing magical here – these are the same bytes that we sent up above:
In [16]: bytes_to_send
Out[16]: b'GET /xml HTTP/1.1\r\nhost: httpbin.org\r\n\r\n'
Why doesn’t h11 go ahead and send the bytes for you? Because it’s designed to be usable no matter what socket API you’re using – doesn’t matter if it’s synchronous like this, asynchronous, callback-based, whatever; if you can read and write bytes from the network, then you can use h11.
In this case, we’re not quite done yet – we have to send another
event to tell the other side that we’re finished, which we do by
sending an EndOfMessage
event:
In [17]: end_of_message_bytes_to_send = conn.send(h11.EndOfMessage())
In [18]: sock.sendall(end_of_message_bytes_to_send)
Out[18]: 0
Of course, it turns out that in this case, the HTTP/1.1 specification
tells us that any request that doesn’t contain either a
Content-Length
or Transfer-Encoding
header automatically has a
0 length body, and h11 knows that, and h11 knows that the server knows
that, so it actually encoded the EndOfMessage
event as the
empty string:
In [19]: end_of_message_bytes_to_send
Out[19]: b''
But there are other cases where it might not, depending on what
headers are set, what message is being responded to, the HTTP version
of the remote peer, etc. etc. So for consistency, h11 requires that
you always finish your messages by sending an explicit
EndOfMessage
event; then it keeps track of the details of
what that actually means in any given situation, so that you don’t
have to.
Finally, we have to read the server’s reply. By now you can probably
guess how this is done: we read some bytes from the network, then we
hand them to Connection.receive_data()
and it gives us back
high-level events from the server.
In [20]: bytes_received = sock.recv(1024)
In [21]: events_received = conn.receive_data(bytes_received)
In [22]: events_received
Out[22]:
[Response(status_code=200, headers=[(b'server', b'nginx'), (b'date', b'Sat, 14 May 2016 01:32:05 GMT'), (b'content-type', b'application/xml'), (b'content-length', b'522'), (b'connection', b'keep-alive'), (b'access-control-allow-origin', b'*'), (b'access-control-allow-credentials', b'true')], http_version=b'1.1'),
Data(data=bytearray(b'<?xml version=\'1.0\' encoding=\'us-ascii\'?>\n\n<!-- A SAMPLE set of slides -->\n\n<slideshow \n title="Sample Slide Show"\n date="Date of publication"\n author="Yours Truly"\n >\n\n <!-- TITLE SLIDE -->\n <slide type="all">\n <title>Wake up to WonderWidgets!</title>\n </slide>\n\n <!-- OVERVIEW -->\n <slide type="all">\n <title>Overview</title>\n <item>Why <em>WonderWidgets</em> are great</item>\n <item/>\n <item>Who <em>buys</em> WonderWidgets</item>\n </slide>\n\n</slideshow>')),
EndOfMessage(headers=[])]
Here the server sent us three events: a Response
object,
which is similar to the Request
object that we created
earlier and has the response’s status code (200 OK) and headers; a
Data
object containing the response data; and another
EndOfMessage
object. This similarity between what we send and
what we receive isn’t accidental: if we were using h11 to write an HTTP
server, then these are the objects we would have created and passed to
send()
– h11 in client and server mode has an API
that’s almost exactly symmetric.
A basic client object¶
To make this a little more convenient to play with, we can wrap up our
socket and Connection
into a single object with some
convenience methods:
import socket, ssl
import h11
class MyHttpClient:
def __init__(self, host, port):
self.sock = socket.create_connection((host, port))
if port == 443:
self.sock = ssl.wrap_socket(self.sock)
self.conn = h11.Connection(our_role=h11.CLIENT)
def send(self, *events):
for event in events:
data = self.conn.send(event)
if data is None:
# event was a ConnectionClosed(), meaning that we won't be
# sending any more data:
self.sock.shutdown(socket.SHUT_WR)
else:
self.sock.sendall(data)
# max_bytes set intentionally small for pedagogical purposes
def receive(self, max_bytes=200):
return self.conn.receive_data(self.sock.recv(max_bytes))
And then we can send requests:
In [23]: client = MyHttpClient("httpbin.org", 443)
In [24]: client.send(h11.Request(method="GET", target="/xml",
....: headers=[("Host", "httpbin.org")]),
....: h11.EndOfMessage())
....:
And read back the events:
In [25]: client.receive()
Out[25]: []
What happened here? We only read a max of 200 bytes from the socket
(see max_bytes=
above), and it turns out that this wasn’t enough
to form a complete event. This happens all the time in real life, due
to slow networks or whatever – data trickles in at its own pace. When
this happens, h11 buffers the unprocessed data internally, and if you
keep reading then eventually you’ll get a complete event:
In [26]: client.receive()
Out[26]:
[Response(status_code=200, headers=[(b'server', b'nginx'), (b'date', b'Sat, 14 May 2016 01:32:05 GMT'), (b'content-type', b'application/xml'), (b'content-length', b'522'), (b'connection', b'keep-alive'), (b'access-control-allow-origin', b'*'), (b'access-control-allow-credentials', b'true')], http_version=b'1.1'),
Data(data=bytearray(b'<?xml version=\'1.0\' encoding=\'us-ascii\'?>\n\n<!-- A SAMPLE set of slides -->\n\n<slideshow \n title="Sample Slide Show"\n date="Date of publication"\n author="Yours Truly"\n >'))]
Note here that we received a Data
event that only has part
of the response body – h11 streams out data as it arrives, which
might mean that you receive multiple Data
events. (Of course,
if you’re the one sending data, you can do the same thing: instead of
buffering all your data in one giant Data
event, you can send
multiple Data
events yourself to stream the data out
incrementally; just make sure that you set the appropriate
Content-Length
/ Transfer-Encoding
headers.) If we keep
reading, we’ll see more Data
events, and then eventually the
EndOfMessage
:
In [27]: client.receive()
Out[27]: [Data(data=bytearray(b'\n\n <!-- TITLE SLIDE -->\n <slide type="all">\n <title>Wake up to WonderWidgets!</title>\n </slide>\n\n <!-- OVERVIEW -->\n <slide type="all">\n <title>Overview</title>\n <ite'))]
In [28]: client.receive()