API documentation¶
Contents
h11 has a fairly small public API, with all public symbols available directly at the top level:
In [1]: import h11
In [2]: h11.<TAB>
h11.CLIENT h11.MUST_CLOSE
h11.CLOSED h11.NEED_DATA
h11.Connection h11.PAUSED
h11.ConnectionClosed h11.PRODUCT_ID
h11.Data h11.ProtocolError
h11.DONE h11.RemoteProtocolError
h11.EndOfMessage h11.Request
h11.ERROR h11.Response
h11.IDLE h11.SEND_BODY
h11.InformationalResponse h11.SEND_RESPONSE
h11.LocalProtocolError h11.SERVER
h11.MIGHT_SWITCH_PROTOCOL h11.SWITCHED_PROTOCOL
These symbols fall into three main categories: event classes, special
constants used to track different connection states, and the
Connection
class itself. We’ll describe them in that order.
Events¶
Events are the core of h11: the whole point of h11 is to let you think about HTTP transactions as being a series of events sent back and forth between a client and a server, instead of thinking in terms of bytes.
All events behave in essentially similar ways. Let’s take
Request
as an example. Like all events, this is a “final”
class – you cannot subclass it. And like all events, it has several
fields. For Request
, there are four of them:
method
, target
,
headers
, and
http_version
. http_version
defaults to b"1.1"
; the rest have no default, so to create a
Request
you have to specify their values:
In [3]: req = h11.Request(method="GET",
...: target="/",
...: headers=[("Host", "example.com")])
...:
Event constructors accept only keyword arguments, not positional arguments.
Events have a useful repr:
In [4]: req
Out[4]: Request(method=b'GET', headers=<Headers([(b'host', b'example.com')])>, target=b'/', http_version=b'1.1')
And their fields are available as regular attributes:
In [5]: req.method
Out[5]: b'GET'
In [6]: req.target
Out[6]: b'/'
In [7]: req.headers
Out[7]: <Headers([(b'host', b'example.com')])>
In [8]: req.http_version
Out[8]: b'1.1'
Notice that these attributes have been normalized to byte-strings. In
general, events normalize and validate their fields when they’re
constructed. Some of these normalizations and checks are specific to a
particular event – for example, Request
enforces RFC 7230’s
requirement that HTTP/1.1 requests must always contain a "Host"
header:
# HTTP/1.0 requests don't require a Host: header
In [9]: h11.Request(method="GET", target="/", headers=[], http_version="1.0")
Out[9]: Request(method=b'GET', headers=<Headers([])>, target=b'/', http_version=b'1.0')
# But HTTP/1.1 requests do
In [10]: h11.Request(method="GET", target="/", headers=[])
---------------------------------------------------------------------------
LocalProtocolError Traceback (most recent call last)
Cell In[10], line 1
----> 1 h11.Request(method="GET", target="/", headers=[])
File ~/checkouts/readthedocs.org/user_builds/h11/envs/latest/lib/python3.8/site-packages/h11/_events.py:117, in Request.__init__(self, method, headers, target, http_version, _parsed)
115 host_count += 1
116 if self.http_version == b"1.1" and host_count == 0:
--> 117 raise LocalProtocolError("Missing mandatory Host: header")
118 if host_count > 1:
119 raise LocalProtocolError("Found multiple Host: headers")
LocalProtocolError: Missing mandatory Host: header
This helps protect you from accidentally violating the protocol, and also helps protect you from remote peers who attempt to violate the protocol.
A few of these normalization rules are standard across multiple events, so we document them here:
headers
: In h11, headers are represented internally as a list
of (name, value) pairs, where name and value are both
byte-strings, name is always lowercase, and name and value are
both guaranteed not to have any leading or trailing whitespace. When
constructing an event, we accept any iterable of pairs like this, and
will automatically convert native strings containing ascii or
bytes-like objects to byte-strings and convert names to
lowercase:
In [11]: original_headers = [("HOST", bytearray(b"Example.Com"))]
In [12]: req = h11.Request(method="GET", target="/", headers=original_headers)
In [13]: original_headers
Out[13]: [('HOST', bytearray(b'Example.Com'))]
In [14]: req.headers
Out[14]: <Headers([(b'host', b'Example.Com')])>
If any names are detected with leading or trailing whitespace, then
this is an error (“in the past, differences in the handling of such
whitespace have led to security vulnerabilities” – RFC 7230). We also check
for certain other protocol violations, e.g. it’s always illegal to
have a newline inside a header value, and Content-Length: hello
is
an error because Content-Length should always be an integer. We may
add additional checks in the future.
While we make sure to expose header names as lowercased bytes, we also preserve the original header casing that is used. Compliant HTTP agents should always treat headers in a case insensitive manner, but this may not always be the case. When sending bytes over the wire we send headers preserving whatever original header casing was used.
It is possible to access the headers in their raw original casing, which may be useful for some user output or debugging purposes.
In [15]: original_headers = [("Host", "example.com")]
In [16]: req = h11.Request(method="GET", target="/", headers=original_headers)
In [17]: req.headers.raw_items()
Out[17]: [(b'Host', b'example.com')]
It’s not just headers we normalize to being byte-strings: the same
type-conversion logic is also applied to the Request.method
and Request.target
field, and – for consistency – all
http_version
fields. In particular, we always represent HTTP
version numbers as byte-strings like b"1.1"
. Bytes-like
objects and native strings will be automatically converted to byte
strings. Note that the HTTP standard specifically guarantees that all HTTP
version numbers will consist of exactly two digits separated by a dot,
so comparisons like req.http_version < b"1.1"
are safe and valid.
When manually constructing an event, you generally shouldn’t specify
http_version
, because it defaults to b"1.1"
, and if you
attempt to override this to some other value then
Connection.send()
will reject your event – h11 only speaks
HTTP/1.1. But it does understand other versions of HTTP, so you might
receive events with other http_version
values from remote peers.
Here’s the complete set of events supported by h11:
-
class
h11.
Request
(*, method: Union[bytes, str], headers: Union[h11._headers.Headers, List[Tuple[bytes, bytes]], List[Tuple[str, str]]], target: Union[bytes, str], http_version: Union[bytes, str] = b'1.1', _parsed: bool = False)[source]¶ The beginning of an HTTP request.
Fields:
-
method
¶ An HTTP method, e.g.
b"GET"
orb"POST"
. Always a byte string. Bytes-like objects and native strings containing only ascii characters will be automatically converted to byte strings.
-
target
¶ The target of an HTTP request, e.g.
b"/index.html"
, or one of the more exotic formats described in RFC 7320, section 5.3. Always a byte string. Bytes-like objects and native strings containing only ascii characters will be automatically converted to byte strings.
-
headers
¶ Request headers, represented as a list of (name, value) pairs. See the header normalization rules for details.
-
http_version
¶ The HTTP protocol version, represented as a byte string like
b"1.1"
. See the HTTP version normalization rules for details.
-
-
class
h11.
InformationalResponse
(*, headers: Union[h11._headers.Headers, List[Tuple[bytes, bytes]], List[Tuple[str, str]]], status_code: int, http_version: Union[bytes, str] = b'1.1', reason: Union[bytes, str] = b'', _parsed: bool = False)[source]¶ An HTTP informational response.
Fields:
-
status_code
¶ The status code of this response, as an integer. For an
InformationalResponse
, this is always in the range [100, 200).
-
headers
¶ Request headers, represented as a list of (name, value) pairs. See the header normalization rules for details.
-
http_version
¶ The HTTP protocol version, represented as a byte string like
b"1.1"
. See the HTTP version normalization rules for details.
-
reason
¶ The reason phrase of this response, as a byte string. For example:
b"OK"
, orb"Not Found"
.
-
-
class
h11.
Response
(*, headers: Union[h11._headers.Headers, List[Tuple[bytes, bytes]], List[Tuple[str, str]]], status_code: int, http_version: Union[bytes, str] = b'1.1', reason: Union[bytes, str] = b'', _parsed: bool = False)[source]¶ The beginning of an HTTP response.
Fields:
-
status_code
¶ The status code of this response, as an integer. For an
Response
, this is always in the range [200, 1000).
-
headers
¶ Request headers, represented as a list of (name, value) pairs. See the header normalization rules for details.
-
http_version
¶ The HTTP protocol version, represented as a byte string like
b"1.1"
. See the HTTP version normalization rules for details.
-
reason
¶ The reason phrase of this response, as a byte string. For example:
b"OK"
, orb"Not Found"
.
-
-
class
h11.
Data
(data: bytes, chunk_start: bool = False, chunk_end: bool = False)[source]¶ Part of an HTTP message body.
Fields:
-
data
¶ A bytes-like object containing part of a message body. Or, if using the
combine=False
argument toConnection.send()
, then any object that your socket writing code knows what to do with, and for which callinglen()
returns the number of bytes that will be written – see Support for sendfile() for details.
-
chunk_start
¶ A marker that indicates whether this data object is from the start of a chunked transfer encoding chunk. This field is ignored when when a Data event is provided to
Connection.send()
: it is only valid on events emitted fromConnection.next_event()
. You probably shouldn’t use this attribute at all; see Chunked Transfer Encoding Delimiters for details.
-
chunk_end
¶ A marker that indicates whether this data object is the last for a given chunked transfer encoding chunk. This field is ignored when when a Data event is provided to
Connection.send()
: it is only valid on events emitted fromConnection.next_event()
. You probably shouldn’t use this attribute at all; see Chunked Transfer Encoding Delimiters for details.
-
-
class
h11.
EndOfMessage
(*, headers: Optional[Union[h11._headers.Headers, List[Tuple[bytes, bytes]], List[Tuple[str, str]]]] = None, _parsed: bool = False)[source]¶ The end of an HTTP message.
Fields:
-
headers
¶ Default value:
[]
Any trailing headers attached to this message, represented as a list of (name, value) pairs. See the header normalization rules for details.
Must be empty unless
Transfer-Encoding: chunked
is in use.
-
-
class
h11.
ConnectionClosed
[source]¶ This event indicates that the sender has closed their outgoing connection.
Note that this does not necessarily mean that they can’t receive further data, because TCP connections are composed to two one-way channels which can be closed independently. See Closing connections for details.
No fields.
The state machine¶
Now that you know what the different events are, the next question is: what can you do with them?
A basic HTTP request/response cycle looks like this:
The client sends:
one
Request
event with request metadata and headers,zero or more
Data
events with the request body (if any),and an
EndOfMessage
event.
And then the server replies with:
zero or more
InformationalResponse
events,one
Response
event,zero or more
Data
events with the response body (if any),and a
EndOfMessage
event.
And once that’s finished, both sides either close the connection, or they go back to the top and re-use it for another request/response cycle.
To coordinate this interaction, the h11 Connection
object
maintains several state machines: one that tracks what the client is
doing, one that tracks what the server is doing, and a few more tiny
ones to track whether keep-alive is
enabled and whether the client has proposed to switch protocols. h11 always keeps track of all of these state
machines, regardless of whether it’s currently playing the client or
server role.
The state machines look like this (click on each to expand):
If you squint at the first two diagrams, you can see the client’s IDLE -> SEND_BODY -> DONE path and the server’s IDLE -> SEND_RESPONSE -> SEND_BODY -> DONE path, which encode the basic sequence of events we described above. But there’s a fair amount of other stuff going on here as well.
The first thing you should notice is the different colors. These correspond to the different ways that our state machines can change state.
Dark blue arcs are event-triggered transitions: if we’re in state A, and this event happens, when we switch to state B. For the client machine, these transitions always happen when the client sends an event. For the server machine, most of them involve the server sending an event, except that the server also goes from IDLE -> SEND_RESPONSE when the client sends a
Request
.Green arcs are state-triggered transitions: these are somewhat unusual, and are used to couple together the different state machines – if, at any moment, one machine is in state A and another machine is in state B, then the first machine immediately transitions to state C. For example, if the CLIENT machine is in state DONE, and the SERVER machine is in the CLOSED state, then the CLIENT machine transitions to MUST_CLOSE. And the same thing happens if the CLIENT machine is in the state DONE and the keep-alive machine is in the state disabled.
There are also two purple arcs labeled
start_next_cycle()
: these correspond to an explicit method call documented below.
Here’s why we have all the stuff in those diagrams above, beyond what’s needed to handle the basic request/response cycle:
Server sending a
Response
directly fromIDLE
: This is used for error responses, when the client’s request never arrived (e.g. 408 Request Timed Out) or was unparseable gibberish (400 Bad Request) and thus didn’t register with our state machine as a realRequest
.The transitions involving
MUST_CLOSE
andCLOSE
: keep-alive and shutdown handling; see Re-using a connection: keep-alive and pipelining and Closing connections.The transitions involving
MIGHT_SWITCH_PROTOCOL
andSWITCHED_PROTOCOL
: See Switching protocols.That weird
ERROR
state hanging out all lonely on the bottom: to avoid cluttering the diagram, we don’t draw any arcs coming into this node, but that doesn’t mean it can’t be entered. In fact, it can be entered from any state: if any exception occurs while trying to send/receive data, then the corresponding machine will transition directly to this state. Once there, though, it can never leave – that part of the diagram is accurate. See Error handling.
And finally, note that in these diagrams, all the labels that are in
italics are informal English descriptions of things that happen in
the code, while the labels in upright text correspond to actual
objects in the public API. You’ve already seen the event objects like
Request
and Response
; there are also a set of opaque
sentinel values that you can use to track and query the client and
server’s states.
Special constants¶
h11 exposes some special constants corresponding to the different states in the client and server state machines described above. The complete list is:
-
h11.
IDLE
[source]¶ -
h11.
SEND_RESPONSE
[source]¶ -
h11.
SEND_BODY
[source]¶ -
h11.
DONE
[source]¶ -
h11.
MUST_CLOSE
[source]¶ -
h11.
CLOSED
[source]¶ -
h11.
MIGHT_SWITCH_PROTOCOL
[source]¶ -
h11.
SWITCHED_PROTOCOL
[source]¶ -
h11.
ERROR
[source]¶
For example, we can see that initially the client and server start in
state IDLE
/ IDLE
:
In [18]: conn = h11.Connection(our_role=h11.CLIENT)
In [19]: conn.states
Out[19]: {CLIENT: IDLE, SERVER: IDLE}
And then if the client sends a Request
, then the client
switches to state SEND_BODY
, while the server switches to
state SEND_RESPONSE
:
In [20]: conn.send(h11.Request(method="GET", target="/", headers=[("Host", "example.com")]));
In [21]: conn.states
Out[21]: {CLIENT: SEND_BODY, SERVER: SEND_RESPONSE}
And we can test these values directly using constants like SEND_BODY
:
In [22]: conn.states[h11.CLIENT] is h11.SEND_BODY
Out[22]: True
This shows how the Connection
type tracks these state
machines and lets you query their current state.
The above also showed the special constants that can be used to indicate the two different roles that a peer can play in an HTTP connection:
And finally, there are also two special constants that can be returned
from Connection.next_event()
:
All of these behave the same, and their behavior is modeled after
None
: they’re opaque singletons, their __repr__()
is
their name, and you compare them with is
.
Finally, h11’s constants have a quirky feature that can sometimes be useful: they are instances of themselves.
In [23]: type(h11.NEED_DATA) is h11.NEED_DATA
Out[23]: True
In [24]: type(h11.PAUSED) is h11.PAUSED
Out[24]: True
The main application of this is that when handling the return value
from Connection.next_event()
, which is sometimes an instance of
an event class and sometimes NEED_DATA
or PAUSED
, you
can always call type(event)
to get something useful to dispatch
one, using e.g. a handler table, functools.singledispatch()
, or
calling getattr(some_object, "handle_" +
type(event).__name__)
. Not that this kind of dispatch-based strategy
is always the best approach – but the option is there if you want it.
The Connection object¶
-
class
h11.
Connection
(our_role: Type[h11._util.Sentinel], max_incomplete_event_size: int = 16384)[source]¶ An object encapsulating the state of an HTTP connection.
- Parameters
our_role – If you’re implementing a client, pass
h11.CLIENT
. If you’re implementing a server, passh11.SERVER
.max_incomplete_event_size (int) – The maximum number of bytes we’re willing to buffer of an incomplete event. In practice this mostly sets a limit on the maximum size of the request/response line + headers. If this is exceeded, then
next_event()
will raiseRemoteProtocolError
.
-
receive_data
(data: bytes) → None[source]¶ Add data to our internal receive buffer.
This does not actually do any processing on the data, just stores it. To trigger processing, you have to call
next_event()
.- Parameters
data (bytes-like object) –
The new data that was just received.
Special case: If data is an empty byte-string like
b""
, then this indicates that the remote side has closed the connection (end of file). Normally this is convenient, because standard Python APIs likefile.read()
orsocket.recv()
useb""
to indicate end-of-file, while other failures to read are indicated using other mechanisms like raisingTimeoutError
. When using such an API you can just blindly pass through whatever you get fromread
toreceive_data()
, and everything will work.But, if you have an API where reading an empty string is a valid non-EOF condition, then you need to be aware of this and make sure to check for such strings and avoid passing them to
receive_data()
.- Returns
Nothing, but after calling this you should call
next_event()
to parse the newly received data.- Raises
RuntimeError – Raised if you pass an empty data, indicating EOF, and then pass a non-empty data, indicating more data that somehow arrived after the EOF. (Calling
receive_data(b"")
multiple times is fine, and equivalent to calling it once.)
-
next_event
() → Union[h11._events.Event, Type[h11._connection.NEED_DATA], Type[h11._connection.PAUSED]][source]¶ Parse the next event out of our receive buffer, update our internal state, and return it.
This is a mutating operation – think of it like calling
next()
on an iterator.- Returns
One of three things:
An event object – see Events.
The special constant
NEED_DATA
, which indicates that you need to read more data from your socket and pass it toreceive_data()
before this method will be able to return any more events.The special constant
PAUSED
, which indicates that we are not in a state where we can process incoming data (usually because the peer has finished their part of the current request/response cycle, and you have not yet calledstart_next_cycle()
). See Flow control for details.
- Raises
RemoteProtocolError – The peer has misbehaved. You should close the connection (possibly after sending some kind of 4xx response).
Once this method returns
ConnectionClosed
once, then all subsequent calls will also returnConnectionClosed
.If this method raises any exception besides
RemoteProtocolError
then that’s a bug – if it happens please file a bug report!If this method raises any exception then it also sets
Connection.their_state
toERROR
– see Error handling for discussion.
-
send
(event: h11._events.ConnectionClosed) → None[source]¶ -
send
(event: Union[h11._events.Request, h11._events.InformationalResponse, h11._events.Response, h11._events.Data, h11._events.EndOfMessage]) → bytes -
send
(event: h11._events.Event) → Optional[bytes] Convert a high-level event into bytes that can be sent to the peer, while updating our internal state machine.
- Parameters
event – The event to send.
- Returns
If
type(event) is ConnectionClosed
, then returnsNone
. Otherwise, returns a bytes-like object.- Raises
LocalProtocolError – Sending this event at this time would violate our understanding of the HTTP/1.1 protocol.
If this method raises any exception then it also sets
Connection.our_state
toERROR
– see Error handling for discussion.
-
send_with_data_passthrough
(event: h11._events.Event) → Optional[List[bytes]][source]¶ Identical to
send()
, except that in situations wheresend()
returns a single bytes-like object, this instead returns a list of them – and when sending aData
event, this list is guaranteed to contain the exact object you passed in asData.data
. See Support for sendfile() for discussion.
-
send_failed
() → None[source]¶ Notify the state machine that we failed to send the data it gave us.
This causes
Connection.our_state
to immediately becomeERROR
– see Error handling for discussion.
-
start_next_cycle
() → None[source]¶ Attempt to reset our connection state for a new request/response cycle.
If both client and server are in
DONE
state, then resets them both toIDLE
state in preparation for a new request/response cycle on this same connection. Otherwise, raises aLocalProtocolError
.
-
states
¶ A dictionary like:
{CLIENT: <client state>, SERVER: <server state>}
See The state machine for details.
-
our_state
¶ The current state of whichever role we are playing. See The state machine for details.
-
their_state
¶ The current state of whichever role we are NOT playing. See The state machine for details.
-
their_http_version
¶ The version of HTTP that our peer claims to support.
None
if we haven’t yet received a request/response.This is preserved by
start_next_cycle()
, so it can be handy for a client making multiple requests on the same connection: normally you don’t know what version of HTTP the server supports until after you do a request and get a response – so on an initial request you might have to assume the worst. But on later requests on the same connection, the information will be available here.
-
client_is_waiting_for_100_continue
¶ True if the client sent a request with the
Expect: 100-continue
header, and is still waiting for a response (i.e., the server has not sent a 100 Continue or any other kind of response, and the client has not gone ahead and started sending the body anyway).See RFC 7231 section 5.1.1 for details.
-
they_are_waiting_for_100_continue
¶ True if
their_role
isCLIENT
andclient_is_waiting_for_100_continue
.
-
trailing_data
¶ Data that has been received, but not yet processed, represented as a tuple with two elements, where the first is a byte-string containing the unprocessed data itself, and the second is a bool that is True if the receive connection was closed.
See Switching protocols for discussion of why you’d want this.
Error handling¶
Given the vagaries of networks and the folks on the other side of them, it’s extremely important to be prepared for errors.
Most errors in h11 are signaled by raising one of
ProtocolError
’s two concrete base classes,
LocalProtocolError
and RemoteProtocolError
:
-
exception
h11.
ProtocolError
(msg: str, error_status_hint: int = 400)[source]¶ Exception indicating a violation of the HTTP/1.1 protocol.
This as an abstract base class, with two concrete base classes:
LocalProtocolError
, which indicates that you tried to do something that HTTP/1.1 says is illegal, andRemoteProtocolError
, which indicates that the remote peer tried to do something that HTTP/1.1 says is illegal. See Error handling for details.In addition to the normal
Exception
features, it has one attribute:-
error_status_hint
¶ This gives a suggestion as to what status code a server might use if this error occurred as part of a request.
For a
RemoteProtocolError
, this is useful as a suggestion for how you might want to respond to a misbehaving peer, if you’re implementing a server.For a
LocalProtocolError
, this can be taken as a suggestion for how your peer might have responded to you if h11 had allowed you to continue.The default is 400 Bad Request, a generic catch-all for protocol violations.
-
There are four cases where these exceptions might be raised:
When trying to instantiate an event object (
LocalProtocolError
): This indicates that something about your event is invalid. Your event wasn’t constructed, but there are no other consequences – feel free to try again.When calling
Connection.start_next_cycle()
(LocalProtocolError
): This indicates that the connection is not ready to be re-used, because one or both of the peers are not in theDONE
state. TheConnection
object remains usable, and you can try again later.When calling
Connection.next_event()
(RemoteProtocolError
): This indicates that the remote peer has violated our protocol assumptions. This is unrecoverable – we don’t know what they’re doing and we cannot safely proceed.Connection.their_state
immediately becomesERROR
, and all further calls tonext_event()
will also raiseRemoteProtocolError
.Connection.send()
still works as normal, so if you’re implementing a server and this happens then you have an opportunity to send back a 400 Bad Request response. But aside from that, your only real option is to close your socket and make a new connection.When calling
Connection.send()
orConnection.send_with_data_passthrough()
(LocalProtocolError
): This indicates that you violated our protocol assumptions. This is also unrecoverable – h11 doesn’t know what you’re doing, its internal state may be inconsistent, and we cannot safely proceed.Connection.our_state
immediately becomesERROR
, and all further calls tosend()
will also raiseLocalProtocolError
. The only thing you can reasonably due at this point is to close your socket and make a new connection.
So that’s how h11 tells you about errors that it detects. In some
cases, it’s also useful to be able to tell h11 about an error that you
detected. In particular, the Connection
object assumes that
after you call Connection.send()
, you actually send that data to
the remote peer. But sometimes, for one reason or another, this
doesn’t actually happen.
Here’s a concrete example. Suppose you’re using h11 to implement an
HTTP client that keeps a pool of connections so it can re-use them
when possible (see Re-using a connection: keep-alive and pipelining). You take a
connection from the pool, and start to do a large upload… but then
for some reason this gets cancelled (maybe you have a GUI and a user
clicked “cancel”). This can cause h11’s model of this connection to
diverge from reality: for example, h11 might think that you
successfully sent the full request, because you passed an
EndOfMessage
object to Connection.send()
, but in fact
you didn’t, because you never sent the resulting bytes. And then –
here’s the really tricky part! – if you’re not careful, you might
think that it’s OK to put this connection back into the connection
pool and re-use it, because h11 is telling you that a full
request/response cycle was completed. But this is wrong; in fact you
have to close this connection and open a new one.
The solution is simple: call Connection.send_failed()
, and now
h11 knows that your send failed. In this case,
Connection.our_state
immediately becomes ERROR
, just
like if you had tried to do something that violated the protocol.
Message body framing: Content-Length
and all that¶
There are two different headers that HTTP/1.1 uses to indicate a
framing mechanism for request/response bodies: Content-Length
and
Transfer-Encoding
. Our general philosophy is that the way you tell
h11 what configuration you want to use is by setting the appropriate
headers in your request / response, and then h11 will both pass those
headers on to the peer and encode the body appropriately.
Currently, the only supported Transfer-Encoding
is chunked
.
On requests, this means:
No
Content-Length
orTransfer-Encoding
: no body, equivalent toContent-Length: 0
.Content-Length: ...
: You’re going to send exactly the specified number of bytes. h11 will keep track and signal an error if yourEndOfMessage
doesn’t happen at the right place.Transfer-Encoding: chunked
: You’re going to send a variable / not yet known number of bytes.Note 1: only HTTP/1.1 servers are required to support
Transfer-Encoding: chunked
, and as a client you have to decide whether to send this header before you get to see what protocol version the server is using.Note 2: even though HTTP/1.1 servers are required to support
Transfer-Encoding: chunked
, this doesn’t necessarily mean that they actually do – e.g., applications using Python’s standard WSGI API cannot accept chunked requests.Nonetheless, this is the only way to send request where you don’t know the size of the body ahead of time, so if that’s the situation you find yourself in then you might as well try it and hope.
On responses, things are a bit more subtle. There are effectively two cases:
Content-Length: ...
: You’re going to send exactly the specified number of bytes. h11 will keep track and signal an error if yourEndOfMessage
doesn’t happen at the right place.Transfer-Encoding: chunked
, or, neither framing header is provided: These two cases are handled differently at the wire level, but as far as the application is concerned they provide (almost) exactly the same semantics: in either case, you’ll send a variable / not yet known number of bytes. The difference between them is thatTransfer-Encoding: chunked
works better (compatible with keep-alive, allows trailing headers, clearly distinguishes between successful completion and network errors), but requires an HTTP/1.1 client; for HTTP/1.0 clients the only option is the no-headers approach where you have to close the socket to indicate completion.Since this is (almost) entirely a wire-level-encoding concern, h11 abstracts it: when sending a response you can set either
Transfer-Encoding: chunked
or leave off both framing headers, and h11 will treat both cases identically: it will automatically pick the best option given the client’s advertised HTTP protocol level.You need to watch out for this if you’re using trailing headers (i.e., a non-empty
headers
attribute onEndOfMessage
), since trailing headers are only legal if we actually ended up usingTransfer-Encoding: chunked
. Trying to send a non-empty set of trailing headers to a HTTP/1.0 client will raise aLocalProtocolError
. If this use case is important to you, checkConnection.their_http_version
to confirm that the client speaks HTTP/1.1 before you attempt to send any trailing headers.
Re-using a connection: keep-alive and pipelining¶
HTTP/1.1 allows a connection to be re-used for multiple request/response cycles (also known as “keep-alive”). This can make things faster by letting us skip the costly connection setup, but it does create some complexities: we have to keep track of whether a connection is reusable, and when there are multiple requests and responses flowing through the same connection we need to be careful not to get confused about which request goes with which response.
h11 considers a connection to be reusable if, and only if, both
sides (a) speak HTTP/1.1 (HTTP/1.0 did have some complex and fragile
support for keep-alive bolted on, but h11 currently doesn’t support
that – possibly this will be added in the future), and (b) neither
side has explicitly disabled keep-alive by sending a Connection:
close
header.
If you plan to make only a single request or response and then close
the connection, you should manually set the Connection: close
header in your request/response. h11 will notice and update its state
appropriately.
There are also some situations where you are required to send a
Connection: close
header, e.g. if you are a server talking to a
client that doesn’t support keep-alive. You don’t need to worry about
these cases – h11 will automatically add this header when
necessary. Just worry about setting it when it’s actually something
that you’re actively choosing.
If you want to re-use a connection, you have to wait until both the
request and the response have been completed, bringing both the client
and server to the DONE
state. Once this has happened, you can
explicitly call Connection.start_next_cycle()
to reset both
sides back to the IDLE
state. This makes sure that the client
and server remain synched up.
If keep-alive is disabled for whatever reason – someone set
Connection: close
, lack of protocol support, one of the sides just
unilaterally closed the connection – then the state machines will
skip past the DONE
state directly to the MUST_CLOSE
or
CLOSED
states. In this case, trying to call
start_next_cycle()
will raise an error, and the only
thing you can legally do is to close this connection and make a new
one.
HTTP/1.1 also allows for a more aggressive form of connection re-use, in which a client sends multiple requests in quick succession, and then waits for the responses to stream back in order (“pipelining”). This is generally considered to have been a bad idea, because it makes things like error recovery very complicated.
As a client, h11 does not support pipelining. This is enforced by the
structure of the state machine: after sending one Request
,
you can’t send another until after calling
start_next_cycle()
, and you can’t call
start_next_cycle()
until the server has entered the
DONE
state, which requires reading the server’s full
response.
As a server, h11 provides the minimal support for pipelining required
to comply with the HTTP/1.1 standard: if the client sends multiple
pipelined requests, then we handle the first request until we reach the
DONE
state, and then next_event()
will
pause and refuse to parse any more events until the response is
completed and start_next_cycle()
is called. See the
next section for more details.
Flow control¶
Presumably you know when you want to send things, and the
send()
interface is very simple: it just immediately
returns all the data you need to send for the given event, so you can
apply whatever send buffer strategy you want. But reading from the
remote peer is a bit trickier: you don’t want to read data from the
remote peer if it can’t be processed (i.e., you want to apply
backpressure and avoid building arbitrarily large in-memory buffers),
and you definitely don’t want to block waiting on data from the remote
peer at the same time that it’s blocked waiting for you, because that
will cause a deadlock.
One complication here is that if you’re implementing a server, you
have to be prepared to handle Request
s that have an
Expect: 100-continue
header. You can read the spec for the full
details, but basically what this header means is that after sending
the Request
, the client plans to pause and wait until they
see some response from the server before they send that request’s
Data
. The server’s response would normally be an
InformationalResponse
with status 100 Continue
, but it
could be anything really (e.g. a full Response
with a 4xx
status code). The crucial thing as a server, though, is that you
should never block trying to read a request body if the client is
blocked waiting for you to tell them to send the request body.
Fortunately, h11 makes this easy, because it tracks whether the client
is in the waiting-for-100-continue state, and exposes this as
Connection.they_are_waiting_for_100_continue
. So you don’t
have to pay attention to the Expect
header yourself; you just have
to make sure that before you block waiting to read a request body, you
execute some code like:
if conn.they_are_waiting_for_100_continue:
do_send(conn, h11.InformationalResponse(100, headers=[...]))
do_read(...)
In fact, if you’re lazy (and what programmer isn’t?) then you can just do this check before all reads – it’s mandatory before blocking to read a request body, but it’s safe at any time.
And the other thing you want to pay attention to is the special values
that next_event()
might return: NEED_DATA
and PAUSED
.
NEED_DATA
is what it sounds like: it means that
next_event()
is guaranteed not to return any more
real events until you’ve called receive_data()
at
least once.
PAUSED
is a little more subtle: it means that
next_event()
is guaranteed not to return any more
real events until something else has happened to clear up the paused
state. There are three cases where this can happen:
We received a full request/response from the remote peer, and then we received some more data after that. (The main situation where this might happen is a server responding to a pipelining client.) The
PAUSED
state will go away after you callstart_next_cycle()
.A successful
CONNECT
orUpgrade:
request has caused the connection to switch to some other protocol (see Switching protocols). ThisPAUSED
state is permanent; you should abandon thisConnection
and go do whatever it is you’re going to do with your new protocol.We’re a server, and the client we’re talking to proposed to switch protocols (see Switching protocols), and now is waiting to find out whether their request was successful or not. Once we either accept or deny their request then this will turn into one of the above two states, so you probably don’t need to worry about handling it specially.
Putting all this together –
If your I/O is organized around a “pull” strategy, where your code
requests events as its ready to handle them (e.g. classic synchronous
code, or asyncio’s await loop.sock_recv(...)
, or Trio’s streams),
then you’ll probably want logic that looks something like:
# Replace do_sendall and do_recv with your I/O code
def get_next_event():
while True:
event = conn.next_event()
if event is h11.NEED_DATA:
if conn.they_are_waiting_for_100_continue:
do_sendall(conn, h11.InformationalResponse(100, ...))
conn.receive_data(do_recv())
continue
return event
And then your code that calls this will need to make sure to call it
only at appropriate times (e.g., not immediately after receiving
EndOfMessage
or PAUSED
).
If your I/O is organized around a “push” strategy, where the network
drives processing (e.g. you’re using Twisted, or implementing an
asyncio.Protocol
), then you’ll want to internally apply
back-pressure whenever you see PAUSED
, remove back-pressure
when you call start_next_cycle()
, and otherwise just
deliver events as they arrive. Something like:
class HTTPProtocol(asyncio.Protocol):
# Save the transport for later -- needed to access the
# backpressure API.
def connection_made(self, transport):
self._transport = transport
# Internal helper function -- deliver all pending events
def _deliver_events(self):
while True:
event = self.conn.next_event()
if event is h11.NEED_DATA:
break
elif event is h11.PAUSED:
# Apply back-pressure
self._transport.pause_reading()
break
else:
self.event_received(event)
# Called by "someone" whenever new data appears on our socket
def data_received(self, data):
self.conn.receive_data(data)
self._deliver_events()
# Called by "someone" whenever the peer closes their socket
def eof_received(self):
self.conn.receive_data(b"")
self._deliver_events()
# asyncio will close our socket unless we return True here.
return True
# Called by your code when its ready to start a new
# request/response cycle
def start_next_cycle(self):
self.conn.start_next_cycle()
# New events might have been buffered internally, and only
# become deliverable after calling start_next_cycle
self._deliver_events()
# Remove back-pressure
self._transport.resume_reading()
# Fill in your code here
def event_received(self, event):
...
And your code that uses this will have to remember to check for
they_are_waiting_for_100_continue
at the
appropriate time.
Closing connections¶
h11 represents a connection shutdown with the special event type
ConnectionClosed
. You can send this event, in which case
send()
will simply update the state machine and
then return None
. You can receive this event, if you call
conn.receive_data(b"")
. (The actual receipt might be delayed if
the connection is paused.) It’s safe and legal
to call conn.receive_data(b"")
multiple times, and once you’ve
done this once, then all future calls to
receive_data()
will also return
ConnectionClosed()
:
In [25]: conn = h11.Connection(our_role=h11.CLIENT)
In [26]: conn.receive_data(b"")
In [27]: conn.receive_data(b"")
In [28]: conn.receive_data(None)
(Or if you try to actually pass new data in after calling
conn.receive_data(b"")
, that will raise an exception.)
h11 is careful about interpreting connection closure in a half-duplex
fashion. TCP sockets pretend to be a two-way connection, but really
they’re two one-way connections. In particular, it’s possible for one
party to shut down their sending connection – which causes the other
side to be notified that the connection has closed via the usual
socket.recv(...) -> b""
mechanism – while still being able to
read from their receiving connection. (On Unix, this is generally
accomplished via the shutdown(2)
system call.) So, for example, a
client could send a request, and then close their socket for writing
to indicate that they won’t be sending any more requests, and then
read the response. It’s this kind of closure that is indicated by
h11’s ConnectionClosed
: it means that this party will not be
sending any more data – nothing more, nothing less. You can see this
reflected in the state machine, in which one
party transitioning to CLOSED
doesn’t immediately halt the
connection, but merely prevents it from continuing for another
request/response cycle.
The state machine also indicates that ConnectionClosed
events
can only happen in certain states. This isn’t true, of course – any
party can close their connection at any time, and h11 can’t stop
them. But what h11 can do is distinguish between clean and unclean
closes. For example, if both sides complete a request/response cycle
and then close the connection, that’s a clean closure and everyone
will transition to the CLOSED
state in an orderly fashion. On
the other hand, if one party suddenly closes the connection while
they’re in the middle of sending a chunked response body, or when they
promised a Content-Length:
of 1000 bytes but have only sent 500,
then h11 knows that this is a violation of the HTTP protocol, and will
raise a ProtocolError
. Basically h11 treats an unexpected
close the same way it would treat unexpected, uninterpretable data
arriving – it lets you know that something has gone wrong.
As a client, the proper way to perform a single request and then close the connection is:
Send a
Request
withConnection: close
Send the rest of the request body
Read the server’s
Response
and bodyconn.our_state is h11.MUST_CLOSE
will now be true. Callconn.send(ConnectionClosed())
and then close the socket. Or really you could just close the socket – the thing callingsend
will do is raise an error if you’re not inMUST_CLOSE
as expected. So it’s between you and your conscience and your code reviewers.
(Technically it would also be legal to shutdown your socket for writing as step 2.5, but this doesn’t serve any purpose and some buggy servers might get annoyed, so it’s not recommended.)
As a server, the proper way to perform a response is:
Send your
Response
and bodyCheck if
conn.our_state is h11.MUST_CLOSE
. This might happen for a variety of reasons; for example, if the response had unknown length and the client speaks only HTTP/1.0, then the client will not consider the connection complete until we issue a close.
You should be particularly careful to take into consideration the following note fromx RFC 7230 section 6.6:
If a server performs an immediate close of a TCP connection, there is a significant risk that the client will not be able to read the last HTTP response. If the server receives additional data from the client on a fully closed connection, such as another request that was sent by the client before receiving the server’s response, the server’s TCP stack will send a reset packet to the client; unfortunately, the reset packet might erase the client’s unacknowledged input buffers before they can be read and interpreted by the client’s HTTP parser.
To avoid the TCP reset problem, servers typically close a connection in stages. First, the server performs a half-close by closing only the write side of the read/write connection. The server then continues to read from the connection until it receives a corresponding close by the client, or until the server is reasonably certain that its own TCP stack has received the client’s acknowledgement of the packet(s) containing the server’s last response. Finally, the server fully closes the connection.
Switching protocols¶
h11 supports two kinds of “protocol switches”: requests with method
CONNECT
, and the newer Upgrade:
header, most commonly used for
negotiating WebSocket connections. Both follow the same pattern: the
client proposes that they switch from regular HTTP to some other kind
of interaction, and then the server either rejects the suggestion –
in which case we return to regular HTTP rules – or else accepts
it. (For CONNECT
, acceptance means a response with 2xx status
code; for Upgrade:
, acceptance means an
InformationalResponse
with status 101 Switching
Protocols
) If the proposal is accepted, then both sides switch to
doing something else with their socket, and h11’s job is done.
As a developer using h11, it’s your responsibility to send and
interpret the actual CONNECT
or Upgrade:
request and response,
and to figure out what to do after the handover; it’s h11’s job to
understand what’s going on, and help you make the handover
smoothly.
Specifically, what h11 does is pause parsing
incoming data at the boundary between the two protocols, and then you
can retrieve any unprocessed data from the
Connection.trailing_data
attribute.
Support for sendfile()
¶
Many networking APIs provide some efficient way to send particular data, e.g. asking the operating system to stream files directly off of the disk and into a socket without passing through userspace.
It’s possible to use these APIs together with h11. The basic strategy is:
Create some placeholder object representing the special data, that your networking code knows how to “send” by invoking whatever the appropriate underlying APIs are.
Make sure your placeholder object implements a
__len__
method returning its size in bytes.Call
conn.send_with_data_passthrough(Data(data=<your placeholder object>))
This returns a list whose contents are a mixture of (a) bytes-like objects, and (b) your placeholder object. You should send them to the network in order.
Here’s a sketch of what this might look like:
class FilePlaceholder:
def __init__(self, file, offset, count):
self.file = file
self.offset = offset
self.count = count
def __len__(self):
return self.count
def send_data(sock, data):
if isinstance(data, FilePlaceholder):
# socket.sendfile added in Python 3.5
sock.sendfile(data.file, data.offset, data.count)
else:
# data is a bytes-like object to be sent directly
sock.sendall(data)
placeholder = FilePlaceholder(open("...", "rb"), 0, 200)
for data in conn.send_with_data_passthrough(Data(data=placeholder)):
send_data(sock, data)
This works with all the different framing modes (Content-Length
,
Transfer-Encoding: chunked
, etc.) – h11 will add any necessary
framing data, update its internal state, and away you go.
Identifying h11 in requests and responses¶
According to RFC 7231, client requests are supposed to include a
User-Agent:
header identifying what software they’re using, and
servers are supposed to respond with a Server:
header doing the
same. h11 doesn’t construct these headers for you, but to make it
easier for you to construct this header, it provides:
-
h11.
PRODUCT_ID
¶ A string suitable for identifying the current version of h11 in a
User-Agent:
orServer:
header.The version of h11 that was used to build these docs identified itself as:
In [29]: h11.PRODUCT_ID Out[29]: 'python-h11/0.14.0+dev'
Chunked Transfer Encoding Delimiters¶
New in version 0.7.0.
HTTP/1.1 allows for the use of Chunked Transfer Encoding to frame request and response bodies. This form of transfer encoding allows the implementation to provide its body data in the form of length-prefixed “chunks” of data.
RFC 7230 is extremely clear that the breaking points between chunks of data are non-semantic: that is, users should not rely on them or assign any meaning to them. This is particularly important given that RFC 7230 also allows intermediaries such as proxies and caches to change the chunk boundaries as they see fit, or even to remove the chunked transfer encoding entirely.
However, for some applications it is valuable or essential to see the chunk
boundaries because the peer implementation has assigned meaning to them. While
this is against the specification, if you do really need access to this
information h11 makes it available to you in the form of the
Data.chunk_start
and Data.chunk_end
properties of the
Data
event.
Data.chunk_start
is set to True
for the first Data
event
for a given chunk of data. Data.chunk_end
is set to True
for the
last Data
event that is emitted for a given chunk of data. h11
guarantees that it will always emit at least one Data
event for each
chunk of data received from the remote peer, but due to its internal buffering
logic it may return more than one. It is possible for a single Data
event to have both Data.chunk_start
and Data.chunk_end
set to
True
, in which case it will be the only Data
event for that chunk
of data.
Again, it is strongly encouraged that you avoid relying on this information if at all possible. This functionality should be considered an escape hatch for when there is no alternative but to rely on the information, rather than a general source of data that is worth relying on.