Chunking

Not all bytes are the same. If you are working with TCP or QUIC, you get to work with byte "streams". Byte streams have no hard boundaries about the bytes you get. You could ask for 1 byte or you could ask for 10000 bytes.

Websockets or HTTP2, however, are "frame" based. You send and receive discrete chunks of bytes, and these could be used to define individual messages.

Because of these differences, there's often different APIs when working with them.

For instance, The tokio-tungstenite crate for websockets exposes a WebSocketStream which is a Stream<Item = Result<Message, Error>> + Sink<Message, Error = Error>. Sink is something we haven't looked at yet, but it effectively abstracts a channel sender, where Stream is like the channel receiver.

hyper and by extension axum use a custom http::Body trait in requests/responses. This is similar to a Stream<Item = Result<Frame<Buf>, Error>>, but is designed specifically with http in mind. There are helpers in the http-body-util crate that allow freely converting between Body and Stream, though.

Each of these only offer a way to receive the entire message only, no partial messages. Body::next_frame() always returns a single frame. WebSocketStream::next() only returns a single message.

However, tokio::net::TcpStream and quinn::RecvStream/quinn::SendStream all work over the tokio::io::AsyncRead/tokio::io::AsyncWrite trait, which allows partial reading and writing.

Excess Buffering

One problem that arises with these different abstractions is that layering them can cause excess buffering.

TCP is based on IP packets. IP packets are framed and have a maximum size depending on the maximum transmission unit (MTU) of your network. When you write to a TCP stream, the data is split into buckets for each IP packet to be sent. These are then buffered by your OS for re-delivery incase these IP packets get lost.

Let's say you use TLS on your connection. TLS sends data as frames, since each encrypted block of bytes needs some extra metadata - like the authentication tag. Because of this, the TLS wrapper needs to buffer data from the TCP stream until it can read a full TLS frame. tokio_rustls::TlsStream exposes this data back as a tokio::io::AsyncRead interface.

On top of this you might have a WebSocketStream. Since websocket messages are framed, if a large websocket message is sent, then the websocket stream will need to buffer the data from the TLS stream until it has enough to process the entire websocket message into one Vec<u8> to give you.

Problems this causes

This manifested as a problem at Neon for the service I work on. Postgres uses its own TCP/TLS based protocol, but we found that not all environments (edge/serverless) supports raw TCP. They did however support websockets. We decided to implement a simple middleware client that turns the postgres byte stream into websocket messages, then my service takes those websocket messages, and interprets them as bytes.

Postgres is itself a framed based protocol, so we had an extra layer of buffering. This caused a messurable amount of lag as messages would need to fill each buffer before being processed. Notably, if a client sent a large query in a single websocket message, then we would need to buffer the entire message until we could start then sending it to postgres.

I fixed this by forking and re-writing the websocket server implementation we were using. Fortunately, websockets has a concept of "partial messages", so in the server layer, I managed to modify it to split a large message into smaller chunks, which reduced the amount of buffering needed significantly.

An additional optimsation could be sharing the buffer needed with rustls's upcoming "unbuffered" API. This API no longer has a buffer of its own, and asks the user to provide their own buffer. You could then use this same buffer as the one used for websocket processing.

Async Design Patterns

Chunking

Excess Buffering

Problems this causes