Zig Readers and Writers
Zig 0.15 introduces a new Reader and Writer interface where "the buffer is in the interface". While there are a lot of examples of what this means, none of them quite struck me as clear. To learn more, I wanted to write an implementation of PackBits, which is a simple run-length compression scheme.
packbits
PackBits uses a simple ruleset for encoding:
| Header Byte | Data |
|---|---|
0..127 | 1 + n literal bytes of data |
129..255 | one byte repeated 1 - n times |
128 | skip |
One thing to note is that we can compress at most 127 bytes of data, so we should ensure that whatever buffer we're using is at least 127 bytes. The buffer is how we will compute our Run-length sizes. In this simple version, we just want to get something working and can figure out how to make it more flexible later.
Writer
A std.Io.Writer is a structure in zig that consists of a buffer, an end
index into said buffer which indicates where the actual data is, and a vtable
which contains functions providing the implementation details. Only one
function is required, which is drain:
// drain function
pub const VTable = struct {
/// Sends bytes to the logical sink. A write will only be sent here if it
/// could not fit into `buffer`, or during a `flush` operation.
///
/// `buffer[0..end]` is consumed first, followed by each slice of `data` in
/// order. Elements of `data` may alias each other but may not alias
/// `buffer`.
///
/// This function modifies `Writer.end` and `Writer.buffer` in an
/// implementation-defined manner.
///
/// `data.len` must be nonzero.
///
/// The last element of `data` is repeated as necessary so that it is
/// written `splat` number of times, which may be zero.
///
/// This function may not be called if the data to be written could have
/// been stored in `buffer` instead, including when the amount of data to
/// be written is zero and the buffer capacity is zero.
///
/// Number of bytes consumed from `data` is returned, excluding bytes from
/// `buffer`.
///
/// Number of bytes returned may be zero, which does not indicate stream
/// end. A subsequent call may return nonzero, or signal end of stream via
/// `error.WriteFailed`.
drain: *const fn (w: *Writer, data: []const []const u8, splat: usize) Error!usize,
/// ...other functions that we dont need
};
drain is a heavily loaded function. We are expected to handle the buffer, the slices
of data, and repeat using splat. I think this documentation is unclear about
what's required versus what is available for optimization. The last two paragraphs mention
the most important part in my view, which is that the writer may (at their discretion) only
flush the contents of the buffer. This means we can forget about data and splat,
and the higher-level methods of Writer will respond accordingly.
So in practice, we should:
- Consume all of the data in
buffer[0..end] - Move
endto0to indicate that we cleared the buffer - return 0 to indicate that we didn't consume
data.
It's not required to consume the entirety of buffer, but it makes it simpler.
This is details