2019-02-22 21:58:56

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH 0/2] NFSD: Add support for the v4.2 READ_PLUS operation

These patches add server support for the READ_PLUS operation. This
operation is meant to improve file reading performance when working with
sparse files, but there are some issues around the use of vfs_llseek() to
identify hole and data segments when encoding the reply. I've done a
bunch of testing on virtual machines, and I found that READ_PLUS
performs best if:

1) The file being read is not yet in the server's page cache.
2) The read request begins with a hole segment. And
3) The server only performs one llseek() call during encoding

I've added a "noreadplus" mount option on the client side to allow users
to disable the new operation if it becomes a problem, similar to the
"nordirplus" mount option that we already have.

Here are the results of my performance tests, separated by underlying
filesystem and if the file is already in the cache or not. The NFS v4.2
column is for the standard READ operation, and v4.2+ is with READ_PLUS.
In addition to the 100% data and 100% hole cases, I also test with files
that alternate between data and hole chunks. I tested with two files
for each chunk size, one beginning with a data segment and one beginning
with a hole. I used the `vmtouch` utility to load and clear the file
from the server's cache, and I used the following `dd` command on the
client for reading back the file:

$ dd if=$src of=/dev/null bs=$rsize_from_mount 2>&1


xfs (uncached) | NFS v3 NFS v4.0 NFS v4.1 NFS v4.2 NFS v4.2+
------------------------+-------------------------------------------------
Whole File (data) | 3.228s 3.361s 3.679s 3.382s 3.483s
Whole File (hole) | 1.276s 1.086s 1.143s 1.066s 0.805s
Sparse 4K (data) | 3.473s 3.953s 3.740s 3.535s 3.515s
Sparse 4K (hole) | 3.373s 3.192s 3.120s 3.113s 2.709s
Sparse 8K (data) | 3.782s 3.527s 3.589s 3.476s 3.494s
Sparse 8K (hole) | 3.161s 3.328s 2.974s 2.889s 2.863s
Sparse 16K (data) | 3.804s 3.945s 3.885s 3.507s 3.569s
Sparse 16K (hole) | 2.961s 3.124s 3.413s 3.136s 2.712s
Sparse 32K (data) | 2.891s 3.632s 3.833s 3.643s 3.485s
Sparse 32K (hole) | 2.592s 2.216s 2.545s 2.665s 2.829s

xfs (cached) | NFS v3 NFS v4.0 NFS v4.1 NFS v4.2 NFS v4.2+
------------------------+-------------------------------------------------
Whole File (data) | 0.939s 0.943s 0.939s 0.942s 1.153s
Whole File (hole) | 0.982s 1.007s 0.991s 0.946s 0.826s
Sparse 4K (data) | 0.980s 0.999s 0.961s 0.996s 1.166s
Sparse 4K (hole) | 1.001s 0.972s 0.997s 1.001s 1.201s
Sparse 8K (data) | 1.272s 1.053s 0.999s 0.974s 1.200s
Sparse 8K (hole) | 0.965s 1.004s 1.036s 1.006s 1.248s
Sparse 16K (data) | 0.995s 0.993s 1.035s 1.054s 1.210s
Sparse 16K (hole) | 0.966s 0.982s 1.091s 1.038s 1.214s
Sparse 32K (data) | 1.054s 0.968s 1.045s 0.990s 1.203s
Sparse 32K (hole) | 1.019s 0.960s 1.001s 0.983s 1.254s

ext4 (uncached) | NFS v3 NFS v4.0 NFS v4.1 NFS v4.2 NFS v4.2+
------------------------+-------------------------------------------------
Whole File (data) | 6.089s 6.104s 6.489s 6.342s 6.137s
Whole File (hole) | 2.603s 2.258s 2.226s 2.315s 1.715s
Sparse 4K (data) | 7.063s 7.372s 7.064s 7.149s 7.459s
Sparse 4K (hole) | 7.231s 6.709s 6.495s 6.880s 6.138s
Sparse 8K (data) | 6.576s 6.938s 6.386s 6.086s 6.154s
Sparse 8K (hole) | 5.903s 6.089s 5.555s 5.578s 5.442s
Sparse 16K (data) | 6.556s 6.257s 6.135s 5.588s 5.856s
Sparse 16K (hole) | 5.504s 5.290s 5.545s 5.195s 4.983s
Sparse 32K (data) | 5.047s 5.490s 5.734s 5.578s 5.378s
Sparse 32K (hole) | 4.232s 3.860s 4.299s 4.466s 4.633s

ext4 (cached) | NFS v3 NFS v4.0 NFS v4.1 NFS v4.2 NFS v4.2+
------------------------+-------------------------------------------------
Whole File (data) | 1.873s 1.881s 1.869s 1.890s 2.344s
Whole File (hole) | 1.929s 2.009s 1.963s 1.917s 1.554s
Sparse 4K (data) | 1.961s 1.974s 1.957s 1.986s 2.408s
Sparse 4K (hole) | 2.056s 2.025s 1.977s 1.988s 2.458s
Sparse 8K (data) | 2.297s 2.038s 2.008s 1.954s 2.437s
Sparse 8K (hole) | 1.939s 2.011s 2.024s 2.015s 2.509s
Sparse 16K (data) | 1.907s 1.973s 2.053s 2.070s 2.411s
Sparse 16K (hole) | 1.940s 1.964s 2.075s 1.996s 2.422s
Sparse 32K (data) | 2.045s 1.921s 2.021s 2.013s 2.388s
Sparse 32K (hole) | 1.984s 1.944s 1.997s 1.974s 2.398s

btrfs (uncached) | NFS v3 NFS v4.0 NFS v4.1 NFS v4.2 NFS v4.2+
------------------------+-------------------------------------------------
Whole File (data) | 9.369s 9.438s 9.837s 9.840s 11.790s
Whole File (hole) | 4.052s 3.390s 3.380s 3.619s 2.519s
Sparse 4K (data) | 9.738s 10.110s 9.774s 9.819s 12.471s
Sparse 4K (hole) | 9.907s 9.504s 9.241s 9.610s 9.054s
Sparse 8K (data) | 9.132s 9.453s 8.954s 8.660s 10.555s
Sparse 8K (hole) | 8.290s 8.489s 8.305s 8.332s 7.850s
Sparse 16K (data) | 8.742s 8.507s 8.667s 8.002s 9.940s
Sparse 16K (hole) | 7.635s 7.604s 7.967s 7.558s 7.062s
Sparse 32K (data) | 7.279s 7.670s 8.006s 7.705s 9.219s
Sparse 32K (hole) | 6.200s 5.713s 6.268s 6.464s 6.486s

btrfs (cached) | NFS v3 NFS v4.0 NFS v4.1 NFS v4.2 NFS v4.2+
------------------------+-------------------------------------------------
Whole File (data) | 2.770s 2.814s 2.841s 2.854s 3.492s
Whole File (hole) | 2.871s 2.970s 3.001s 2.929s 2.372s
Sparse 4K (data) | 2.945s 2.905s 2.930s 2.951s 3.663s
Sparse 4K (hole) | 3.032s 3.057s 2.962s 3.050s 3.705s
Sparse 8K (data) | 3.277s 3.069s 3.127s 3.034s 3.652s
Sparse 8K (hole) | 2.866s 2.959s 3.078s 2.989s 3.762s
Sparse 16K (data) | 2.916s 2.923s 3.060s 3.081s 3.631s
Sparse 16K (hole) | 2.948s 2.969s 3.108s 2.990s 3.623s
Sparse 32K (data) | 3.044s 2.881s 3.052s 2.962s 3.585s
Sparse 32K (hole) | 2.954s 2.957s 3.018s 2.951s 3.639s


I also have performance numbers for if we encode every hole and data
segment but I figured this email was long enough already. I'm happy to
share it if requested!

Thoughts?
Anna

-------------------------------------------------------------------------

Anna Schumaker (2):
NFSD: nfsd4_encode_read{v}() should encode eof and maxcount
NFSD: Add basic READ_PLUS support

fs/nfsd/nfs4proc.c | 16 ++++
fs/nfsd/nfs4xdr.c | 180 ++++++++++++++++++++++++++++++++++-----------
2 files changed, 153 insertions(+), 43 deletions(-)

--
2.20.1



2019-02-22 21:58:57

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH 1/2] NFSD: nfsd4_encode_read{v}() should encode eof and maxcount

I intend to reuse nfsd4_encode_readv() for READ_PLUS, so I need an
alternate way to encode these values. I think it makes sense for
nfsd4_encode_read() to handle this in a single place for both splice and
readv paths.

Signed-off-by: Anna Schumaker <[email protected]>
---
fs/nfsd/nfs4xdr.c | 67 ++++++++++++++++++-----------------------------
1 file changed, 26 insertions(+), 41 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 3de42a729093..bb487e5c022c 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3453,24 +3453,19 @@ nfsd4_encode_open_downgrade(struct nfsd4_compoundres *resp, __be32 nfserr, struc
static __be32 nfsd4_encode_splice_read(
struct nfsd4_compoundres *resp,
struct nfsd4_read *read,
- struct file *file, unsigned long maxcount)
+ struct file *file, unsigned long *maxcount)
{
struct xdr_stream *xdr = &resp->xdr;
struct xdr_buf *buf = xdr->buf;
- u32 eof;
- long len;
int space_left;
__be32 nfserr;
- __be32 *p = xdr->p - 2;

/* Make sure there will be room for padding if needed */
if (xdr->end - xdr->p < 1)
return nfserr_resource;

- len = maxcount;
nfserr = nfsd_splice_read(read->rd_rqstp, read->rd_fhp,
- file, read->rd_offset, &maxcount);
- read->rd_length = maxcount;
+ file, read->rd_offset, maxcount);
if (nfserr) {
/*
* nfsd_splice_actor may have already messed with the
@@ -3481,27 +3476,21 @@ static __be32 nfsd4_encode_splice_read(
return nfserr;
}

- eof = nfsd_eof_on_read(len, maxcount, read->rd_offset,
- d_inode(read->rd_fhp->fh_dentry)->i_size);
-
- *(p++) = htonl(eof);
- *(p++) = htonl(maxcount);
-
- buf->page_len = maxcount;
- buf->len += maxcount;
- xdr->page_ptr += (buf->page_base + maxcount + PAGE_SIZE - 1)
+ buf->page_len = *maxcount;
+ buf->len += *maxcount;
+ xdr->page_ptr += (buf->page_base + *maxcount + PAGE_SIZE - 1)
/ PAGE_SIZE;

/* Use rest of head for padding and remaining ops: */
buf->tail[0].iov_base = xdr->p;
buf->tail[0].iov_len = 0;
xdr->iov = buf->tail;
- if (maxcount&3) {
- int pad = 4 - (maxcount&3);
+ if (*maxcount&3) {
+ int pad = 4 - (*maxcount&3);

*(xdr->p++) = 0;

- buf->tail[0].iov_base += maxcount&3;
+ buf->tail[0].iov_base += *maxcount&3;
buf->tail[0].iov_len = pad;
buf->len += pad;
}
@@ -3516,21 +3505,19 @@ static __be32 nfsd4_encode_splice_read(

static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
struct nfsd4_read *read,
- struct file *file, unsigned long maxcount)
+ struct file *file, unsigned long *maxcount)
{
struct xdr_stream *xdr = &resp->xdr;
- u32 eof;
int v;
- int starting_len = xdr->buf->len - 8;
+ int starting_len = xdr->buf->len;
long len;
int thislen;
__be32 nfserr;
- __be32 tmp;
__be32 *p;
u32 zzz = 0;
int pad;

- len = maxcount;
+ len = *maxcount;
v = 0;

thislen = min_t(long, len, ((void *)xdr->end - (void *)xdr->p));
@@ -3552,25 +3539,14 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
}
read->rd_vlen = v;

- len = maxcount;
nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
- resp->rqstp->rq_vec, read->rd_vlen, &maxcount);
- read->rd_length = maxcount;
+ resp->rqstp->rq_vec, read->rd_vlen, maxcount);
if (nfserr)
return nfserr;
- xdr_truncate_encode(xdr, starting_len + 8 + ((maxcount+3)&~3));
-
- eof = nfsd_eof_on_read(len, maxcount, read->rd_offset,
- d_inode(read->rd_fhp->fh_dentry)->i_size);
+ xdr_truncate_encode(xdr, starting_len + ((*maxcount+3)&~3));

- tmp = htonl(eof);
- write_bytes_to_xdr_buf(xdr->buf, starting_len , &tmp, 4);
- tmp = htonl(maxcount);
- write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
-
- pad = (maxcount&3) ? 4 - (maxcount&3) : 0;
- write_bytes_to_xdr_buf(xdr->buf, starting_len + 8 + maxcount,
- &zzz, pad);
+ pad = (*maxcount&3) ? 4 - (*maxcount&3) : 0;
+ write_bytes_to_xdr_buf(xdr->buf, starting_len + *maxcount, &zzz, pad);
return 0;

}
@@ -3585,6 +3561,8 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
int starting_len = xdr->buf->len;
struct raparms *ra = NULL;
__be32 *p;
+ long len;
+ u32 eof;

p = xdr_reserve_space(xdr, 8); /* eof flag and byte count */
if (!p) {
@@ -3602,15 +3580,22 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
maxcount = min_t(unsigned long, maxcount,
(xdr->buf->buflen - xdr->buf->len));
maxcount = min_t(unsigned long, maxcount, read->rd_length);
+ len = maxcount;

if (read->rd_tmp_file)
ra = nfsd_init_raparms(file);

if (file->f_op->splice_read &&
test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags))
- nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount);
+ nfserr = nfsd4_encode_splice_read(resp, read, file, &maxcount);
else
- nfserr = nfsd4_encode_readv(resp, read, file, maxcount);
+ nfserr = nfsd4_encode_readv(resp, read, file, &maxcount);
+
+ read->rd_length = maxcount;
+ eof = nfsd_eof_on_read(len, maxcount, read->rd_offset,
+ d_inode(read->rd_fhp->fh_dentry)->i_size);
+ *p++ = cpu_to_be32(eof);
+ *p++ = cpu_to_be32(maxcount);

if (ra)
nfsd_put_raparams(file, ra);
--
2.20.1


2019-02-22 21:58:58

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH 2/2] NFSD: Add basic READ_PLUS support

This patch adds READ_PLUS support for both NFS4_CONTENT_DATA and
NFS4_CONTENT_HOLE segments. I keep things simple for now by only
returning a hole segment if it is the first segment found at the given
offset. Everything else, including other hole segments, will be encoded
as data.

Signed-off-by: Anna Schumaker <[email protected]>
---
fs/nfsd/nfs4proc.c | 16 +++++++
fs/nfsd/nfs4xdr.c | 113 ++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 127 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 0cfd257ffdaf..1c5f2c3da55f 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -2180,6 +2180,16 @@ static inline u32 nfsd4_read_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
return (op_encode_hdr_size + 2 + XDR_QUADLEN(rlen)) * sizeof(__be32);
}

+static inline u32 nfsd4_read_plus_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
+{
+ u32 maxcount = svc_max_payload(rqstp);
+ u32 rlen = min(op->u.read.rd_length, maxcount);
+ /* enough extra xdr space for encoding either a hole or data segment. */
+ u32 xdr = 5;
+
+ return (op_encode_hdr_size + 2 + xdr + XDR_QUADLEN(rlen)) * sizeof(__be32);
+}
+
static inline u32 nfsd4_readdir_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
{
u32 maxcount = 0, rlen = 0;
@@ -2701,6 +2711,12 @@ static const struct nfsd4_operation nfsd4_ops[] = {
.op_name = "OP_COPY",
.op_rsize_bop = nfsd4_copy_rsize,
},
+ [OP_READ_PLUS] = {
+ .op_func = nfsd4_read,
+ .op_name = "OP_READ_PLUS",
+ .op_rsize_bop = nfsd4_read_plus_rsize,
+ .op_get_currentstateid = nfsd4_get_readstateid,
+ },
[OP_SEEK] = {
.op_func = nfsd4_seek,
.op_name = "OP_SEEK",
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index bb487e5c022c..ec953efd24c2 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1882,7 +1882,7 @@ static const nfsd4_dec nfsd4_dec_ops[] = {
[OP_LAYOUTSTATS] = (nfsd4_dec)nfsd4_decode_notsupp,
[OP_OFFLOAD_CANCEL] = (nfsd4_dec)nfsd4_decode_offload_status,
[OP_OFFLOAD_STATUS] = (nfsd4_dec)nfsd4_decode_offload_status,
- [OP_READ_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_READ_PLUS] = (nfsd4_dec)nfsd4_decode_read,
[OP_SEEK] = (nfsd4_dec)nfsd4_decode_seek,
[OP_WRITE_SAME] = (nfsd4_dec)nfsd4_decode_notsupp,
[OP_CLONE] = (nfsd4_dec)nfsd4_decode_clone,
@@ -4273,7 +4273,116 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
return nfserr_resource;
p = xdr_encode_hyper(p, os->count);
*p++ = cpu_to_be32(0);
+ return nfserr;
+}
+
+static __be32
+nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp, struct nfsd4_read *read,
+ struct file *file)
+{
+ struct xdr_stream *xdr = &resp->xdr;
+ unsigned long maxcount;
+ __be32 *p, nfserr;
+
+ p = xdr_reserve_space(xdr, 4 + 8 + 4);
+ if (!p)
+ return nfserr_resource;
+ xdr_commit_encode(xdr);
+
+ maxcount = svc_max_payload(resp->rqstp);
+ maxcount = min_t(unsigned long, maxcount,
+ (xdr->buf->buflen - xdr->buf->len));
+ maxcount = min_t(unsigned long, maxcount, read->rd_length);
+
+ if (file->f_op->splice_read &&
+ test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags))
+ nfserr = nfsd4_encode_splice_read(resp, read, file, &maxcount);
+ else
+ nfserr = nfsd4_encode_readv(resp, read, file, &maxcount);
+
+ *p++ = cpu_to_be32(NFS4_CONTENT_DATA);
+ p = xdr_encode_hyper(p, read->rd_offset);
+ *p++ = cpu_to_be32(maxcount);
+
+ read->rd_offset += maxcount;
+ return nfserr;
+}
+
+static __be32
+nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp, struct nfsd4_read *read,
+ unsigned long length)
+{
+ __be32 *p = xdr_reserve_space(&resp->xdr, 4 + 8 + 8);
+ if (!p)
+ return nfserr_resource;
+
+ length = min_t(unsigned long, read->rd_length, length);
+
+ *p++ = cpu_to_be32(NFS4_CONTENT_HOLE);
+ p = xdr_encode_hyper(p, read->rd_offset);
+ p = xdr_encode_hyper(p, length);
+
+ read->rd_offset += length;
+ read->rd_length -= length;
+ return nfs_ok;
+}

+static __be32
+nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
+ struct nfsd4_read *read)
+{
+ struct xdr_stream *xdr = &resp->xdr;
+ struct file *file = read->rd_filp;
+ int starting_len = xdr->buf->len;
+ struct raparms *ra = NULL;
+ loff_t data_pos;
+ __be32 *p;
+ u32 eof, segments = 0;
+
+ if (nfserr)
+ goto out;
+
+ /* eof flag, segment count */
+ p = xdr_reserve_space(xdr, 4 + 4 );
+ if (!p) {
+ nfserr = nfserr_resource;
+ goto out;
+ }
+ xdr_commit_encode(xdr);
+
+ if (read->rd_tmp_file)
+ ra = nfsd_init_raparms(file);
+
+ data_pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
+ if (data_pos == -ENXIO)
+ data_pos = i_size_read(file_inode(file));
+
+ if (data_pos > read->rd_offset) {
+ nfserr = nfsd4_encode_read_plus_hole(resp, read,
+ data_pos - read->rd_offset);
+ if (nfserr)
+ goto out;
+ segments++;
+ }
+
+ if (read->rd_length > 0) {
+ nfserr = nfsd4_encode_read_plus_data(resp, read, file);
+ segments++;
+ }
+
+ eof = (read->rd_offset >= i_size_read(file_inode(file)));
+ *p++ = cpu_to_be32(eof);
+ *p++ = cpu_to_be32(segments);
+
+ if (ra)
+ nfsd_put_raparams(file, ra);
+
+ if (nfserr)
+ xdr_truncate_encode(xdr, starting_len);
+
+out:
+ if (file)
+ fput(file);
return nfserr;
}

@@ -4381,7 +4490,7 @@ static const nfsd4_enc nfsd4_enc_ops[] = {
[OP_LAYOUTSTATS] = (nfsd4_enc)nfsd4_encode_noop,
[OP_OFFLOAD_CANCEL] = (nfsd4_enc)nfsd4_encode_noop,
[OP_OFFLOAD_STATUS] = (nfsd4_enc)nfsd4_encode_offload_status,
- [OP_READ_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
+ [OP_READ_PLUS] = (nfsd4_enc)nfsd4_encode_read_plus,
[OP_SEEK] = (nfsd4_enc)nfsd4_encode_seek,
[OP_WRITE_SAME] = (nfsd4_enc)nfsd4_encode_noop,
[OP_CLONE] = (nfsd4_enc)nfsd4_encode_noop,
--
2.20.1


2019-03-01 17:03:42

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 0/2] NFSD: Add support for the v4.2 READ_PLUS operation

On Fri, Feb 22, 2019 at 04:58:48PM -0500, Anna Schumaker wrote:
> These patches add server support for the READ_PLUS operation. This
> operation is meant to improve file reading performance when working with
> sparse files, but there are some issues around the use of vfs_llseek() to
> identify hole and data segments when encoding the reply. I've done a
> bunch of testing on virtual machines,

Maybe the VM<->VM case is important too, but it'd be a little easier to
understand a case with real hardware, I think. What kind of disk are
you using, our of curiosity?

Also, what are the file sizes, and the rsize? (Apologies if I
overlooked that.)

> and I found that READ_PLUS performs best if:

It's interesting to think about why these are:

> 1) The file being read is not yet in the server's page cache.

I don't understand this one.

> 2) The read request begins with a hole segment. And

If I understand your current implementation, it's basically:

- seek with SEEK_DATA.
- if that finds a hole, read the rest of the requested range and
return two segments (a hole plus ordinary read results)
- otherwise just return ordinary read data for the whole range.

So, right, in the case the read range starts with data, the seek was
just a waste of time, makes sense.

> 3) The server only performs one llseek() call during encoding

OK, so for that we'd need to compare to a different implementation,
which is what you did elsewhere:

> I also have performance numbers for if we encode every hole and data
> segment but I figured this email was long enough already. I'm happy to
> share it if requested!

Got it. (And, sure, that might be interesting.)

--b.