2023-08-31 09:34:28

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] pstore: Base compression input buffer size on estimated compressed size

On Thu, 31 Aug 2023 at 07:20, Eric Biggers <[email protected]> wrote:
>
> On Wed, Aug 30, 2023 at 04:43:37PM -0700, Kees Cook wrote:
> > On Wed, Aug 30, 2023 at 11:22:38PM +0200, Ard Biesheuvel wrote:
> > > Commit 1756ddea6916 ("pstore: Remove worst-case compression size logic")
> > > removed some clunky per-algorithm worst case size estimation routines on
> > > the basis that we can always store pstore records uncompressed, and
> > > these worst case estimations are about how much the size might
> > > inadvertently *increase* due to encapsulation overhead when the input
> > > cannot be compressed at all. So if compression results in a size
> > > increase, we just store the original data instead.
> >
> > Does the Z_FINISH vs Z_SYNC_FLUSH thing need to be fixed as well, or
> > does that become a non-issue with this change?
>
> I haven't seen any real evidence that that issue actually exists.
>

Indeed. The workaround in crypto/deflate.c was added in 2003 by James
Morris, and the zlib fork was rebased onto a newer upstream at least
once since then.

> > > void pstore_record_init(struct pstore_record *record,
> > > @@ -305,7 +314,7 @@ static void pstore_dump(struct kmsg_dumper *dumper,
> > > record.buf = psinfo->buf;
> > >
> > > dst = big_oops_buf ?: psinfo->buf;
> > > - dst_size = psinfo->bufsize;
> > > + dst_size = max_uncompressed_size ?: psinfo->bufsize;
> > >
> > > /* Write dump header. */
> > > header_size = snprintf(dst, dst_size, "%s#%d Part%u\n", why,
> > > @@ -326,8 +335,15 @@ static void pstore_dump(struct kmsg_dumper *dumper,
> > > record.compressed = true;
> > > record.size = zipped_len;
> > > } else {
> > > - record.size = header_size + dump_size;
> > > - memcpy(psinfo->buf, dst, record.size);
> > > + /*
> > > + * Compression failed, so the buffer is most
> > > + * likely filled with binary data that does not
> > > + * compress as well as ASCII text. Copy as much
> > > + * of the uncompressed data as possible into
> > > + * the pstore record, and discard the rest.
> > > + */
> > > + record.size = psinfo->bufsize;
> > > + memcpy(psinfo->buf, dst, psinfo->bufsize);
> >
> > I don't think this is "friendly" enough. :P
> >
> > In the compression failure case, we've got a larger dst_size (and
> > dump_size, but technically it might not be true if something else went
> > wrong) than psinfo->bufsize, so we want to take the trailing bytes
> > (i.e. panic details are more likely at the end). And we should keep
> > the header, which is already present in "dst". I think we need to do
> > something like this:
> >
> > size_t buf_size_available = psinfo->bufsize - header_size;
> > size_t dump_size_wanted = min(dump_size, buf_size_available);
> >
> > record.size = header_size + dump_size_wanted;
> > memcpy(psinfo->buf, dst, header_size);
> > memcpy(psinfo->buf + header_size,
> > dst + header_size + (dump_size - dump_size_wanted),
> > dump_size_wanted);
> >
> > My eyes, my eyes.
> >
>
> How hard would it be to write two uncompressed records when compression fails to
> achieve the targeted 50% ratio?
>

It wouldn't be that hard, and this crossed my mind as well.

However, we should ask ourselves when this is ever going to happen,
and what the implications are for those records. The reason we want to
capture the dmesg output is because it contains human readable ASCII
text that explains what happened right before the system crashed.

The zlib library code uses pre-allocated buffers and so the most
likely reason for failure is that the 2x estimation does not hold for
the buffer in question. It is highly unlikely that that buffer
contains ASCII text or anything resembling it in that case, and so it
is most probably garbage that happens to compress really poorly. Do we
really need to capture all of that? And if we don't, does it really
matter whether the we capture the first bit or the last bit?