Received: by 2002:a05:7412:3210:b0:e2:908c:2ebd with SMTP id eu16csp47536rdb; Thu, 31 Aug 2023 02:34:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHecJ5+dURum/zIj2HLppn86CIlRVf5NSi8qwxJ2Rr1nhPMw1nb4a6D4vyiN8GRjt+yGZr+ X-Received: by 2002:a17:90a:8043:b0:268:2523:652c with SMTP id e3-20020a17090a804300b002682523652cmr4330233pjw.31.1693474468107; Thu, 31 Aug 2023 02:34:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1693474468; cv=none; d=google.com; s=arc-20160816; b=oyWWOh+q4sIeGr5xN/ZR2mdlYdo8phgtqKzJSoCz+dw40ypj37prNlGKUcgDfpWTIQ nKfLiLv9Ad49OF4GHDtX6/2rrbTronbe3pcG4yg44Um2sNjDUozraEYKIychGC9LGatv Fi1sD0OEuhzjmfKj2Ixi8cXMbqNqDQMB/oqRySrmGVj697ErYuPc+wKyS4HJ5ESv9uHY ujYz4u4u/KV6Gs0iCVLcUUwz9RaCfLznIraNZossNt23e374tcUuaMv0sRb15XZdtCdQ Zt7tdShU/eWipP7trjgZR48WlF4odWTKMkKkXUbf3xbDGr+CBZ19b7bKRpK3I6ybGt5g rCXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=FkY+RcQX851tZfoM1MMfCnc1g2l7Hr8Gb+qFVJRsNq8=; fh=WGvx4C50k3zOWDF/Tln1vvyk4Ddwtt5Fmxf2SWew/4Y=; b=ZnX1P3jTDjbXdNAI+y9cbCsE0S3s/VP7/Rg/mI6jkHfdbNV1liMUiaQXnGpZkTFh2X T5UsCNwjbPOIQXSkZkiq/iMo63w6iYFgEhLBKUahGYxyOkhnas3qczarbPd8V0Pak/yZ //3MSlpE68mv0E4sRDo5VMW5zrxTN5t81MflLr3clnBaYDQas56QVPXyQ1Qs+RZhEKh1 hHRKU4OevQmh6yYiP2GiL2CUUXQUXXV/q9rQK5I3rwo5hd+X06MsgLpfPco9rIii3/yX 81XwOjBEoB3EMlnyTUDzqx7ezZKRRQi1BLRtvml86wlJLq2XMeqxBoP5EK6LkuXRhp1I ChCg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=np3yit3d; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c20-20020a17090ad91400b0026cefd1bb13si3655259pjv.0.2023.08.31.02.34.05; Thu, 31 Aug 2023 02:34:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=np3yit3d; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233035AbjHaH3D (ORCPT + 99 others); Thu, 31 Aug 2023 03:29:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43430 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229680AbjHaH3B (ORCPT ); Thu, 31 Aug 2023 03:29:01 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE29E1A4 for ; Thu, 31 Aug 2023 00:28:58 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 77317B8214E for ; Thu, 31 Aug 2023 07:28:57 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3B408C433C9 for ; Thu, 31 Aug 2023 07:28:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1693466936; bh=aAsHT2HzQhTcrO+4j0ndVmaexFGGJKnxvEIZY4yryN8=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=np3yit3dWU+KvMRcRuve1HYQ8szdLTgQUAV++aqlxieYYPxKTymuiRHI2MA3luxtH tiQvYapxG9sYlUq8uOoD0Hs+5RPOWUqDVy/+r7NexaVxPYoj2hVw5UNQ+U6mjXEop7 buGPp9RfTcT+8N5s5eD1kyEk22YcBRYqHlwgO8Chcj3MuoR3yf+f16jTlszlmXMqIi 1OjE0ctx7MopzU7SM1NjN2biFMQhxANFHsdWmzrm5Rc4N1irugBz3yI+EXbdcuF9wq jjdfMfCOFy7AqWccusNQbc+B5HbrJwfE3hLDCOVAZ2bPkaNpIHmlsdgOVrS/ze/F7G gXP+wTxfM1tSQ== Received: by mail-lj1-f178.google.com with SMTP id 38308e7fff4ca-2b9d07a8d84so9114351fa.3 for ; Thu, 31 Aug 2023 00:28:56 -0700 (PDT) X-Gm-Message-State: AOJu0Yy8abhLfD8Zr4bmDuKctHsYGZKEsocL7i2qm2tw4phF5/TDM6wW P8ITmVeg7IdZXj/l+QDPQWPgtlHC0MGx42n8OZI= X-Received: by 2002:a2e:a178:0:b0:2b9:f0b4:eaa1 with SMTP id u24-20020a2ea178000000b002b9f0b4eaa1mr3008469ljl.16.1693466934183; Thu, 31 Aug 2023 00:28:54 -0700 (PDT) MIME-Version: 1.0 References: <20230830212238.135900-1-ardb@kernel.org> <202308301608.739BFA8@keescook> <20230831052009.GA1349@sol.localdomain> In-Reply-To: <20230831052009.GA1349@sol.localdomain> From: Ard Biesheuvel Date: Thu, 31 Aug 2023 09:28:42 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] pstore: Base compression input buffer size on estimated compressed size To: Eric Biggers Cc: Kees Cook , linux-kernel@vger.kernel.org, Linus Torvalds , Herbert Xu Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 31 Aug 2023 at 07:20, Eric Biggers wrote: > > On Wed, Aug 30, 2023 at 04:43:37PM -0700, Kees Cook wrote: > > On Wed, Aug 30, 2023 at 11:22:38PM +0200, Ard Biesheuvel wrote: > > > Commit 1756ddea6916 ("pstore: Remove worst-case compression size logic") > > > removed some clunky per-algorithm worst case size estimation routines on > > > the basis that we can always store pstore records uncompressed, and > > > these worst case estimations are about how much the size might > > > inadvertently *increase* due to encapsulation overhead when the input > > > cannot be compressed at all. So if compression results in a size > > > increase, we just store the original data instead. > > > > Does the Z_FINISH vs Z_SYNC_FLUSH thing need to be fixed as well, or > > does that become a non-issue with this change? > > I haven't seen any real evidence that that issue actually exists. > Indeed. The workaround in crypto/deflate.c was added in 2003 by James Morris, and the zlib fork was rebased onto a newer upstream at least once since then. > > > void pstore_record_init(struct pstore_record *record, > > > @@ -305,7 +314,7 @@ static void pstore_dump(struct kmsg_dumper *dumper, > > > record.buf = psinfo->buf; > > > > > > dst = big_oops_buf ?: psinfo->buf; > > > - dst_size = psinfo->bufsize; > > > + dst_size = max_uncompressed_size ?: psinfo->bufsize; > > > > > > /* Write dump header. */ > > > header_size = snprintf(dst, dst_size, "%s#%d Part%u\n", why, > > > @@ -326,8 +335,15 @@ static void pstore_dump(struct kmsg_dumper *dumper, > > > record.compressed = true; > > > record.size = zipped_len; > > > } else { > > > - record.size = header_size + dump_size; > > > - memcpy(psinfo->buf, dst, record.size); > > > + /* > > > + * Compression failed, so the buffer is most > > > + * likely filled with binary data that does not > > > + * compress as well as ASCII text. Copy as much > > > + * of the uncompressed data as possible into > > > + * the pstore record, and discard the rest. > > > + */ > > > + record.size = psinfo->bufsize; > > > + memcpy(psinfo->buf, dst, psinfo->bufsize); > > > > I don't think this is "friendly" enough. :P > > > > In the compression failure case, we've got a larger dst_size (and > > dump_size, but technically it might not be true if something else went > > wrong) than psinfo->bufsize, so we want to take the trailing bytes > > (i.e. panic details are more likely at the end). And we should keep > > the header, which is already present in "dst". I think we need to do > > something like this: > > > > size_t buf_size_available = psinfo->bufsize - header_size; > > size_t dump_size_wanted = min(dump_size, buf_size_available); > > > > record.size = header_size + dump_size_wanted; > > memcpy(psinfo->buf, dst, header_size); > > memcpy(psinfo->buf + header_size, > > dst + header_size + (dump_size - dump_size_wanted), > > dump_size_wanted); > > > > My eyes, my eyes. > > > > How hard would it be to write two uncompressed records when compression fails to > achieve the targeted 50% ratio? > It wouldn't be that hard, and this crossed my mind as well. However, we should ask ourselves when this is ever going to happen, and what the implications are for those records. The reason we want to capture the dmesg output is because it contains human readable ASCII text that explains what happened right before the system crashed. The zlib library code uses pre-allocated buffers and so the most likely reason for failure is that the 2x estimation does not hold for the buffer in question. It is highly unlikely that that buffer contains ASCII text or anything resembling it in that case, and so it is most probably garbage that happens to compress really poorly. Do we really need to capture all of that? And if we don't, does it really matter whether the we capture the first bit or the last bit?