Received: by 2002:a05:7412:f690:b0:e2:908c:2ebd with SMTP id ej16csp166526rdb; Thu, 19 Oct 2023 00:03:16 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHQjmIVxnEY02ZMD1ScNh+jnHysik3wgeqCxybRZ91/HLeWFh3p9GcWGXi5W+bsJgWsyEYV X-Received: by 2002:a17:902:ecc3:b0:1c5:b855:38f with SMTP id a3-20020a170902ecc300b001c5b855038fmr1837008plh.24.1697698996149; Thu, 19 Oct 2023 00:03:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697698996; cv=none; d=google.com; s=arc-20160816; b=vL+hzf626A1gs3mlk7t1T6OR0UkJ7vY1AG1CUdQV5E6VIy57yQtyBnWDLwGZ6ElbPB bUyHSu7tYkLe21aCe6Wn49ftljbUfgm4cEuR6/4UX3D3qUZUldFpvaT9Sofg8KAzV66u 5bYrLHcPK3L4WYcNqVZrMPRpd3m4spE/roKyxDdBlqttKxD1iPg7zlTj+gkOuPGrSB8z pyoOiVczIMB0/5ecjkEjy/mPrkuHuYOOGPsEVyRuCNTi2Umd1aOnHhc6RqTDmdCkRWOZ XTSp3hrB0CZPpCR8fk249fJQjaaFIip+njyU3LUBDvPkRnVnLmqPYi7vDg9dR1a1sO/n HjAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=POYUbV85mJmwsgyUe7yXLNQlTMLHgXWAa14Vl8HA5yk=; fh=EfSMxHt5EA7zoTRA+lwqEmb0Vi5TW+g7iMSfDZ8B610=; b=LF4CX9VqxfeK0+JzzM0aCU7Y03M03VmHz8xzFNNtq3psY1fJ+iThUr6hcsNYislTrc ZvmfOLsnni//q9OFP/qfGCjbBWQ5by87n09bCEToSh9VrUCMBaM0yjggIxE0U03JTjTR hbsJ3djrv0aWhAPcalJSllj+9K27lwzEitRNmsuXzVB8rYQWXEcqhw9agrVta0UWPGr/ rrwAiUf9yThVySHt04HX0TJdMnBuI/UNO9/n4ysAGE7HIrLchMdnIOVHplmZMUGz9Mvs DCMSBYzT/ezWXirtg99rmjT/vHunOLX+NLYQCuXKEy5sg1s/9EjnWfCr7kt304OwML+0 GNpQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id s9-20020a170902ea0900b001c9af74feaesi1738493plg.215.2023.10.19.00.03.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Oct 2023 00:03:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id D41248075B2D; Thu, 19 Oct 2023 00:03:12 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344745AbjJSHDD (ORCPT + 99 others); Thu, 19 Oct 2023 03:03:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56160 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232835AbjJSHDC (ORCPT ); Thu, 19 Oct 2023 03:03:02 -0400 Received: from 1wt.eu (ded1.1wt.eu [163.172.96.212]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 52C6A12D; Thu, 19 Oct 2023 00:02:58 -0700 (PDT) Received: (from willy@localhost) by mail.home.local (8.17.1/8.17.1/Submit) id 39J71rZ9000896; Thu, 19 Oct 2023 09:01:53 +0200 Date: Thu, 19 Oct 2023 09:01:53 +0200 From: Willy Tarreau To: Kees Cook Cc: Christoph Hellwig , Justin Stitt , Keith Busch , Jens Axboe , Sagi Grimberg , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org, ksummit@lists.linux.dev Subject: Re: the nul-terminated string helper desk chair rearrangement Message-ID: References: <20231018-strncpy-drivers-nvme-host-fabrics-c-v1-1-b6677df40a35@google.com> <20231019054642.GF14346@lst.de> <202310182248.9E197FFD5@keescook> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <202310182248.9E197FFD5@keescook> X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Thu, 19 Oct 2023 00:03:13 -0700 (PDT) On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote: > On Thu, Oct 19, 2023 at 07:46:42AM +0200, Christoph Hellwig wrote: > > On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote: > > > strncpy() is deprecated for use on NUL-terminated destination strings > > > [1] and as such we should prefer more robust and less ambiguous string > > > interfaces. > > > > If we want that we need to stop pretendening direct manipulation of > > nul-terminate strings is a good idea. I suspect the churn of replacing > > one helper with another, maybe slightly better, one probably > > introduces more bugs than it fixes. > > > > If we want to attack the issue for real we need to use something > > better. > > > > lib/seq_buf.c is a good start for a lot of simple cases that just > > append to strings including creating complex ones. Kent had a bunch > > of good ideas on how to improve it, but couldn't be convinced to > > contribute to it instead of duplicating the functionality which > > is a bit sad, but I think we need to switch to something like > > seq_buf that actually has a counted string instead of all this messing > > around with the null-terminated strings. > > When doing more complex string creation, I agree. I spent some time > doing this while I was looking at removing strcat() and strlcat(); this > is where seq_buf shines. (And seq_buf is actually both: it maintains its > %NUL termination _and_ does the length counting.) The only thing clunky > about it was initialization, but all the conversions I experimented with > were way cleaner using seq_buf. (...) I also agree. I'm using several other schemes based on pointer+length in other projects and despite not being complete in terms of API (due to the slow migration of old working code), over time it proves much easier to use and requires far less controls. With NUL-teminated strings you need to perform checks for each and every operation. When the length is known and controlled, most often you can get rid of many tests on intermediate operations and perform a check at the end, thus you end up with less "if" and "goto fail" in the code, because the checks are no longer for "not crashing nor introducing vulnerabilities", but just "returning a correct result", which can often be detected more easily. Another benefit I found by accident is that when you need to compare some tokens against multiple ones (say some keywords for example), it becomes much faster than strcmp()-based if/else series because in this case you start by comparing lengths instead of comparing contents. And when your macros allow you to constify string constants, the compiler will replace long "if" series with checks against constant values, and may even arrange them as a tree since all are constants, sometimes mixing with the first char as the discriminator. Typically on the test below I observe a 10x speedup at -O3 and ~5x at -O2 when I convert this: if (!strcmp(name, "host") || !strcmp(name, "content-length") || !strcmp(name, "connection") || !strcmp(name, "proxy-connection") || !strcmp(name, "keep-alive") || !strcmp(name, "upgrade") || !strcmp(name, "te") || !strcmp(name, "transfer-encoding")) return 1; to this: if (isteq(name, ist("host")) || isteq(name, ist("content-length")) || isteq(name, ist("connection")) || isteq(name, ist("proxy-connection")) || isteq(name, ist("keep-alive")) || isteq(name, ist("upgrade")) || isteq(name, ist("te")) || isteq(name, ist("transfer-encoding"))) return 1; The code is larger but when compiled at -Os, it instead becomes smaller. Another interesting property I'm using in the API above, that might or might not apply there is that for most archs we care about, functions can take a struct of two words passed as registers, and can return such a struct as a pair of registers as well. This allows to chain functions by passing one function's return as the argument to another one, which is what users often want to do to avoid intermediate variables. All this to say that length-based strings do offer quite a lot of benefits over the long term. Willy