Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp1783523rwi; Mon, 10 Oct 2022 23:39:14 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7HMfpaJCyYA77dWtn4+3EkHbaGIyHy++4X/llkYwqfwnJE1oA6tkc6TmTzIBSK5SjbDYz8 X-Received: by 2002:a17:90a:e7d0:b0:20c:169f:7503 with SMTP id kb16-20020a17090ae7d000b0020c169f7503mr20469428pjb.175.1665470354259; Mon, 10 Oct 2022 23:39:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665470354; cv=none; d=google.com; s=arc-20160816; b=L9euqnCtXotFwHgH4uKKnbumfZx41BH9HEYZc+2zHhMXjT7cWm88+YNPyaxgU5TdWJ 3Avw67tGV2ju/EkJX8I8sZH7neoz84LcDOIlqugVJpKdTOaq133JkNKlE/XuuEZmVMpC 3N3Oxi51gBNEPpVPfISITVytjggd+xS85EW9n8/+ZaoZRvQegLWRxVp1+p2o1Zh/PC3b UbHqYpKyLJL7ANBFeM+ffjKQ3StDV/n3AdEJht3DYkKzEG1/MDM2+tBywadGdLtg4Dd7 wT2vw+3e2naXzIdU8/CU1uvSTN81mOylNARzPzO6GnQxmwCpxFA1oKF2bL6k0EXD0r7W +btg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=5T+PnP42sEs4hw9687+M6AniaRXf14TUKPHVoz572Dg=; b=EkjrCGywL6lefOfzpyCqYNkBtotlJI2NhvYesf6+yMImkfHQmmXVMiJeoTFqeC2RgR 7kbmHXeQbr4bGPcMcwuILfWKmFf7Evr+YouDjlq0hwymNZbiEWQ02HYlw2CIJInQ8in+ YXcwtpU+wGvXRlg/TiFGAWitC9WVB/SCyZN/pW1DNwyVQuVDUyeMXSdtQ/jox1U1aRkQ 07i1cMsB+wjE4m/Q3BXGfE3ArkfUqv8PdbPXGejko1TKO0oUXkeXYTD7gK/SLYiltpON VmUzTgzd8OSr4e1qRe29GxLp+mESi7erdQyJ3k3Jx9xvVK15bt+2pKQEicP0NjljwcIG ZXjg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e7-20020a170902b78700b00174a8d357b0si13237908pls.202.2022.10.10.23.39.02; Mon, 10 Oct 2022 23:39:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229929AbiJKGWU (ORCPT + 99 others); Tue, 11 Oct 2022 02:22:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229764AbiJKGVr (ORCPT ); Tue, 11 Oct 2022 02:21:47 -0400 Received: from 1wt.eu (wtarreau.pck.nerim.net [62.212.114.60]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8169588A31; Mon, 10 Oct 2022 23:21:32 -0700 (PDT) Received: (from willy@localhost) by pcw.home.local (8.15.2/8.15.2/Submit) id 29B6Kt5D005367; Tue, 11 Oct 2022 08:20:55 +0200 Date: Tue, 11 Oct 2022 08:20:55 +0200 From: Willy Tarreau To: David Laight Cc: Alexey Dobriyan , "lkp@intel.com" , "linux-kselftest@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "Paul E. McKenney" Subject: Re: tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12 Message-ID: <20221011062055.GC5107@1wt.eu> References: <20221009175920.GA28685@1wt.eu> <20221009183604.GA29069@1wt.eu> <9e16965f1d494084981eaa90d73ca80e@AcuMS.aculab.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9e16965f1d494084981eaa90d73ca80e@AcuMS.aculab.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_PASS, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 10, 2022 at 10:03:53AM +0000, David Laight wrote: > From: Willy Tarreau > > Sent: 09 October 2022 19:36 > ... > > By the way, just for the sake of completeness, the one that consistently > > gives me a better output is this one: > > > > size_t strlen(const char *str) > > { > > const char *s0 = str--; > > > > while (*++str) > > ; > > return str - s0; > > } > > > > Which gives me this: > > > > > > 0000000000000000 : > > 0: 48 8d 47 ff lea -0x1(%rdi),%rax > > 4: 48 ff c0 inc %rax > > 7: 80 38 00 cmpb $0x0,(%rax) > > a: 75 f8 jne 4 > > c: 48 29 f8 sub %rdi,%rax > > f: c3 ret > > > > But this is totally ruined by the addition of asm() in the loop. However > > I suspect that the construct is difficult to match against a real strlen() > > since it starts on an extra character, thus placing the asm() statement > > before the loop could durably preserve it. It does work here (the code > > remains the exact same one), but for how long, that's the question. Maybe > > we can revisit the various loop-based functions in the future with this in > > mind. > > clang wilfully and persistently generates: > > strlen: # @strlen > movq $-1, %rax > .LBB0_1: # =>This Inner Loop Header: Depth=1 > cmpb $0, 1(%rdi,%rax) > leaq 1(%rax), %rax > jne .LBB0_1 > retq > > But feed the C for that into gcc and it generates a 'jmp strlen' > at everything above -O1. Interesting, that's not the case for me here with 12.2 from kernel.org on x86_64, which gives this at -O1, -O2, -O3 and -Ofast: 0000000000000000 : 0: 48 8d 47 ff lea -0x1(%rdi),%rax 4: 0f 1f 40 00 nopl 0x0(%rax) 8: 48 83 c0 01 add $0x1,%rax c: 80 38 00 cmpb $0x0,(%rax) f: 75 f7 jne 8 11: 48 29 f8 sub %rdi,%rax 14: c3 ret Out of curiosity what version were you using ? > I suspect that might run with less clocks/byte than the code above. Certainly for large strings, but not for short ones. > Somewhere I hate some complier pessimisations. > Substituting a call to strlen() is typical. > strlen() is almost certainly optimised for long strings. > If the string is short the coded loop will be faster. Yes, and more importantly, if a developer takes the time to explicitly write a loop to do something that matches such a function, it's very likely that they already considered the function and did *not* want to use it for whatever reason. And the problem we're currently having with compilers is that they are not willing to respect the developer's intent because they always know better. > The same is true (and probably more so) for memcpy. Yes, I think that we'll eventually have to stuff a few asm() here and there in a few such loops as compilers become less and less trustable. Willy