Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp2001438rwi; Tue, 11 Oct 2022 03:34:15 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6Sw0xBp/H+mfESfMWNDrP7dGKY9l7vZaTKz7od30uWSUlO7V+GKp/YPkVnwcGP98Tgq57A X-Received: by 2002:a17:902:efd4:b0:180:fd88:1255 with SMTP id ja20-20020a170902efd400b00180fd881255mr16743241plb.111.1665484455115; Tue, 11 Oct 2022 03:34:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665484455; cv=none; d=google.com; s=arc-20160816; b=vtCDIcIas9hbDnUXOzsBFagBPBFV2UAn/H5XBIgnWByTr7B/ZcBkQfrj3lEMtTa57C DC+QlfixKlSuf5E5khQWqkYWHMIVe5GxEoyybwxTnffF45HiqTEZeEdnxR/IwYktx5sZ 7mhZLoKpc7YovBXQpLR4WyTEUwgNH1nF/+UCHDPNaYXqSimDHwXwZ1Wscl8cc5Y5qGeR Rlx0Iug3uRYSly0g5/G4JOb0zBhx7Yf5q5zcDEnqJOsdnJj7UW5h8zy/2LNipHd8jh+7 CuXXS5yOrsJUz6ogPf3JxFsuJFpKZ5JHMcfzBPRD/YNonB+IXEefCxjnZxEUo9m1pgDi tqCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :mime-version:accept-language:in-reply-to:references:message-id:date :thread-index:thread-topic:subject:cc:to:from; bh=X2aKwKZyW1hW7kDAAGAtM/wqJT36O++YlLT5qMSTfNY=; b=VC6VqwMvEES/+TAwwT+x0QUOzuBVszEbkW9/p1ytp+1HyIltddfdXHb/w1jcB7ybFL P4uAuPLmSe+eylekODejCh3zHLpY44gwPLyNjUfQVFrafVwgCljzbSDabWFW4U8o4yrY AmcOrPdRHNngukIdbvtL6Kw9k4ppo44Dwzws/Ectf+LK8pFBFWmWhzib+0xEAqMpYGnT o4gc/WH9Yr/oirskfkqH8e6POvtwoelMovlSGahVhG6KwGP9XpLdY7Glti2l/7xXoD9E lVpAhj36lQoWUFtavKt+SZeU9cjzFw43+LMVfz27ZomFLY78QbtGxDlOFlNPEIto7R1d +JhA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ms19-20020a17090b235300b0020a78bdb716si15790655pjb.24.2022.10.11.03.34.02; Tue, 11 Oct 2022 03:34:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229552AbiJKKSL convert rfc822-to-8bit (ORCPT + 99 others); Tue, 11 Oct 2022 06:18:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229490AbiJKKSJ (ORCPT ); Tue, 11 Oct 2022 06:18:09 -0400 Received: from eu-smtp-delivery-151.mimecast.com (eu-smtp-delivery-151.mimecast.com [185.58.85.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D8B86FA1E for ; Tue, 11 Oct 2022 03:18:07 -0700 (PDT) Received: from AcuMS.aculab.com (156.67.243.121 [156.67.243.121]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id uk-mta-286-09IWco5YP-G3TOKGjGh0fQ-1; Tue, 11 Oct 2022 11:18:04 +0100 X-MC-Unique: 09IWco5YP-G3TOKGjGh0fQ-1 Received: from AcuMS.Aculab.com (10.202.163.6) by AcuMS.aculab.com (10.202.163.6) with Microsoft SMTP Server (TLS) id 15.0.1497.38; Tue, 11 Oct 2022 11:18:03 +0100 Received: from AcuMS.Aculab.com ([::1]) by AcuMS.aculab.com ([::1]) with mapi id 15.00.1497.040; Tue, 11 Oct 2022 11:18:03 +0100 From: David Laight To: 'Willy Tarreau' CC: Alexey Dobriyan , "lkp@intel.com" , "linux-kselftest@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "Paul E. McKenney" Subject: RE: tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12 Thread-Topic: tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12 Thread-Index: AQHY3A4FfEAIYReKOkCQTHoqkpVhA64HYvoggAFHY4CAACsgsA== Date: Tue, 11 Oct 2022 10:18:03 +0000 Message-ID: References: <20221009175920.GA28685@1wt.eu> <20221009183604.GA29069@1wt.eu> <9e16965f1d494084981eaa90d73ca80e@AcuMS.aculab.com> <20221011062055.GC5107@1wt.eu> In-Reply-To: <20221011062055.GC5107@1wt.eu> Accept-Language: en-GB, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.107] MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: aculab.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Willy Tarreau > Sent: 11 October 2022 07:21 > > On Mon, Oct 10, 2022 at 10:03:53AM +0000, David Laight wrote: > > From: Willy Tarreau > > > Sent: 09 October 2022 19:36 > > ... > > > By the way, just for the sake of completeness, the one that consistently > > > gives me a better output is this one: > > > > > > size_t strlen(const char *str) > > > { > > > const char *s0 = str--; > > > > > > while (*++str) > > > ; > > > return str - s0; > > > } > > > > > > Which gives me this: > > > > > > > > > 0000000000000000 : > > > 0: 48 8d 47 ff lea -0x1(%rdi),%rax > > > 4: 48 ff c0 inc %rax > > > 7: 80 38 00 cmpb $0x0,(%rax) > > > a: 75 f8 jne 4 > > > c: 48 29 f8 sub %rdi,%rax > > > f: c3 ret > > > > > > But this is totally ruined by the addition of asm() in the loop. However > > > I suspect that the construct is difficult to match against a real strlen() > > > since it starts on an extra character, thus placing the asm() statement > > > before the loop could durably preserve it. It does work here (the code > > > remains the exact same one), but for how long, that's the question. Maybe > > > we can revisit the various loop-based functions in the future with this in > > > mind. > > > > clang wilfully and persistently generates: > > > > strlen: # @strlen > > movq $-1, %rax > > .LBB0_1: # =>This Inner Loop Header: Depth=1 > > cmpb $0, 1(%rdi,%rax) > > leaq 1(%rax), %rax > > jne .LBB0_1 > > retq > > > > But feed the C for that into gcc and it generates a 'jmp strlen' > > at everything above -O1. > > Interesting, that's not the case for me here with 12.2 from kernel.org > on x86_64, which gives this at -O1, -O2, -O3 and -Ofast: > > 0000000000000000 : > 0: 48 8d 47 ff lea -0x1(%rdi),%rax > 4: 0f 1f 40 00 nopl 0x0(%rax) > 8: 48 83 c0 01 add $0x1,%rax > c: 80 38 00 cmpb $0x0,(%rax) > f: 75 f7 jne 8 > 11: 48 29 f8 sub %rdi,%rax > 14: c3 ret > > Out of curiosity what version were you using ? Clang 12.0.0 onwards, see https://godbolt.org/z/67Gnzs8js > > I suspect that might run with less clocks/byte than the code above. > > Certainly for large strings, but not for short ones. For short strings not needing the final sub and not having the read depend on the increment should make the leal one faster. (The nop to align the loop label is monumentally pointless.) For long strings what matters is how many clocks it takes to schedule the 4 uops in the loop. It might be possible to get down to 2 clocks - but I think both the loops are 3 clocks (assuming the adjacent cmp/jne fuse). I'm not going to try to instrument the loops though! David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)