Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp6479575iob; Tue, 10 May 2022 20:58:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx6UsRL3/LPBCK4TnVBzgBHT13TnKQMP8sMD2rvyZSluLMrRdBjXI8IEyPc+PilTPftyqa6 X-Received: by 2002:aa7:dcd5:0:b0:425:e49f:db86 with SMTP id w21-20020aa7dcd5000000b00425e49fdb86mr26819658edu.202.1652241505729; Tue, 10 May 2022 20:58:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652241505; cv=none; d=google.com; s=arc-20160816; b=JOWRs3yVHbGZB2goSw0coTj4ll6YcLFBFV+knQS5HX8elGxziYS8ihX9T8jE47JE8Y e2YbWoKmmdv9V26KPQ2AiMK/JV/xwm0FTACys8LecY+ua/IPTaCrT9zVvaGtlz6CDZlm DEKcEwJ0Q+HGWJOIHkrn80P1Hxru8H9OUkfnEqQrkB9S4sjSjBq5UCJ9ImHypQLza8pi cFcmvwIMPz0hKiGCajxKH4VYmf+tCS+bOOs05P8F5bNx0oxPkToNyr8Dxw3AX0s+QSHx ashbp2V2rWPNdeUXtgYI7zvSqIhAc15yFFhp9bNxcYKQyVGWaDYr2PdsM48ihSBgkUAx M21w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version; bh=GokfEYQ6AnMfT1CyMpbI+Gftvn5q5QVicSRCEy9ThmQ=; b=AE4cvITF2SrrJ+MptvFLd2lGpwCCJw2iQ7IKUTErJ4IGGZz61BsSvFTO+9Ln/WkCYd lYh3VKouthsp4BFk4MhmjtphAgoeF4zt1GoUj28pr7LafOHXz7Bmk143AYf76QtB3uNB fUoQInXqNhTyJVDnv0f4iCUq6+ZWRGkdaxQf9t5KbneZvgfsDjCBGYZs/vpatKydbBKB jHFIJHDlc3mrPI8DT3OT45eeMWf0zev09mK9VyRPFDuMYDr+pXc1sEfazJEHSauBb/Rx rbs7x9H2DuRX20jFfHTxyoir1Tvi71vjQdKiDkv7kKu6U3MEfRcsss9D5akOfc8GVra3 GhZA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hb7-20020a170907160700b006e86e278dcesi1240107ejc.766.2022.05.10.20.58.00; Tue, 10 May 2022 20:58:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237647AbiEJXYw (ORCPT + 99 others); Tue, 10 May 2022 19:24:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49700 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233372AbiEJXYu (ORCPT ); Tue, 10 May 2022 19:24:50 -0400 Received: from mail-yb1-f171.google.com (mail-yb1-f171.google.com [209.85.219.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A56F32802F5 for ; Tue, 10 May 2022 16:24:46 -0700 (PDT) Received: by mail-yb1-f171.google.com with SMTP id m128so745283ybm.5 for ; Tue, 10 May 2022 16:24:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=GokfEYQ6AnMfT1CyMpbI+Gftvn5q5QVicSRCEy9ThmQ=; b=Q9tCJA2T6C9b2tpdFN8wDgedg4W5ApdcgtHSoUkVHqXmIxIhFDHQr7rR6qnnKLCYFg ALB9bfpLMwtPK4Zyt48VzPrZw3HY5RAAsrIfEQn4znYcnZ5fgvms92PPfBHY1dLR+hF9 VRN5kEn7i7hDhI0aLEbmDX/X4RWBk3m2mYQ2vu6cJCC8/EBTWz9/hfX6d69Kk46yFqko musukmqMHvhWXboWGxCYqess250Thvln/40tKfdzRHScJf/qx5NymTvSc8ggx1vtFtzt NnetJL3P3zkThmMBKVuBcWCq1HsC5QqHUaonc3VBtOKp9rIypfAeLnVemfVQJ1t1tOBW 3yCA== X-Gm-Message-State: AOAM530MADjnfmZIRAqjlSFgOWg5a6h8LU4Tkm4nkoFqWhI0imcmxr/N litEHmEKhT+vIPC8auKT4HCsfzvMwuWhXmnC/Rw= X-Received: by 2002:a25:76c6:0:b0:648:5616:ca50 with SMTP id r189-20020a2576c6000000b006485616ca50mr22374768ybc.423.1652225085812; Tue, 10 May 2022 16:24:45 -0700 (PDT) MIME-Version: 1.0 References: <20220510142550.1686866-1-mailhol.vincent@wanadoo.fr> In-Reply-To: From: Vincent MAILHOL Date: Wed, 11 May 2022 08:24:34 +0900 Message-ID: Subject: Re: [PATCH 0/2] x86/asm/bitops: optimize ff{s,z} functions for constant expressions To: Nick Desaulniers Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Nathan Chancellor , Tom Rix , linux-kernel@vger.kernel.org, llvm@lists.linux.dev Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed. 11 May 2022 at 07:14, Nick Desaulniers wrote: > On Tue, May 10, 2022 at 7:26 AM Vincent Mailhol > wrote: > > > > The compilers provides some builtin expression equivalent to the > > ffs(), __ffs() and ffz() function of the kernel. The kernel uses > > optimized assembly which produces better code than the builtin > > functions. However, such assembly code can not be optimized when used > > on constant expression. > > > > This series relies on __builtin_constant_p to select the optimal solution: > > > > * use kernel assembly for non constant expressions > > > > * use compiler's __builtin function for constant expressions. > > > > I also think that the fls() and fls64() can be optimized in a similar > > way, using __builtin_ctz() and __builtin_ctzll() but it is a bit less > > trivial so I want to focus on this series first. If it get accepted, I > > will then work on those two additionnal function. > > > > > > ** Statistics ** > > > > On a allyesconfig, before applying this series, I get: > > > > | $ objdump -d vmlinux.o | grep bsf | wc -l > > | 1081 > > > > After applying this series: > > > > | $ objdump -d vmlinux.o | grep bsf | wc -l > > | 792 > > > > So, roughly 26.7% of the call to either ffs() or __ffs() were using > > constant expression and can be optimized (I did not produce the > > figures for ffz()). > > These stats are interesting; consider putting them on patch 1/2 commit > message though (in addition to the cover letter). (Sending thoughts on > 1/2 next). The fact is that patch 1/2 changes ffs() and patch 2/2 changes __ffs(). For v2, I will run the stats on each patch separately in order not to mix the results. > > > > (tests done on linux v5.18-rc5 x86_64 using GCC 11.2.1) > > Here's the same measure of x86_64 allyesconfig (./scripts/config -d > CONFIG_HINIC) at 9be9ed2612b5aedb52a2c240edb1630b6b743cb6 with ToT > LLVM (~clang-15): > > Before: > $ objdump -d vmlinux.o | grep bsf | wc -l > 1454 > > After: > $ objdump -d vmlinux.o | grep bsf | wc -l > 1070 > > -26.4% :) Roughly same ratio. I am just surprise that the absolute number are different: * GCC before: 1081, after 792 * clang before 1454, after 1070 I wonder why clang produces more bsf instructions than GCC? Also, on a side note, I am not the first one to realize that __builtin_ffs() is able to optimize the constant variable. Some people already used it to locally: | $ git grep __builtin_ffs | wc -l | 80 > > > > > > Vincent Mailhol (2): > > x86/asm/bitops: ffs: use __builtin_ffs to evaluate constant > > expressions > > x86/asm/bitops: __ffs,ffz: use __builtin_ctzl to evaluate constant > > expressions > > > > arch/x86/include/asm/bitops.h | 65 +++++++++++++++++++++-------------- > > 1 file changed, 40 insertions(+), 25 deletions(-) > > > > -- > > 2.35.1 > > > > > -- > Thanks, > ~Nick Desaulniers