Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp233703iob; Wed, 11 May 2022 13:14:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwBLybz1tcv2IK/22DGr+t7497AnQa093ewA90Arxk2wYFjMROBNa0TYCJio9LVLIcb7siG X-Received: by 2002:a17:907:62a2:b0:6e0:e201:b94e with SMTP id nd34-20020a17090762a200b006e0e201b94emr26267552ejc.730.1652300043434; Wed, 11 May 2022 13:14:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652300043; cv=none; d=google.com; s=arc-20160816; b=sn09CmuyGvmHAg39avPLj/z46b1h6VyDjojLqe3q16e7jbp067JgXz9nn30DQ0x1Te JBLBNhxPcp1UPMp5R4aGZ2/7gTEhGlgUEHD/7O0OOWy+JbcfeVK5LQKQYaZwtTyJqA/M hxt8ofQ4MTvqor7pYOxrjD8Mhq/i7SdWCkjSAQ9+O5IhY5uVQUl8AAUyFlXZX+06CAPE TYkLMZzyd1V9lxdi0aLsVvbczgVks0ga3XQZdAXfjh/Amuv93Gf/lCnZb6CeX+E6QDh1 P9uadLZYcyxvHG2pRPvVucDj7/VDVQI7aTlONdt9PXQgwcquQGg4+heZcBOVoPU/SYPz HBmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version; bh=EnmGA8ainxG/8m3UcH5gpShY1r3ysYYHWnV4t7QF74M=; b=G4AdtlbeB6JGjgNOE80I/a7gpj4RDXS3z/X+ZUWCd7VTXGXVJeMbgVVddXW/zcOKrX htS8uZlz9cG/NzgOts3aVmW9qHDcyyOddfTxmDmGP7kH+WGi1f8N17LLn16xkuD0BxmP NATSEpQOEUA5KIMge7R1/MkgRtmcsLyuhOkp/NB+o+79YQBSUg+qxiMUo1EtQtk49rU9 P39a+93Wbn5ZmqBc6y3Hk6eeHL5N+2etxlecXm+lz/dKkXviL3vq/TqGN9ottd3ZCTOM 42UXT/amZLT38TM1EYdjq5yuw3w/rr0KBQEKp0XR9oGL40uPZ+MgGzreC2dp/qD1PGOV eF3w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hb15-20020a170907160f00b006f8bb943e80si4079150ejc.435.2022.05.11.13.13.14; Wed, 11 May 2022 13:14:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245071AbiEKOrJ (ORCPT + 99 others); Wed, 11 May 2022 10:47:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48292 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245041AbiEKOqu (ORCPT ); Wed, 11 May 2022 10:46:50 -0400 Received: from mail-yb1-f177.google.com (mail-yb1-f177.google.com [209.85.219.177]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 653342DD6D for ; Wed, 11 May 2022 07:46:43 -0700 (PDT) Received: by mail-yb1-f177.google.com with SMTP id y76so4496578ybe.1 for ; Wed, 11 May 2022 07:46:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=EnmGA8ainxG/8m3UcH5gpShY1r3ysYYHWnV4t7QF74M=; b=3t1QvIMeFK5mP1pm+mtbHV+1M6nMvHDx53e5f+bT8UJgXX9hTvdaYvt/UdCBWKpDI8 zCGkBtm8SmniRqbfBk5i6DS/pT/GsHiTvzLFlzOY3ZhV9QLx5KRIJa1gn/2OJipf4UlU SpwDuUNs+eGnnXhK6Fov+n9a9I468QyGh+XeCuIxep/VZhq7WJo/Xkh8iafV+OrPKoKl q5ZpmXV2tE3kVNzDTLaliuYPph65Goh6s+2072AVULu6/TajYWmxVP+hHqCxDfLOTT6S i3FVfz2HeyqK9ZXNDNs612Gz3NFog2DlWqNP5BNYTUBSwzT6yOn9yZZ59ahuJF0coHHE 3YdQ== X-Gm-Message-State: AOAM532P+utQwNcz25zow5pCUxBXuRUgycFflOFeSqPxORiH8jKTK9hN GqL2hsQ/Zhmf6omjUXS930d56L8fA0/iom23hHc= X-Received: by 2002:a25:cb4b:0:b0:645:d702:eb15 with SMTP id b72-20020a25cb4b000000b00645d702eb15mr22111070ybg.500.1652280402426; Wed, 11 May 2022 07:46:42 -0700 (PDT) MIME-Version: 1.0 References: <20220510142550.1686866-1-mailhol.vincent@wanadoo.fr> In-Reply-To: From: Vincent MAILHOL Date: Wed, 11 May 2022 23:46:31 +0900 Message-ID: Subject: Re: [PATCH 0/2] x86/asm/bitops: optimize ff{s,z} functions for constant expressions To: Nick Desaulniers Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Nathan Chancellor , Tom Rix , linux-kernel@vger.kernel.org, llvm@lists.linux.dev Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed. 11 mai 2022 at 08:24, Vincent MAILHOL wrote: > On Wed. 11 May 2022 at 07:14, Nick Desaulniers wrote: > > On Tue, May 10, 2022 at 7:26 AM Vincent Mailhol > > wrote: > > > > > > The compilers provides some builtin expression equivalent to the > > > ffs(), __ffs() and ffz() function of the kernel. The kernel uses > > > optimized assembly which produces better code than the builtin > > > functions. However, such assembly code can not be optimized when used > > > on constant expression. > > > > > > This series relies on __builtin_constant_p to select the optimal solution: > > > > > > * use kernel assembly for non constant expressions > > > > > > * use compiler's __builtin function for constant expressions. > > > > > > I also think that the fls() and fls64() can be optimized in a similar > > > way, using __builtin_ctz() and __builtin_ctzll() but it is a bit less > > > trivial so I want to focus on this series first. If it get accepted, I > > > will then work on those two additionnal function. > > > > > > > > > ** Statistics ** > > > > > > On a allyesconfig, before applying this series, I get: > > > > > > | $ objdump -d vmlinux.o | grep bsf | wc -l > > > | 1081 > > > > > > After applying this series: > > > > > > | $ objdump -d vmlinux.o | grep bsf | wc -l > > > | 792 > > > > > > So, roughly 26.7% of the call to either ffs() or __ffs() were using > > > constant expression and can be optimized (I did not produce the > > > figures for ffz()). > > > > These stats are interesting; consider putting them on patch 1/2 commit > > message though (in addition to the cover letter). (Sending thoughts on > > 1/2 next). > > The fact is that patch 1/2 changes ffs() and patch 2/2 changes > __ffs(). For v2, I will run the stats on each patch separately in > order not to mix the results. > > > > > > > (tests done on linux v5.18-rc5 x86_64 using GCC 11.2.1) > > > > Here's the same measure of x86_64 allyesconfig (./scripts/config -d > > CONFIG_HINIC) at 9be9ed2612b5aedb52a2c240edb1630b6b743cb6 with ToT > > LLVM (~clang-15): > > > > Before: > > $ objdump -d vmlinux.o | grep bsf | wc -l > > 1454 > > > > After: > > $ objdump -d vmlinux.o | grep bsf | wc -l > > 1070 > > > > -26.4% :) > > Roughly same ratio. I am just surprise that the absolute number > are different: > > * GCC before: 1081, after 792 > * clang before 1454, after 1070 > > I wonder why clang produces more bsf instructions than GCC? Did not find the answer yet, but while looking at this, I found another interesting thing: on x86_64 the bsf instruction produces tzcnt when used with the ret prefix. So ffs() produces bsf assembly instructions but __ffs() and ffz() produces tzcnt. c.f. http://lkml.kernel.org/r/5058741E020000780009C014@nat28.tlf.novell.com I will update the figures in v2 and benchmark both bsf and tzcnt. > Also, on a side note, I am not the first one to realize that > __builtin_ffs() is able to optimize the constant variable. Some > people already used it to locally: > > | $ git grep __builtin_ffs | wc -l > | 80