Received: by 2002:ac0:e34a:0:0:0:0:0 with SMTP id g10csp916163imn; Tue, 26 Jul 2022 12:56:56 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uNDpm1+qTFrs/NxOobPS4crfNQZ9pPed7v8uDP2T8RmWy3+4WB8CLUmAdN+iyj/NyRh32l X-Received: by 2002:a05:6a00:1a04:b0:52a:d4dc:5653 with SMTP id g4-20020a056a001a0400b0052ad4dc5653mr317124pfv.69.1658865416113; Tue, 26 Jul 2022 12:56:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658865416; cv=none; d=google.com; s=arc-20160816; b=YP97FNOoKADDo7nPGKjeSgRmqCNOU2QnIQ41scSZ5HygIoBzbCg/mS5spfNuRbs0za gG7AZ6xKjjNavOWw7Gmf3UrwGL+mUakP6xmsFaJC8lYE/mq+iR09vojkhiKJ6H2XMLNK fy8vayQrKLvpMf+gZh1W/89YpnmxEUIbqQUOcGrOv8X18cayr+0JvlwlmKob/RrQzKpH N1Gpy+5TS5LCi2wGhHdFKtkaa4Aus7PfBm3S3ifI20SkZ9TATKOUDWSzv05WihIXus7N Tmr7uj7ZokYu8Zkuvpaev3jWfa6GKF9/tfgI/hCv2EGEkd4C+J+arc/tTo5JYgReR7JK G3Ww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=TGq8QP1oBJLa4d9eMw2KLoD7AcNO8P4iv3iIuufPVq4=; b=sLBNV2v0T8EJ6n8LYW4qfbFMS+F3+lnC2Q8psDDsy5qNqG+dYCl1Gm4H4cIusNppvv WP6F8o8wUvM6tmmpaRf0lRTpU1N52Q/8pasIVzCkXGgfrsSACXgfdyagtl6qoAj2ZSHA dHugbenGXk2fFOryuqWHyT3yUDp3ox9QsdlWBczU7TeFXry1IgLFZO5bSfDdaomaAJTt kPWOFAV0w4xt3/DNN1UbMJ1GZ3oVj2yoHNL4j1KxgJHcVFEJoWoonRRC6DQIBwl3Jfd7 YYkfKcoNSd3FVLCMKhUElqtE+uNgIcQdstbP2Mq9gSr4+EWDeamJIoCj4tsfrRf5AQtB xrig== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail (test mode) header.i=@armlinux.org.uk header.s=pandora-2019 header.b=DoPmG1X4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=armlinux.org.uk Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lr3-20020a17090b4b8300b001f2c217c71csi6270132pjb.153.2022.07.26.12.56.40; Tue, 26 Jul 2022 12:56:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail (test mode) header.i=@armlinux.org.uk header.s=pandora-2019 header.b=DoPmG1X4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=armlinux.org.uk Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239423AbiGZTop (ORCPT + 99 others); Tue, 26 Jul 2022 15:44:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230287AbiGZTon (ORCPT ); Tue, 26 Jul 2022 15:44:43 -0400 Received: from pandora.armlinux.org.uk (pandora.armlinux.org.uk [IPv6:2001:4d48:ad52:32c8:5054:ff:fe00:142]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 51D923122B for ; Tue, 26 Jul 2022 12:44:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=armlinux.org.uk; s=pandora-2019; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=TGq8QP1oBJLa4d9eMw2KLoD7AcNO8P4iv3iIuufPVq4=; b=DoPmG1X4azqFAa9TxUU9xSoQO4 PtjPwk2cEf59HghYi6McmmrULab2qwjdhs8xG2upOd4P/QQDDFZsZfvM1Vye8fVqAFmsfXpKwPSCt A7HH24z8qsPdm/WEI/1S14aw2cQKaV9yQUNI7GTtpv0LbHdAnIOPj1lwjEzczETDanFiIJ3Olzhrg u+HvoSF4Dw/bulkDWQ90cLky6Pjfh7faZaj51p23r/xRXMqcbbRi4fVygESlq1h6v1ZlMiahLrfhI FVhZeEb+JeR9p8nGfPm/gbSmwYuJmDr2biZphi4DDy2FSpkkeQv/m2Jqlcj6wae2cH2quNuqwOfk2 jYAUJ0hA==; Received: from shell.armlinux.org.uk ([fd8f:7570:feb6:1:5054:ff:fe00:4ec]:33578) by pandora.armlinux.org.uk with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oGQTt-0004YI-Cr; Tue, 26 Jul 2022 20:44:37 +0100 Received: from linux by shell.armlinux.org.uk with local (Exim 4.94.2) (envelope-from ) id 1oGQTq-0001UV-AR; Tue, 26 Jul 2022 20:44:34 +0100 Date: Tue, 26 Jul 2022 20:44:34 +0100 From: "Russell King (Oracle)" To: Linus Torvalds Cc: Yury Norov , Dennis Zhou , Guenter Roeck , Catalin Marinas , Linux Kernel Mailing List , Geert Uytterhoeven , linux-m68k@lists.linux-m68k.org Subject: Re: Linux 5.19-rc8 Message-ID: References: <20220725161141.GA1306881@roeck-us.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: Russell King (Oracle) X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 26, 2022 at 11:36:21AM -0700, Linus Torvalds wrote: > On Tue, Jul 26, 2022 at 11:18 AM Yury Norov wrote: > > > > We have find_bit_benchmark to check how it works in practice. Would > > be great if someone with access to the hardware can share numbers. > > Honestly, I doubt benchmarking find_bit in a loop is all that sensible. Yes, that's what I was thinking - I've never seen it crop up in any of the perf traces I've seen. Nevertheless, here's some numbers from a single run of the find_bit_benchmark module, kernel built with: arm-linux-gnueabihf-gcc (Debian 10.2.1-6) 10.2.1 20210110 Current native implementation: [ 46.184565] Start testing find_bit() with random-filled bitmap [ 46.195127] find_next_bit: 2440833 ns, 163112 iterations [ 46.204226] find_next_zero_bit: 2372128 ns, 164569 iterations [ 46.213152] find_last_bit: 2199779 ns, 163112 iterations [ 46.299398] find_first_bit: 79526013 ns, 16234 iterations [ 46.684026] find_first_and_bit: 377912990 ns, 32617 iterations [ 46.692020] find_next_and_bit: 1269071 ns, 73562 iterations [ 46.698745] Start testing find_bit() with sparse bitmap [ 46.705711] find_next_bit: 118652 ns, 656 iterations [ 46.716621] find_next_zero_bit: 4183472 ns, 327025 iterations [ 46.723395] find_last_bit: 50448 ns, 656 iterations [ 46.762308] find_first_bit: 32190802 ns, 656 iterations [ 46.769093] find_first_and_bit: 52129 ns, 1 iterations [ 46.775882] find_next_and_bit: 62522 ns, 1 iterations Generic implementation: [ 25.149238] Start testing find_bit() with random-filled bitmap [ 25.160002] find_next_bit: 2640943 ns, 163537 iterations [ 25.169567] find_next_zero_bit: 2838485 ns, 164144 iterations [ 25.178595] find_last_bit: 2302372 ns, 163538 iterations [ 25.204016] find_first_bit: 18697630 ns, 16373 iterations [ 25.602571] find_first_and_bit: 391841480 ns, 32555 iterations [ 25.610563] find_next_and_bit: 1260306 ns, 73587 iterations [ 25.617295] Start testing find_bit() with sparse bitmap [ 25.624222] find_next_bit: 70289 ns, 656 iterations [ 25.636478] find_next_zero_bit: 5527050 ns, 327025 iterations [ 25.643253] find_last_bit: 52147 ns, 656 iterations [ 25.657304] find_first_bit: 7328573 ns, 656 iterations [ 25.664087] find_first_and_bit: 48518 ns, 1 iterations [ 25.670871] find_next_and_bit: 59750 ns, 1 iterations Overall, I would say it's pretty similar (some generic perform marginally better, some native perform marginally better) with the exception of find_first_bit() being much better with the generic implementation, but find_next_zero_bit() being noticably worse. So, pretty much nothing of any relevance between them, which may come as a surprise given the byte vs word access differences between the two implementations. I suspect the reason behind that may be because the native implementation code is smaller than the generic implementation, outweighing the effects of the by-byte rather than by-word. I would also suspect that, because of the smaller implementation, the native version performs better in a I$-cool situation than the generic. Lastly, I would suspect if we fixed the bug in the native version, and converted it to use word loads, it would probably be better than the generic version. I haven't anything to base that on other than gut feeling at the moment, but I can make the changes to the native implementation and see what effect that has, possibly tomorrow. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!