Received: by 2002:ac0:e34a:0:0:0:0:0 with SMTP id g10csp1056685imn; Tue, 26 Jul 2022 17:19:12 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vfFi4bHG9RG6GJTnCaQxgAyrKDIFQ5hZK+OaLexW4TGP00E9EVlWluxjYKhvJi2wfQCktG X-Received: by 2002:a17:907:e90:b0:72b:d0a7:9dd5 with SMTP id ho16-20020a1709070e9000b0072bd0a79dd5mr16525707ejc.18.1658881151966; Tue, 26 Jul 2022 17:19:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658881151; cv=none; d=google.com; s=arc-20160816; b=yi52NQbIk4EyHhFJL+ML/I5p4h+d7O+U2JGRLh1dl7zztTuotHZ0OMBWhUO256hD0i 3Ck8175wcT4y7uDEhqEwuNR0H4CbONx7CXWvmC2dLey6+nf3CNxU/zqlapZbmRYLeY6Z WKb3ov+onacNP5jN3zlV4cDEWPM63zF6njgNaJwj58osTZUpct/eQ4qIQt6dcFf1v5jK B82+RoFlbJijhMfLK03fmOzG0VuUgfk3wuALho8My+rLV43AzejmL+GBiL/V++bX2lR2 btaR8M40RgiJUJ+PJQ+Xbzw/P6JkCDxLPe+0/Wnrf1c5HiK27TMWNNmOD4tH5Nw4s6Xv N0eg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=2T3hUKFNgtBW8OumyrTKwgozBtxLREQ5SuTUmSurTyM=; b=MRT/Vpj3HxvZ1HwCA60SdKPVpJER0F941hGZWqz5gGe0t+NAlylIQ0j/ff4LG33mzi CP3mzXfxxGWWr4TvxKp4dKyBiLVggQm7sY/ZEQ0RWClGXbWNNzZG9HX8jsea9bhEtJVb oBw3b/u9yfk1KgReTR5CKvY3Wl0Y7as6ijH6bFt4stoDcs7/pmT9n2VKEWq1REq5MEVa 7la4KyxV2N182yUx7fcdQPveJ/loLM54EdTM5XVm4N/+/dWeERcZLSkPDxRUp6bq+OCW 2619hHp6427hAfRPz1npPNpd6DkEYEBvO6twqe9JUuiz2laDbsW6E4qf+Q6ChAZ41UTZ t5+A== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail (test mode) header.i=@armlinux.org.uk header.s=pandora-2019 header.b=Qvbv1ZKq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=armlinux.org.uk Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qw2-20020a1709066a0200b0072efa4fb55asi20609171ejc.661.2022.07.26.17.18.46; Tue, 26 Jul 2022 17:19:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail (test mode) header.i=@armlinux.org.uk header.s=pandora-2019 header.b=Qvbv1ZKq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=armlinux.org.uk Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240047AbiG0AP6 (ORCPT + 99 others); Tue, 26 Jul 2022 20:15:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34448 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229458AbiG0AP4 (ORCPT ); Tue, 26 Jul 2022 20:15:56 -0400 Received: from pandora.armlinux.org.uk (pandora.armlinux.org.uk [IPv6:2001:4d48:ad52:32c8:5054:ff:fe00:142]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5326819C2E for ; Tue, 26 Jul 2022 17:15:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=armlinux.org.uk; s=pandora-2019; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=2T3hUKFNgtBW8OumyrTKwgozBtxLREQ5SuTUmSurTyM=; b=Qvbv1ZKqlbGOonvYL7mDWLSOHx 2LeMGAHEXGQJtPNZMvoA23P7UJXDwFyqWax0qYqy9SYdaKjVihDm7p4HWTDjzNDFxXW88rpl8dxI9 8Hd0h9ygt82QIRfJXlbTS0MrDszmrDrHwZGM9m8g1ndAD9V15M3nKCiZ+xLgn3hkfcoKxRni9XAFW P9p2zWNfr8gwQp1oNtVSm6opL9RIaGE9s9zSShCGfn9ChBZaH3uddav9SvwBa798HUsSghJ2ztto9 EcWeaC31Bx+P7yrXQ/tw6qDwILk7yogVPm3v3CeW0Z+QMUPz+Gq473hmJl+NE1qaZHj4zJGQwrZVp 9MkhKmkQ==; Received: from shell.armlinux.org.uk ([fd8f:7570:feb6:1:5054:ff:fe00:4ec]:33582) by pandora.armlinux.org.uk with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oGUiL-0004hF-9q; Wed, 27 Jul 2022 01:15:49 +0100 Received: from linux by shell.armlinux.org.uk with local (Exim 4.94.2) (envelope-from ) id 1oGUiH-0001iU-DX; Wed, 27 Jul 2022 01:15:45 +0100 Date: Wed, 27 Jul 2022 01:15:45 +0100 From: "Russell King (Oracle)" To: Linus Torvalds Cc: Yury Norov , Dennis Zhou , Guenter Roeck , Catalin Marinas , Linux Kernel Mailing List , Geert Uytterhoeven , linux-m68k@lists.linux-m68k.org Subject: Re: Linux 5.19-rc8 Message-ID: References: <20220725161141.GA1306881@roeck-us.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: Russell King (Oracle) X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 26, 2022 at 01:20:23PM -0700, Linus Torvalds wrote: > On Tue, Jul 26, 2022 at 12:44 PM Russell King (Oracle) > wrote: > > > > Overall, I would say it's pretty similar (some generic perform > > marginally better, some native perform marginally better) with the > > exception of find_first_bit() being much better with the generic > > implementation, but find_next_zero_bit() being noticably worse. > > The generic _find_first_bit() code is actually sane and simple. It > loops over words until it finds a non-zero one, and then does trivial > calculations on that last word. > > That explains why the generic code does so much better than your byte-wise asm. > > In contrast, the generic _find_next_bit() I find almost offensively > silly - which in turn explains why your byte-wide asm does better. > > I think the generic _find_next_bit() should actually do what the m68k > find_next_bit code does: handle the first special word itself, and > then just call find_first_bit() on the rest of it. > > And it should *not* try to handle the dynamic "bswap and/or bit sense > invert" thing at all. That should be just four different (trivial) > cases for the first word. Here's the results for the native version converted to use word loads: [ 37.319937] Start testing find_bit() with random-filled bitmap [ 37.330289] find_next_bit: 2222703 ns, 163781 iterations [ 37.339186] find_next_zero_bit: 2154375 ns, 163900 iterations [ 37.348118] find_last_bit: 2208104 ns, 163780 iterations [ 37.372564] find_first_bit: 17722203 ns, 16370 iterations [ 37.737415] find_first_and_bit: 358135191 ns, 32453 iterations [ 37.745420] find_next_and_bit: 1280537 ns, 73644 iterations [ 37.752143] Start testing find_bit() with sparse bitmap [ 37.759032] find_next_bit: 41256 ns, 655 iterations [ 37.769905] find_next_zero_bit: 4148410 ns, 327026 iterations [ 37.776675] find_last_bit: 48742 ns, 655 iterations [ 37.790961] find_first_bit: 7562371 ns, 655 iterations [ 37.797743] find_first_and_bit: 47366 ns, 1 iterations [ 37.804527] find_next_and_bit: 59924 ns, 1 iterations which is generally faster than the generic version, with the exception of the sparse find_first_bit (generic was: [ 25.657304] find_first_bit: 7328573 ns, 656 iterations) find_next_{,zero_}bit() in the sparse case are quite a bit faster than the generic code. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!