Received: by 2002:ac0:e34a:0:0:0:0:0 with SMTP id g10csp1109020imn; Tue, 26 Jul 2022 19:11:38 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tw1aT8wh/JoGL/gSaCz/MzDMoc1K0CUKa8eHFh/S17lZUDZEu39fhUvnRuQ0G92KWFcsr9 X-Received: by 2002:a63:1e10:0:b0:41a:ee1c:290a with SMTP id e16-20020a631e10000000b0041aee1c290amr10953753pge.196.1658887898529; Tue, 26 Jul 2022 19:11:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658887898; cv=none; d=google.com; s=arc-20160816; b=trpZ0EXUIYKQvR3ZTl0jRfLMpu0x6DmYRQhAluMgriOv8CbtMYpwGbQHaSBQrPZNw6 9+0NlTVIQdrFfyysxzP649CK7xQswQeM5hhBCbCWW++wAdYwaJ3f/oIxKPuddEEprB87 Im/avIgCXw3Yoogcs4OjT+XTXpV+1hGKd4IEpphxw5fVQ4xL4B/VbWP6Toe4FRxmdGEo aQVTGzNC2Y3rRGRzoCglfwqawqwAH7LMJzCJeuTI/cH9Snn5NmYzGXIkrcJJBwynbm8o id+RJGJ9wHDCYxqg+wmJHN8K0K6AbN9Niudca7hLR7s8C+Q20EqOCLZoVfxaW5JKumwT NNCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=5/vmVKqoirO/ZnmWFWh7pPMm3cXVMOD/O70AH93NwoQ=; b=YJu+qiCFohOkKubb4NBD0tvedW4WeBFbOiUKb/6UOOctM1/QY5HPMfmu11Zf+TS63N KOW/uOjHCEOG12KWnDZHRuinMsAUBLQz+hWKIlJaZeyW+i+bMOByk3Y5zDB1H3HiUmbT mgU+JvqrILf1Xm+GK2b6VzHgt/72F0OI9ykv08lnold9DZ9Paxy8ZtHsfFVoANGIy7RC H8TpVoUOFhzXifAO/BwQpBaA+Twf5TbglV4RAAbwwRtvB2L/YR1XkbSO8PSaqknUfmKE FqrxhFo1kfbaka3c/SjsRpSpGCsIeF6RuwY48BIxS5wjFqkq6aaR9KVibNSMTfde6cGg c9hA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=Jc6Ja1iL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id rm11-20020a17090b3ecb00b001f2faae850csi704767pjb.156.2022.07.26.19.11.23; Tue, 26 Jul 2022 19:11:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=Jc6Ja1iL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240174AbiG0BeR (ORCPT + 99 others); Tue, 26 Jul 2022 21:34:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46380 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240196AbiG0BeK (ORCPT ); Tue, 26 Jul 2022 21:34:10 -0400 Received: from mail-vs1-xe32.google.com (mail-vs1-xe32.google.com [IPv6:2607:f8b0:4864:20::e32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3EDEF3B944 for ; Tue, 26 Jul 2022 18:34:08 -0700 (PDT) Received: by mail-vs1-xe32.google.com with SMTP id 129so7721313vsq.8 for ; Tue, 26 Jul 2022 18:34:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=5/vmVKqoirO/ZnmWFWh7pPMm3cXVMOD/O70AH93NwoQ=; b=Jc6Ja1iL+PmFORd+ayfCdUAApsKwWJH3RfjPl/SQuZhrBaZzCXxdetQ7soy2EcQ6dE f/nj5SyVuHTBOA/KjFi5utIcWohXkfywgsRATmBnYC04gNbsmz83ocjbKaRp9g83rcuk DhMYOeaq/VteMEk0cRjq+hFoBQcrQ3fUYOHARFyICB9SNqsTFisArW8UEaQ90k5lEh8z anyJBKA7hPC0NEyC5JZ1wQ4BYi/YB9bCXWuo5pm0FVN4qHFWpH4yPs3QQIBJlhzOkclx 2F7lKyyI/n+cS7QMfS0nNxChHpDhxlQvH7kvT3LDfZ0jZzvcM+4rXNAnke9WUeAjp7Y6 Xtyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=5/vmVKqoirO/ZnmWFWh7pPMm3cXVMOD/O70AH93NwoQ=; b=hPZx2PZ1y5k+wnsUGNS0AqJjjp+M2twIABMmH7uRYLQZcRBfnsVCRzjp4MvMnb0AjE bRhDPSbTjX/g0opVOEMI2kdjd2IKPmQCmXuVmLYkQzUvPbHyB06f1DFKyF0t+Rb6FSxo 1EoYbAQCxRLC+qkp92opvAwDLhZ6BvefTqMwecOUnbzx7yZtcg0pJe3ab3F85rZPvRTd zTc3SbzFlRsazXSnmOmLSwCidLMlOlps0YwaIcp+bKcJe1T2xp2cvokYDCArmv51Hgnd 69UuKGFM5nR/jms6jvKKsI79/Ppu9zkaqdgmt86hVy3xLRP3aj/zZHo6kso4fVNOOuIX i57Q== X-Gm-Message-State: AJIora954Qew9fUMqBVS16fcdw/peHM5efLjc+HGn8SEFLKifHsZ3BCc Jcr+NCHrSkenuAESnmu7HYidb59HhbxdtF2eBYw= X-Received: by 2002:a05:6102:2387:b0:34b:9f6d:10da with SMTP id v7-20020a056102238700b0034b9f6d10damr6233583vsr.28.1658885647112; Tue, 26 Jul 2022 18:34:07 -0700 (PDT) MIME-Version: 1.0 References: <20220725161141.GA1306881@roeck-us.net> In-Reply-To: From: Yury Norov Date: Tue, 26 Jul 2022 18:33:55 -0700 Message-ID: Subject: Re: Linux 5.19-rc8 To: "Russell King (Oracle)" Cc: Linus Torvalds , Dennis Zhou , Guenter Roeck , Catalin Marinas , Linux Kernel Mailing List , Geert Uytterhoeven , linux-m68k@lists.linux-m68k.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 26, 2022 at 5:15 PM Russell King (Oracle) wrote: > > On Tue, Jul 26, 2022 at 01:20:23PM -0700, Linus Torvalds wrote: > > On Tue, Jul 26, 2022 at 12:44 PM Russell King (Oracle) > > wrote: > > > > > > Overall, I would say it's pretty similar (some generic perform > > > marginally better, some native perform marginally better) with the > > > exception of find_first_bit() being much better with the generic > > > implementation, but find_next_zero_bit() being noticably worse. > > > > The generic _find_first_bit() code is actually sane and simple. It > > loops over words until it finds a non-zero one, and then does trivial > > calculations on that last word. > > > > That explains why the generic code does so much better than your byte-wise asm. > > > > In contrast, the generic _find_next_bit() I find almost offensively > > silly - which in turn explains why your byte-wide asm does better. > > > > I think the generic _find_next_bit() should actually do what the m68k > > find_next_bit code does: handle the first special word itself, and > > then just call find_first_bit() on the rest of it. > > > > And it should *not* try to handle the dynamic "bswap and/or bit sense > > invert" thing at all. That should be just four different (trivial) > > cases for the first word. > > Here's the results for the native version converted to use word loads: > > [ 37.319937] > Start testing find_bit() with random-filled bitmap > [ 37.330289] find_next_bit: 2222703 ns, 163781 iterations > [ 37.339186] find_next_zero_bit: 2154375 ns, 163900 iterations > [ 37.348118] find_last_bit: 2208104 ns, 163780 iterations > [ 37.372564] find_first_bit: 17722203 ns, 16370 iterations > [ 37.737415] find_first_and_bit: 358135191 ns, 32453 iterations > [ 37.745420] find_next_and_bit: 1280537 ns, 73644 iterations > [ 37.752143] > Start testing find_bit() with sparse bitmap > [ 37.759032] find_next_bit: 41256 ns, 655 iterations > [ 37.769905] find_next_zero_bit: 4148410 ns, 327026 iterations > [ 37.776675] find_last_bit: 48742 ns, 655 iterations > [ 37.790961] find_first_bit: 7562371 ns, 655 iterations > [ 37.797743] find_first_and_bit: 47366 ns, 1 iterations > [ 37.804527] find_next_and_bit: 59924 ns, 1 iterations > > which is generally faster than the generic version, with the exception > of the sparse find_first_bit (generic was: > [ 25.657304] find_first_bit: 7328573 ns, 656 iterations) > > find_next_{,zero_}bit() in the sparse case are quite a bit faster than > the generic code. Look at find_{first,next}_and_bit results. Those two have no arch version and in both cases use generic code. In theory they should be equally fast before and after, but your testing says that generic case is slower even for them, and the difference is comparable with real arch functions numbers. It makes me feel like: - there's something unrelated, like governor/throttling that affect results; - the numbers are identical, taking the dispersion into account. If the difference really concerns you, I'd suggest running the test several times to measure confidence intervals. Thanks, Yury