Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1424185pxb; Thu, 4 Mar 2021 10:54:27 -0800 (PST) X-Google-Smtp-Source: ABdhPJwUj2siLRzO8hDPlbZAtrEUiKcPvu3OOT0KRC9cbdZjpRem8A3Z6GJa/oO3/2CZAxe9hsHM X-Received: by 2002:a05:6402:1013:: with SMTP id c19mr6133183edu.86.1614884067760; Thu, 04 Mar 2021 10:54:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614884067; cv=none; d=google.com; s=arc-20160816; b=wmWf8Y423rIw0YJMEFm2XKZ+YQeyBOyDzpkX56kXFhivjxhkvMvd9UJV/DV8YqNCEf HrQdukAYgnsO+51ybIn/+7UfkGHO7aGaySDOqiLq5CMirM4rQxPt2CkHyZ/ct53iP8ol mN/tuTIg+NJaKMB/pXRQTKUw9ehxwmwRQO+O5ZzwC4fJQ2ONZ3bHljDnkv8bujfhKafj XYIxwBZ0FHndFIStiB4C3rLQem/WAt2Q53lCfp3NHuM8GTM8KC7ZnfvKCy5WpDhCgJIc 9sXb/ReYd5EhrjNVG6uzJ90lfEbsazMkFjv1Q6z8niFoTJyfU2rtooAf9LRkDGifhhe6 6tmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=nhQUkZdRR30HOXaJaC3+WVNEIt/nPm3693Z/MPSB2r4=; b=J39dORIBXw3/I/n/GVkv2Q9GyPaQ1MIuVWiOJi1EzZOnnb2/HxUpmXuzytEAhThj4k lTOndsLDB+Il1/mabVjjn8ffihGJVGTattTsb30G3hQ5Iq1g1ofHMysKJbVR5RQLG9vo r7czJYjgI7WrZjuWrcU7BNhePUoV2I0aWbV85jMMts4M8G2cNzF0xUAF1adqwIfKrFb/ E3pUgFd17kF409uWVgn8KjU7VGj5+iA/woPwq/blrUPcxeZnl69DKAoqC8PG/3fjre6e f3e5RvlLEXZGa8D5cHWnnx5UvNXkqIFrdpV10FCBqzkIhh08hwxIQEQl1VGAGXSkxJcc NUrw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="ov8/nFem"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id jo14si7070939ejb.460.2021.03.04.10.54.05; Thu, 04 Mar 2021 10:54:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="ov8/nFem"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238393AbhCDRft (ORCPT + 99 others); Thu, 4 Mar 2021 12:35:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46200 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237434AbhCDRfX (ORCPT ); Thu, 4 Mar 2021 12:35:23 -0500 Received: from mail-lj1-x22f.google.com (mail-lj1-x22f.google.com [IPv6:2a00:1450:4864:20::22f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87637C061574 for ; Thu, 4 Mar 2021 09:34:42 -0800 (PST) Received: by mail-lj1-x22f.google.com with SMTP id r25so33378676ljk.11 for ; Thu, 04 Mar 2021 09:34:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=nhQUkZdRR30HOXaJaC3+WVNEIt/nPm3693Z/MPSB2r4=; b=ov8/nFemh04xcClicqEhm2CKTF68mlWX4fbewlfrQquTos5Q3Ij2wtZf85pZbZW1jd K5Nc7FrFtHnACQD3XKUTKNqs4qPoGqb12V1kY8Aav4pewPbBGXGfuHnxA4cE9iQWj1qB YWv2zzHp/EOdGU0Qi/j1nfAaa4EzOQJPswyJBQFg19pWHOfg7oUuzUFKo2NZHAmROGTy H321ypSLL9GY51oNtfR9mlJIATi/5mfNyHxwHWxGmP5F20YiIx0Tgqmp5tMmwRy/FdFa I2Lp39d2u79X3VRMKtgE2KMkl6wRjRfEEuUW6iOva0Krwo4wnh5e8Tu4TW3sqlmMIrF7 Nacw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=nhQUkZdRR30HOXaJaC3+WVNEIt/nPm3693Z/MPSB2r4=; b=Nuzo5X+wTUvVrhOWGoCfzBFBB5KUxMrSQYacfiyf2EesPfulwFXozK/Bg1OGE2E9Pc fnjiLuicn4g6mf11Asrp/f8nIq9BMAYXayaga5WRoZry+Rx0HoQThlVrSdOi1Aqj+Zoh ZAp3HUJu4MyLXQmSq+Dhv3HmsWNKRXNV8Nh11kfQHPFJuaaJLjV0noXJJdS+9AsmaZpg O3Q1/w9lIGEjQw+8TLzucdHyGwhQm9LXIF2drDz4OGshzE9fMNyK8InPlA1CRP7neEBV nFRRLVeY/GLn0REMaV4RmyqBxHp2WMJqfe/BK+ZQlIPphRa/tGudJ3cwgtNrXDK1g18x A3IA== X-Gm-Message-State: AOAM533eqAQAINmExu0zapDI++x0YAOEK2certeTiKDXl6taIyWUcEJ3 32ffPS/9sRKsbcXFSQoEqmjSw1yN9dXnEghYb93EiQ== X-Received: by 2002:a2e:868e:: with SMTP id l14mr2906875lji.479.1614879280871; Thu, 04 Mar 2021 09:34:40 -0800 (PST) MIME-Version: 1.0 References: <20210303224653.2579656-1-joshdon@google.com> In-Reply-To: <20210303224653.2579656-1-joshdon@google.com> From: Nick Desaulniers Date: Thu, 4 Mar 2021 09:34:29 -0800 Message-ID: Subject: Re: [PATCH v2] sched: Optimize __calc_delta. To: Josh Don Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Nathan Chancellor , LKML , clang-built-linux , Clement Courbet , Oleg Rombakh , Bill Wendling Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 3, 2021 at 2:48 PM Josh Don wrote: > > From: Clement Courbet > > A significant portion of __calc_delta time is spent in the loop > shifting a u64 by 32 bits. Use `fls` instead of iterating. > > This is ~7x faster on benchmarks. > > The generic `fls` implementation (`generic_fls`) is still ~4x faster > than the loop. > Architectures that have a better implementation will make use of it. For > example, on X86 we get an additional factor 2 in speed without dedicated > implementation. > > On gcc, the asm versions of `fls` are about the same speed as the > builtin. On clang, the versions that use fls are more than twice as > slow as the builtin. This is because the way the `fls` function is > written, clang puts the value in memory: > https://godbolt.org/z/EfMbYe. This bug is filed at > https://bugs.llvm.org/show_bug.cgi?id=3D49406. Hi Josh, Thanks for helping get this patch across the finish line. Would you mind updating the commit message to point to https://bugs.llvm.org/show_bug.cgi?id=3D20197? > > ``` > name cpu/op > BM_Calc<__calc_delta_loop> 9.57ms =C2=B112% > BM_Calc<__calc_delta_generic_fls> 2.36ms =C2=B113% > BM_Calc<__calc_delta_asm_fls> 2.45ms =C2=B113% > BM_Calc<__calc_delta_asm_fls_nomem> 1.66ms =C2=B112% > BM_Calc<__calc_delta_asm_fls64> 2.46ms =C2=B113% > BM_Calc<__calc_delta_asm_fls64_nomem> 1.34ms =C2=B115% > BM_Calc<__calc_delta_builtin> 1.32ms =C2=B111% > ``` > > Signed-off-by: Clement Courbet > Signed-off-by: Josh Don > --- > kernel/sched/fair.c | 19 +++++++++++-------- > kernel/sched/sched.h | 1 + > 2 files changed, 12 insertions(+), 8 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 8a8bd7b13634..a691371960ae 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -229,22 +229,25 @@ static void __update_inv_weight(struct load_weight = *lw) > static u64 __calc_delta(u64 delta_exec, unsigned long weight, struct loa= d_weight *lw) > { > u64 fact =3D scale_load_down(weight); > + u32 fact_hi =3D (u32)(fact >> 32); > int shift =3D WMULT_SHIFT; > + int fs; > > __update_inv_weight(lw); > > - if (unlikely(fact >> 32)) { > - while (fact >> 32) { > - fact >>=3D 1; > - shift--; > - } > + if (unlikely(fact_hi)) { > + fs =3D fls(fact_hi); > + shift -=3D fs; > + fact >>=3D fs; > } > > fact =3D mul_u32_u32(fact, lw->inv_weight); > > - while (fact >> 32) { > - fact >>=3D 1; > - shift--; > + fact_hi =3D (u32)(fact >> 32); > + if (fact_hi) { > + fs =3D fls(fact_hi); > + shift -=3D fs; > + fact >>=3D fs; > } > > return mul_u64_u32_shr(delta_exec, fact, shift); > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 10a1522b1e30..714af71cf983 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -36,6 +36,7 @@ > #include > > #include > +#include This hunk of the patch is curious. I assume that bitops.h is needed for fls(); if so, why not #include it in kernel/sched/fair.c? Otherwise this potentially hurts compile time for all TUs that include kernel/sched/sched.h. > #include > #include > #include > -- > 2.30.1.766.gb4fecdf3b7-goog > --=20 Thanks, ~Nick Desaulniers