Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1445399pxb; Thu, 4 Mar 2021 11:26:23 -0800 (PST) X-Google-Smtp-Source: ABdhPJzQDTcizlRc+pI4Ha6Upr0WYG4WcPcqe3ppTsQtai7gV9zQ8XLKp/+VqdGqO8Prkrj1W+hw X-Received: by 2002:a17:907:2bdd:: with SMTP id gv29mr5825373ejc.259.1614885982849; Thu, 04 Mar 2021 11:26:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614885982; cv=none; d=google.com; s=arc-20160816; b=XM0LOxZUvLh2/pFgtaQtvPPsJPP+G8eT4tzonsG3rvhQFO8xgBy+3gNrWbIPnG3p8h /EguUFjzoLMe4l8+ZM4dt848UeDSTPs5T220RY3/R82YORMYSfViqfFrqpWEBPbYk5od btDzDGtOEEzkMGOeg+KiZxqTDvU2nROUOAAKbJOpbTTRd/qKJo13yfMsQ17gIeUlO0MX 0HuQLFBvG3yWNjC6FtQbw81ihkznACgWdh21ixt/TJaDa5e8Qc17Cvd6meSe3keNB6sX OkJtGjDNzC1ibtqteOEIkhDBuGC1svJtwPeXk5UzTgBJc+VMiL9b7x/7YIOnb61zSwLG UZaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:reply-to :in-reply-to:references:mime-version:dkim-signature; bh=R5uvdWP3IXyJRb9GkUi22skzm53AYXgUrSMSuvQhHNE=; b=iw3+iluqJQZ0mF8dsE6gzO5kPF1kOWNnRJ1MOsYKaF4Tj8awIYQ3a+y2ekmOcCZjEf 9ah2bsXwkbz+bWOec1qjmX8EV7VZuMR0Es6HF2B76c8oqwqB+ib+ZHvsTJ2CkNZ+d51t 9lH7eh4acz+7PcybbUxvygNf7i8jRPSHHhRoEGMl5N0VmueJ0CmeX4OAh0oLdq/AjVDk ZETnLYqpVbQWoxBfyqVL5Z/xedRwbFFkBuadFIkUKcXiB8fsuryUQFhLrJ7m2uTBnL1p 8hiejMFir3g7EBpcBjNdUZiHRQgCrl2C33UNyWVmlnROhPdVFYCEQ0oN/EhCPJbtkPMX cAaQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="J7EvIFn/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id x23si50737ejv.1.2021.03.04.11.25.57; Thu, 04 Mar 2021 11:26:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="J7EvIFn/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234552AbhCDTW1 (ORCPT + 99 others); Thu, 4 Mar 2021 14:22:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234242AbhCDTWW (ORCPT ); Thu, 4 Mar 2021 14:22:22 -0500 Received: from mail-il1-x12d.google.com (mail-il1-x12d.google.com [IPv6:2607:f8b0:4864:20::12d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8F5E2C061574 for ; Thu, 4 Mar 2021 11:21:41 -0800 (PST) Received: by mail-il1-x12d.google.com with SMTP id p10so13771440ils.9 for ; Thu, 04 Mar 2021 11:21:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:reply-to:from:date:message-id :subject:to:cc; bh=R5uvdWP3IXyJRb9GkUi22skzm53AYXgUrSMSuvQhHNE=; b=J7EvIFn/UA7xzd5lnEBk2jqfskkcsSLAGRWrLw6IyvKIhdqCzXcbkvLB6N7EsV4JFC 3ZGrGucBMdsEj6Mcvzz14S/LaOzQn8QgtGFATtprYxQLiP4H2/9/F3XnoRzVu0lDNvAe DlZyOrDz41O1tRISMYjGFetCK+7cytGC5F9hUN4MgYmjd9Y2MmZTrsBqExwMWa7OTXan HAlvMYZifZXkdBw01nwRePO7Mp6aOzYPG+BpIepGtxoRrBdJ9gkD1+kVjD2xxAXurt4b pnPlgaZyIvAzogPGXWq8j2+5t16ohare32L7NW+GWtvwiQ+9YkC/eP7KmocUvpwmXvhZ AKLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to:cc; bh=R5uvdWP3IXyJRb9GkUi22skzm53AYXgUrSMSuvQhHNE=; b=EhX1exVapzFdpxKQEUAzdoeRJXRZM9tSucZ6RocHIcwmU8khvVLZcTmi6jJaZA2KQJ 6Ots++w0XbhWeX33tyYh0P1LkA9AwfkUr5IxOVs/YGejvasAOshy3a2TJwhORHcQE8Bz vzgRgBvKUlRJaBxmR3rUHfeSx2foO96+OEQepNH8Fn/Tig0ay9ZJl1ZCg7c6lxR3WQnd h8oTlqy0UeD4R9l0dG5dT/aftD70eTYpBQYIVREo3sFWNzBU4Nv/jmtD9ntLjeS/5fWw G4Xk5x0O7N5H0efl7K9LWbnb7vdGhTT6RiHVBYgTxHu/i8oKsVQMRMTUuJYBU6S5Sytt D/BA== X-Gm-Message-State: AOAM533o12REDbmph7B+JMk36+oxDvg4rWxZKOOTGLroa3eFVB5ts0iU HkHvmFhr0z9Y7VEqAQ8NgxcvmZtGbr3p/emRDLk= X-Received: by 2002:a92:444e:: with SMTP id a14mr5157873ilm.215.1614885700730; Thu, 04 Mar 2021 11:21:40 -0800 (PST) MIME-Version: 1.0 References: <20210303224653.2579656-1-joshdon@google.com> In-Reply-To: Reply-To: sedat.dilek@gmail.com From: Sedat Dilek Date: Thu, 4 Mar 2021 20:21:03 +0100 Message-ID: Subject: Re: [PATCH v2] sched: Optimize __calc_delta. To: Nick Desaulniers Cc: Josh Don , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Nathan Chancellor , LKML , clang-built-linux , Clement Courbet , Oleg Rombakh , Bill Wendling Content-Type: multipart/mixed; boundary="0000000000008c030c05bcbae088" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --0000000000008c030c05bcbae088 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Mar 4, 2021 at 7:24 PM Sedat Dilek wrote: > > On Thu, Mar 4, 2021 at 6:34 PM 'Nick Desaulniers' via Clang Built > Linux wrote: > > > > On Wed, Mar 3, 2021 at 2:48 PM Josh Don wrote: > > > > > > From: Clement Courbet > > > > > > A significant portion of __calc_delta time is spent in the loop > > > shifting a u64 by 32 bits. Use `fls` instead of iterating. > > > > > > This is ~7x faster on benchmarks. > > > > > > The generic `fls` implementation (`generic_fls`) is still ~4x faster > > > than the loop. > > > Architectures that have a better implementation will make use of it. = For > > > example, on X86 we get an additional factor 2 in speed without dedica= ted > > > implementation. > > > > > > On gcc, the asm versions of `fls` are about the same speed as the > > > builtin. On clang, the versions that use fls are more than twice as > > > slow as the builtin. This is because the way the `fls` function is > > > written, clang puts the value in memory: > > > https://godbolt.org/z/EfMbYe. This bug is filed at > > > https://bugs.llvm.org/show_bug.cgi?id=3D49406. > > > > Hi Josh, Thanks for helping get this patch across the finish line. > > Would you mind updating the commit message to point to > > https://bugs.llvm.org/show_bug.cgi?id=3D20197? > > > > > > > > ``` > > > name cpu/op > > > BM_Calc<__calc_delta_loop> 9.57ms =C2=B112% > > > BM_Calc<__calc_delta_generic_fls> 2.36ms =C2=B113% > > > BM_Calc<__calc_delta_asm_fls> 2.45ms =C2=B113% > > > BM_Calc<__calc_delta_asm_fls_nomem> 1.66ms =C2=B112% > > > BM_Calc<__calc_delta_asm_fls64> 2.46ms =C2=B113% > > > BM_Calc<__calc_delta_asm_fls64_nomem> 1.34ms =C2=B115% > > > BM_Calc<__calc_delta_builtin> 1.32ms =C2=B111% > > > ``` > > > > > > Signed-off-by: Clement Courbet > > > Signed-off-by: Josh Don > > > --- > > > kernel/sched/fair.c | 19 +++++++++++-------- > > > kernel/sched/sched.h | 1 + > > > 2 files changed, 12 insertions(+), 8 deletions(-) > > > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index 8a8bd7b13634..a691371960ae 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -229,22 +229,25 @@ static void __update_inv_weight(struct load_wei= ght *lw) > > > static u64 __calc_delta(u64 delta_exec, unsigned long weight, struct= load_weight *lw) > > > { > > > u64 fact =3D scale_load_down(weight); > > > + u32 fact_hi =3D (u32)(fact >> 32); > > > int shift =3D WMULT_SHIFT; > > > + int fs; > > > > > > __update_inv_weight(lw); > > > > > > - if (unlikely(fact >> 32)) { > > > - while (fact >> 32) { > > > - fact >>=3D 1; > > > - shift--; > > > - } > > > + if (unlikely(fact_hi)) { > > > + fs =3D fls(fact_hi); > > > + shift -=3D fs; > > > + fact >>=3D fs; > > > } > > > > > > fact =3D mul_u32_u32(fact, lw->inv_weight); > > > > > > - while (fact >> 32) { > > > - fact >>=3D 1; > > > - shift--; > > > + fact_hi =3D (u32)(fact >> 32); > > > + if (fact_hi) { > > > + fs =3D fls(fact_hi); > > > + shift -=3D fs; > > > + fact >>=3D fs; > > > } > > > > > > return mul_u64_u32_shr(delta_exec, fact, shift); > > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > > > index 10a1522b1e30..714af71cf983 100644 > > > --- a/kernel/sched/sched.h > > > +++ b/kernel/sched/sched.h > > > @@ -36,6 +36,7 @@ > > > #include > > > > > > #include > > > +#include > > > > This hunk of the patch is curious. I assume that bitops.h is needed > > for fls(); if so, why not #include it in kernel/sched/fair.c? > > Otherwise this potentially hurts compile time for all TUs that include > > kernel/sched/sched.h. > > > > I have v2 as-is in my custom patchset and booted right now on bare metal. > > As Nick points out moving the include makes sense to me. > We have a lot of include at the wrong places increasing build-time. > I tried with the attached patch. $ LC_ALL=3DC ll kernel/sched/fair.o -rw-r--r-- 1 dileks dileks 1.2M Mar 4 20:11 kernel/sched/fair.o - Sedat - --0000000000008c030c05bcbae088 Content-Type: text/x-patch; charset="US-ASCII"; name="0001-sched-fair-Move-include-after-__calc_delta-optimizat.patch" Content-Disposition: attachment; filename="0001-sched-fair-Move-include-after-__calc_delta-optimizat.patch" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_klv969x50 RnJvbSBhZmQ0NWNkNzhjMjE5NjBjNmU5MzcwMjFmMDk1ZTVmOGY1MWZlZjdhIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBTZWRhdCBEaWxlayA8c2VkYXQuZGlsZWtAZ21haWwuY29tPgpE YXRlOiBUaHUsIDQgTWFyIDIwMjEgMjA6MDU6MzAgKzAxMDAKU3ViamVjdDogW1BBVENIXSBzY2hl ZC9mYWlyOiBNb3ZlIGluY2x1ZGUgYWZ0ZXIgX19jYWxjX2RlbHRhIG9wdGltaXphdGlvbgogY2hh bmdlCgpTaWduZWQtb2ZmLWJ5OiBTZWRhdCBEaWxlayA8c2VkYXQuZGlsZWtAZ21haWwuY29tPgot LS0KIGtlcm5lbC9zY2hlZC9mYWlyLmMgIHwgMiArKwoga2VybmVsL3NjaGVkL3NjaGVkLmggfCAx IC0KIDIgZmlsZXMgY2hhbmdlZCwgMiBpbnNlcnRpb25zKCspLCAxIGRlbGV0aW9uKC0pCgpkaWZm IC0tZ2l0IGEva2VybmVsL3NjaGVkL2ZhaXIuYyBiL2tlcm5lbC9zY2hlZC9mYWlyLmMKaW5kZXgg NWZkYTE3NTFmYmQxLi5iOWYxMGFlOTJlM2YgMTAwNjQ0Ci0tLSBhL2tlcm5lbC9zY2hlZC9mYWly LmMKKysrIGIva2VybmVsL3NjaGVkL2ZhaXIuYwpAQCAtMjAsNiArMjAsOCBAQAogICogIEFkYXB0 aXZlIHNjaGVkdWxpbmcgZ3JhbnVsYXJpdHksIG1hdGggZW5oYW5jZW1lbnRzIGJ5IFBldGVyIFpp amxzdHJhCiAgKiAgQ29weXJpZ2h0IChDKSAyMDA3IFJlZCBIYXQsIEluYy4sIFBldGVyIFppamxz dHJhCiAgKi8KKyNpbmNsdWRlIDxsaW51eC9iaXRvcHMuaD4KKwogI2luY2x1ZGUgInNjaGVkLmgi CiAKIC8qCmRpZmYgLS1naXQgYS9rZXJuZWwvc2NoZWQvc2NoZWQuaCBiL2tlcm5lbC9zY2hlZC9z Y2hlZC5oCmluZGV4IDcxNGFmNzFjZjk4My4uMTBhMTUyMmIxZTMwIDEwMDY0NAotLS0gYS9rZXJu ZWwvc2NoZWQvc2NoZWQuaAorKysgYi9rZXJuZWwvc2NoZWQvc2NoZWQuaApAQCAtMzYsNyArMzYs NiBAQAogI2luY2x1ZGUgPHVhcGkvbGludXgvc2NoZWQvdHlwZXMuaD4KIAogI2luY2x1ZGUgPGxp bnV4L2JpbmZtdHMuaD4KLSNpbmNsdWRlIDxsaW51eC9iaXRvcHMuaD4KICNpbmNsdWRlIDxsaW51 eC9ibGtkZXYuaD4KICNpbmNsdWRlIDxsaW51eC9jb21wYXQuaD4KICNpbmNsdWRlIDxsaW51eC9j b250ZXh0X3RyYWNraW5nLmg+Ci0tIAoyLjMwLjEKCg== --0000000000008c030c05bcbae088--