Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp355494ybt; Fri, 19 Jun 2020 03:39:08 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxK1KF/iRNayQf4zrDbNsgv/V98jEPkG1OGEXDYi2fEtD9rSPdGqDwY2na7XF1ApFemMl7e X-Received: by 2002:a17:906:2c08:: with SMTP id e8mr2945555ejh.385.1592563148303; Fri, 19 Jun 2020 03:39:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592563148; cv=none; d=google.com; s=arc-20160816; b=QUAg0OgmTvca106Tx927xJS9t78/myDpXtCIomkys63LORhdmjjstfgpZw6zKSF4mA dLJktbAgQfDOvqSOSsCLkLZnEQU9wBoFECG5BtVh4F6C2Ib4c7geocJAdwHmFOLi3coT g2Im6VYMCCl86DWaSTk+xHL+MCVLomug/zlFQrtX7IY8J87eZ3wo8FiqFeN8Njv0C/Rh DLgwn2BGUl9f5y2rrYaERyyybwGD5CaMCg33TNWa6yJoK7JQ1u+/kkzNZyUutPUEOCnz DupNGmBDutIRrlkxSAX1tGfvY/Jn2SWYlXq+A4ujJFRVx+ez+c844oocleHhbwXeRbkT WGaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:in-reply-to :subject:cc:to:from:user-agent:references; bh=2DR2WMCWgW1T8z/Hwyr6s+kNL/O6tCzlsV0bw0Wilkc=; b=KKCu6cCsisK4+o93anObk2Nm9kXQCFULjlhl7mo/EAB4x1cCRKwlOFUaWOGOvmFeC0 skbR5PgIYD8o+O1gvRbGg/9ArFpIlRF2HRarmuxaS8KUgcjeZGw+25tjQy0VGXSy0zgR EseGN/HJH8SVwq4f9RvUA/dSMULkgccqFFtMhkYKGa/70s+k36RJ3Aht2NVVHklUTRAU MoSL+Dg+QAxPHHFAG7y6pKTcuSihW3KS1h2bS3JN3PhyYjUprSKZKtlkr4Lm2UOHRUXH iW+slZ8+bEju5uzRlIM+L4D3pEG0Y6ulYehGuHd3g4takirB88V5qS3BAk3hCeCRcQDf EKaQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u23si3427933eds.526.2020.06.19.03.38.43; Fri, 19 Jun 2020 03:39:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732388AbgFSKgx (ORCPT + 99 others); Fri, 19 Jun 2020 06:36:53 -0400 Received: from foss.arm.com ([217.140.110.172]:51208 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729195AbgFSKgv (ORCPT ); Fri, 19 Jun 2020 06:36:51 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D3160D6E; Fri, 19 Jun 2020 03:36:50 -0700 (PDT) Received: from e113632-lin (e113632-lin.cambridge.arm.com [10.1.194.46]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 338343F71F; Fri, 19 Jun 2020 03:36:49 -0700 (PDT) References: <20200618195525.7889-1-qais.yousef@arm.com> <20200618195525.7889-3-qais.yousef@arm.com> User-agent: mu4e 0.9.17; emacs 26.3 From: Valentin Schneider To: Qais Yousef Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Patrick Bellasi , Chris Redpath , Lukasz Luba , linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key In-reply-to: <20200618195525.7889-3-qais.yousef@arm.com> Date: Fri, 19 Jun 2020 11:36:46 +0100 Message-ID: MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 18/06/20 20:55, Qais Yousef wrote: > There is a report that when uclamp is enabled, a netperf UDP test > regresses compared to a kernel compiled without uclamp. > > https://lore.kernel.org/lkml/20200529100806.GA3070@suse.de/ > ISTR the perennial form for those is: https://lkml.kernel.org/r/ > While investigating the root cause, there were no sign that the uclamp > code is doing anything particularly expensive but could suffer from bad > cache behavior under certain circumstances that are yet to be > understood. > > https://lore.kernel.org/lkml/20200616110824.dgkkbyapn3io6wik@e107158-lin/ Ditto on the URL > > To reduce the pressure on the fast path anyway, add a static key that is ^^ s/is// > by default will skip executing uclamp logic in the > enqueue/dequeue_task() fast path in the until it's needed. ^^^^^^ s/in the// > As soon as the user start using util clamp by: > > 1. Changing uclamp value of a task with sched_setattr() > 2. Modifying the default sysctl_sched_util_clamp_{min, max} > 3. Modifying the default cpu.uclamp.{min, max} value in cgroup > > We flip the static key now that the user has opted to use util clamp. > Effectively re-introducing uclamp logic in the enqueue/dequeue_task() > fast path. It stays on from that point forward until the next reboot. > > This should help minimize the effect of util clamp on workloads that > don't need it but still allow distros to ship their kernels with uclamp > compiled in by default. > > SCHED_WARN_ON() in uclamp_rq_dec_id() was removed since now we can end > up with unbalanced call to uclamp_rq_dec_id() if we flip the key while > a task is running in the rq. Since we know it is harmless we just > quietly return if we attempt a uclamp_rq_dec_id() when > rq->uclamp[].bucket[].tasks is 0. > I have an extra comment about that down in the diff. Also, I think it would be worth mentioning in the changelog why we use the same static key with different likelihoods - unlikely in unfrequent paths, and likely in the eq/dq hotpath. > The following results demonstrates how this helps on 2 Sockets Xeon E5 > 2x10-Cores system. > > nouclamp uclamp uclamp-static-key > Hmean send-64 162.43 ( 0.00%) 157.84 * -2.82%* 163.39 * 0.59%* > Hmean send-128 324.71 ( 0.00%) 314.78 * -3.06%* 326.18 * 0.45%* > Hmean send-256 641.55 ( 0.00%) 628.67 * -2.01%* 648.12 * 1.02%* > Hmean send-1024 2525.28 ( 0.00%) 2448.26 * -3.05%* 2543.73 * 0.73%* > Hmean send-2048 4836.14 ( 0.00%) 4712.08 * -2.57%* 4867.69 * 0.65%* > Hmean send-3312 7540.83 ( 0.00%) 7425.45 * -1.53%* 7621.06 * 1.06%* > Hmean send-4096 9124.53 ( 0.00%) 8948.82 * -1.93%* 9276.25 * 1.66%* > Hmean send-8192 15589.67 ( 0.00%) 15486.35 * -0.66%* 15819.98 * 1.48%* > Hmean send-16384 26386.47 ( 0.00%) 25752.25 * -2.40%* 26773.74 * 1.47%* > Am I reading this correctly in that compiling in uclamp but having the static key enabled gives a slight improvement compared to not compiling in uclamp? I suppose the important bit is that we're not seeing regressions anymore, but still. > Reported-by: Mel Gorman > Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcounting") > Signed-off-by: Qais Yousef > Cc: Juri Lelli > Cc: Vincent Guittot > Cc: Dietmar Eggemann > Cc: Steven Rostedt > Cc: Ben Segall > Cc: Mel Gorman > CC: Patrick Bellasi > Cc: Chris Redpath > Cc: Lukasz Luba > Cc: linux-kernel@vger.kernel.org > > kernel/sched/core.c | 56 +++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 49 insertions(+), 7 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index e19d2b915406..0824e1bfb484 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -993,9 +1012,16 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p, > lockdep_assert_held(&rq->lock); > > bucket = &uc_rq->bucket[uc_se->bucket_id]; > - SCHED_WARN_ON(!bucket->tasks); > - if (likely(bucket->tasks)) > - bucket->tasks--; > + > + /* > + * This could happen if sched_uclamp_unused was disabled while the > + * current task was running, hence we could end up with unbalanced call > + * to uclamp_rq_dec_id(). > + */ > + if (unlikely(!bucket->tasks)) > + return; I'm slightly worried about silent returns for cases like these, can we try to cook something up to preserve the previous SCHED_WARN_ON()? Say, something like the horrendous below - alternatively might be feasible with with some clever p->on_rq flag. --- diff --git a/include/linux/sched.h b/include/linux/sched.h index b62e6aaf28f0..09a7891eb481 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -695,6 +695,9 @@ struct task_struct { struct uclamp_se uclamp_req[UCLAMP_CNT]; /* Effective clamp values used for a scheduling entity */ struct uclamp_se uclamp[UCLAMP_CNT]; +#ifdef CONFIG_SCHED_DEBUG + int uclamp_unused_enqueue; +#endif #endif #ifdef CONFIG_PREEMPT_NOTIFIERS diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2a712dcb682b..2a723e9d5219 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1018,8 +1018,10 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p, * current task was running, hence we could end up with unbalanced call * to uclamp_rq_dec_id(). */ - if (unlikely(!bucket->tasks)) + if (unlikely(!bucket->tasks)) { + SCHED_WARN_ON(!p->uclamp_unused_enqueue); return; + } bucket->tasks--; uc_se->active = false; @@ -1049,8 +1051,16 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p) { enum uclamp_id clamp_id; - if (static_branch_likely(&sched_uclamp_unused)) +#ifdef CONFIG_SCHED_DEBUG + p->uclamp_unused_enqueue = 0; +#endif + + if (static_branch_likely(&sched_uclamp_unused)) { +#ifdef CONFIG_SCHED_DEBUG + p->uclamp_unused_enqueue = 1; +#endif return; + } if (unlikely(!p->sched_class->uclamp_enabled)) return; @@ -1075,6 +1085,10 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p) for_each_clamp_id(clamp_id) uclamp_rq_dec_id(rq, p, clamp_id); + +#ifdef CONFIG_SCHED_DEBUG + p->uclamp_unused_enqueue = 0; +#endif } static inline void ---