Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp202938iob; Mon, 2 May 2022 17:06:31 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzF5Qrhx0rhY8CmtGVnxTc7DOjwSkxmIauerMa/WWddlnIbtQbmElnVwfoFWQk8SiRYN6Pi X-Received: by 2002:a17:902:bc46:b0:15c:f32f:39bf with SMTP id t6-20020a170902bc4600b0015cf32f39bfmr15036194plz.32.1651536390682; Mon, 02 May 2022 17:06:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651536390; cv=none; d=google.com; s=arc-20160816; b=i4CzM2fABmYWHN7PQsh2ZRBrpq0jctowCmNRW5bXz/yhV501ehazhQWB1TZtHOC4rp hWMVRTnUbU+J7RIXVfVE079FO/BzYboCbJ1Lu0bcAwFZbDJH+KTHdUb+J8knVX4hjidI RvnW+e5GO0CoKqSgsuhqlqDcqByJgeLNv61CLnEMjxLDoYWzX1+9QkpStU5M6EcWn339 G7l3B69lZJOeSX7Ow13g8BpMODwsvC0BssaUrtguI/s8CekASyPOi/s8gQ4LTv7BrNEo g56qufEBM2Q6NcggKBoAGelhNl2iZOri6aDnPHEUugZDhT/5IXihyPeKW20+6R00SWRA glxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:dkim-signature:date; bh=K4pFDZe+cxfRhQZdm2ea58zzQK7Kjz3GOkvvWFPRfx0=; b=bmicf144KULGTBVa1iIdrol/LZ+cR03fPYsLS62iipq0jzXabiQz07ZKaI1CUY+/Rt rbUDAZL9xq60YeqOurMwKU6wKkfRrsSTwCFf/4bXipbkI/ywmeFIDsGeaRisFU61fOWC 3u47Km8RN3gTapMPRllh3wOvii38WDbAcy7tDv6cEaez1QF9Hz15caE9wIGyIKQ5F9Dv TYlSmo/0ru1sWeluop2l19rS/IXxh8I/n6lW15UUukTewhzxzTVhrvRI6HwPq3M/OPHN 2BWJWYhToGc3VHfVVTjlxgzKyQinrgWy06z9ycPOQRWpZaLw2vTCXMesFEVK9LevMVpk AbEw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=QMGVzTRE; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id x20-20020a056a00189400b0050cfb4ff72dsi17350470pfh.52.2022.05.02.17.06.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 May 2022 17:06:30 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=QMGVzTRE; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A291C35AB2; Mon, 2 May 2022 17:05:53 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240205AbiEBPuT (ORCPT + 99 others); Mon, 2 May 2022 11:50:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58138 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239930AbiEBPuS (ORCPT ); Mon, 2 May 2022 11:50:18 -0400 Received: from out2.migadu.com (out2.migadu.com [IPv6:2001:41d0:2:aacc::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 491A711C12 for ; Mon, 2 May 2022 08:46:48 -0700 (PDT) Date: Mon, 2 May 2022 23:47:32 +0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1651506406; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=K4pFDZe+cxfRhQZdm2ea58zzQK7Kjz3GOkvvWFPRfx0=; b=QMGVzTREQa7v1eu0jAyaaQqi3z5UfGFiRCJI8gZz2NhDHFp4HJ7WjLibPN41TPukiQm4PK oiTTUXmRxryjU6FL284UvEtPCSfY6eLeLhxXegB7xebygfCMi26G1tJ3Bv+gFAMS1NBzxe hw3wKcF1npcZ7O8glJ2K/ON16kHQnPw= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Tao Zhou To: Vincent Guittot , Tao Zhou Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, linux-kernel@vger.kernel.org, parth@linux.ibm.com, qais.yousef@arm.com, chris.hyser@oracle.com, vschneid@redhat.com, patrick.bellasi@matbug.net, David.Laight@aculab.com, pjt@google.com, pavel@ucw.cz, tj@kernel.org, qperret@google.com, tim.c.chen@linux.intel.com Subject: Re: [RFC 5/6] sched/fair: Take into account latency nice at wakeup Message-ID: References: <20220311161406.23497-1-vincent.guittot@linaro.org> <20220311161406.23497-6-vincent.guittot@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,PDS_BTC_ID,RDNS_NONE,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 02, 2022 at 11:26:14PM +0800, Tao Zhou wrote: > On Mon, May 02, 2022 at 11:08:15PM +0800, Tao Zhou wrote: > > On Mon, May 02, 2022 at 02:30:55PM +0200, Vincent Guittot wrote: > > > > > On Mon, 2 May 2022 at 11:54, Vincent Guittot wrote: > > > > > > > > Hi Tao, > > > > > > > > On Sun, 1 May 2022 at 17:58, Tao Zhou wrote: > > > > > > > > > > Hi Vincent, > > > > > > > > > > Change to Valentin Schneider's now using mail address. > > > > > > > > Thanks > > > > > > > > > > > > > > On Fri, Mar 11, 2022 at 05:14:05PM +0100, Vincent Guittot wrote: > > > > > > > > > > > Take into account the nice latency priority of a thread when deciding to > > > > > > preempt the current running thread. We don't want to provide more CPU > > > > > > bandwidth to a thread but reorder the scheduling to run latency sensitive > > > > > > task first whenever possible. > > > > > > > > > > > > As long as a thread didn't use its bandwidth, it will be able to preempt > > > > > > the current thread. > > > > > > > > > > > > At the opposite, a thread with a low latency priority will preempt current > > > > > > thread at wakeup only to keep fair CPU bandwidth sharing. Otherwise it will > > > > > > wait for the tick to get its sched slice. > > > > > > > > > > > > curr vruntime > > > > > > | > > > > > > sysctl_sched_wakeup_granularity > > > > > > <--> > > > > > > ----------------------------------|----|-----------------------|--------------- > > > > > > | |<---------------------> > > > > > > | . sysctl_sched_latency > > > > > > | . > > > > > > default/current latency entity | . > > > > > > | . > > > > > > 1111111111111111111111111111111111|0000|-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1- > > > > > > se preempts curr at wakeup ------>|<- se doesn't preempt curr ----------------- > > > > > > | . > > > > > > | . > > > > > > | . > > > > > > low latency entity | . > > > > > > ---------------------->| > > > > > > % of sysctl_sched_latency | > > > > > > 1111111111111111111111111111111111111111111111111111111111|0000|-1-1-1-1-1-1-1- > > > > > > preempt ------------------------------------------------->|<- do not preempt -- > > > > > > | . > > > > > > | . > > > > > > | . > > > > > > high latency entity | . > > > > > > |<-----------------------| . > > > > > > | % of sysctl_sched_latency . > > > > > > 111111111|0000|-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1 > > > > > > preempt->|<- se doesn't preempt curr ------------------------------------------ > > > > > > > > > > > > Tests results of nice latency impact on heavy load like hackbench: > > > > > > > > > > > > hackbench -l (2560 / group) -g group > > > > > > group latency 0 latency 19 > > > > > > 1 1.450(+/- 0.60%) 1.376(+/- 0.54%) + 5% > > > > > > 4 1.537(+/- 1.75%) 1.335(+/- 1.81%) +13% > > > > > > 8 1.414(+/- 1.91%) 1.348(+/- 1.88%) + 5% > > > > > > 16 1.423(+/- 1.65%) 1.374(+/- 0.58%) + 3% > > > > > > > > > > > > hackbench -p -l (2560 / group) -g group > > > > > > group > > > > > > 1 1.416(+/- 3.45%) 0.886(+/- 0.54%) +37% > > > > > > 4 1.634(+/- 7.14%) 0.888(+/- 5.40%) +45% > > > > > > 8 1.449(+/- 2.14%) 0.804(+/- 4.55%) +44% > > > > > > 16 0.917(+/- 4.12%) 0.777(+/- 1.41%) +15% > > > > > > > > > > > > By deacreasing the latency prio, we reduce the number of preemption at > > > > > > > > > > s/deacreasing/decreasing/ > > > > > > > > yes > > > > > > > > > s/reduce/increase/ > > > > > > > > not in the case of hackbench tests above. By decreasing the latency > > > > prio of all hackbench threads, we make sure that they will not preempt > > > > the current thread and let it move forward so we reduce the number of > > > > preemption. > > > > > > > > > > > > > > > wakeup and help hackbench making progress. > > > > > > > > > > > > Test results of nice latency impact on short live load like cyclictest > > > > > > while competing with heavy load like hackbench: > > > > > > > > > > > > hackbench -l 10000 -g group & > > > > > > cyclictest --policy other -D 5 -q -n > > > > > > latency 0 latency -20 > > > > > > group min avg max min avg max > > > > > > 0 16 17 28 15 17 27 > > > > > > 1 61 382 10603 63 89 4628 > > > > > > 4 52 437 15455 54 98 16238 > > > > > > 8 56 728 38499 61 125 28983 > > > > > > 16 53 1215 52207 61 183 80751 > > > > > > > > > > > > group = 0 means that hackbench is not running. > > > > > > > > > > > > The avg is significantly improved with nice latency -20 especially with > > > > > > large number of groups but min and max remain quite similar. If we add the > > > > > > histogram parameters to get details of latency, we have : > > > > > > > > > > > > hackbench -l 10000 -g 16 & > > > > > > cyclictest --policy other -D 5 -q -n -H 20000 --histfile data.txt > > > > > > latency 0 latency -20 > > > > > > Min Latencies: 63 62 > > > > > > Avg Latencies: 1397 218 > > > > > > Max Latencies: 44926 42291 > > > > > > 50% latencies: 100 98 > > > > > > 75% latencies: 762 116 > > > > > > 85% latencies: 1118 126 > > > > > > 90% latencies: 1540 130 > > > > > > 95% latencies: 5610 138 > > > > > > 99% latencies: 13738 266 > > > > > > > > > > > > With percentile details, we see the benefit of nice latency -20 as > > > > > > 1% of the latencies stays above 266us whereas the default latency has > > > > > > got 25% are above 762us and 10% over the 1ms. > > > > > > > > > > > > > > [..] > > > > > > > > > > +static long wakeup_latency_gran(int latency_weight) > > > > > > +{ > > > > > > + long thresh = sysctl_sched_latency; > > > > > > + > > > > > > + if (!latency_weight) > > > > > > + return 0; > > > > > > + > > > > > > + if (sched_feat(GENTLE_FAIR_SLEEPERS)) > > > > > > + thresh >>= 1; > > > > > > + > > > > > > + /* > > > > > > + * Clamp the delta to stay in the scheduler period range > > > > > > + * [-sysctl_sched_latency:sysctl_sched_latency] > > > > > > + */ > > > > > > + latency_weight = clamp_t(long, latency_weight, > > > > > > + -1 * NICE_LATENCY_WEIGHT_MAX, > > > > > > + NICE_LATENCY_WEIGHT_MAX); > > > > > > + > > > > > > + return (thresh * latency_weight) >> NICE_LATENCY_SHIFT; > > > > > > +} > > > > > > + > > > > > > static unsigned long wakeup_gran(struct sched_entity *se) > > > > > > { > > > > > > unsigned long gran = sysctl_sched_wakeup_granularity; > > > > > > @@ -7008,6 +7059,10 @@ static int > > > > > > wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se) > > > > > > { > > > > > > s64 gran, vdiff = curr->vruntime - se->vruntime; > > > > > > + int latency_weight = se->latency_weight - curr->latency_weight; > > > > > > + > > > > > > + latency_weight = min(latency_weight, se->latency_weight); > > > > > > > > > > Let lable A=se->latency_weight, B=curr->latency_weight, C=A-B. > > > > > > > > > > 1 A>0 B>0 > > > > > ::C=A-B<0, min(C,A)=C, latency decrease, C is the real diff value no matter > > > > > what A is. That means it can be very 'long'(-sched_latency) and vdiff is > > > > > more possible to be in <= 0 case and return -1. > > > > > ::C=A-B>0, min(C,A)=A, latency increase, but it is conservative. Limit to > > > > > > > > A>0 and B>0 then min(C=A-B, A) = C > > > > It's my mistake. And in this case the calculating of value added to vdiff > > for latency increase and deleted to vdiff for latency decrease is the same. > > > > > > > > > > > A/1024*sched_latency. > > > > > When latecy is decreased, the negtive value added to vdiff is larger than the > > > > > positive value added to vdiff when latency is increased. > > > > > > > > When the weight > 0, the tasks have some latency requirements so we > > > > use their relative priority to decide if we can preempt current which > > > > also has some latency requirement > > > > > > > > > > > > > > 2 A>0 B<0 > > > > > ::C=A-B>0 and C>A, min(C,A)=A, latency increase and it is conservative. > > > > > > For this one we want to use delta like for UC 1 above > > > > > > > > > > > > > 3 A<0 B<0 > > > > > ::C=A-B>0, min(C,A)=A, latency increase but min(C,A)<0, I think if latency > > > > > increase means the value added to vdiff will be a positive value to increase > > > > > the chance to return 1. I would say it is max(C,A)=C > > > > > ::C=A-B<0, min(C,A)=A, latency decrease and the real negtive value. > > > > > > > > When the weight < 0, the tasks haven't latency requirement and even > > > > don't care of being scheduled "quickly". In this case, we don't care > > > > about the relative priority but we want to minimize the preemption of > > > > current so we use the weight > > > > > > > > > > > > > > 4 A<0,B>0 > > > > > ::C=A-B<0, min(C,A)=C, latency decrease and the real negtive value. > > > > > > And for this one we probably want to use A like for other UC with A < 0 > > > > > > I'm going to update the way the weight is computed to match this > > > > According to your explanations, the min(C,A) is used for A and B both in > > the negtive region or in the postive region, the max(C,A) is use for A and > > B both not in the same region. > > > > if ((A>>31)^(B>>31)) > > max(C,A) > > else > > min(C,A) > > Not right. > > if ((A&(1<<31))^(B(1<<31))) > max(C,A) > else > min(C,A) > > > > > wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se) > > { > > s64 gran, vdiff = curr->vruntime - se->vruntime; > > int latency_weight = se->latency_weight - curr->latency_weight; > > > > if ((se->latency_weight>>(WMULT_SHIFT-1)) ^ > > curr->latency_weight>>(WMULT_SHIFT-1)) > > latency_weight = max(latency_weight, se->latency_weight); > > else > > latency_weight = min(latency_weight, se->latency_weight); > > > > vdiff += wakeup_latency_gran(latency_weight); > > ... > > } > > Not right. > > wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se) > { > s64 gran, vdiff = curr->vruntime - se->vruntime; > int latency_weight = se->latency_weight - curr->latency_weight; > > if ((se->latency_weight & (1 << WMULT_SHIFT-1)) ^ > curr->latency_weight & (1 << WMULT_SHIFT-1)) > latency_weight = max(latency_weight, se->latency_weight); > else > latency_weight = min(latency_weight, se->latency_weight); > > vdiff += wakeup_latency_gran(latency_weight); > ... > } I already on bed, but I think they are the same.. heh > > > > > > > > > > Is there a reason that the value when latency increase and latency decrease > > > > > be treated differently. Latency increase value is limited to se's latency_weight > > > > > > > > I have tried to explain why I treat differently if weight is > 0 or < 0. > > > > > > > > > but latency decrease value can extend to -sched_latency or treat them the same. > > > > > Penalty latency decrease and conserve latency increase. > > > > > > > > > > > > > > > There is any value that this latency_weight can be considered to be a average. > > > > > > > > > > The delta value choose is ~%5 to 1024. %5*sched_latency=0.05*6ms=0.3ms.(no scale) > > > > > I do not think over that vdiff equation and others though. > > > > > > > > > > Thanks, > > > > > Tao > > > > > > + vdiff += wakeup_latency_gran(latency_weight); > > > > > > > > > > > > if (vdiff <= 0) > > > > > > return -1; > > > > > > @@ -7117,6 +7172,7 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_ > > > > > > return; > > > > > > > > > > > > update_curr(cfs_rq_of(se)); > > > > > > + > > > > > > if (wakeup_preempt_entity(se, pse) == 1) { > > > > > > /* > > > > > > * Bias pick_next to pick the sched entity that is > > > > > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > > > > > > index 456ad2159eb1..dd92aa9c36f9 100644 > > > > > > --- a/kernel/sched/sched.h > > > > > > +++ b/kernel/sched/sched.h > > > > > > @@ -122,6 +122,17 @@ extern void call_trace_sched_update_nr_running(struct rq *rq, int count); > > > > > > * Default tasks should be treated as a task with latency_nice = 0. > > > > > > */ > > > > > > #define DEFAULT_LATENCY_NICE 0 > > > > > > +#define DEFAULT_LATENCY_PRIO (DEFAULT_LATENCY_NICE + LATENCY_NICE_WIDTH/2) > > > > > > + > > > > > > +/* > > > > > > + * Convert user-nice values [ -20 ... 0 ... 19 ] > > > > > > + * to static latency [ 0..39 ], > > > > > > + * and back. > > > > > > + */ > > > > > > +#define NICE_TO_LATENCY(nice) ((nice) + DEFAULT_LATENCY_PRIO) > > > > > > +#define LATENCY_TO_NICE(prio) ((prio) - DEFAULT_LATENCY_PRIO) > > > > > > +#define NICE_LATENCY_SHIFT (SCHED_FIXEDPOINT_SHIFT) > > > > > > +#define NICE_LATENCY_WEIGHT_MAX (1L << NICE_LATENCY_SHIFT) > > > > > > > > > > > > /* > > > > > > * Increase resolution of nice-level calculations for 64-bit architectures. > > > > > > @@ -2098,6 +2109,7 @@ static_assert(WF_TTWU == SD_BALANCE_WAKE); > > > > > > > > > > > > extern const int sched_prio_to_weight[40]; > > > > > > extern const u32 sched_prio_to_wmult[40]; > > > > > > +extern const int sched_latency_to_weight[40]; > > > > > > > > > > > > /* > > > > > > * {de,en}queue flags: > > > > > > -- > > > > > > 2.17.1 > > > > > >