Received: by 2002:a05:7412:8521:b0:e2:908c:2ebd with SMTP id t33csp2416618rdf; Mon, 6 Nov 2023 13:38:06 -0800 (PST) X-Google-Smtp-Source: AGHT+IFM8kK8CxR7pxF2G0pX5munNMcwRsOtKW9ZqOjXO1ZKagN3nO/yUQjETRz4AyFmYWMEfroK X-Received: by 2002:a17:90a:1c2:b0:280:52a3:711e with SMTP id 2-20020a17090a01c200b0028052a3711emr19808628pjd.47.1699306685437; Mon, 06 Nov 2023 13:38:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699306685; cv=none; d=google.com; s=arc-20160816; b=N0FsokjOVIb8N4hqg5h5laUfmEp9s59It+v18fyG8e+LmcK9zDcxqYA8vq4IRu3vAg AWJAhcVg4ivd9VXnY07/ow0W/GYuWRlHPC77cMah/G4nSa/quxAPSUqA7hVLdhKlaQUw J6fpKqNVFP/H5gMLShO5g4FBfQIjJg9USYpQfyogIWBNLLt9Ux23XgM5LgUlSL9FEFKw vTOmQnf7d+XRi6BXvjqJYjyb1W1k6ZY/rmEGU7WGQ+2aCSUrq2o6dEKlt2TzhmqS90qG evWLwQPNxTPxK4he6YPI5qnW5BxuJClHVZbqBPvKLHcSkm06WKZvsgcSExbZ1Iqj+b0R 8LTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=7dh0i1Y13fCBXMy4THdL0Y/icX4PEdeYjxpU/J+CTIM=; fh=VPaN+orJ8iBLsOu4c363yD1VuIwopVw/devgFaoaKR8=; b=nWCliAAza1UnpxPBkHTpuiLZH8+b8mLWwJqQmmRVztX05yfbkvzn0nP/yJNTehwEt7 ZDiBqv/aW+561BVNkTVU3dG9FPnjh8Tzh7rA+En5ZegdboDxUiVenAMwE6a9HCV+sUUY KM1dNG/xa7JLAiisT4JFBJuol3XCRBls6DfwNuDeGoMEspniKR7F5h7XQuzNF9rEPAUJ ikB/nhtODuoWRw4V3AUnkA9yHBqabyqYIg7ZqLE7rBQYIbNjbxqKdQNcQdSzJilbHruv OGN1k6lDyC995wipQCyRJKsDqBJrRhFmbBLhf9LQ1Zb2C96dCoAZgd51W+/rkThKUSNq 567w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=RpUgtOLU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id l20-20020a17090b079400b0027717627cf6si8954127pjz.41.2023.11.06.13.38.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Nov 2023 13:38:05 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=RpUgtOLU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 1E51480C7767; Mon, 6 Nov 2023 13:38:02 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233175AbjKFVht (ORCPT + 99 others); Mon, 6 Nov 2023 16:37:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57838 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233156AbjKFVhs (ORCPT ); Mon, 6 Nov 2023 16:37:48 -0500 Received: from mail-lj1-x22d.google.com (mail-lj1-x22d.google.com [IPv6:2a00:1450:4864:20::22d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA4C9D75 for ; Mon, 6 Nov 2023 13:37:44 -0800 (PST) Received: by mail-lj1-x22d.google.com with SMTP id 38308e7fff4ca-2c518a1d83fso66140721fa.3 for ; Mon, 06 Nov 2023 13:37:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; t=1699306663; x=1699911463; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=7dh0i1Y13fCBXMy4THdL0Y/icX4PEdeYjxpU/J+CTIM=; b=RpUgtOLUhtzIW1Omph/xCaZMtOw3VwEPCMRqtsSPikXey2qARc/YSVBrnx90W06yWc zG0Kcs1pS6dF/Ronej2yeGUUtDHNsbRxkMggPAXpHsKw1z+kxqh3RweM2EKWpMTdp3FX 6B0ru+Wgeeir3BGdIeSWK5DUwV6TiwFi6gnak= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699306663; x=1699911463; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7dh0i1Y13fCBXMy4THdL0Y/icX4PEdeYjxpU/J+CTIM=; b=qJkeLqNcPZOHSWSXr5xGK0euzDuT4Dk6K/PN0wOGdsuR0NfaKMYfWi6aTlWKkchS4o O61FR00L+5V/Wu1uvwEvQMXTYikKQZ/hX3j//+JNCxe8yzCD9EDXn2ehoHUY1FtsEpD/ FJw4/q44XU0oJJa/atoRimJgDOnNfLVDxkfWY6gfgrFP+LdEMJCvuByYfhBsyQlq7+Bg Eut1qu1FkCU+GMRB8pJ4F9udOi2b8ATKdwkzgO+vm1eDHMsUEltgVU0P8kcqWmjmOYRD TUpZAi7kcP7K9t/cumiGcnA9WYgDYJpj5Kb4hRSpQ6ZDxTWrdPzJuEBSsuBOzD3x6Pw4 4Fhg== X-Gm-Message-State: AOJu0YxgUbqB8kgEBh5RdbNuvY43IrUwzIZVeorbgBbm2K++MTnHckou EF+J9e2cuH8ipkEfb3w3bbP45sIGR22jnR6FW5JuFQ== X-Received: by 2002:a05:651c:213:b0:2c6:ee51:87aa with SMTP id y19-20020a05651c021300b002c6ee5187aamr11968270ljn.39.1699306663102; Mon, 06 Nov 2023 13:37:43 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Joel Fernandes Date: Mon, 6 Nov 2023 16:37:32 -0500 Message-ID: Subject: Re: [PATCH v5 6/7] sched/deadline: Deferrable dl server To: Daniel Bristot de Oliveira Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Luca Abeni , Tommaso Cucinotta , Thomas Gleixner , Vineeth Pillai , Shuah Khan , Phil Auld Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Mon, 06 Nov 2023 13:38:02 -0800 (PST) On Mon, Nov 6, 2023 at 4:32=E2=80=AFPM Joel Fernandes wrote: > > On Mon, Nov 6, 2023 at 2:32=E2=80=AFPM Joel Fernandes wrote: > > > > Hi Daniel, > > > > On Sat, Nov 4, 2023 at 6:59=E2=80=AFAM Daniel Bristot de Oliveira > > wrote: > > > > > > Among the motivations for the DL servers is the real-time throttling > > > mechanism. This mechanism works by throttling the rt_rq after > > > running for a long period without leaving space for fair tasks. > > > > > > The base dl server avoids this problem by boosting fair tasks instead > > > of throttling the rt_rq. The point is that it boosts without waiting > > > for potential starvation, causing some non-intuitive cases. > > > > > > For example, an IRQ dispatches two tasks on an idle system, a fair > > > and an RT. The DL server will be activated, running the fair task > > > before the RT one. This problem can be avoided by deferring the > > > dl server activation. > > > > > > By setting the zerolax option, the dl_server will dispatch an > > > SCHED_DEADLINE reservation with replenished runtime, but throttled. > > > > > > The dl_timer will be set for (period - runtime) ns from start time. > > > Thus boosting the fair rq on its 0-laxity time with respect to > > > rt_rq. > > > > > > If the fair scheduler has the opportunity to run while waiting > > > for zerolax time, the dl server runtime will be consumed. If > > > the runtime is completely consumed before the zerolax time, the > > > server will be replenished while still in a throttled state. Then, > > > the dl_timer will be reset to the new zerolax time > > > > > > If the fair server reaches the zerolax time without consuming > > > its runtime, the server will be boosted, following CBS rules > > > (thus without breaking SCHED_DEADLINE). > > > > > > Signed-off-by: Daniel Bristot de Oliveira > > > --- > > > include/linux/sched.h | 2 + > > > kernel/sched/deadline.c | 100 ++++++++++++++++++++++++++++++++++++++= +- > > > kernel/sched/fair.c | 3 ++ > > > 3 files changed, 103 insertions(+), 2 deletions(-) > > > > > > diff --git a/include/linux/sched.h b/include/linux/sched.h > > > index 5ac1f252e136..56e53e6fd5a0 100644 > > > --- a/include/linux/sched.h > > > +++ b/include/linux/sched.h > > > @@ -660,6 +660,8 @@ struct sched_dl_entity { > > > unsigned int dl_non_contending : 1; > > > unsigned int dl_overrun : 1; > > > unsigned int dl_server : 1; > > > + unsigned int dl_zerolax : 1; > > > + unsigned int dl_zerolax_armed : 1; > > > > > > /* > > > * Bandwidth enforcement timer. Each -deadline task has its > > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > > > index 1d7b96ca9011..69ee1fbd60e4 100644 > > > --- a/kernel/sched/deadline.c > > > +++ b/kernel/sched/deadline.c > > > @@ -772,6 +772,14 @@ static inline void replenish_dl_new_period(struc= t sched_dl_entity *dl_se, > > > /* for non-boosted task, pi_of(dl_se) =3D=3D dl_se */ > > > dl_se->deadline =3D rq_clock(rq) + pi_of(dl_se)->dl_deadline; > > > dl_se->runtime =3D pi_of(dl_se)->dl_runtime; > > > + > > > + /* > > > + * If it is a zerolax reservation, throttle it. > > > + */ > > > + if (dl_se->dl_zerolax) { > > > + dl_se->dl_throttled =3D 1; > > > + dl_se->dl_zerolax_armed =3D 1; > > > + } > > > } > > > > > > /* > > > @@ -828,6 +836,7 @@ static inline void setup_new_dl_entity(struct sch= ed_dl_entity *dl_se) > > > * could happen are, typically, a entity voluntarily trying to overc= ome its > > > * runtime, or it just underestimated it during sched_setattr(). > > > */ > > > +static int start_dl_timer(struct sched_dl_entity *dl_se); > > > static void replenish_dl_entity(struct sched_dl_entity *dl_se) > > > { > > > struct dl_rq *dl_rq =3D dl_rq_of_se(dl_se); > > > @@ -874,6 +883,28 @@ static void replenish_dl_entity(struct sched_dl_= entity *dl_se) > > > dl_se->dl_yielded =3D 0; > > > if (dl_se->dl_throttled) > > > dl_se->dl_throttled =3D 0; > > > + > > > + /* > > > + * If this is the replenishment of a zerolax reservation, > > > + * clear the flag and return. > > > + */ > > > + if (dl_se->dl_zerolax_armed) { > > > + dl_se->dl_zerolax_armed =3D 0; > > > + return; > > > + } > > > + > > > + /* > > > + * A this point, if the zerolax server is not armed, and the = deadline > > > + * is in the future, throttle the server and arm the zerolax = timer. > > > + */ > > > + if (dl_se->dl_zerolax && > > > + dl_time_before(dl_se->deadline - dl_se->runtime, rq_clock= (rq))) { > > > + if (!is_dl_boosted(dl_se)) { > > > + dl_se->dl_zerolax_armed =3D 1; > > > + dl_se->dl_throttled =3D 1; > > > + start_dl_timer(dl_se); > > > + } > > > + } > > > } > > > > > > /* > > > @@ -1024,6 +1055,13 @@ static void update_dl_entity(struct sched_dl_e= ntity *dl_se) > > > } > > > > > > replenish_dl_new_period(dl_se, rq); > > > + } else if (dl_server(dl_se) && dl_se->dl_zerolax) { > > > + /* > > > + * The server can still use its previous deadline, so= throttle > > > + * and arm the zero-laxity timer. > > > + */ > > > + dl_se->dl_zerolax_armed =3D 1; > > > + dl_se->dl_throttled =3D 1; > > > } > > > } > > > > > > @@ -1056,8 +1094,20 @@ static int start_dl_timer(struct sched_dl_enti= ty *dl_se) > > > * We want the timer to fire at the deadline, but considering > > > * that it is actually coming from rq->clock and not from > > > * hrtimer's time base reading. > > > + * > > > + * The zerolax reservation will have its timer set to the > > > + * deadline - runtime. At that point, the CBS rule will decid= e > > > + * if the current deadline can be used, or if a replenishment > > > + * is required to avoid add too much pressure on the system > > > + * (current u > U). > > > */ > > > - act =3D ns_to_ktime(dl_next_period(dl_se)); > > > + if (dl_se->dl_zerolax_armed) { > > > + WARN_ON_ONCE(!dl_se->dl_throttled); > > > + act =3D ns_to_ktime(dl_se->deadline - dl_se->runtime)= ; > > > > Just a question, here if dl_se->deadline - dl_se->runtime is large, > > then does that mean that server activation will be much more into the > > future? So say I want to give CFS 30%, then it will take 70% of the > > period before CFS preempts RT thus "starving" CFS for this duration. I > > think that's Ok for smaller periods and runtimes, though. > > > > I think it does reserve the amount of required CFS bandwidth so it is > > probably OK, though it is perhaps letting RT run more initially (say > > if CFS tasks are not CPU bound and occasionally wake up, they will > > always be hit by the 70% latency AFAICS which may be large for large > > periods and small runtimes). > > > > One more consideration I guess is, because the server is throttled > till 0-laxity time, it is possible that if CFS sleeps even a bit > (after the DL-server is unthrottled), then it will be pushed out to a > full current deadline + period due to CBS. In such a situation, if > CFS-server is the only DL task running, it might starve RT for a bit > more time. > > Example, say CFS runtime is 0.3s and period is 1s. At 0.7s, 0-laxity > timer fires. CFS runs for 0.29s, then sleeps for 0.005s and wakes up > at 0.295s (its remaining runtime is 0.01s at this point which is < the > "time till deadline" of 0.005s). Now the runtime of the CFS-server > will be replenished to the full 3s (due to CBS) and the deadline > pushed out. The end result is the total runtime that the CFS-server > actually gets is 0.0595s (though yes it did sleep for 5ms in between, > still that's tiny -- say if it briefly blocked on a kernel mutex). Blah, I got lost in decimal points. Here's the example again: Say CFS-server runtime is 0.3s and period is 1s. At 0.7s, 0-laxity timer fires. CFS runs for 0.29s, then sleeps for 0.005s and wakes up at 0.295s (its remaining runtime is 0.01s at this point which is < the "time till deadline" of 0.005s) Now the runtime of the CFS-server will be replenished to the full 0.3s (due to CBS) and the deadline pushed out. The end result is, the total runtime that the CFS-server actually gets is 0.595s (though yes it did sleep for 5ms in between, still that's tiny -- say if it briefly blocked on a kernel mutex). That's almost double the allocated runtime. This is just theoretical and I have yet to see if it is actually an issue in practice. Thanks.