Received: by 2002:a05:7412:8521:b0:e2:908c:2ebd with SMTP id t33csp2413997rdf; Mon, 6 Nov 2023 13:32:41 -0800 (PST) X-Google-Smtp-Source: AGHT+IE2EFVQYBoq3C+eLVPMrdcocLgyitpD4PStOkdYwwzM5ezHANdmEYQ00FDK9aAvajm94Ly3 X-Received: by 2002:a17:90a:7445:b0:280:200c:2e22 with SMTP id o5-20020a17090a744500b00280200c2e22mr26194710pjk.0.1699306361476; Mon, 06 Nov 2023 13:32:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699306361; cv=none; d=google.com; s=arc-20160816; b=Z7AOKdZ19o/pV2AoruNt60ZwfTX6UDupknurlAoUciK8otNCGzZuMz0sDFuSW6x7m+ hx/Zw8jeVruPsgiBWKxdwFow31RO5nUUAgi8GxYKOuFT1SsbO9AvWGU71iE6vkZCzFNI 2expHAHrokBA7qVwlmReM//FIqCFnvHWEMlhzBAJTPG9jmAOV52h6UFbkbuemdmbIkLm CmKS2GFFE1edaaMqbJLSsHyYT3KJWqxMgZqpc7Of1YEMjcwjk+NsDqv2BTCPTEcpYLD8 1CYExLEmF4HjWBffEsBbWpdA/jgegfWa70KMwBOVVqRP9+SunOIft+vegnoI88AvC5EV 3tZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=TFuNQvNz6VrigJhBNosf/uvYFJji3ZKCfAmovuTcn/Y=; fh=VPaN+orJ8iBLsOu4c363yD1VuIwopVw/devgFaoaKR8=; b=WwVhJSobRXuyjXkC641vA/j/E1vf0UwSznz+lsdkhTB1tQs2Z9KGq8AtMX1z73k0Dn g8AhnI2RLpEDllrL1M4T7+7/C7XJnWBp75dzNU4e9FGtTdTSLJynnbIRMUI6K0yEBESF tkOCrj/3GP0NlU3/qfXVjASBN7Q761TLUGX/a2EQNPb/pHsI+Ty+ZcViT1kcK+y0if/t v4OYZXHIRWLRWkME13HhBFo4ohguccvUh0pcgLbjCo1Zr9EPgwZz5oBcHAdC83v6RdIX u8RsqPYTHeZa+7cCTppqnYuTvvzPxXo71YoIVdZetoD93KK3zg5tPc6aHjp0qBDiMUx7 Xgbw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=WAaa50qs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id d5-20020a170903230500b001cc50114667si8967638plh.551.2023.11.06.13.32.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Nov 2023 13:32:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=WAaa50qs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 3655E801B3A6; Mon, 6 Nov 2023 13:32:40 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233079AbjKFVcj (ORCPT + 99 others); Mon, 6 Nov 2023 16:32:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45088 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233005AbjKFVci (ORCPT ); Mon, 6 Nov 2023 16:32:38 -0500 Received: from mail-lj1-x233.google.com (mail-lj1-x233.google.com [IPv6:2a00:1450:4864:20::233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5185FB0 for ; Mon, 6 Nov 2023 13:32:35 -0800 (PST) Received: by mail-lj1-x233.google.com with SMTP id 38308e7fff4ca-2c54c8934abso70500311fa.0 for ; Mon, 06 Nov 2023 13:32:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; t=1699306353; x=1699911153; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=TFuNQvNz6VrigJhBNosf/uvYFJji3ZKCfAmovuTcn/Y=; b=WAaa50qsWk3jABk1DmSqGg5Z7OA9W9a0+AelCz5OsjcxF/0bhzquoFUy5TcFoS97BV 2skpE95rhTxEQgIDOdHoKzzF02M17ntOUNVgeiJHaI4j7L9OHWpYod46ubA82RanySmW o8xQqM/VR4tUV6/oZYNqQC0hIplmb82ZgDW1E= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699306353; x=1699911153; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TFuNQvNz6VrigJhBNosf/uvYFJji3ZKCfAmovuTcn/Y=; b=qD5zTVQXYyFWoGCzfVyA5HL7iQE1QyYrKE1CcsKWsYsVtmvbp6hVLgU8dtfqGRvsIB BKPp55oWwpZXFJ1iAl+3PYMLKkwDYfSW7BElxeVCtBDsHncMSG8bvP472/kAdcS6ka4S STeghOzxX0n7JB0XiNcPFfo8b1dRjKLGRRHp9A4ospRXWujkigpkJHXYpabIqp7l+3LN OZtvD2foOAzByU4KBRtRrjXCjVtqpA64i5SKTX8MsJ4QraUV4DLZQ8UucukLZpNwmnax EVM67ZwyzmyK/UClFyyxmRmPrnTzPn0IpYHs+/i0Vo7UfnBmU/28T8P0X0bwLcNsLLyp RRLQ== X-Gm-Message-State: AOJu0YzwloCn7PJ9rWnQlXqRu86QhXnAWMLlFvTdMeNiItKHc+e0mLva OKw5FNH9YiSBBdHuhQGUWAAVxMSlRdkLL3V/vIfTYQ== X-Received: by 2002:a2e:9a97:0:b0:2c5:911:db70 with SMTP id p23-20020a2e9a97000000b002c50911db70mr20227847lji.14.1699306353305; Mon, 06 Nov 2023 13:32:33 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Joel Fernandes Date: Mon, 6 Nov 2023 16:32:22 -0500 Message-ID: Subject: Re: [PATCH v5 6/7] sched/deadline: Deferrable dl server To: Daniel Bristot de Oliveira Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Luca Abeni , Tommaso Cucinotta , Thomas Gleixner , Vineeth Pillai , Shuah Khan , Phil Auld Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Mon, 06 Nov 2023 13:32:40 -0800 (PST) On Mon, Nov 6, 2023 at 2:32=E2=80=AFPM Joel Fernandes wrote: > > Hi Daniel, > > On Sat, Nov 4, 2023 at 6:59=E2=80=AFAM Daniel Bristot de Oliveira > wrote: > > > > Among the motivations for the DL servers is the real-time throttling > > mechanism. This mechanism works by throttling the rt_rq after > > running for a long period without leaving space for fair tasks. > > > > The base dl server avoids this problem by boosting fair tasks instead > > of throttling the rt_rq. The point is that it boosts without waiting > > for potential starvation, causing some non-intuitive cases. > > > > For example, an IRQ dispatches two tasks on an idle system, a fair > > and an RT. The DL server will be activated, running the fair task > > before the RT one. This problem can be avoided by deferring the > > dl server activation. > > > > By setting the zerolax option, the dl_server will dispatch an > > SCHED_DEADLINE reservation with replenished runtime, but throttled. > > > > The dl_timer will be set for (period - runtime) ns from start time. > > Thus boosting the fair rq on its 0-laxity time with respect to > > rt_rq. > > > > If the fair scheduler has the opportunity to run while waiting > > for zerolax time, the dl server runtime will be consumed. If > > the runtime is completely consumed before the zerolax time, the > > server will be replenished while still in a throttled state. Then, > > the dl_timer will be reset to the new zerolax time > > > > If the fair server reaches the zerolax time without consuming > > its runtime, the server will be boosted, following CBS rules > > (thus without breaking SCHED_DEADLINE). > > > > Signed-off-by: Daniel Bristot de Oliveira > > --- > > include/linux/sched.h | 2 + > > kernel/sched/deadline.c | 100 +++++++++++++++++++++++++++++++++++++++- > > kernel/sched/fair.c | 3 ++ > > 3 files changed, 103 insertions(+), 2 deletions(-) > > > > diff --git a/include/linux/sched.h b/include/linux/sched.h > > index 5ac1f252e136..56e53e6fd5a0 100644 > > --- a/include/linux/sched.h > > +++ b/include/linux/sched.h > > @@ -660,6 +660,8 @@ struct sched_dl_entity { > > unsigned int dl_non_contending : 1; > > unsigned int dl_overrun : 1; > > unsigned int dl_server : 1; > > + unsigned int dl_zerolax : 1; > > + unsigned int dl_zerolax_armed : 1; > > > > /* > > * Bandwidth enforcement timer. Each -deadline task has its > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > > index 1d7b96ca9011..69ee1fbd60e4 100644 > > --- a/kernel/sched/deadline.c > > +++ b/kernel/sched/deadline.c > > @@ -772,6 +772,14 @@ static inline void replenish_dl_new_period(struct = sched_dl_entity *dl_se, > > /* for non-boosted task, pi_of(dl_se) =3D=3D dl_se */ > > dl_se->deadline =3D rq_clock(rq) + pi_of(dl_se)->dl_deadline; > > dl_se->runtime =3D pi_of(dl_se)->dl_runtime; > > + > > + /* > > + * If it is a zerolax reservation, throttle it. > > + */ > > + if (dl_se->dl_zerolax) { > > + dl_se->dl_throttled =3D 1; > > + dl_se->dl_zerolax_armed =3D 1; > > + } > > } > > > > /* > > @@ -828,6 +836,7 @@ static inline void setup_new_dl_entity(struct sched= _dl_entity *dl_se) > > * could happen are, typically, a entity voluntarily trying to overcom= e its > > * runtime, or it just underestimated it during sched_setattr(). > > */ > > +static int start_dl_timer(struct sched_dl_entity *dl_se); > > static void replenish_dl_entity(struct sched_dl_entity *dl_se) > > { > > struct dl_rq *dl_rq =3D dl_rq_of_se(dl_se); > > @@ -874,6 +883,28 @@ static void replenish_dl_entity(struct sched_dl_en= tity *dl_se) > > dl_se->dl_yielded =3D 0; > > if (dl_se->dl_throttled) > > dl_se->dl_throttled =3D 0; > > + > > + /* > > + * If this is the replenishment of a zerolax reservation, > > + * clear the flag and return. > > + */ > > + if (dl_se->dl_zerolax_armed) { > > + dl_se->dl_zerolax_armed =3D 0; > > + return; > > + } > > + > > + /* > > + * A this point, if the zerolax server is not armed, and the de= adline > > + * is in the future, throttle the server and arm the zerolax ti= mer. > > + */ > > + if (dl_se->dl_zerolax && > > + dl_time_before(dl_se->deadline - dl_se->runtime, rq_clock(r= q))) { > > + if (!is_dl_boosted(dl_se)) { > > + dl_se->dl_zerolax_armed =3D 1; > > + dl_se->dl_throttled =3D 1; > > + start_dl_timer(dl_se); > > + } > > + } > > } > > > > /* > > @@ -1024,6 +1055,13 @@ static void update_dl_entity(struct sched_dl_ent= ity *dl_se) > > } > > > > replenish_dl_new_period(dl_se, rq); > > + } else if (dl_server(dl_se) && dl_se->dl_zerolax) { > > + /* > > + * The server can still use its previous deadline, so t= hrottle > > + * and arm the zero-laxity timer. > > + */ > > + dl_se->dl_zerolax_armed =3D 1; > > + dl_se->dl_throttled =3D 1; > > } > > } > > > > @@ -1056,8 +1094,20 @@ static int start_dl_timer(struct sched_dl_entity= *dl_se) > > * We want the timer to fire at the deadline, but considering > > * that it is actually coming from rq->clock and not from > > * hrtimer's time base reading. > > + * > > + * The zerolax reservation will have its timer set to the > > + * deadline - runtime. At that point, the CBS rule will decide > > + * if the current deadline can be used, or if a replenishment > > + * is required to avoid add too much pressure on the system > > + * (current u > U). > > */ > > - act =3D ns_to_ktime(dl_next_period(dl_se)); > > + if (dl_se->dl_zerolax_armed) { > > + WARN_ON_ONCE(!dl_se->dl_throttled); > > + act =3D ns_to_ktime(dl_se->deadline - dl_se->runtime); > > Just a question, here if dl_se->deadline - dl_se->runtime is large, > then does that mean that server activation will be much more into the > future? So say I want to give CFS 30%, then it will take 70% of the > period before CFS preempts RT thus "starving" CFS for this duration. I > think that's Ok for smaller periods and runtimes, though. > > I think it does reserve the amount of required CFS bandwidth so it is > probably OK, though it is perhaps letting RT run more initially (say > if CFS tasks are not CPU bound and occasionally wake up, they will > always be hit by the 70% latency AFAICS which may be large for large > periods and small runtimes). > One more consideration I guess is, because the server is throttled till 0-laxity time, it is possible that if CFS sleeps even a bit (after the DL-server is unthrottled), then it will be pushed out to a full current deadline + period due to CBS. In such a situation, if CFS-server is the only DL task running, it might starve RT for a bit more time. Example, say CFS runtime is 0.3s and period is 1s. At 0.7s, 0-laxity timer fires. CFS runs for 0.29s, then sleeps for 0.005s and wakes up at 0.295s (its remaining runtime is 0.01s at this point which is < the "time till deadline" of 0.005s). Now the runtime of the CFS-server will be replenished to the full 3s (due to CBS) and the deadline pushed out. The end result is the total runtime that the CFS-server actually gets is 0.0595s (though yes it did sleep for 5ms in between, still that's tiny -- say if it briefly blocked on a kernel mutex). On the other hand, if the CFS server started a bit earlier than the 0-laxity, it would probably not have had CBS pushing it out. This is likely also not an issue for shorter runtime/period values, still the throttling till later has a small trade-off (Not saying we should not do this, this whole series is likely a huge improvement over the current RT throttling). There is a chance I am uttering nonsense as I am not a DL expert, so apologies if so. Thanks.