Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp893124pxk; Mon, 31 Aug 2020 04:21:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwitwoqyGPq09oyrkIPwaa0aijNsBEjW4HlZqkHwKYqOt/CWP1BxS4bXZTDFm7kElXkaYZZ X-Received: by 2002:a17:906:6a84:: with SMTP id p4mr719174ejr.374.1598872913036; Mon, 31 Aug 2020 04:21:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1598872913; cv=none; d=google.com; s=arc-20160816; b=VF4D66p7dp40wkl+SGdf4zNE9pRmx+2RDGUgk6cuP3Nlh4LRmpC3VnY9e8prUUpuKL aHgux5I6tqru7bo131k3YHCkV8iaPj3mGut06viiYlW0xbjhelRXjF/5k+5SSW0jGtce fHuHsSoo1lYKX99vAktwmcIilPbyA7V5NzxK7KIRGsyg3M2ZrqqgR6ycRQeHN9L8/SKR s3/fIXqderhT+DtsGTws2CdMpohkrJ9pW9He4fuaRSUDzX8jeMmtQwxilD4qjOhZet+l E9Sno3hdORC3piQKrIlr7pgUY7BoVNnJt9djakzk+ThusOcJdo9qXSq/04I7Lkq2TM05 0HpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:subject:content-transfer-encoding :mime-version:message-id:date:cc:to:from; bh=HMJldHgYAfv7tTE9ep4DPAPn2psHnoB3QQS4cXjzSJE=; b=lrQxXf82MjYCfJTkcwfa3z8dwcLEJ7qyaTd4vx/H7LCBcGbuEVB3he6XqvrnOuHLVr cKMj8Fc6FmyVyUf28QXqwWRoBZkf6LVW2Jp+g+Iadwg7UF0TOrAjjIFoEHn1EzazBft2 fin2drmN2jnl/lX+qWuLRuQ+NV8kQA+G0fYIUSG+v9ev+HkzAy9f1lai0n8EDi85Mgtc k/Yie4H1MbwXnuy2JsRIW2Ad6YjuRwkp3gyg1WrzdDQQ1012uXdkHf3C+XPwCgtUBfNf t3tq07TVu5m5bZ49dZN70C3E8e+KECug0GB2z2sK9X9oKb3BIjNdjNRkTLvnNP4Ct9l/ Oqyg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z13si5282199ejx.751.2020.08.31.04.21.29; Mon, 31 Aug 2020 04:21:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726557AbgHaLTk (ORCPT + 99 others); Mon, 31 Aug 2020 07:19:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42226 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726411AbgHaLIK (ORCPT ); Mon, 31 Aug 2020 07:08:10 -0400 Received: from metis.ext.pengutronix.de (metis.ext.pengutronix.de [IPv6:2001:67c:670:201:290:27ff:fe1d:cc33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4209FC061573 for ; Mon, 31 Aug 2020 04:07:34 -0700 (PDT) Received: from [2a0a:edc0:0:1101:1d::39] (helo=dude03.red.stw.pengutronix.de) by metis.ext.pengutronix.de with esmtp (Exim 4.92) (envelope-from ) id 1kChei-0007tZ-1h; Mon, 31 Aug 2020 13:07:20 +0200 From: Lucas Stach To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , linux-kernel@vger.kernel.org, kernel@pengutronix.de, patchwork-lst@pengutronix.de Date: Mon, 31 Aug 2020 13:07:19 +0200 Message-Id: <20200831110719.2126930-1-l.stach@pengutronix.de> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SA-Exim-Connect-IP: 2a0a:edc0:0:1101:1d::39 X-SA-Exim-Mail-From: l.stach@pengutronix.de X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on metis.ext.pengutronix.de X-Spam-Level: X-Spam-Status: No, score=-0.4 required=4.0 tests=BAYES_00,RDNS_NONE, SPF_HELO_NONE,SPF_SOFTFAIL autolearn=no autolearn_force=no version=3.4.2 Subject: [PATCH] sched/deadline: Fix stale throttling on de-/boosted tasks X-SA-Exim-Version: 4.2.1 (built Wed, 08 May 2019 21:11:16 +0000) X-SA-Exim-Scanned: Yes (on metis.ext.pengutronix.de) X-PTX-Original-Recipient: linux-kernel@vger.kernel.org Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When a boosted task gets throttled, what normally happens is that it's immediately enqueued again with ENQUEUE_REPLENISH, which replenishes the runtime and clears the dl_throttled flag. There is a special case however: if the throttling happened on sched-out and the task has been deboosted in the meantime, the replenish is skipped as the task will return to its normal scheduling class. This leaves the task with the dl_throttled flag set. Now if the task gets boosted up to the deadline scheduling class again while it is sleeping, it's still in the throttled state. The normal wakeup however will enqueue the task with ENQUEUE_REPLENISH not set, so we don't actually place it on the rq. Thus we end up with a task that is runnable, but not actually on the rq and neither a immediate replenishment happens, nor is the replenishment timer set up, so the task is stuck in forever-throttled limbo. Clear the dl_throttled flag before dropping back to the normal scheduling class to fix this issue. Signed-off-by: Lucas Stach --- This is the root cause and fix of the issue described at [1]. After working on other stuff for the last few months, I finally was able to circle back to this issue and gather the required data to pinpoint the failure mode. [1] https://lkml.org/lkml/2020/3/20/765 --- kernel/sched/deadline.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 3862a28cd05d..c19c1883d695 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1527,12 +1527,15 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags) pi_se = &pi_task->dl; } else if (!dl_prio(p->normal_prio)) { /* - * Special case in which we have a !SCHED_DEADLINE task - * that is going to be deboosted, but exceeds its - * runtime while doing so. No point in replenishing - * it, as it's going to return back to its original - * scheduling class after this. + * Special case in which we have a !SCHED_DEADLINE task that is going + * to be deboosted, but exceeds its runtime while doing so. No point in + * replenishing it, as it's going to return back to its original + * scheduling class after this. If it has been throttled, we need to + * clear the flag, otherwise the task may wake up as throttled after + * being boosted again with no means to replenish the runtime and clear + * the throttle. */ + p->dl.dl_throttled = 0; BUG_ON(!p->dl.dl_boosted || flags != ENQUEUE_REPLENISH); return; } -- 2.20.1