Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1777301imu; Tue, 6 Nov 2018 04:17:16 -0800 (PST) X-Google-Smtp-Source: AJdET5d/tM9WZh4ohwYig3K+2JHQT3TrtW5P4ILQqdILm3FBznLgfEpakMjGfqsmucX9ruf+vdpm X-Received: by 2002:a63:1a0c:: with SMTP id a12mr3442589pga.157.1541506636025; Tue, 06 Nov 2018 04:17:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541506635; cv=none; d=google.com; s=arc-20160816; b=LkMK/WF4s9+SJbS6//OhJ8H/B4LL6fhbfe4S3G16bPFeBjz4qXqj0G9gX+g4CzUrLZ JK0nAWIi9FnZsg/muZKxXoLKDl+tclZiQmi/AfUqQczhpk0hUgPjFW85KY/z963h2CQ4 4yPjq6LyU1geCJff/MAMzd/JHuPi4zjBc9E0W2Tc5aEk7ui1u2RP+MXKlkbXoQbOqR9R ou90iS9t912K6bW58Pi62oseWbtWfOX8MSAzF4p3yQOu55AiION4MmVYxgN8HJlTOV8Z /H4oYcaVUKCb1pPTg2ZyJbwcw8hVkoTUrVEnhG9nal7Yjtll/YEHrrOsxXm9v6LWWAG1 oXCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=2QJipU+HXUvVA+b1QDgedT6J9Y32YKRs0MEcblmHkkA=; b=tSXjBBNpRUrhP1ijw1o/En5BzJnDhUJjMDJNU2VjK/oGHBAd06SJPiI+BGpD76lQrD zAFwKHmwgy9gWMeftbQAWOJGeFQZ3hMHLP7pUuS8q2icOFP6gV1MMweZqIpQn31xq5s4 b5wGTDOJtQwdNqlklUl1oCbustObGKEwAuAzFchtpAVthg26PtRYRa4TfbRZCyfvWzjK BKwwLH863sv7IRbs+FoDVt9oglEaB8f0BGwaRGHsu06lvOJEIFdMUmkIGUI56xPqr5It A6stmg6hUuZLZY0c3lDBBvLBCPgmYe/7S3abTWDbsHCRUI7QIgtY4evga8zMp8RR1vfr W+lA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 133-v6si13358319pfb.41.2018.11.06.04.17.00; Tue, 06 Nov 2018 04:17:15 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730494AbeKFVJt (ORCPT + 99 others); Tue, 6 Nov 2018 16:09:49 -0500 Received: from mail-wm1-f67.google.com ([209.85.128.67]:39030 "EHLO mail-wm1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729878AbeKFVJt (ORCPT ); Tue, 6 Nov 2018 16:09:49 -0500 Received: by mail-wm1-f67.google.com with SMTP id u13-v6so11422872wmc.4 for ; Tue, 06 Nov 2018 03:44:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=2QJipU+HXUvVA+b1QDgedT6J9Y32YKRs0MEcblmHkkA=; b=Zwa0DHxdF8N9bd/VxXY18RNxyoawY/t2l+/nRArfCP7zddj2dwJ5HsyMfc+BnRT7xA U/Q5220YZ210Dz5pHgaXhwUKhITn8r5/XVKzIiSEUNoJTNbWUUbE+j57jhG5GyC064gR 6ZG8UFbVfGSg2w5G9paZ2dja8flZM7NzbIvFdYB8Ubwcu3TJ5J5sKYFD+JqBGDf5YpvM AsdgwAR2VxX1qPvtyRZE7F5DvV0r2TPg5aQIFvM7de3NYJvcZDx0hN4eqWi5Hv2xsmWM m26UvIuBL8Y0YfpKAprmaFEFpBbLlz3ceAnpf40h2nJyJ2DlQE1InxhE8HACECVxqjMl FObA== X-Gm-Message-State: AGRZ1gIy0ejcfihAyf5zLfDB4ZrwcU7cYwylACzLjU0wb9coLqEfpgbt 2OBNW9rU137OiJBOPMkOSLcHgw== X-Received: by 2002:a1c:5d01:: with SMTP id r1-v6mr1642360wmb.25.1541504697277; Tue, 06 Nov 2018 03:44:57 -0800 (PST) Received: from localhost.localdomain ([151.34.111.247]) by smtp.gmail.com with ESMTPSA id 191-v6sm2291681wmk.30.2018.11.06.03.44.55 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 06 Nov 2018 03:44:56 -0800 (PST) Date: Tue, 6 Nov 2018 12:44:52 +0100 From: Juri Lelli To: Peter Zijlstra Cc: luca abeni , Thomas Gleixner , Juri Lelli , syzbot , Borislav Petkov , "H. Peter Anvin" , LKML , mingo@redhat.com, nstange@suse.de, syzkaller-bugs@googlegroups.com, henrik@austad.us, Tommaso Cucinotta , Claudio Scordino , Daniel Bristot de Oliveira Subject: Re: INFO: rcu detected stall in do_idle Message-ID: <20181106114452.GV18091@localhost.localdomain> References: <20181016153608.GH9130@localhost.localdomain> <20181018082838.GA21611@localhost.localdomain> <20181018122331.50ed3212@luca64> <20181018104713.GC21611@localhost.localdomain> <20181018130811.61337932@luca64> <20181019113942.GH3121@hirez.programming.kicks-ass.net> <20181019225005.61707c64@nowhere> <20181024120335.GE29272@localhost.localdomain> <20181030104554.GB8177@hirez.programming.kicks-ass.net> <20181030111221.GA18091@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181030111221.GA18091@localhost.localdomain> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 30/10/18 12:12, Juri Lelli wrote: > On 30/10/18 11:45, Peter Zijlstra wrote: > > [...] > > > Hurm.. right. We knew of this issue back when we did it. > > I suppose now it hurts and we need to figure something out. > > > > By virtue of being a real-time class, we do indeed need to have deadline > > on the wall-clock. But if we then don't account runtime on that same > > clock, but on a potentially slower clock, we get the problem that we can > > run longer than our period/deadline, which is what we're running into > > here I suppose. > > > > And yes, at some point RT workloads need to be aware of the jitter > > injected by things like IRQs and such. But I believe the rationale was > > that for soft real-time workloads this current semantic was 'easier' > > because we get to ignore IRQ overhead for workload estimation etc. > > Right. In this case the task is self injecting IRQ load, but it maybe > doesn't make a big difference on how we need to treat it (supposing we > can actually distinguish). > > > What we could maybe do is track runtime in both rq_clock_task() and > > rq_clock() and detect where the rq_clock based one exceeds the period > > and then push out the deadline (and add runtime). > > > > Maybe something along such lines; does that make sense? > > Yeah, I think I've got the gist of the idea. I'll play with it. So, even though I consider Luca and Daniel's points/concerns more then valid, I spent I bit of time playing with this idea and ended up with what follows. I'm adding deadline adjustment when we realize that there is a difference between delta_exec and delta_wall (IRQ load) and walltime is bigger then configured dl_period. The adjustment to rq_clock (even though it of course de-synchs user/kernel deadlines) seems needed, as w/o this kind of thing task doesn't get throttled for too long, a dl_period at max I guess, after having been running for quite a while. Another change is that I believe we want to keep runtime always below dl_runtime while pushing deadline away. This doesn't seem to modify behavior for non misbehaving and non affected by irq load tasks and should fix stalls and starvation of lower prio activities problem. Does it make any sense? :-) --->8--- include/linux/sched.h | 3 ++ kernel/sched/deadline.c | 78 ++++++++++++++++++++++++++++++----------- 2 files changed, 60 insertions(+), 21 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 977cb57d7bc9..e5aaf6deab7e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -521,6 +521,9 @@ struct sched_dl_entity { u64 deadline; /* Absolute deadline for this instance */ unsigned int flags; /* Specifying the scheduler behaviour */ + u64 wallstamp; + s64 walltime; + /* * Some bool flags: * diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 93da990ba458..061562589282 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -696,16 +696,8 @@ static void replenish_dl_entity(struct sched_dl_entity *dl_se, if (dl_se->dl_yielded && dl_se->runtime > 0) dl_se->runtime = 0; - /* - * We keep moving the deadline away until we get some - * available runtime for the entity. This ensures correct - * handling of situations where the runtime overrun is - * arbitrary large. - */ - while (dl_se->runtime <= 0) { - dl_se->deadline += pi_se->dl_period; - dl_se->runtime += pi_se->dl_runtime; - } + /* XXX have to deal with PI */ + dl_se->walltime = 0; /* @@ -917,7 +909,7 @@ static int start_dl_timer(struct task_struct *p) * that it is actually coming from rq->clock and not from * hrtimer's time base reading. */ - act = ns_to_ktime(dl_next_period(dl_se)); + act = ns_to_ktime(dl_se->deadline - dl_se->dl_period); now = hrtimer_cb_get_time(timer); delta = ktime_to_ns(now) - rq_clock(rq); trace_printk("%s: cpu:%d pid:%d dl_runtime:%llu dl_deadline:%llu dl_period:%llu runtime:%lld deadline:%llu rq_clock:%llu rq_clock_task:%llu act:%lld now:%lld delta:%lld", @@ -1174,9 +1166,10 @@ static void update_curr_dl(struct rq *rq) { struct task_struct *curr = rq->curr; struct sched_dl_entity *dl_se = &curr->dl; - u64 delta_exec, scaled_delta_exec; + u64 delta_exec, scaled_delta_exec, delta_wall; int cpu = cpu_of(rq); - u64 now; + u64 now, wall; + bool adjusted = false; if (!dl_task(curr) || !on_dl_rq(dl_se)) return; @@ -1191,12 +1184,28 @@ static void update_curr_dl(struct rq *rq) */ now = rq_clock_task(rq); delta_exec = now - curr->se.exec_start; + + wall = rq_clock(rq); + delta_wall = wall - dl_se->wallstamp; + if (unlikely((s64)delta_exec <= 0)) { if (unlikely(dl_se->dl_yielded)) goto throttle; return; } + if (delta_wall > 0) { + dl_se->walltime += delta_wall; + dl_se->wallstamp = wall; + } + schedstat_set(curr->se.statistics.exec_max, max(curr->se.statistics.exec_max, delta_exec)); @@ -1230,22 +1239,47 @@ static void update_curr_dl(struct rq *rq) dl_se->runtime -= scaled_delta_exec; + if (dl_runtime_exceeded(dl_se) || + dl_se->dl_yielded || + unlikely(dl_se->walltime > dl_se->dl_period)) { throttle: - if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) { dl_se->dl_throttled = 1; /* If requested, inform the user about runtime overruns. */ if (dl_runtime_exceeded(dl_se) && (dl_se->flags & SCHED_FLAG_DL_OVERRUN)) dl_se->dl_overrun = 1; - __dequeue_task_dl(rq, curr, 0); + /* + * We keep moving the deadline away until we get some available + * runtime for the entity. This ensures correct handling of + * situations where the runtime overrun is arbitrary large. + */ + while (dl_se->runtime <= 0 || dl_se->walltime > (s64)dl_se->dl_period) { + + if (delta_exec != delta_wall && + dl_se->walltime > (s64)dl_se->dl_period && !adjusted) { + dl_se->deadline = wall; + adjusted = true; + } + + dl_se->deadline += dl_se->dl_period; + + if (dl_se->runtime <= 0) + dl_se->runtime += dl_se->dl_runtime; + + dl_se->walltime -= dl_se->dl_period; + } + + WARN_ON_ONCE(dl_se->runtime > dl_se->dl_runtime); + if (unlikely(dl_se->dl_boosted || !start_dl_timer(curr))) enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH); @@ -1783,9 +1817,10 @@ pick_next_task_dl(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) p = dl_task_of(dl_se); p->se.exec_start = rq_clock_task(rq); + dl_se->wallstamp = rq_clock(rq); /* Running task will never be pushed. */ - dequeue_pushable_dl_task(rq, p); + dequeue_pushable_dl_task(rq, p); if (hrtick_enabled(rq)) start_hrtick_dl(rq, p); @@ -1843,6 +1878,7 @@ static void set_curr_task_dl(struct rq *rq) struct task_struct *p = rq->curr; p->se.exec_start = rq_clock_task(rq); + p->dl.wallstamp = rq_clock(rq); /* You can't push away the running task */ dequeue_pushable_dl_task(rq, p);