Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1607686imm; Wed, 1 Aug 2018 20:21:18 -0700 (PDT) X-Google-Smtp-Source: AAOMgpeNfO7bXecXBLyxI/I98Oyw/BRRcidP86e+Pz6sPwsE945LH+bfa97pumxu2KSM3jENlNfy X-Received: by 2002:a17:902:d807:: with SMTP id a7-v6mr823152plz.214.1533180077981; Wed, 01 Aug 2018 20:21:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533180077; cv=none; d=google.com; s=arc-20160816; b=cio9vYmkxibb6XgXOk4x1XsmU5IN6HGk9wsqskPZ338LkaL9theB9rE4Kon/2W0h9n Ds9l79+6a6LF9B6wlVZQJJnNPIqrk6c0/HRmIAMxvqnnUjJEfeCg2uk6e/XqIrUrUCoV Ft0u93GO/5Ny57XquIxu3kuNTai164WFMdOFxaKRxK1KOkQHGSmt5kMLW1oHb47wKyP+ Z1031kq0zMtg5EyKiwnXJhDnYjvFUqXuVeItfD2oGlSDPXw0jZyKxlDKiCmvnlitR7lD FVkyWCH7z/zqJqlDulgNBAWO6RD+qrcuE5vhRYo9vQFMziQM+iHPXyPbhHp1FBoLmsqQ ccEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=+kH88MxM40YYOV+UDY11Ii5dvjTgxe7j0vLPPgnfsL8=; b=rDL83RqQ/8Vya3ANSSqdOvgrYJGKDEVq5DYLrRi0TpQDY5ru5HOxaC/SrQCSIbDFOw HdLgWdg9dgTG8WJtqIyPtoqDjujP02EqHrq322y0DG9di0FLAQ03Fa7gNv+V8GwTZjaW 2H0wxec1ThMSZlSVLN+FEujtIAzerPF/2q1WPaYXTjIFr6BvNf07If0xzuwptw67Rgyg KY6QColPqEM7DE8RLZF1n2NwrgqKFi7TeHPlGhQZTvWWUlNpExWkk7exWTzcXG3U/FYn Pkv8cd6GM9/xD1pw9HCDO3bsnh5VWRXp1oSWd/zfzmYarl+vKAKEMYG0xSLsKT23+imj /JPw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o6-v6si582356pgp.631.2018.08.01.20.21.02; Wed, 01 Aug 2018 20:21:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726144AbeHBFIz (ORCPT + 99 others); Thu, 2 Aug 2018 01:08:55 -0400 Received: from mail.kernel.org ([198.145.29.99]:39238 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725926AbeHBFIz (ORCPT ); Thu, 2 Aug 2018 01:08:55 -0400 Received: from vmware.local.home (cpe-66-24-56-78.stny.res.rr.com [66.24.56.78]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 24E23208A3; Thu, 2 Aug 2018 03:19:55 +0000 (UTC) Date: Wed, 1 Aug 2018 23:19:53 -0400 From: Steven Rostedt To: Juri Lelli Cc: peterz@infradead.org, mingo@redhat.com, mark.rutland@arm.com, linux-kernel@vger.kernel.org, luca.abeni@santannapisa.it, claudio@evidence.eu.com, bristot@redhat.com Subject: Re: [PATCH] sched/deadline: Fix switched_from_dl Message-ID: <20180801231953.151fdce6@vmware.local.home> In-Reply-To: <20180711072948.27061-1-juri.lelli@redhat.com> References: <20180711072948.27061-1-juri.lelli@redhat.com> X-Mailer: Claws Mail 3.15.1 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 11 Jul 2018 09:29:48 +0200 Juri Lelli wrote: > Mark noticed that syzkaller is able to reliably trigger the following > > dl_rq->running_bw > dl_rq->this_bw > WARNING: CPU: 1 PID: 153 at kernel/sched/deadline.c:124 switched_from_dl+0x454/0x608 > Kernel panic - not syncing: panic_on_warn set ... > > CPU: 1 PID: 153 Comm: syz-executor253 Not tainted 4.18.0-rc3+ #29 > Hardware name: linux,dummy-virt (DT) > Call trace: > dump_backtrace+0x0/0x458 > show_stack+0x20/0x30 > dump_stack+0x180/0x250 > panic+0x2dc/0x4ec > __warn_printk+0x0/0x150 > report_bug+0x228/0x2d8 > bug_handler+0xa0/0x1a0 > brk_handler+0x2f0/0x568 > do_debug_exception+0x1bc/0x5d0 > el1_dbg+0x18/0x78 > switched_from_dl+0x454/0x608 > __sched_setscheduler+0x8cc/0x2018 > sys_sched_setattr+0x340/0x758 > el0_svc_naked+0x30/0x34 > > syzkaller reproducer runs a bunch of threads that constantly switch > between DEADLINE and NORMAL classes while interacting through futexes. > > The splat above is caused by the fact that if a DEADLINE task is setattr > back to NORMAL while in non_contending state (blocked on a futex - > inactive timer armed), its contribution to running_bw is not removed > before sub_rq_bw() gets called (!task_on_rq_queued() branch) and the > latter sees running_bw > this_bw. > > Fix it by removing a task contribution from running_bw if the task is > not queued and in non_contending state while switched to a different > class. > > Reported-by: Mark Rutland > Signed-off-by: Juri Lelli > --- > kernel/sched/deadline.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > index fbfc3f1d368a..10c7b51c0d1f 100644 > --- a/kernel/sched/deadline.c > +++ b/kernel/sched/deadline.c > @@ -2290,8 +2290,17 @@ static void switched_from_dl(struct rq *rq, struct task_struct *p) > if (task_on_rq_queued(p) && p->dl.dl_runtime) > task_non_contending(p); > > - if (!task_on_rq_queued(p)) > + if (!task_on_rq_queued(p)) { > + /* > + * Inactive timer is armed. However, p is leaving DEADLINE and > + * might migrate away from this rq while continuing to run on > + * some other class. We need to remove its contribution from > + * this rq running_bw now, or sub_rq_bw (below) will complain. > + */ > + if (p->dl.dl_non_contending) > + sub_running_bw(&p->dl, &rq->dl); > sub_rq_bw(&p->dl, &rq->dl); > + } > > /* > * We cannot use inactive_task_timer() to invoke sub_running_bw() Looking at this code: if (!task_on_rq_queued(p)) { /* * Inactive timer is armed. However, p is leaving DEADLINE and * might migrate away from this rq while continuing to run on * some other class. We need to remove its contribution from * this rq running_bw now, or sub_rq_bw (below) will complain. */ if (p->dl.dl_non_contending) sub_running_bw(&p->dl, &rq->dl); sub_rq_bw(&p->dl, &rq->dl); } /* * We cannot use inactive_task_timer() to invoke sub_running_bw() * at the 0-lag time, because the task could have been migrated * while SCHED_OTHER in the meanwhile. */ if (p->dl.dl_non_contending) p->dl.dl_non_contending = 0; Question. Is the "dl_non_contending" only able to be set if !task_on_rq_queued(p) is true? In that case, we could just clear it in the first if block. If it's not true, I would think the subtraction is needed regardless. -- Steve