Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp2698708ybx; Fri, 8 Nov 2019 08:07:11 -0800 (PST) X-Google-Smtp-Source: APXvYqwmFbV4lPUk+DorKWMOMnN+4zdpZVd+2eZozkIkrkuHg4T58ALb47fangWe+BmIgd5siYTa X-Received: by 2002:a17:906:2e5a:: with SMTP id r26mr9561035eji.283.1573229231134; Fri, 08 Nov 2019 08:07:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573229231; cv=none; d=google.com; s=arc-20160816; b=pDT/1zxDEScY76UXDTP3xAB7gTohXFwBBSOiE5YXUDzDsByG+q8P2fXfG4GzOlfLGY soCjeJ10vb6MtHPfhQVlOYJRVMmHhG6/AjWiaUdYDEsgNt3O7w92QjVZ8+Yk/CrPw+lz GRZlawKNubdJaQ4XPO1r4sfOeVEi2LM5Hvw1EpC7iWDctSKDFbsS+mGAXGvwgS96j/0i xBDxsElYsI/BixqZZmdVXQ5z0uRP3drXuZvEgboLqFxR+yqIGbjHlWMmyXR9GLW3DAyt S7VpnH1M+K9KLJ1JBmRRRp/rMUJI/RUbKKyLJ3pBWKvzABgT50D17/cjo3LC6B/896V3 f20w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=KxDV4sNM0IGrhViZUfgiw/7J0lPr9cAkC+zNROQ58a8=; b=V4YK4bylMAGad2jSf0ih7tz1miS+4yOdYT/416lygIuAlblNwexf+jOp/QpSgehIOh uerxn/SEY9KouiG526Zx2Vdbwi2qC2fIdJ9QK8z/jtFpZXfSqfzgP7C/TAOMLfowD4vh SlpbylyOt21+IZ4yCK4vChkrji3jaK0xd9HrovoXpvkWFxQtdvgYJsuYc8dVpSi3Tjrk OgbYSAINVnBzlz66JmnJt24L17eLSn8hS/HrcIB/aHkNaf347H7N28HDQ3Jak06AUdvu RH1ESU5AsAu8SKR1ZSFOHHsb9ajq/szyucQnOjJcezpMZErFdIaEzRk9I4Lt+IepZvcA RDww== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u11si5288539edi.385.2019.11.08.08.06.46; Fri, 08 Nov 2019 08:07:11 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726307AbfKHQFe (ORCPT + 99 others); Fri, 8 Nov 2019 11:05:34 -0500 Received: from foss.arm.com ([217.140.110.172]:46032 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726036AbfKHQFd (ORCPT ); Fri, 8 Nov 2019 11:05:33 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0908E46A; Fri, 8 Nov 2019 08:05:33 -0800 (PST) Received: from [10.0.2.15] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 63B4D3F71A; Fri, 8 Nov 2019 08:05:31 -0800 (PST) Subject: Re: [PATCH 1/7] sched: Fix pick_next_task() vs change pattern race To: Peter Zijlstra , mingo@kernel.org, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, juri.lelli@redhat.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de Cc: linux-kernel@vger.kernel.org, qperret@google.com, qais.yousef@arm.com, ktkhai@virtuozzo.com References: <20191108131553.027892369@infradead.org> <20191108131909.428842459@infradead.org> From: Valentin Schneider Message-ID: Date: Fri, 8 Nov 2019 16:05:25 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <20191108131909.428842459@infradead.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/11/2019 13:15, Peter Zijlstra wrote: > Fixes: 67692435c411 ("sched: Rework pick_next_task() slow-path") > Reported-by: Quentin Perret > Signed-off-by: Peter Zijlstra (Intel) I've been running the same reproducer as Quentin's for a similar length of time (~3h) and it's still going strong, so FWIW: Tested-by: Valentin Schneider > --- > kernel/sched/core.c | 21 +++++++++++++++------ > kernel/sched/deadline.c | 40 ++++++++++++++++++++-------------------- > kernel/sched/fair.c | 15 ++++++++++++--- > kernel/sched/idle.c | 9 ++++++++- > kernel/sched/rt.c | 37 +++++++++++++++++++------------------ > kernel/sched/sched.h | 30 +++++++++++++++++++++++++++--- > kernel/sched/stop_task.c | 18 +++++++++++------- > 7 files changed, 112 insertions(+), 58 deletions(-) > > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3929,13 +3929,22 @@ pick_next_task(struct rq *rq, struct tas > } > > restart: > +#ifdef CONFIG_SMP I suppose we could dump that in a core balance_prev_task() to avoid the inline #ifdeffery, but eh... > /* > - * Ensure that we put DL/RT tasks before the pick loop, such that they > - * can PULL higher prio tasks when we lower the RQ 'priority'. > + * We must do the balancing pass before put_next_task(), such > + * that when we release the rq->lock the task is in the same > + * state as before we took rq->lock. > + * > + * We can terminate the balance pass as soon as we know there is > + * a runnable task of @class priority or higher. > */ > - prev->sched_class->put_prev_task(rq, prev, rf); > - if (!rq->nr_running) > - newidle_balance(rq, rf); > + for_class_range(class, prev->sched_class, &idle_sched_class) { > + if (class->balance(rq, prev, rf)) > + break; > + } > +#endif > + > + put_prev_task(rq, prev); > > for_each_class(class) { > p = class->pick_next_task(rq, NULL, NULL); > @@ -1469,6 +1469,22 @@ static void check_preempt_equal_prio(str > resched_curr(rq); > } > > +static int balance_rt(struct rq *rq, struct task_struct *p, struct rq_flags *rf) > +{ > + if (!on_rt_rq(&p->rt) && need_pull_rt_task(rq, p)) { > + /* > + * This is OK, because current is on_cpu, which avoids it being > + * picked for load-balance and preemption/IRQs are still > + * disabled avoiding further scheduler activity on it and we've > + * not yet started the picking loop. > + */ > + rq_unpin_lock(rq, rf); > + pull_rt_task(rq); > + rq_repin_lock(rq, rf); > + } > + > + return sched_stop_runnable(rq) || sched_dl_runnable(rq) || sched_rt_runnable(rq); So we already have some dependencies on the class ordering (e.g. fair->idle), but I'm wondering if would it make sense to have these runnable functions be defined as sched_class callbacks? e.g. rt_sched_class.runnable = rt_runnable w/ rt_runnable() just being a non-inlined sched_rt_runnable() you define further down the patch (or a wrapper to it). The balance return pattern could then become: for_class_range(class, sched_class_highest, rt_sched_class->next) if (class->runnable(rq)) return true; return false; (and replace rt_sched_class by whatever class' balance callback this is) It's a bit neater, but I'm pretty sure it's going to run worse :/ The only unaffected one would be fair, since newidle_balance() already does that "for free". > +} > #endif /* CONFIG_SMP */ > > /*