Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp6761865ybf; Fri, 6 Mar 2020 04:07:58 -0800 (PST) X-Google-Smtp-Source: ADFU+vvXUYDoCG0e5EAyqgdiHJorajYvyWmujDpw2y1FCneHdqUnyJbbX9wx3l9k01g3yd052UoH X-Received: by 2002:a9d:3b6:: with SMTP id f51mr2295681otf.255.1583496478569; Fri, 06 Mar 2020 04:07:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583496478; cv=none; d=google.com; s=arc-20160816; b=Xpid6+c0PDKh+Kmi+t1KqY8NoSud6zgDmG7W+mL24rscjztU8jztJNjUIklO+1sWgy gG4tJuD4o1fya219uyXRlyEw+GsaHNgrNbf8vkzREqMAIwvehvg8z+T1ftAxCAuQq1nk 2Dasqq7SCiymCbKX0kSUfiqs7PsFakL5hWai1/U2cZlfziZ+r0ZouVyVc82KnHtHgtiO 7H+t7kaqMDcxAyW84zmoBLEcoktgjFlur1GLuyZ67hqVudpFk0NhBVuPKKWwECaOAavW udCLGCmmw3C9w2ngqZzgVKtA0uM4jK9F/vqwjxY6CrNtkuar+5BZ9dvklucmrFNp/e3w qkZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=4KPUPSTkugtYA0PbEeoVta4pOw/ai16ArIKK838lBEU=; b=TMqiRVx6/YCfl4WAWakkLSmWZzOxolFU7RrF4xp9qV1wmLqHH8m31d10meewnXZ0L/ JJaVTy9INpABYB/+oA/oLwcCZKWDwGXnHv72+uu03BxHLEuQe0GNsoo/eQUbv6ZsviiF 46pAgBMN7nFpjQ1Ohuo3Fe4Q7A6is5zRlDYWe2jd54Ma7bqhdZQlLoN2vbFUSJ31ujfQ XVewp1KDEbHKgVzQeWebUSNvtpca958wKG+5cWPv+1Sopo87ptcKzF3KGTQFNBxeqCbC 4C5n1WCwvhOKYBt2RhQa39zUf/WklOMf/fe39+Ej8E/J7VzAKdjoPNM5qsudyrJSbxvT 5uSg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=CqMIIU6O; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f21si1387531otf.128.2020.03.06.04.07.46; Fri, 06 Mar 2020 04:07:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=CqMIIU6O; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726413AbgCFMHS (ORCPT + 99 others); Fri, 6 Mar 2020 07:07:18 -0500 Received: from mail-lf1-f68.google.com ([209.85.167.68]:37733 "EHLO mail-lf1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726171AbgCFMHS (ORCPT ); Fri, 6 Mar 2020 07:07:18 -0500 Received: by mail-lf1-f68.google.com with SMTP id j11so1724602lfg.4 for ; Fri, 06 Mar 2020 04:07:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=4KPUPSTkugtYA0PbEeoVta4pOw/ai16ArIKK838lBEU=; b=CqMIIU6OgUcYfP+nO15DqV4Gn+28U4l0XKINHu7YR8cz06yd1dTfyQOySiqK8perVe q9cBlZ2hKQbb6Y6/ajVOTatYAlb8vOP2A+OXF4OW7VofwPUW8se8+6H3YntUcm9i03X0 dhmQuNmiKWxKSbFtmu3Dise3GGTlB0YOpqwp7veoFIANAJ/YXBAnSy0VfnbTKvbDT1un de/8n7Nsne1PBues3q5A7DLVutU3rmafGrzSpvI7sWUzRtt1FntXj7ANmZRArJrX4P6N v3sBy6cEI8TxFNCkLrLgmFRuuUUbivLim/IAEzmhkO9x3QDAIpTgcMcIbIzsGSY+zlbK L3Wg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=4KPUPSTkugtYA0PbEeoVta4pOw/ai16ArIKK838lBEU=; b=oHuE0nyiE3FsBrCAk8rrPvvWGYn8S+VwBIIWyGB69I11L63c2ddKZbWP/IQLP/1IAs HsPk6+0ejP8uo3ZyNv1kHq8zUlkZ8u/0+XeLmBHGcqIK4uef9+oAmQ4kVJbvp4Gj7k7k 957SktwsPqM84aMy/5qZgXO2pbE9fZYu+Jvjq8gFQHyqe0RDb4bDPluLizcVUwMK5Gv2 EaJi9sLy8Ng0PwoOjVSEjloUADRHMjElYT4Xlk+Bf1X2QNYAkY6ptW6CM6vgmZ40EA9N BuvCW5/fR4Xpsg05/pSivi58jAIXvW/zOzZpSjNOiJ6wcf/QENnQft7gHUWMUUT2M4c3 W63w== X-Gm-Message-State: ANhLgQ3TIiIilIkEXtztIhdiOzSEGQENBUTj46nbLjcycg263nFlrSGa ehFF9nYZA3kU1rcp0TyuwJATLIa6jV4j4RVvapgAsA== X-Received: by 2002:a19:c215:: with SMTP id l21mr1732735lfc.95.1583496435711; Fri, 06 Mar 2020 04:07:15 -0800 (PST) MIME-Version: 1.0 References: <20200305172921.22743-1-vincent.guittot@linaro.org> In-Reply-To: From: Vincent Guittot Date: Fri, 6 Mar 2020 13:07:04 +0100 Message-ID: Subject: Re: [PATCH] sched/fair: fix enqueue_task_fair warning To: Dietmar Eggemann Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Steven Rostedt , Ben Segall , Mel Gorman , linux-kernel , Christian Borntraeger , "# v4 . 16+" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 6 Mar 2020 at 10:12, Vincent Guittot wrote: > > On Thu, 5 Mar 2020 at 20:07, Dietmar Eggemann wrote: > > > > On 05/03/2020 18:29, Vincent Guittot wrote: > > > When a cfs rq is throttled, the latter and its child are removed from the > > > leaf list but their nr_running is not changed which includes staying higher > > > than 1. When a task is enqueued in this throttled branch, the cfs rqs must > > > be added back in order to ensure correct ordering in the list but this can > > > only happens if nr_running == 1. > > > When cfs bandwidth is used, we call unconditionnaly list_add_leaf_cfs_rq() > > > when enqueuing an entity to make sure that the complete branch will be > > > added. > > > > > > Reported-by: Christian Borntraeger > > > Tested-by: Christian Borntraeger > > > Cc: stable@vger.kernel.org #v5.1+ > > > Signed-off-by: Vincent Guittot > > > --- > > > kernel/sched/fair.c | 11 +++++++++-- > > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index fcc968669aea..bdc5bb72ab31 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -4117,6 +4117,7 @@ static inline void check_schedstat_required(void) > > > #endif > > > } > > > > > > +static inline bool cfs_bandwidth_used(void); > > > > > > /* > > > * MIGRATION > > > @@ -4195,10 +4196,16 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) > > > __enqueue_entity(cfs_rq, se); > > > se->on_rq = 1; > > > > > > - if (cfs_rq->nr_running == 1) { > > > + /* > > > + * When bandwidth control is enabled, cfs might have been removed because of > > > + * a parent been throttled but cfs->nr_running > 1. Try to add it > > > + * unconditionnally. > > > + */ > > > + if (cfs_rq->nr_running == 1 || cfs_bandwidth_used()) > > > list_add_leaf_cfs_rq(cfs_rq); > > > + > > > + if (cfs_rq->nr_running == 1) > > > check_enqueue_throttle(cfs_rq); > > > - } > > > } > > > > > > static void __clear_buddies_last(struct sched_entity *se) > > > > I experimented with an rt-app based setup on Arm64 Juno (6 CPUs): > > > > cgroupv1 hierarchy A/B/C, all CFS bw controlled (30,000/100,000) > > > > I create A/B/C outside rt-app so I can have rt-app runs with an already > > existing taskgroup hierarchy. There is a 4 secs gap between consecutive > > rt-app runs. > > > > The rt-app files contains 6 periodic CFS tasks (25,000/100,000) running > > in /A/B/C, /A/B, /A (3 rt-app task phases). > > > > I get w/ the patch (and the debug patch applied to unthrottle_cfs_rq()): > > > > root@juno:~# > > [ 409.236925] CPU1 path=/A/B on_list=1 nr_running=1 throttled=1 > > [ 409.242682] CPU1 path=/A on_list=0 nr_running=0 throttled=1 > > [ 409.248260] CPU1 path=/ on_list=1 nr_running=0 throttled=0 > > [ 409.253748] ------------[ cut here ]------------ > > [ 409.258365] rq->tmp_alone_branch != &rq->leaf_cfs_rq_list > > [ 409.258382] WARNING: CPU: 1 PID: 0 at kernel/sched/fair.c:380 > > unthrottle_cfs_rq+0x21c/0x2a8 > > ... > > [ 409.275196] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc3-dirty #62 > > [ 409.281990] Hardware name: ARM Juno development board (r0) (DT) > > ... > > [ 409.384644] Call trace: > > [ 409.387089] unthrottle_cfs_rq+0x21c/0x2a8 > > [ 409.391188] distribute_cfs_runtime+0xf4/0x198 > > [ 409.395634] sched_cfs_period_timer+0x134/0x240 > > [ 409.400168] __hrtimer_run_queues+0x10c/0x3c0 > > [ 409.404527] hrtimer_interrupt+0xd4/0x250 > > [ 409.408539] tick_handle_oneshot_broadcast+0x17c/0x208 > > [ 409.413683] sp804_timer_interrupt+0x30/0x40 > > > > If I add the following snippet the issue goes away: If it's fine for you, I'm going to add this in a new version of the patch > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index e9fd5379bb7e..5e03be046aba 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -4627,11 +4627,17 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) > > break; > > } > > > > - assert_list_leaf_cfs_rq(rq); > > - > > if (!se) > > add_nr_running(rq, task_delta); > > will add similar comment as for enqueue_task_fair + /* + * The cfs_rq_throttled() breaks in the above iteration can result in + * incomplete leaf list maintenance, resulting in triggering the assertion + * below. + */ > > + for_each_sched_entity(se) { > > + cfs_rq = cfs_rq_of(se); > > + > > + list_add_leaf_cfs_rq(cfs_rq); > > + } > > Yes make sense. > > > + > > + assert_list_leaf_cfs_rq(rq); > > + > > /* Determine whether we need to wake up potentially idle CPU: */ > > if (rq->curr == rq->idle && rq->cfs.nr_running) > > resched_curr(rq);