Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp836321ybf; Fri, 28 Feb 2020 08:32:29 -0800 (PST) X-Google-Smtp-Source: APXvYqxRwH27pX44s2u2aosCOPYidj423TIY9heUGJjpn2sGNQ0MkqE8HzAJ1eIBqJzZnJpRPuv7 X-Received: by 2002:a05:6830:13ca:: with SMTP id e10mr4080572otq.267.1582907549414; Fri, 28 Feb 2020 08:32:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582907549; cv=none; d=google.com; s=arc-20160816; b=PcGd0gePD2JIDmMbDOiKf/tAaRGMKTYo2xsHgqleSmR2h/ZfMVnoP6TXuh5WIaaAge /SvwbS00hvkXEnAo5DKFc7sYlk41qJgNv6+fEZ/rhEPufcvNXkfXpzf69IaUmMWY/JO+ +/eOtwaicMGwhzoQ8mcPVmikM+F9XCWpdMVef0KvSosPR3XW8mPhFxbClpDIkIMR5O1T mWai3As+7mPBdBtNt0jVYJP9PVQY4GsKFKz7/QqprD/cqz45O5i/1gnVJ1FAXh8c2EBN WKADOj7qkMI0f7GziCnQ92I5gcDxsr+0MwUf+86aF7My4zG/J9bIp1mIoto+4EN3yUKR jvEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=Of0aW1w5dVfjcWAI81ebL0MhltdDOxGtpn94qd+4/ug=; b=Q4a3yVM41bNm1HYYL63lM61wHs7gC7dp7WwPvKy2OHX4ZA0CprsRdzNtUPTenIXyta uJ3VsGyH3k0+Cz0Z5cME+guTOhvEXXFQcJ/5GoF3YL7812JSVGPQbwMRGJhZq46jlwNM uFY7t8yxiHLHT7qgLC7vObTEyt9Ui9u3NDacguupa2MLoqqoTXwPLcD5/jj05iOye5G2 p8gn/hIvsjaeaaMLhwxfQmDNNaQD39flt4KGdqkydHA6UmPpmSl3CWdfdAaEFOzsH+2K bwDn4uWyW/3bYs586f3ACSx3KQFLVf9qBRxayIDXS+AhEcWei+58sUjO8D2zv4hG6ocd iCFQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k9si2064433oiw.262.2020.02.28.08.32.17; Fri, 28 Feb 2020 08:32:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725876AbgB1QcH (ORCPT + 99 others); Fri, 28 Feb 2020 11:32:07 -0500 Received: from foss.arm.com ([217.140.110.172]:41066 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725827AbgB1QcG (ORCPT ); Fri, 28 Feb 2020 11:32:06 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 24B11FEC; Fri, 28 Feb 2020 08:32:06 -0800 (PST) Received: from e107158-lin.cambridge.arm.com (e107158-lin.cambridge.arm.com [10.1.195.21]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5B62F3F73B; Fri, 28 Feb 2020 08:32:05 -0800 (PST) Date: Fri, 28 Feb 2020 16:32:03 +0000 From: Qais Yousef To: Christian Borntraeger Cc: Vincent Guittot , Ingo Molnar , Peter Zijlstra , "linux-kernel@vger.kernel.org" Subject: Re: 5.6-rc3: WARNING: CPU: 48 PID: 17435 at kernel/sched/fair.c:380 enqueue_task_fair+0x328/0x440 Message-ID: <20200228163202.aebqzo6n363oqdg5@e107158-lin.cambridge.arm.com> References: <1a607a98-f12a-77bd-2062-c3e599614331@de.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20171215 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/28/20 16:42, Christian Borntraeger wrote: > > > On 28.02.20 16:37, Vincent Guittot wrote: > > On Fri, 28 Feb 2020 at 16:08, Christian Borntraeger > > wrote: > >> > >> Also happened with 5.4: > >> Seems that I just happen to have an interesting test workload/system size interaction > >> on a newly installed system that triggers this. > > > > you will probably go back to 5.1 which is the version where we put > > back the deletion of unused cfs_rq from the list which can trigger the > > warning: > > commit 039ae8bcf7a5 : (Fix O(nr_cgroups) in the load balancing path) > > > > AFAICT, we haven't changed this since > > So you do know what is the problem? If not is there any debug option or > patch that I could apply to give you more information? > It might be a long shot as I'm not particularly knowledgeable about this code path, but could we be missing rcu_read_lock/unlock around the call to unthrottle_cfs_rq() here? --- diff --git a/kernel/sched/core.c b/kernel/sched/core.c index fc1dfc007604..56aa5cfbb7f1 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7434,6 +7434,7 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota) raw_spin_unlock_irq(&cfs_b->lock); + rcu_read_lock(); for_each_online_cpu(i) { struct cfs_rq *cfs_rq = tg->cfs_rq[i]; struct rq *rq = cfs_rq->rq; @@ -7447,6 +7448,7 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota) unthrottle_cfs_rq(cfs_rq); rq_unlock_irq(rq, &rf); } + rcu_read_unlock(); if (runtime_was_enabled && !runtime_enabled) cfs_bandwidth_usage_dec(); out_unlock: