Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp4377227ybb; Mon, 23 Mar 2020 20:02:05 -0700 (PDT) X-Google-Smtp-Source: ADFU+vvIjuPTyaCtp88AUP0SPw/w6W7pCKelRBvR4hJIpbOgbVgfk2lpXbXgd6gNE7j+4XQsyNx7 X-Received: by 2002:a05:6808:a0a:: with SMTP id n10mr1979962oij.10.1585018925795; Mon, 23 Mar 2020 20:02:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585018925; cv=none; d=google.com; s=arc-20160816; b=azouI+QQUHKqamY0u82pKLWeSmtCKLMFbeSHDLTTzzO5NX8+Z4tRFl4jkAe3hUFbjj v79aVYxTpd8C6lzj+vkKOa+pRGqm+ktS2X4a+Nmqv3QTTSgqT8ucVL2QQZelQhqciZWl Zk9ZZjDbRExS1HEiP2i50vAdIisDIqI0MoTdnj1JFvi/le2s9frMMGfQgrkEecNVnxJ4 k4XndOpqsZbN8AWn9dhNrwg0CIx5rEjU6SBHNHctgK3I+hARa48ZEEZsS4D7Y9VxXoDa 7Ifaf5UkzU5vjDizIJkY0ZitHvP5BQsmD5AEe2OTG89VOvWsXuxK7ctDeTAcS18Lsr5D rUGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:ironport-sdr:ironport-sdr; bh=7nIyefuenRcbHZ4vtaAwe0d4C4NIJk2u9EgFmf3edDA=; b=chGdOmReijWYjRPNfEzcV7g9oVxp6IisY3MWQaNoKpJNlSVjEFC1ScUk3fawRvEC+j EUro9SdM0abcWqnWVDVlCKzn10/DRM06BvkBqO/UsyLjrtpzHFitV5OxTZqcvRAzAZTg VL7gDEMOV16CKhK0JBl7bWGO/3S/ZnZP3Bxa9N6+dfxX6QIcLBFWSI+316m7sTJOCi3E qxv2JdJL7e34yP/nfkVEfI/ZRzYUk4ps+xIVpVqKZBIXh9r2uB8atZzlZUoY+BxCBp/A p+cPoT9F1JgqPQMt/zFTP+0YcYM8AAr5SBzahMRiYFkFYEY1vSccS0TgAMk2xduh7Dom Vnbw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f9si9016783oti.44.2020.03.23.20.01.53; Mon, 23 Mar 2020 20:02:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727156AbgCXDBc (ORCPT + 99 others); Mon, 23 Mar 2020 23:01:32 -0400 Received: from mga03.intel.com ([134.134.136.65]:29293 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726874AbgCXDBc (ORCPT ); Mon, 23 Mar 2020 23:01:32 -0400 IronPort-SDR: 3pof18D67SLKFPhv4enRlEMhMBCLhd6mnED7GNg24aAwEKo2LGHW8FFCyFEDHf5yOJo97JRoot 618mj6dBZaDA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 20:01:31 -0700 IronPort-SDR: 3ZHWJJDdK2iADXbBU88hlNLtqHpcCrwsbu88M3z835dbwTBbJmyE90mG9EqWvCfUktWd4Sd5OE Lm+7+1K2be1w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,298,1580803200"; d="scan'208";a="357288459" Received: from cli6-desk1.ccr.corp.intel.com (HELO [10.239.161.118]) ([10.239.161.118]) by fmsmga001.fm.intel.com with ESMTP; 23 Mar 2020 20:01:28 -0700 Subject: Re: [PATCH] sched: Use RCU-sched in core-scheduling balancing logic To: Joel Fernandes Cc: paulmck@kernel.org, linux-kernel@vger.kernel.org, vpillai , Aaron Lu , Aubrey Li , peterz@infradead.org, Ben Segall , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Mel Gorman , Steven Rostedt , Vincent Guittot References: <20200313232918.62303-1-joel@joelfernandes.org> <20200314003004.GI3199@paulmck-ThinkPad-P72> <20200323152126.GA141027@google.com> From: "Li, Aubrey" Message-ID: <6d933ce2-75e3-6469-4bb0-08ce9df29139@linux.intel.com> Date: Tue, 24 Mar 2020 11:01:27 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.3.0 MIME-Version: 1.0 In-Reply-To: <20200323152126.GA141027@google.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020/3/23 23:21, Joel Fernandes wrote: > On Mon, Mar 23, 2020 at 02:58:18PM +0800, Li, Aubrey wrote: >> On 2020/3/14 8:30, Paul E. McKenney wrote: >>> On Fri, Mar 13, 2020 at 07:29:18PM -0400, Joel Fernandes (Google) wrote: >>>> rcu_read_unlock() can incur an infrequent deadlock in >>>> sched_core_balance(). Fix this by using the RCU-sched flavor instead. >>>> >>>> This fixes the following spinlock recursion observed when testing the >>>> core scheduling patches on PREEMPT=y kernel on ChromeOS: >>>> >>>> [ 14.998590] watchdog: BUG: soft lockup - CPU#0 stuck for 11s! [kworker/0:10:965] >>>> >>> >>> The original could indeed deadlock, and this would avoid that deadlock. >>> (The commit to solve this deadlock is sadly not yet in mainline.) >>> >>> Acked-by: Paul E. McKenney >> >> I saw this in dmesg with this patch, is it expected? >> >> [ 117.000905] ============================= >> [ 117.000907] WARNING: suspicious RCU usage >> [ 117.000911] 5.5.7+ #160 Not tainted >> [ 117.000913] ----------------------------- >> [ 117.000916] kernel/sched/core.c:4747 suspicious rcu_dereference_check() usage! >> [ 117.000918] >> other info that might help us debug this: > > Sigh, this is because for_each_domain() expects rcu_read_lock(). From an RCU > PoV, the code is correct (warning doesn't cause any issue). > > To silence warning, we could replace the rcu_read_lock_sched() in my patch with: > preempt_disable(); > rcu_read_lock(); > > and replace the unlock with: > > rcu_read_unlock(); > preempt_enable(); > > That should both take care of both the warning and the scheduler-related > deadlock. Thoughts? > How about this? diff --git a/kernel/sched/core.c b/kernel/sched/core.c index a01df3e..7ff694e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4743,7 +4743,6 @@ static void sched_core_balance(struct rq *rq) int cpu = cpu_of(rq); rcu_read_lock(); - raw_spin_unlock_irq(rq_lockp(rq)); for_each_domain(cpu, sd) { if (!(sd->flags & SD_LOAD_BALANCE)) break; @@ -4754,7 +4753,6 @@ static void sched_core_balance(struct rq *rq) if (steal_cookie_task(cpu, sd)) break; } - raw_spin_lock_irq(rq_lockp(rq)); rcu_read_unlock(); } Thanks, -Aubrey