Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp1131863ybh; Thu, 16 Jul 2020 04:10:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxyy+Pk5XZrd7ETGA/NvOC3r6tyj7zGndsYEhbppHN68oIMaVjCvf0wzVbPOtbhjzt5xYtI X-Received: by 2002:a17:906:1c0d:: with SMTP id k13mr3323792ejg.477.1594897811964; Thu, 16 Jul 2020 04:10:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594897811; cv=none; d=google.com; s=arc-20160816; b=A+98oSMJ6Fuu0yOqi0I9/1qOdX9DaBZnaW5szAg+LrvL9++oeIMvi/n1CFaL6EZMsW H3cm/vt0I4y1aKedj57WfJhsReE1BejPwq5q+FcAV1t/yXVu6qtVma/Pu4cHImyGJG9m 0q/Jswb0JHwSMgMIqN9wflvvyGY4Ugp/Y491aWJbVRXxKKEA4qHYjwEdmXnFGQfKhmxL 4zxaabeppb6MMNiY3RZJdd0h+h6+DbsGXfcJFA6zfhdRhEsRf4fT5m2gv6rU9qErb5MB MLCum5ljMM3TBTru1GCyLedmhw+1hNDMfi0Lx+Xz/NsUj8oXwf9MC6PcEntV47fkCpGY sIDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from; bh=l6PZcLwzmZU6PY1t8+maGoF5MInW+WYsopNfrNRwtHw=; b=raOu+CXgCveiqhOuA6mCRUnhFF2LehCfKCoDpWn6i0t2auLxdsrWUGwAozkYwgsBQz QWHdXnHzAJdCzx75laMCwf2ua0XNKeP49dlsUhn4gpSlsaJmy3m/Hcr36oPgM2ZmFTu4 NEkcxmLEBsOFhFPiu6zowOL2GG7zJY4jBOatBx9SJ9aIo7QisFSkpr48QxlJdS0Qjs3j gFQ3bEJO2zvLm+GYqqJLKVNTHquOKyeIjY5sD+nwS5dIEJeV54713Qpv8i3ARK6YRQb4 655iilto45jcnWYnGjcsd0q/55znFKKkyC1/nTkvBWywzVeZ+Oy1tL7kDMX/xvnDVKgE xuVQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g21si3014247ejr.275.2020.07.16.04.09.48; Thu, 16 Jul 2020 04:10:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728164AbgGPLHh (ORCPT + 99 others); Thu, 16 Jul 2020 07:07:37 -0400 Received: from foss.arm.com ([217.140.110.172]:58280 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726383AbgGPLHf (ORCPT ); Thu, 16 Jul 2020 07:07:35 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8019F11B3; Thu, 16 Jul 2020 04:07:34 -0700 (PDT) Received: from e107158-lin.cambridge.arm.com (e107158-lin.cambridge.arm.com [10.1.195.21]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B9FB93F68F; Thu, 16 Jul 2020 04:07:32 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , Peter Zijlstra Cc: Doug Anderson , Qais Yousef , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Patrick Bellasi , Chris Redpath , Lukasz Luba , linux-kernel@vger.kernel.org Subject: [PATCH v7 3/3] sched/uclamp: Fix a deadlock when enabling uclamp static key Date: Thu, 16 Jul 2020 12:03:47 +0100 Message-Id: <20200716110347.19553-4-qais.yousef@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200716110347.19553-1-qais.yousef@arm.com> References: <20200716110347.19553-1-qais.yousef@arm.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The following splat was caught when setting uclamp value of a task. BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:49 ====================================================== in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 731, name: l_3-1 WARNING: possible circular locking dependency detected 5.8.0-rc4-00040-g6345b3305877-dirty #864 Not tainted INFO: lockdep is turned off. ------------------------------------------------------ l_0-0/730 is trying to acquire lock: irq event stamp: 150 ffff80001343dea0 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_enable+0x1c/0x38 el0_svc_common.constprop.4+0xe4/0x200 but task is already holding lock: ffff00097ef4ca58 (&rq->lock){-.-.}-{2:2}, at: task_rq_lock+0x60/0xf0 _raw_spin_lock_irqsave+0x38/0xa8 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: copy_process+0x620/0x18f0 -> #1 (&rq->lock){-.-.}-{2:2}: 0x0 _raw_spin_lock+0x64/0x80 __schedule+0x108/0x910 CPU: 5 PID: 731 Comm: l_3-1 Not tainted 5.8.0-rc4-00040-g6345b3305877-dirty #864 schedule+0x7c/0x108 schedule_timeout+0x2b0/0x448 Hardware name: ARM Juno development board (r2) (DT) wait_for_completion_killable+0xb8/0x1a8 __kthread_create_on_node+0xe0/0x1c0 Call trace: kthread_create_on_node+0x8c/0xb8 create_worker+0xd0/0x1b8 dump_backtrace+0x0/0x1f0 workqueue_prepare_cpu+0x5c/0xa0 cpuhp_invoke_callback+0xe8/0xe30 show_stack+0x2c/0x38 _cpu_up+0xf4/0x1c0 cpu_up+0xa0/0xc0 dump_stack+0xf0/0x170 bringup_nonboot_cpus+0x88/0xc0 smp_init+0x34/0x90 ___might_sleep+0x144/0x200 kernel_init_freeable+0x1b8/0x338 kernel_init+0x18/0x118 __might_sleep+0x54/0x88 ret_from_fork+0x10/0x18 -> #0 (cpu_hotplug_lock){++++}-{0:0}: cpus_read_lock+0x2c/0x130 __lock_acquire+0x11a0/0x1550 lock_acquire+0xf8/0x460 static_key_enable+0x1c/0x38 cpus_read_lock+0x68/0x130 static_key_enable+0x1c/0x38 __sched_setscheduler+0x900/0xad8 __sched_setscheduler+0x900/0xad8 __arm64_sys_sched_setattr+0x2e0/0x4f8 __arm64_sys_sched_setattr+0x2e0/0x4f8 el0_svc_common.constprop.4+0x84/0x200 do_el0_svc+0x34/0xa0 el0_svc_common.constprop.4+0x84/0x200 el0_sync_handler+0x16c/0x340 el0_sync+0x140/0x180 do_el0_svc+0x34/0xa0 other info that might help us debug this: Possible unsafe locking scenario: el0_sync_handler+0x16c/0x340 CPU0 CPU1 ---- ---- el0_sync+0x140/0x180 lock(&rq->lock); lock(cpu_hotplug_lock); lock(&rq->lock); lock(cpu_hotplug_lock); *** DEADLOCK *** 3 locks held by l_0-0/730: #0: ffff80001345b4d0 (&cpuset_rwsem){++++}-{0:0}, at: __sched_setscheduler+0x4c0/0xad8 #1: ffff00096e83b718 (&p->pi_lock){-.-.}-{2:2}, at: task_rq_lock+0x44/0xf0 #2: ffff00097ef4ca58 (&rq->lock){-.-.}-{2:2}, at: task_rq_lock+0x60/0xf0 stack backtrace: CPU: 1 PID: 730 Comm: l_0-0 Tainted: G W 5.8.0-rc4-00040-g6345b3305877-dirty #864 Hardware name: ARM Juno development board (r2) (DT) Call trace: dump_backtrace+0x0/0x1f0 show_stack+0x2c/0x38 dump_stack+0xf0/0x170 print_circular_bug.isra.40+0x228/0x280 check_noncircular+0x14c/0x1b0 __lock_acquire+0x11a0/0x1550 lock_acquire+0xf8/0x460 cpus_read_lock+0x68/0x130 static_key_enable+0x1c/0x38 __sched_setscheduler+0x900/0xad8 __arm64_sys_sched_setattr+0x2e0/0x4f8 el0_svc_common.constprop.4+0x84/0x200 do_el0_svc+0x34/0xa0 el0_sync_handler+0x16c/0x340 el0_sync+0x140/0x180 Fix by ensuring we enable the key outside of the critical section in __sched_setscheduler() Fixes: 46609ce22703 ("sched/uclamp: Protect uclamp fast path code with static key") Signed-off-by: Qais Yousef Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Steven Rostedt Cc: Ben Segall Cc: Mel Gorman CC: Patrick Bellasi Cc: Chris Redpath Cc: Lukasz Luba Cc: linux-kernel@vger.kernel.org --- kernel/sched/core.c | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e1578c3ad40c..947a1f4fa112 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1348,7 +1348,7 @@ static int uclamp_validate(struct task_struct *p, return 0; } -static void __setscheduler_uclamp(struct task_struct *p, +static bool __setscheduler_uclamp(struct task_struct *p, const struct sched_attr *attr) { enum uclamp_id clamp_id; @@ -1376,9 +1376,7 @@ static void __setscheduler_uclamp(struct task_struct *p, } if (likely(!(attr->sched_flags & SCHED_FLAG_UTIL_CLAMP))) - return; - - static_branch_enable(&sched_uclamp_used); + return false; if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) { uclamp_se_set(&p->uclamp_req[UCLAMP_MIN], @@ -1389,6 +1387,8 @@ static void __setscheduler_uclamp(struct task_struct *p, uclamp_se_set(&p->uclamp_req[UCLAMP_MAX], attr->sched_util_max, true); } + + return true; } static void uclamp_fork(struct task_struct *p) @@ -1465,8 +1465,11 @@ static inline int uclamp_validate(struct task_struct *p, { return -EOPNOTSUPP; } -static void __setscheduler_uclamp(struct task_struct *p, - const struct sched_attr *attr) { } +static bool __setscheduler_uclamp(struct task_struct *p, + const struct sched_attr *attr) +{ + return false; +} static inline void uclamp_fork(struct task_struct *p) { } static inline void uclamp_post_fork(struct task_struct *p) { } static inline void init_uclamp(void) { } @@ -5305,7 +5308,8 @@ static int __sched_setscheduler(struct task_struct *p, prev_class = p->sched_class; __setscheduler(rq, p, attr, pi); - __setscheduler_uclamp(p, attr); + + retval = __setscheduler_uclamp(p, attr); if (queued) { /* @@ -5335,6 +5339,18 @@ static int __sched_setscheduler(struct task_struct *p, balance_callback(rq); preempt_enable(); +#ifdef CONFIG_UCLAMP_TASK + /* + * Enable uclamp static key outside the critical section. + * static_branch_enable() will hold cpu_hotplug_lock; if done from + * critical section above which holds other locks (rq->lock namely), + * it could lead to deadlock scenarios as both are popular locks and + * could be acquired from different paths in different orders. + */ + if (retval) + static_branch_enable(&sched_uclamp_used); +#endif + return 0; unlock: -- 2.17.1