Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp4928932pxu; Thu, 10 Dec 2020 08:43:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJztlooMwuNZbQrpjAw2AGCrp5g3WCpSEdnWiCBfiEZK/Vp0wWWrSPzKk9z748m6WMbfAnUE X-Received: by 2002:a50:fe87:: with SMTP id d7mr7644406edt.381.1607618586838; Thu, 10 Dec 2020 08:43:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607618586; cv=none; d=google.com; s=arc-20160816; b=JqwLNqog6NP+iCDPV32Uy1y9HuR+lkhzg684f0HXE8CpaH3QnZLd+VCw4xhUbyC0u+ khKxgRLtg42eTrEfHUqR59XZJcp+xj86v4WlIupfLuH6tUfFDf76zaBG/K/yGpfaWXyp WB8zk1mZLyW0aR9CkdE6PqgMIRVx73NlC2R8mQxpENnCKWeCpGMKyz7lIJK8SdNimcWB dopu6+eOPZjPwvazrMNR048lQCu6rRrsNG4YzczxSvSU4WmRZd9EIGppnMPtTWeuMvf5 tTKP/fWI6tmN4tAwNMGj2BXjl+IQQ/hMxb5aaLsNJ63VBcWAI5aD5hDnidTbV+24bzh5 Nfjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=xaYAzowY6z6nF9nNkdU8ZDIovL552gVwbLT0QNmiBGc=; b=Jm60ahg3g6bfnOoiqIybxgPLQ0BLMm5D1VRneAUL7ZPxGsP+HF+byxYfBW/tMaKGEk hL2zYTA0DAfVlquiDgzxc2RhkJa0Bec6uf8ZNWPN90Ort5om3Iy8yj1ANstKK62EKl/q DQEcYG3hcaN825j5sdTS9QzcV+Zc1PdsyS5W2tvJOiy4KEB3wUJYoziRqQ1L3olo2uJZ 1docIHyW5H7vMu+98AciwwP22OmKEvOjuCcbhQFCU/UBNptMpBVZBXCR/18nrtSHgbAd A/4DLKuvdlhvKlSqLHRy1YrKiSvXy7LtVvOzyGdAh6z0jRWdAld/oNrkuimke/Asmd+I L3SA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k2si3098211edf.160.2020.12.10.08.42.42; Thu, 10 Dec 2020 08:43:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391891AbgLJQki (ORCPT + 99 others); Thu, 10 Dec 2020 11:40:38 -0500 Received: from foss.arm.com ([217.140.110.172]:52484 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730173AbgLJQkE (ORCPT ); Thu, 10 Dec 2020 11:40:04 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CCFBD1396; Thu, 10 Dec 2020 08:39:18 -0800 (PST) Received: from e113632-lin.cambridge.arm.com (e113632-lin.cambridge.arm.com [10.1.194.46]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 7E86D3F66B; Thu, 10 Dec 2020 08:39:16 -0800 (PST) From: Valentin Schneider To: linux-kernel@vger.kernel.org Cc: Qian Cai , Peter Zijlstra , tglx@linutronix.de, mingo@kernel.org, bigeasy@linutronix.de, qais.yousef@arm.com, swood@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vincent.donnefort@arm.com, tj@kernel.org, ouwen210@hotmail.com Subject: [PATCH 2/2] workqueue: Fix affinity of kworkers attached during late hotplug Date: Thu, 10 Dec 2020 16:38:30 +0000 Message-Id: <20201210163830.21514-3-valentin.schneider@arm.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20201210163830.21514-1-valentin.schneider@arm.com> References: <20201210163830.21514-1-valentin.schneider@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Per-CPU kworkers forcefully migrated away by hotplug via workqueue_offline_cpu() can end up spawning more kworkers via manage_workers() -> maybe_create_worker() Workers created at this point will be bound using pool->attrs->cpumask which in this case is wrong, as the hotplug state machine already migrated all pinned kworkers away from this CPU. This ends up triggering the BUG_ON condition is sched_cpu_dying() (i.e. there's a kworker enqueued on the dying rq). Special-case workers being attached to DISASSOCIATED pools and bind them to cpu_active_mask, mimicking them being present when workqueue_offline_cpu() was invoked. Link: https://lore.kernel.org/r/ff62e3ee994efb3620177bf7b19fab16f4866845.camel@redhat.com Fixes: 06249738a41a ("workqueue: Manually break affinity on hotplug") Reported-by: Qian Cai Signed-off-by: Valentin Schneider --- kernel/workqueue.c | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 9880b6c0e272..fb1418edf85c 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1848,19 +1848,29 @@ static void worker_attach_to_pool(struct worker *worker, { mutex_lock(&wq_pool_attach_mutex); - /* - * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any - * online CPUs. It'll be re-applied when any of the CPUs come up. - */ - set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); - /* * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains * stable across this function. See the comments above the flag * definition for details. + * + * Worker might get attached to a pool *after* workqueue_offline_cpu() + * was run - e.g. created by manage_workers() from a kworker which was + * forcefully moved away by hotplug. Kworkers created from this point on + * need to have their affinity changed as if they were present during + * workqueue_offline_cpu(). + * + * This will be resolved in rebind_workers(). */ - if (pool->flags & POOL_DISASSOCIATED) + if (pool->flags & POOL_DISASSOCIATED) { worker->flags |= WORKER_UNBOUND; + set_cpus_allowed_ptr(worker->task, cpu_active_mask); + } else { + /* + * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any + * online CPUs. It'll be re-applied when any of the CPUs come up. + */ + set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); + } list_add_tail(&worker->node, &pool->workers); worker->pool = pool; -- 2.27.0