Received: by 2002:ab2:6309:0:b0:1fb:d597:ff75 with SMTP id s9csp486654lqt; Thu, 6 Jun 2024 09:11:18 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWS24w3C2jFDx8BwmeiG7aHvITgcnxrlJRrsA0xpPu2JiYMqPlKO8oeQtF3/H6+TxJfNQSsWImbdjnmZycCGn67ylB9+7rmKb1YR3eEuw== X-Google-Smtp-Source: AGHT+IErge/bJTrdFEA36iu7FQURo/bHbSC5Fq45sarXj6/PybKN6cX3Gu8TTEbkp9KVwXRglnjD X-Received: by 2002:a05:6a00:80b:b0:702:5609:630f with SMTP id d2e1a72fcca58-703e5a13ed7mr7128587b3a.23.1717690278373; Thu, 06 Jun 2024 09:11:18 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717690278; cv=pass; d=google.com; s=arc-20160816; b=EHHZ6+7GYZVmyE8Q6mpmVwjUPpGm9aE8xelGmloppORS5ABZcCg88Tx4PoQLxoXJ+h Ed7tcvUHDyFjZPRQb6yR7g/WCLN3otgWkgQNmZ6PMEOMikMWdITuTkqVsTCVV9ZpWkBq l/POy1zGngSylNVZtzwYMmchmNZKZOXDRNp2LVlnjtyrEPRy3kstj56mn9R60v/Jto62 n9T6ifOH6uXxJJU2ckBc0cLdU2STG5p7V22pExBtPWcc8ZyvceSaggGxDPlBdILU6/3L 8f9BngGlU3LDpgi6zFJPtOip+Q/UccRfgRPdQfOphcJD5vc+MQ4wPcdKYyIZ5JUqhvoJ kuXA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from; bh=HNkUbKG3kAexWkPgtX7kv4nyaQcNjKSA3thni87sino=; fh=loSm5l4ZFfXt65VlRUFKgIWb6aJ4fwpsPuHUqBBgYOU=; b=NkT6tnYT71jKT8zB8N4m8AvNorLqVr10R6WXbPoYRmUlPXszrf3346r7EsPckGRWDI qc7wQJ027DptDDl5eoRerJhAIlpbcAqBgxDhlcDW+2GBk6LK9uRvcRSBEhJoYjyP64Vq l4uXWInfW9lBLzS8q78n81EAC+b+QEZiqkB6SioRM8s1xo/Oe6BJcOEZ7h1Ut+2DRqRl 835jmZsvLb0ttjJYkdjDm/bV7JPHD8ZrGJ01h0cPYHED+fQg0chrbeduQuwTvHbPOT84 Le+baoqtxss8tnI1f5OSnMGUo4o5v+tr2MGGc5yF06AlljHCR+3qgzQvdksve09AB1zU AMhg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-204579-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-204579-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id d2e1a72fcca58-703fd38f58fsi1352999b3a.69.2024.06.06.09.11.17 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Jun 2024 09:11:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-204579-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-204579-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-204579-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 15DF8B294CD for ; Thu, 6 Jun 2024 15:32:30 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D4078197539; Thu, 6 Jun 2024 15:32:04 +0000 (UTC) Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0485F1667E6; Thu, 6 Jun 2024 15:32:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717687924; cv=none; b=DSAj0ezBZF+LOCLrWQ/w9QoDoriNRIWQ9OXR+FzIieAzh934Fi0gG4QwyXl3mCtQ004P8SvFeTOnlvt3dELFil8dg9vjw/qSi3Vd9HY77Ub02YZAxdiOixY2+jhfg9GOZY9dXRzBViDgigTdR894sYeJiu5J86yl1mlPhQrWzX0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717687924; c=relaxed/simple; bh=zmNGyEPJd4agncjsLzbajet35+uF8WCH96WeoPGE/7g=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=m9P7R1IK7Z8dZcG82iTWuekiEwn+mw7nP+B+/aAAB8F0iJJAfQ5GYyMyGp/3ivxmG4NUB0OOMlL5bWt3LU0vDlrx1HTYWm7u6Gb+IgkW3mTt/nJZsDwuhrIEQ3fjMkf5uczlMMKsXLEZ2MqGZkiVg4J4xFCGbnC1BOjahqwYCa8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4Vw7bp3kldz4f3kkX; Thu, 6 Jun 2024 23:31:50 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id D5B6F1A0BCC; Thu, 6 Jun 2024 23:31:56 +0800 (CST) Received: from huaweicloud.com (unknown [10.67.174.193]) by APP1 (Coremail) with SMTP id cCh0CgAX5g5j1mFmxZX2Og--.51957S4; Thu, 06 Jun 2024 23:31:48 +0800 (CST) From: Luo Gengkun To: linux-kernel@vger.kernel.org Cc: mpe@ellerman.id.au, npiggin@gmail.com, christophe.leroy@csgroup.eu, naveen.n.rao@linux.ibm.com, akpm@linux-foundation.org, trix@redhat.com, dianders@chromium.org, luogengkun@huaweicloud.com, mhocko@suse.com, pmladek@suse.com, kernelfans@gmail.com, lecopzer.chen@mediatek.com, song@kernel.org, yaoma@linux.alibaba.com, tglx@linutronix.de, linuxppc-dev@lists.ozlabs.org, bpf@vger.kernel.org Subject: [PATCH] watchdog/core: Fix AA deadlock due to watchdog holding cpu_hotplug_lock and wait for wq Date: Thu, 6 Jun 2024 15:38:28 +0000 Message-Id: <20240606153828.3261006-1-luogengkun@huaweicloud.com> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID:cCh0CgAX5g5j1mFmxZX2Og--.51957S4 X-Coremail-Antispam: 1UD129KBjvJXoW7Ww18tF4ktFyDAry8Kry8Xwb_yoW8tF1rpr 9rZryUtw1UuF1vvayft39xWFy8uayvgr47Ja1DGw1SkF1rCFs8Zrnakr1aqrZ8ZrZxuF1j 9w12vFWYqa4UtF7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUv014x267AKxVW5JVWrJwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26F4j 6r4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oV Cq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0 I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r 4UM4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628v n2kIc2xKxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F4 0E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_Wryl IxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxV AFwI0_Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6rW3Jr0E3s1lIxAIcVC2z280aVAFwI0_ Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7VUb QVy7UUUUU== X-CM-SenderInfo: 5oxrwvpqjn3046kxt4xhlfz01xgou0bp/ We found an AA deadlock problem as shown belowed: TaskA TaskB WatchDog system_wq ... css_killed_work_fn: P(cgroup_mutex) ... ... __lockup_detector_reconfigure: P(cpu_hotplug_lock.read) ... ... cpu_up: percpu_down_write: P(cpu_hotplug_lock.write) ... cgroup_bpf_release: P(cgroup_mutex) smp_call_on_cpu: Wait system_wq cpuset_css_offline: P(cpu_hotplug_lock.read) WatchDog is waitting for system_wq, who is waitting for cgroup_mutex, to finish the jobs, but the owner of the cgroup_mutex is waitting for cpu_hotplug_lock. The key point is the cpu_hotplug_lock, cause the system_wq may be waitting other lock. It seems unhealthy to hold a lock when waitting system_wq, because we never know what jobs are system_wq doing. So I fix this by replace cpu_read_lock/unlock with cpu_hotplug_disable/enable to prevent cpu offline/online. Fixes: e31d6883f21c ("watchdog/core, powerpc: Lock cpus across reconfiguration") Signed-off-by: Luo Gengkun --- kernel/watchdog.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 51915b44ac73..6ac6fb8d3be0 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -867,7 +867,7 @@ int lockup_detector_offline_cpu(unsigned int cpu) static void __lockup_detector_reconfigure(void) { - cpus_read_lock(); + cpu_hotplug_disable(); watchdog_hardlockup_stop(); softlockup_stop_all(); @@ -877,7 +877,7 @@ static void __lockup_detector_reconfigure(void) softlockup_start_all(); watchdog_hardlockup_start(); - cpus_read_unlock(); + cpu_hotplug_enable(); /* * Must be called outside the cpus locked section to prevent * recursive locking in the perf code. @@ -916,11 +916,11 @@ static __init void lockup_detector_setup(void) #else /* CONFIG_SOFTLOCKUP_DETECTOR */ static void __lockup_detector_reconfigure(void) { - cpus_read_lock(); + cpu_hotplug_disable(); watchdog_hardlockup_stop(); lockup_detector_update_enable(); watchdog_hardlockup_start(); - cpus_read_unlock(); + cpu_hotplug_enable(); } void lockup_detector_reconfigure(void) { -- 2.34.1