Received: by 2002:ab2:6857:0:b0:1ef:ffd0:ce49 with SMTP id l23csp1506575lqp; Fri, 22 Mar 2024 18:42:55 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCU+ZLLnp8V6ZR464QCqxKEfM5ENgIRMz3Vrz0nRZUj0ZMspy8bDwXnfsZef/Qu091HiscnXwVOu8IHye/jCszL82/DOCGER94JLa8Nsng== X-Google-Smtp-Source: AGHT+IHdw4jBojyiuNI6BCjSLtJwO+LTCZb4WMbvUcSmqvaYkWjvUaHexcpcgmz6Cxw4tL6Cf7Z5 X-Received: by 2002:a05:6870:bb10:b0:229:fd7a:d728 with SMTP id nw16-20020a056870bb1000b00229fd7ad728mr1611752oab.24.1711158175652; Fri, 22 Mar 2024 18:42:55 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711158175; cv=pass; d=google.com; s=arc-20160816; b=tVbomPGkax0Dqh8iR+KNt/uVZ/1BtQZyOvaOW0DvSbVNhpD+nG6ByvDeLbJBA5isP6 R1e2uP0Znzs8HQiwWNFjk72YUJ9kzfZd/Snu49Wkn8RP2JJifXg1Td/aVRGJL23Q/JHq lU/9C7C/AjvC0IuGDshlrlxtRvw3m4i3rUUHUWAUGO8Ybi5wOQnTw32SJDuELxCaVjIH bDuhy2fdEy1ALkBJxxTKpMpq/BePOyb2mcKOZBVL9VLmHRi1Pp3fhYcfgFdAumPTN1By V2cGy3Rpz2HdllAaNRQBvCHrZeJKWy2cRHts4vnqjwBEyRr6Ks0aPx7Q6v+JcrLpDkmM 426Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=9MqZ+ehfyaCjE4SV0CbN93lHh5k8LD0aFU71NMkn/Mw=; fh=zdOMAZnBW6EIESt0EX/+avsk4/H625ZPwCc9+pZbT5k=; b=GLYjt4SvrrMoNJYWODtLKk7eVqrqElq/TW0vM8gMp2b1qCHhcuzw5hs23xCaDHUc6D r2lVN7CHVkXXZ4e98ctTOpw1O1SjKTiUARO4Qx4EsQrjWAiSHFTt44hzeHFbGQsaN77p Sk6baTXoHXWIlq1kn8aR2Ur8BBWx/twDswvEw8OwdURLzQoWEA1v8RhIbIZbAC4zBh4l eCfJDMdbX4Tj2R6j8Ir6E5v+H6XEm+TzMeCJchF8rG/QeaUCyW7dm9niEiTGk8T/qcEu D8dvS7Aywo4T4uJRaPHRiTBHanM5KkM4ENUA4RNeWHCSBTPDHRgJFw2kykuMU3HWonWv hoiw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=h3c.com); spf=pass (google.com: domain of linux-kernel+bounces-112162-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-112162-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id z188-20020a6265c5000000b006e6b6887211si676890pfb.377.2024.03.22.18.42.55 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Mar 2024 18:42:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-112162-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=h3c.com); spf=pass (google.com: domain of linux-kernel+bounces-112162-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-112162-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 40830284177 for ; Sat, 23 Mar 2024 01:42:51 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 8443A137E; Sat, 23 Mar 2024 01:42:44 +0000 (UTC) Received: from h3cspam02-ex.h3c.com (smtp.h3c.com [60.191.123.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B60A6A31; Sat, 23 Mar 2024 01:42:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=60.191.123.50 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711158163; cv=none; b=VT9se5bnnNeJVtGb0tBs9zJk38XxxNusJuPurlzCcJbisP4uK0ujdy7xrEv7t7t4fVeKNlKQB2dfucxdnfnKTkIiRFO+rJR4q6nedp9yJYf4aGyhlfiuu7aafVgA4zw7pgizMDfXxxK1+u3xByKXpRqZ+tzPwtbQEdJ1FZcE1d4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711158163; c=relaxed/simple; bh=7OFeym3Sqxgd58tl0vLnpbPtNaVVfsS7w/DOS22Kmus=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=QrKFmopRvOCJpQ8dy1WvhQ74it50RebtuBCbFfaRoLPA7PeBGzqG2yAP9+vYtzOJPCwo13ONxtr4a63cY58Qp2hmD5jZgq4+/pRIx+6o8TyJlS7HIM/aU1l0LP5doam7rUCgFs5s4zeJlCyZNl6x4k/lFeM9Sqy6I08mvx7Jous= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=h3c.com; spf=pass smtp.mailfrom=h3c.com; arc=none smtp.client-ip=60.191.123.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=h3c.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=h3c.com Received: from mail.maildlp.com ([172.25.15.154]) by h3cspam02-ex.h3c.com with ESMTP id 42N1foJW083957; Sat, 23 Mar 2024 09:41:50 +0800 (GMT-8) (envelope-from liu.yeC@h3c.com) Received: from DAG6EX02-IMDC.srv.huawei-3com.com (unknown [10.62.14.11]) by mail.maildlp.com (Postfix) with ESMTP id 5C1E72004BA5; Sat, 23 Mar 2024 09:43:33 +0800 (CST) Received: from localhost.localdomain (10.114.186.34) by DAG6EX02-IMDC.srv.huawei-3com.com (10.62.14.11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1258.27; Sat, 23 Mar 2024 09:41:50 +0800 From: To: , CC: , , , , , , , LiuYe Subject: [PATCH V5] kdb: Fix the deadlock issue in KDB debugging. Date: Sat, 23 Mar 2024 09:41:41 +0800 Message-ID: <20240323014141.3621738-1-liu.yec@h3c.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240322155818.GD7342@aspen.lan> References: <20240322155818.GD7342@aspen.lan> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: BJSMTP01-EX.srv.huawei-3com.com (10.63.20.132) To DAG6EX02-IMDC.srv.huawei-3com.com (10.62.14.11) X-DNSRBL: X-SPAM-SOURCE-CHECK: pass X-MAIL:h3cspam02-ex.h3c.com 42N1foJW083957 From: LiuYe Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will attempt to use schedule_work() to provoke a keyboard reset when transitioning out of the debugger and back to normal operation. This can cause deadlock because schedule_work() is not NMI-safe. The stack trace below shows an example of the problem. In this case the master cpu is not running from NMI but it has parked the slave CPUs using an NMI and the parked CPUs is holding spinlocks needed by schedule_work(). example: BUG: spinlock lockup suspected on CPU#0, namex/10450 lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1 ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000 ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20 Call Trace: [] ? __schedule+0x16d/0xac0 [] ? schedule+0x3c/0x90 [] ? schedule_hrtimeout_range_clock+0x10a/0x120 [] ? mutex_unlock+0xe/0x10 [] ? ep_scan_ready_list+0x1db/0x1e0 [] ? schedule_hrtimeout_range+0x13/0x20 [] ? ep_poll+0x27a/0x3b0 [] ? wake_up_q+0x70/0x70 [] ? SyS_epoll_wait+0xb8/0xd0 [] ? entry_SYSCALL_64_fastpath+0x12/0x75 CPU: 0 PID: 10450 Comm: namex Tainted: G O 4.4.65 #1 Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019 0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000 ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980 000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0 Call Trace: <#DB> [] dump_stack+0x85/0xc2 [] spin_dump+0x7f/0x100 [] do_raw_spin_lock+0xa0/0x150 [] _raw_spin_lock+0x15/0x20 [] try_to_wake_up+0x176/0x3d0 [] wake_up_process+0x15/0x20 [] insert_work+0x81/0xc0 [] __queue_work+0x135/0x390 [] queue_work_on+0x46/0x90 [] kgdboc_post_exp_handler+0x48/0x70 [] kgdb_cpu_enter+0x598/0x610 [] kgdb_handle_exception+0xf2/0x1f0 [] __kgdb_notify+0x71/0xd0 [] kgdb_notify+0x35/0x70 [] notifier_call_chain+0x4a/0x70 [] notify_die+0x3d/0x50 [] do_int3+0x89/0x120 [] int3+0x44/0x80 Just need to postpone schedule_work to the slave CPU exiting the NMI context. irq_work will only respond to handle schedule_work after exiting the current interrupt context. When the master CPU exits the interrupt context, other CPUs will naturally exit the NMI context, so there will be no deadlock. It is the call to input_register_handler() that forces us not to do the work from irq_work's hardirq callback. Therefore schedule another work in the irq_work and not do the job directly. Signed-off-by: LiuYe Co-authored-by: Daniel Thompson Signed-off-by: Daniel Thompson --- V4 -> V5: Answer why schedule another work in the irq_work and not do the job directly. V3 -> V4: Add changelogs V2 -> V3: Add description information V1 -> V2: using irq_work to solve this properly. --- --- drivers/tty/serial/kgdboc.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c index 7ce7bb164..161b25ecc 100644 --- a/drivers/tty/serial/kgdboc.c +++ b/drivers/tty/serial/kgdboc.c @@ -22,6 +22,7 @@ #include #include #include +#include #define MAX_CONFIG_LEN 40 @@ -99,10 +100,17 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy) static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper); +static void kgdboc_queue_restore_input_helper(struct irq_work *unused) +{ + schedule_work(&kgdboc_restore_input_work); +} + +static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, kgdboc_queue_restore_input_helper); + static void kgdboc_restore_input(void) { if (likely(system_state == SYSTEM_RUNNING)) - schedule_work(&kgdboc_restore_input_work); + irq_work_queue(&kgdboc_restore_input_irq_work); } static int kgdboc_register_kbd(char **cptr) @@ -133,6 +141,7 @@ static void kgdboc_unregister_kbd(void) i--; } } + irq_work_sync(&kgdboc_restore_input_irq_work); flush_work(&kgdboc_restore_input_work); } #else /* ! CONFIG_KDB_KEYBOARD */ -- 2.25.1