Received: by 2002:a05:7208:9594:b0:7e:5202:c8b4 with SMTP id gs20csp2506051rbb; Wed, 28 Feb 2024 04:12:45 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWWdIYdgRmoBxaxrBBg6sFx8NtZIPI3g8vqSbmCyQ6JV08SCR7+cx1l8XzRJa3KzVCSWqJnS/msyiTZeGbtSLZ0EqQYR1AJklloUGYzAg== X-Google-Smtp-Source: AGHT+IF4t2B4Fc3gcWIbj3E/+HIYhL4rZ69ggfne7Sqt3SiTYu/taThMX4jeiKrn3ffbCkvuRTOc X-Received: by 2002:aa7:c90e:0:b0:566:5ea2:1257 with SMTP id b14-20020aa7c90e000000b005665ea21257mr1638938edt.16.1709122365538; Wed, 28 Feb 2024 04:12:45 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709122365; cv=pass; d=google.com; s=arc-20160816; b=K9SVEHD4AtkT6iJEQRgiNNOPfcb98ne+5Ogj475eIVAS7itwyIXSgO2oUkYahikXCn 2t4o/Gat9hbBf93Bwuy8+BYdSWvF0xzBLQGbulPSNvIgdIuoeZdtWcipLSXLjThTs+qx bYSgMPGF95IYBV0/Fygz8x4ZnKGlBodztqTdsRIg7gNw1sk5DFH4Zb31KEqIXb62euPD 8GaYmNNiwAaA86kQWYvcbuOBkl0ZOaER+PM3bYpiLQCaVnqQohTRVKVC9gPMC60OMK08 j49Pe45vmJmyTp9eWVoBGZqSOz7SqI49rDcN++v8IUk3MJeC0m7h8gHyrc82g+BlwzON ulGg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=PzqyTr5X+aYREyJTGjIbzkTXpuehiLhDk3JwLwQ/6fk=; fh=ZKwGVTEWzC5J4YsJuJLKeVykssm6m7AMZ1y5fcWQnp0=; b=d0f1Q6LESGaIQ22do3rydwhEnYl/gs4yLWSGLsE5rQ3/6J322lHT/u6N8FlJ1+v9Qg GL47/nBxPYcR32abgYfs4gkddcZv0cFm193IWXxgH0jb9/O30859zOjM4F8kVX0CqDOF Z2wex29kNEscTTA7WlZElFO5cPRB3SrSeqNPiZtIb7ztInQWXm97wTx/XX2WUZmGqEOJ 6XgqB8Zdl2KE4PioWSDcu00fgcYiIGg+y+Y/S+REle+/C4qNr4jsg+MPWgjOACyKpB2e VseYOoXmInTk2ua4doCddBbdU6q0sJwNHK5FiND+NcMbihaIlo3r7A61SYEtKqsAAuUe TXKw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=gBakFKtY; arc=pass (i=1 spf=pass spfdomain=linaro.org dkim=pass dkdomain=linaro.org dmarc=pass fromdomain=linaro.org); spf=pass (google.com: domain of linux-kernel+bounces-84993-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-84993-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id l25-20020a50d6d9000000b005662dc55315si1702017edj.78.2024.02.28.04.12.45 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 04:12:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-84993-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=gBakFKtY; arc=pass (i=1 spf=pass spfdomain=linaro.org dkim=pass dkdomain=linaro.org dmarc=pass fromdomain=linaro.org); spf=pass (google.com: domain of linux-kernel+bounces-84993-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-84993-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 201ED1F29757 for ; Wed, 28 Feb 2024 12:05:29 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id AC3737352C; Wed, 28 Feb 2024 12:05:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="gBakFKtY" Received: from mail-wr1-f54.google.com (mail-wr1-f54.google.com [209.85.221.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D6E7573501 for ; Wed, 28 Feb 2024 12:05:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.54 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709121922; cv=none; b=iyDcLQ5T/zg9o40xd0EXhUUdVsV+S386wgVlQ5mCXzm0GxGBl64DTr6kGgTUgbtpsotD1pAMgX0MqAmoOoCiU6cXm0Ch9X6MvxUpd1rAjgYbqnYD72GzoV564/l7tmNR1eOl8xeDJYpuE5K+NVjBqH37pQD/nk3yEhqFQQat4k4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709121922; c=relaxed/simple; bh=9tWLl7stDS/wMVXAi4o5lfP2199c9GwTcbXY/JsSKHk=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=hIK7yDVwuqsOgsTMLkY51m+MLrc0WmEDI4Bl8QMKYbQrYcg/KvdP3ASR9TG1HrHtktJswz1p5hriFMKEPfFlWYHMJm9jsJi78vZPQc2Rl7sSVKrc0O0qvo/Em7hLRkNB1dvlNJrgUe0ppUacFRucfJrvC5Il0oEn8h5abxT1Gt8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linaro.org; spf=pass smtp.mailfrom=linaro.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b=gBakFKtY; arc=none smtp.client-ip=209.85.221.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-wr1-f54.google.com with SMTP id ffacd0b85a97d-33d6cc6d2fcso3292018f8f.2 for ; Wed, 28 Feb 2024 04:05:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1709121919; x=1709726719; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=PzqyTr5X+aYREyJTGjIbzkTXpuehiLhDk3JwLwQ/6fk=; b=gBakFKtYWvfkw/DrNnh4gv6jMVIljdhR3vboZbMyjOzi3ddxOrT8HkUTOKW/8CsIPa GesDusIWaAPF4OY+/22RV3necZDWNnTSgy7QPOnOfEgH6Y2CZCd8s8DbWwfmDOwZJ9aC Oy4YaaLkzCKFbgN7hXuzAJVM38hbr3b/058ntNJXrKF8/JEK9GwWjH3YnOzGGphy+tGS u15pivOnb9+1CennhcZNCKSrtmhStqeL3QE8HNknu6b8D3Vn/qVka3H7Esc4BHAp/69X ybqEhQIK4Lz73mexGtXN4Rx+l/J7dT/jXP9RtfHTzpuWYIzcxLPE0x5x25bC5mveUcnP WigA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709121919; x=1709726719; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=PzqyTr5X+aYREyJTGjIbzkTXpuehiLhDk3JwLwQ/6fk=; b=kYWeb0s4t4xyms+2vT9ZXnLkKDVfZxUwTfzDUADetY/fV2LjeIGE5PyKSPZuMG78Fp NY9tRnZ2SPgEYQAM5SFxNsAJmQG0PvEHKEGNE6pxUUeeJp+w8ClZyaKVxQViTQkdx6XI I9ClaTbQJQ9/BC66SUdOC1kSzEEsWwNxz8lov/GLErrBAYb+akWUHQUG4fGAUwsqR1re b8tbrLE/Pr3wabYr6i8DNigwhsUox29oX1//dWw+L+fhnTclZ/oxw3hnzHoJ/cYUkPif rDhMlCw9TRt2xWKhXByx0gwJzZQsVhs27UXYIe4pCzC/4UgBcmauYw4CLVoDOs7A1WXP zzPA== X-Forwarded-Encrypted: i=1; AJvYcCXTrQ4Wccjspl4J7Hd62dspPWlTSmC80DsdJJIS7FN/ZwjKLRAcZ4XDrqHIgamypxHysqakWal3x7vDOYH4Oa1T/z/cYohaz7VfhTtc X-Gm-Message-State: AOJu0YyfXPExUmVxSenSDnnVXW6KkGh2prV+46q716GynuSovQekcx+r xqEkMrRX4ArDzjDNS78dBkuLg2uPBymfp9ueJN/x/2L5UexlR58QNCp2a3mqhgM= X-Received: by 2002:a5d:52cb:0:b0:33d:89a8:6b99 with SMTP id r11-20020a5d52cb000000b0033d89a86b99mr7802213wrv.70.1709121919242; Wed, 28 Feb 2024 04:05:19 -0800 (PST) Received: from aspen.lan (aztw-34-b2-v4wan-166919-cust780.vm26.cable.virginm.net. [82.37.195.13]) by smtp.gmail.com with ESMTPSA id by15-20020a056000098f00b0033e02f181f7sm1248178wrb.89.2024.02.28.04.05.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 04:05:18 -0800 (PST) Date: Wed, 28 Feb 2024 12:05:16 +0000 From: Daniel Thompson To: LiuYe Cc: jason.wessel@windriver.com, dianders@chromium.org, gregkh@linuxfoundation.org, jirislaby@kernel.org, kgdb-bugreport@lists.sourceforge.net, linux-kernel@vger.kernel.org, linux-serial@vger.kernel.org Subject: Re: [PATCH] kdb: Fix the deadlock issue in KDB debugging. Message-ID: <20240228120516.GA22898@aspen.lan> References: <20240228025602.3087748-1-liu.yeC@h3c.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240228025602.3087748-1-liu.yeC@h3c.com> On Wed, Feb 28, 2024 at 10:56:02AM +0800, LiuYe wrote: > master cpu : After executing the go command, a deadlock occurs. > slave cpu: may be performing thread migration, > acquiring the running queue lock of master CPU. > Then it was interrupted by kdb NMI and entered the nmi_handler process. > (nmi_handle-> kgdb_nmicallback-> kgdb_cpu_enter > while(1){ touch wathcdog}.) I think this description is a little short and doesn't clearly explain the cause. How about: Currently, if kgdboc includes 'kdb', then kgdboc will attempt to use schedule_work() to provoke a keyboard reset when transitioning out of the debugger and back to normal operation. This can cause deadlock because schedule_work() is not NMI-safe. The stack trace below shows an example of the problem. In this case the master cpu is not running from NMI but it has parked the slace CPUs using an NMI and the parked CPUs is holding spinlocks needed by schedule_work(). > example: > BUG: spinlock lockup suspected on CPU#0, namex/10450 > lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1 > ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000 > ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20 > Call Trace: > [] ? __schedule+0x16d/0xac0 > [] ? schedule+0x3c/0x90 > [] ? schedule_hrtimeout_range_clock+0x10a/0x120 > [] ? mutex_unlock+0xe/0x10 > [] ? ep_scan_ready_list+0x1db/0x1e0 > [] ? schedule_hrtimeout_range+0x13/0x20 > [] ? ep_poll+0x27a/0x3b0 > [] ? wake_up_q+0x70/0x70 > [] ? SyS_epoll_wait+0xb8/0xd0 > [] ? entry_SYSCALL_64_fastpath+0x12/0x75 > CPU: 0 PID: 10450 Comm: namex Tainted: G O 4.4.65 #1 > Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019 > 0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000 > ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980 > 000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0 > Call Trace: > <#DB> [] dump_stack+0x85/0xc2 > [] spin_dump+0x7f/0x100 > [] do_raw_spin_lock+0xa0/0x150 > [] _raw_spin_lock+0x15/0x20 > [] try_to_wake_up+0x176/0x3d0 > [] wake_up_process+0x15/0x20 > [] insert_work+0x81/0xc0 > [] __queue_work+0x135/0x390 > [] queue_work_on+0x46/0x90 > [] kgdboc_post_exp_handler+0x48/0x70 > [] kgdb_cpu_enter+0x598/0x610 > [] kgdb_handle_exception+0xf2/0x1f0 > [] __kgdb_notify+0x71/0xd0 > [] kgdb_notify+0x35/0x70 > [] notifier_call_chain+0x4a/0x70 > [] notify_die+0x3d/0x50 > [] do_int3+0x89/0x120 > [] int3+0x44/0x80 > > Signed-off-by: LiuYe > --- > drivers/tty/serial/kgdboc.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c > index 7ce7bb164..945318ef1 100644 > --- a/drivers/tty/serial/kgdboc.c > +++ b/drivers/tty/serial/kgdboc.c > @@ -22,6 +22,9 @@ > #include > #include > #include > +#include > + > +#include "../kernel/sched/sched.h" > > #define MAX_CONFIG_LEN 40 > > @@ -399,7 +402,8 @@ static void kgdboc_post_exp_handler(void) > dbg_restore_graphics = 0; > con_debug_leave(); > } > - kgdboc_restore_input(); > + if (!raw_spin_is_locked(&(cpu_rq(smp_processor_id())->lock))) > + kgdboc_restore_input(); I don't think solving this by access internal scheduler state is the right approach . The description I wrote above perhaps already suggests why. The deadlock occurs because it is unsafe to call schedule_work() from the debug trap handler. The debug trap handler in your stack trace is not running from an NMI but it certainly has NMI-like properties. Therefore a better fix is not to call schedule_work() at all from the debug trap handler. Instead we need to use an NMI-safe API such as irq_work_queue() and that irq_work can call schedule_work() and trigger the keyboard reset. Daniel.