Received: by 2002:ab2:3141:0:b0:1ed:23cc:44d1 with SMTP id i1csp210872lqg; Fri, 1 Mar 2024 03:00:42 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWGLBe3QDz/9KFosRxrO3FpHTL8CsSMM1jwpdHz6tfOB+P9SVSRhx5Rbx7r0gt+LSHItVikQDoiJM/NMGwdQnysxalClk0Ip+JF6N33qA== X-Google-Smtp-Source: AGHT+IF2D97n08C1gkEt5+J+3dgu9uOQ/ncFEMvzHhIguh9e5sQ72PmsqASf0vHVVj0SrHlutkD/ X-Received: by 2002:a05:6a20:3c8d:b0:19e:67b0:9ac6 with SMTP id b13-20020a056a203c8d00b0019e67b09ac6mr1418507pzj.1.1709290842034; Fri, 01 Mar 2024 03:00:42 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709290842; cv=pass; d=google.com; s=arc-20160816; b=WNDBKPYn1oyDIdS78IaBtq3l51f+MKXZc9GsWC6zwSfphf+gyYbaMJNxSaf6d5OM51 Ub6j8YgDOnWm1YOnTJx4Zj5YGSKYfs2AlKjVcX7sH1URS2dIu4C+FiC4w9hNqJ64SY9/ MwBAkZoNvD34LnXEdFLEHwguZYSN+PsZrai3qVZC8GwUaxTRIdogHJLKwQtwAVfw8pDv e+FvKrgemUD123iPvN3MSO84yeg4fct7AtyJQZ0oKxTPk27/R6m4FjCDcAM2/IxP12Bm yM+vFerGR04G2Tsz2mmNLSN2bkXM+z9i9I1m5MyAVMWsAtFmvH6Y0SD+ku6xM90ggJgs ALDg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=heDjbeIBikVTGhhOlss+48ysC6wtU2DLudUIpsZswWE=; fh=eFiIqdhMjuLBZjo5tBAq9NrGYXN233o2A6hj3bq+dfA=; b=S9dZZGBOxGmiwOKhhZl2jixkMgtZ/qla4eMv7hDkaRhjy55npulKU3OgQh0aJxBcW5 Y3PyF8miiERwsM5135OKi0ZAnTsS40P9GmutoYx4F0uAJu79oxcvxWaineGUznxtyENB 4Cw+GZzz/YWNNvkllTZvlxzkXDcFqNFH97YkHlEZcbr0YgwHSYM3Xap3BNkXmlaRWyVw gt4ave7igAsvgLccNGVa6s8//A6/pTrY4vb4RM9Kdch63KZmQtQsjuMk0MbrKlrFXDk2 P21CvLnFQNWMOEPAwFr0okIktitYebYTP6nSE9R6L9e2v501Yaa9ywhA33RJYJQVFR1f oFjg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=XJpBYCdw; arc=pass (i=1 spf=pass spfdomain=linaro.org dkim=pass dkdomain=linaro.org dmarc=pass fromdomain=linaro.org); spf=pass (google.com: domain of linux-kernel+bounces-88291-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-88291-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id t12-20020a63224c000000b005ced65a49e8si3370026pgm.734.2024.03.01.03.00.41 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Mar 2024 03:00:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-88291-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=XJpBYCdw; arc=pass (i=1 spf=pass spfdomain=linaro.org dkim=pass dkdomain=linaro.org dmarc=pass fromdomain=linaro.org); spf=pass (google.com: domain of linux-kernel+bounces-88291-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-88291-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 5F862B219BB for ; Fri, 1 Mar 2024 10:59:48 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id AA8146BFCB; Fri, 1 Mar 2024 10:59:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="XJpBYCdw" Received: from mail-qk1-f170.google.com (mail-qk1-f170.google.com [209.85.222.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 40A405811C for ; Fri, 1 Mar 2024 10:59:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.170 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709290778; cv=none; b=Eq2lVqGXTf4UVzJ1ksqdp+v8jq/4Q0kgARaWAQAhPjnf4obovczeu/ANFX7QwRtMcU66jGbrD44pgVuprjOpknYGUj3pOd/G38pHar9iDPMmu76kttVD2fWDmecFq6jso2qx51Qmw1ArXvcLOENfukPAgeUbce9HsqTUaf4yi70= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709290778; c=relaxed/simple; bh=FwsfBDQ6oo4G7eqOtvRT/Kp2BDuvxZZ0tWNMz1sTY8I=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=uIkqUJXHiyJHAJAsjbKarSe/ukq4X/vLeCxWU/+5ZCBcheTWOaCezM09rGy9gdFHWphW82YCu+5Rjiu++45M7doNhccQ6jBVase6gCT7Yt/F8P+DDyBlZApdROjxQ18tqSTua6im75V4WwaHUbkzzfWWCC+TfC+hFKlKwgJhEnw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linaro.org; spf=pass smtp.mailfrom=linaro.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b=XJpBYCdw; arc=none smtp.client-ip=209.85.222.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-qk1-f170.google.com with SMTP id af79cd13be357-787ba57afd1so126665085a.0 for ; Fri, 01 Mar 2024 02:59:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1709290776; x=1709895576; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=heDjbeIBikVTGhhOlss+48ysC6wtU2DLudUIpsZswWE=; b=XJpBYCdwEGnGiVKisTL7i8Qs8NhjXP0fcm1eWjYKVey23bXWnmzZ/f2955HBNm8B0E I4dWnjFoJOa4Er2K+JyOfmpF7kBsXq75lh0cOZUGXwL2/AiktiBE7qeJ2VEh/YCHMl3i D8ROK4Hac/ZnMTgREdYgVSrx3TVW7tI/jbZbt2Ir01HGLdvR2Sl6f4i1cSkynktXX0uh 1U3be8jU44GH63sS+INQjCKCTqfBVavJSHXcyPVEv9Ntg4yZgF3yjvbVdybYkgMi4lEI eYE9ahS9U/0ExB6Y9sxlex+eZ/RndFRjZggF5ggQGRBGQ0bIWgKZAsQ97qG4WQOUpRHQ SipA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709290776; x=1709895576; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=heDjbeIBikVTGhhOlss+48ysC6wtU2DLudUIpsZswWE=; b=d1xMCWnuJ3X9GtgUFSgnJLbGEEfb8G35F2jK4obu6rkutj15Axt9Lp8B+5tWIrOoO1 4Xd1F3B/1Tl2PCGaSyLqWmpUw6rNg5RDlHxPRg3LNc9mdMUkkY53Of9gMdbD284TUkpN ORZ6PKP9LRl/KHfoEHcDpqBn+CTtck9fHYFRXHz8mM/Hr7F3wVsO+rhx+wyyCmIccfx2 KSV89Cspxv/zh5Y6NvMmBEi3t0JszrUQbd2+QL4tIfWBh6tdVWHkJDRVXbnBAOPxVTDq k4NiB2Kqx7HptcllblYcXZ1pSzv4VA0KFOp5fJVntC1vKbyrUBGcPraCRDLlkv8OJdHc 3wZA== X-Forwarded-Encrypted: i=1; AJvYcCXvKNSQjS2Chd5r0THqUPJg02khf32O8QUpgfwzSqg0c2annxsJMQAgmmOXTAnS0gBssxxc32QcZO4Ha4eo6SVNBdqxjboSfySuff7E X-Gm-Message-State: AOJu0YyXCEb+xP93RmrkBcsGV7ZKptX9sRDxQtBB9dG5P2uD4KlxxCG3 071olTMx8GadRDE4ZPa7r9K6KaYkhQzNDajimQFgw2L0wkukRGfnm6gtp7b3NVY= X-Received: by 2002:a0c:fdec:0:b0:68f:3f98:f695 with SMTP id m12-20020a0cfdec000000b0068f3f98f695mr1269673qvu.39.1709290776130; Fri, 01 Mar 2024 02:59:36 -0800 (PST) Received: from aspen.lan (aztw-34-b2-v4wan-166919-cust780.vm26.cable.virginm.net. [82.37.195.13]) by smtp.gmail.com with ESMTPSA id qm14-20020a056214568e00b0068f92234e2fsm1710852qvb.109.2024.03.01.02.59.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Mar 2024 02:59:35 -0800 (PST) Date: Fri, 1 Mar 2024 10:59:31 +0000 From: Daniel Thompson To: Liuye Cc: "jason.wessel@windriver.com" , "dianders@chromium.org" , "gregkh@linuxfoundation.org" , "jirislaby@kernel.org" , "kgdb-bugreport@lists.sourceforge.net" , "linux-kernel@vger.kernel.org" , "linux-serial@vger.kernel.org" Subject: Re: =?utf-8?B?562U5aSN?= =?utf-8?Q?=3A?= [PATCH] kdb: Fix the deadlock issue in KDB debugging. Message-ID: <20240301105931.GB5795@aspen.lan> References: <20240228025602.3087748-1-liu.yeC@h3c.com> <20240228120516.GA22898@aspen.lan> <8b41d34adaef4ddcacde2dd00d4e3541@h3c.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8b41d34adaef4ddcacde2dd00d4e3541@h3c.com> On Fri, Mar 01, 2024 at 03:30:25AM +0000, Liuye wrote: > >On Wed, Feb 28, 2024 at 10:56:02AM +0800, LiuYe wrote: > >> master cpu : After executing the go command, a deadlock occurs. > >> slave cpu: may be performing thread migration, acquiring the > >> running queue lock of master CPU. Then it was interrupted by kdb > >> NMI and entered the nmi_handler process. (nmi_handle-> > >> kgdb_nmicallback-> kgdb_cpu_enter while(1){ touch wathcdog}.) > > > >I think this description is a little short and doesn't clearly > >explain the cause. How about: > > > >Currently, if kgdboc includes 'kdb', then kgdboc will attempt to use > >schedule_work() to provoke a keyboard reset when transitioning out of > >the debugger and back to normal operation. This can cause deadlock > >because schedule_work() is not NMI-safe. > > > >The stack trace below shows an example of the problem. In this case > >the master cpu is not running from NMI but it has parked the slace > >CPUs using an NMI and the parked CPUs is holding spinlocks needed by > >schedule_work(). > > Due to the brevity of the description, there may be some > misunderstanding, so a detailed description is provided as follows: So, there is a small mistake in the example description I provided. After double checking the code it should start slightly differently: "Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will attempt to use schedule_work() ...". However other than that I think it is correct. The important bit of feedback here is that the patch description should commence with a description of the bug rather than a description of the symptom. In this case the bug is kgdboc calls a function that is not safe to call from this calling context. It is really useful to describe the symptom as part of the patch description. However if we focus on the symptom without additional code review then we can end up with the wrong fix. That is what happened here. It is unsafe to call schedule_work() and checking the runqueue locks is insufficient to make it safe because we are still calling a function from an inappropriate calling context.. Daniel.