Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3467555imu; Wed, 7 Nov 2018 10:45:29 -0800 (PST) X-Google-Smtp-Source: AJdET5cH+iByj+uAPtKF8pKuCe2NsoGgcEMZe6lQPWbXYDhukDJF1fx7wEulUZ92PIRz9OtSBaBE X-Received: by 2002:a65:6684:: with SMTP id b4mr1143327pgw.55.1541616328939; Wed, 07 Nov 2018 10:45:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541616328; cv=none; d=google.com; s=arc-20160816; b=oznysOZ1J8BFQl5WfC/DaWrYyuLZ5kd8CAomkoGGlP+QCjOKi5PFYGdNRfiVtoVeAz osiXBmDz/TR2VCqY/lt9u7VH37pN1RVB4WgRWeFgkJyS1olWgm3UJbRZ8Fa9nbUmaxo5 Pd7h8A891mA0rCeTEwkU9uMOAOv+ub+7PDPier1qHkSczBnTUV07VsdM+M0LbECMmm41 NRFjQDcxqhjUSIE2Hc/DqlcOGxa+4uAZ9kulaTGJryXSeByriVVR7bVTohrhMPmmzpMy PI+bJtXgKeLA6gbEIGPJ2hmS+oNUk7BdEdBXCEnD2FARg1ZEGKsj6H3PFlKonAK0WqWr IteQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=bZBtL8GlluFABM8CDN/PPTJ9bkuOUzL2FDyLXIArNec=; b=vdOJHzIsg8pQwuC0bqcfK+/Erul4OxifQ4GhvKp/Lu7gfpMONIczYA1m9YwytHL1Md pkrOh4IGLXPpXqB7sgqYLThuxshpe1nlbp3ct3tZt5BfcgDSXt955O+uYBm0ggMfgVM7 tL86KcU5+N7m1yi0zyyMTQPXkoTbs2DkuTz7phC/vMCd3EhmLG++h2BC0PpI6+PkOB2Z V6+JSwiLxy4jpzjs7bBTi5Rs9db7e0aweHq46xgJfUEKvjzdQN1i+DzjccaUV2LX22OQ mlpC12uIQRThNGk/+kmHkv68oT/XYxlGxNSIBfr4yZyW9XcroqIJalD/+WL53Rn3z91/ sS0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=WqTOKQNF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c134-v6si1415819pfb.139.2018.11.07.10.45.14; Wed, 07 Nov 2018 10:45:28 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=WqTOKQNF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728859AbeKHEQY (ORCPT + 99 others); Wed, 7 Nov 2018 23:16:24 -0500 Received: from mail-vs1-f65.google.com ([209.85.217.65]:46306 "EHLO mail-vs1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726938AbeKHEQX (ORCPT ); Wed, 7 Nov 2018 23:16:23 -0500 Received: by mail-vs1-f65.google.com with SMTP id r14so10019665vsc.13 for ; Wed, 07 Nov 2018 10:44:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=bZBtL8GlluFABM8CDN/PPTJ9bkuOUzL2FDyLXIArNec=; b=WqTOKQNFDu3MMYVKmHrHxzCuwhtkxTmx2UvAyV5YIKnRWGs0l609bcPE828l62Sxne Bvn5kghBF/GpQmqt/e9b4tXZq1dMPGeeTrdGpIJ/g97P+lJU8bY+2zgs/bbnpJhYy2Zo l8SgxXkDw16Bxl10RZBIFeIoePZNx7NCw2Uwg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=bZBtL8GlluFABM8CDN/PPTJ9bkuOUzL2FDyLXIArNec=; b=izKyDsmqppUsqhNMgQaDpx0jVA8IEXA+1OlTcF37Ms9c0qZIF5IX9bL6kaXzHeyu7t K6Nu6UadTFYtq87mdj3msr13j/KcXyzCAkJ2C18JJQPTlndXt/tTPvtqUkCyahG8o802 gBVdTZjMJRC9Ff8peBLtjMyVcbtWTevtrHzBEYLdOBQKORLY5h75+eeUW4pWW1VavQ/V F2VjUpoHGurx8+eUwcrUaJmr4T3b7uIHmNLVMOSBvkaoZTVoMJ6LChAWOrp/vWzbWoJN l3pKTHVxR6Alc21hkHyjG2XY0rVL1f2ptLoxOCBftnh2ubqCr12NoRCWaR2HK2jmBOI+ ROpQ== X-Gm-Message-State: AGRZ1gKUiXse3mRMZiAhTXT0OqQGSYhaazRcTRoAxBahyVX5edrAfCu8 a6ft4uTxkhY2EHR4CmmqoosL4dHIwUo= X-Received: by 2002:a67:db06:: with SMTP id z6mr575302vsj.73.1541616284418; Wed, 07 Nov 2018 10:44:44 -0800 (PST) Received: from mail-ua1-f44.google.com (mail-ua1-f44.google.com. [209.85.222.44]) by smtp.gmail.com with ESMTPSA id a2sm337174uaa.15.2018.11.07.10.44.43 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 07 Nov 2018 10:44:43 -0800 (PST) Received: by mail-ua1-f44.google.com with SMTP id o17so6233004uad.8 for ; Wed, 07 Nov 2018 10:44:43 -0800 (PST) X-Received: by 2002:ab0:650e:: with SMTP id w14mr636398uam.83.1541616283078; Wed, 07 Nov 2018 10:44:43 -0800 (PST) MIME-Version: 1.0 References: <20181107010028.184543-1-dianders@chromium.org> <20181107010028.184543-5-dianders@chromium.org> <20181107123032.ytwzfkkwajuo4tsj@holly.lan> In-Reply-To: <20181107123032.ytwzfkkwajuo4tsj@holly.lan> From: Doug Anderson Date: Wed, 7 Nov 2018 10:44:31 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v3 4/4] kdb: Don't back trace on a cpu that didn't round up To: Daniel Thompson Cc: Jason Wessel , kgdb-bugreport@lists.sourceforge.net, Peter Zijlstra , christophe.leroy@c-s.fr, LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On Wed, Nov 7, 2018 at 4:30 AM Daniel Thompson wrote: > > On Tue, Nov 06, 2018 at 05:00:28PM -0800, Douglas Anderson wrote: > > If you have a CPU that fails to round up and then run 'btc' you'll end > > up crashing in kdb becaue we dereferenced NULL. Let's add a check. > > It's wise to also set the task to NULL when leaving the debugger so > > that if we fail to round up on a later entry into the debugger we > > won't backtrace a stale task. > > > > Signed-off-by: Douglas Anderson > > --- > > > > Changes in v3: > > - New for v3. > > > > Changes in v2: None > > > > kernel/debug/debug_core.c | 1 + > > kernel/debug/kdb/kdb_bt.c | 11 ++++++++++- > > 2 files changed, 11 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c > > index 324cba8917f1..08851077c20a 100644 > > --- a/kernel/debug/debug_core.c > > +++ b/kernel/debug/debug_core.c > > @@ -587,6 +587,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs, > > kgdb_info[cpu].exception_state &= > > ~(DCPU_WANT_MASTER | DCPU_IS_SLAVE); > > kgdb_info[cpu].enter_kgdb--; > > + kgdb_info[cpu].task = NULL; > > This code isn't quite right. > > In particular there are two exit paths from kgdb_cpu_enter() and this > code path is for slave exit only (and master may change the next time we > re-enter kgdb). Ah, good point! Right that the master could be the the one that later fails to round up. > It also looks very odd to have an unconditional clear > next to a decrement that takes account of debugger re-entrancy. Would it look less odd if I write it like this? if (kgdb_info[cpu].enter_kgdb == 0) kgdb_info[cpu].task = NULL; In general, though, "enter_kgdb" looks to be more of a boolean value then a true counter. The only place that modifies "enter_kgdb" is kgdb_cpu_enter(). It increments it upon entry to the function and decrements it upon exit. All the callers to kgdb_cpu_enter() confirm that "enter_kgdb" is 0 before calling it. I'll further note that kgdb_cpu_enter() unconditionally clobbers things like "kgdb_info[cpu].debuggerinfo" upon entry. I could add a patch that converts "enter_kgdb" to a true boolean if that helps, though at this point I'm getting pretty far afield of my original purpose of trying to fix the problem lockdep yelled about. :-P ...and I probably need to get back to my day job. I'd also note that the other bits of code around here look pretty unconditional to me, but my kdb/kgdb knowledge is not very strong (as you've seen) so I could be wrong. We are unconditionally clearing bits from the "exception_state", unconditionally turning tracing on, unconditionally calling correct_hw_break(), etc. ...hmmmm, I suppose it could look less odd if I moved my unconditional bit to be up a little higher. I could put it up above to the unconditional clearing of "exception_state" ;-P > Note also that there is similar code in kdb_debugger.c (search for "zero > out any offline cpu data") which should be tidied up if we decide to do > the same clean up in a different way. > > I'll leave it to you whether to fix the existing code or add new code > here but if you do add it in kgdb_cpu_enter() it must cover both return > paths, include debuggerinfo as well, and kdb_debugger.c needs to be > tidied up. OK, so I _think_ the answer here is to just get rid of the code from kdb_debugger.c and rely on my new code. Specifically I'd rather CPUs clear their own "kgdb_info" so we don't need to worry about races where a CPU rounds up really late at some time when we're not expecting it. ...and once CPUs always clear its own "kgdb_info" when exiting we don't need any special case code to have the master try to clear state of offline CPUs. Thanks for pointing me to this code to get rid of. --- So the tl;dr: 1. Though I'm 99% certain that "enter_kgdb" is truly a bool that is either 0 or 1, I won't write a patch to change this myself in the hopes that I can wrap up this patch series. I'll add a note in my CL description indicating that I believe things are safe because "enter_kgdb" is really a bool. 2. I will add clearing of "debuggerinfo". 3. I will add the same clearing of "task" and "debuggerinfo" to when the master exits. I won't try to unify the code in my patch and leave that for a future person working on this code. 4. To make it look less odd, I'll move my clearing to above the "kgdb_info[cpu].exception_state &=" line. It doesn't really matter and why not make it look less odd? 5. I will remove the clearing of "debuggerinfo" and "task" from kdb_stub() since it will no longer be needed. ...and again, thank you for all your timely and awesome reviews and advice here. It is certainly appreciated. -Doug