Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp2759293ybk; Tue, 12 May 2020 07:30:09 -0700 (PDT) X-Google-Smtp-Source: APiQypKbVuqwLeAHAhndiVaYhYMiI9pQZQRrcQBrV/ct47B0bXixsHIuGOGPjB2r5prOWASSf+33 X-Received: by 2002:a50:9e2a:: with SMTP id z39mr18736691ede.371.1589293809217; Tue, 12 May 2020 07:30:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1589293809; cv=none; d=google.com; s=arc-20160816; b=BN73iqzJ8sNl2ZWC+O8uc6Pg3rzKjVaYFdsqHQv6s8YfpQFKn9rbpgXSQkPZKC7vTY dCYiEK7IMMcH9ufwJ2gpMAXb0dA0Z4EL2Nwr+neP75mbGibNXOMw+87uGFWur+8RpdlR 5ykoLhNsCvG4Et0bsA4EcqyUD10X8EqHeMz873IQAuSsZyXJAsmP8LXThHYs0JZ6zZ51 exA9XzAAXGRuWkqh0Z4tE71k0VlFOq86TpMcWFsKAW9+kCjkCP5UlQ7TEP+CqEUOz1VN 9l1LAwgsQcd2FNabJUf1KFFU7KLLVN7AO0ITpp2YdRwAk3fIJG1IBG2OWJjMtm9gTXgP gcGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=Id76DUUx7BnCr5PksX/RIXDmuZEvMLrGK6iNbMFQn+M=; b=Fmp7pc4pNb2mBLnKnTOoIODs8FnCxzWVR8F/+QeR0Pllb9FsWz3IkPjz22yyoHztCr IhA7b0K0WKMtK1kSnveoZtxe5Rp/slxDk/jMjDCmhd2Nu386AbpuNcXdNPITJ5kBZwhL y47s6lGsObBS9Rf17VuJ2Qqp52kNfLe7d49vEoxoOnwWGvebBGOeEFvmquhr9+r9CET9 fVVwqlmSkaFp04R5kiRTrAGIspY4zG+y8AOxPfZGRoD7Ts78h1NfsBo5JFO8yIPGl4w+ jrNm2Ow/Q4oC8+ow1BAmD46MMHqFbfmQMarU5byOc2ouYk2HXZg6IcVI68RuIVG9zZFv U1Ow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=eQDZAPmT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k26si4547807ejv.260.2020.05.12.07.29.46; Tue, 12 May 2020 07:30:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=eQDZAPmT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730288AbgELOZk (ORCPT + 99 others); Tue, 12 May 2020 10:25:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37600 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729637AbgELOZk (ORCPT ); Tue, 12 May 2020 10:25:40 -0400 Received: from mail-wm1-x342.google.com (mail-wm1-x342.google.com [IPv6:2a00:1450:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A08E7C061A0C for ; Tue, 12 May 2020 07:25:38 -0700 (PDT) Received: by mail-wm1-x342.google.com with SMTP id m24so12383636wml.2 for ; Tue, 12 May 2020 07:25:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Id76DUUx7BnCr5PksX/RIXDmuZEvMLrGK6iNbMFQn+M=; b=eQDZAPmTTPyiCe56SuT9EzeYjpzxoz1Hky2+xzFEQTSscBQvhvcOxcAiDgZjSfXiqx qAcIp1pinbm5IAAhK1KO2kD/zr+ekAOYCav+lfUOBKbFhvv+Q3yP2mGapHhY6QmdbZ7W or1/3gFQaJZOpUE4jo65P57NPBl1wO3Y5Tpa2r3CSSZDRHP5tyxNXpIifxop5u5GDUca DfMWpC+tBWbhQPsIts+F9FL1ippOzG70eajkhMGpU0Xm7SY/lguQBXTJy0//jcsi0hHH gm0SBaIfBEVZARsYTGDCikCjZLTBPo1NEXa5M+EC+p3DLFugr+PToE+XZZmWn/oI9xzo Xhmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Id76DUUx7BnCr5PksX/RIXDmuZEvMLrGK6iNbMFQn+M=; b=RVKUIMaQiSTJeCSBi5RsB8UTzS5gryY0Byf+k83RftCzU0sTMVTawEu7159w6WtPAP 1zJotD0yv35B4FNeLaFWODOZKQvXo1RjqPbUQpNPrX26Yixp3u+yLdYeaKuGTzvDNAsN 6VXlhPZnYVLH/OmQfgZpS54jRW+fIwamDhHwaE4ZEHF0W2iNYfHYzCtFGRJCxadceF1A VkQvf58iU5EqBBgfm4Q87XLujl6sQxBUzFBTyz6NwEBP0ZwkmIrJubCaA7NuGWvFsKyz Yduovg2SKyhG/ZTTrEO2JPngJzZH0bMTt73zL8k06dcaphCOn3y+HGhiAnV5hw/Jdxio pFqA== X-Gm-Message-State: AGi0PubSLmZ0oPilpzRqKoMwjDy4NlcOoQS5oZzoSpjnOFXTCTR6F2LI 2ysHEyvLwLyxISSKNyJtax6EspenD+PQ+A== X-Received: by 2002:a7b:c413:: with SMTP id k19mr11635494wmi.124.1589293537286; Tue, 12 May 2020 07:25:37 -0700 (PDT) Received: from holly.lan (cpc141214-aztw34-2-0-cust773.18-1.cable.virginm.net. [86.9.19.6]) by smtp.gmail.com with ESMTPSA id c19sm24088581wrb.89.2020.05.12.07.25.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 07:25:35 -0700 (PDT) Date: Tue, 12 May 2020 15:25:33 +0100 From: Daniel Thompson To: Sumit Garg Cc: jason.wessel@windriver.com, dianders@chromium.org, kgdb-bugreport@lists.sourceforge.net, linux-kernel@vger.kernel.org, arnd@arndb.de, akpm@linux-foundation.org, Petr Mladek , Peter Zijlstra , Steven Rostedt Subject: Re: [PATCH] kgdb: Fix broken handling of printk() in NMI context Message-ID: <20200512142533.ta4uejwmq5gchtlx@holly.lan> References: <1589273314-12060-1-git-send-email-sumit.garg@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1589273314-12060-1-git-send-email-sumit.garg@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 12, 2020 at 02:18:34PM +0530, Sumit Garg wrote: > Since commit 42a0bb3f7138 ("printk/nmi: generic solution for safe printk > in NMI"), kgdb entry in NMI context defaults to use safe NMI printk() I didn't see the author on Cc: nor any of the folks whose hands it passed through. It would definitely be good to involve them in this discussion. > which involves CPU specific buffers and deferred printk() until exit from > NMI context. > > But kgdb being a stop-the-world debugger, we don't want to defer printk() > especially backtrace on corresponding CPUs. So instead switch to normal > printk() mode in kgdb_cpu_enter() if entry is in NMI context. So, firstly I should *definitely* take a mea cupla for not shouting about this at the time (I was on Cc:... twice). Only thing I can say confidently is that the test suite didn't yell about this and so I didn't look at this as closely as I should have done (and that it didn't yell is mostly because I'm still building out the test suite coverage). Anyhow... This feels a little like we are smearing the printk() interception logic across the kernel in ways that make things hard to read. If we accepted this patch we then have, the new NMI interception logic, the old kdb interception logic and some hacks in the kgdb trap handler to defang the NMI interception logic and force the kdb logic to kick in. Wouldn't it be better to migrate kdb interception logic up a couple of levels so that it continues to function even when we are in nmi printk mode. That way *all* the printk() interception code would end up in one place. Finally some clue description of how to provoke the problem would be useful... that sort of things helps me to grow the test suite coverage. Daniel. > > Signed-off-by: Sumit Garg > --- > > Similar change was posted earlier specific to arm64 here [1]. But after > discussions it emerged out that this broken handling of printk() in NMI > context should be a common problem that is relevant to other archs as well. > So fix this handling in kgdb_cpu_enter() as there can be multiple entry > points to kgdb in NMI context. > > [1] https://lkml.org/lkml/2020/4/24/328 > > kernel/debug/debug_core.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c > index 2b7c9b6..ab2933f 100644 > --- a/kernel/debug/debug_core.c > +++ b/kernel/debug/debug_core.c > @@ -567,6 +567,15 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs, > kgdb_info[ks->cpu].enter_kgdb++; > kgdb_info[ks->cpu].exception_state |= exception_state; > > + /* > + * kgdb entry in NMI context defaults to use safe NMI printk() which > + * involves CPU specific buffers and deferred printk() until exit from > + * NMI context. But kgdb being a stop-the-world debugger, we don't want > + * to defer printk(). So instead switch to normal printk() mode here. > + */ > + if (in_nmi()) > + printk_nmi_exit(); > + > if (exception_state == DCPU_WANT_MASTER) > atomic_inc(&masters_in_kgdb); > else > @@ -635,6 +644,8 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs, > atomic_dec(&slaves_in_kgdb); > dbg_touch_watchdogs(); > local_irq_restore(flags); > + if (in_nmi()) > + printk_nmi_enter(); > return 0; > } > cpu_relax(); > @@ -772,6 +783,8 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs, > raw_spin_unlock(&dbg_master_lock); > dbg_touch_watchdogs(); > local_irq_restore(flags); > + if (in_nmi()) > + printk_nmi_enter(); > > return kgdb_info[cpu].ret_state; > } > -- > 2.7.4 >