Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp2517453rdg; Mon, 16 Oct 2023 07:00:32 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEv8SZhVfRTQ88uzulKQbE9+JWsh1hpz31atYRS7JSu0LU7b88opoSSKHBjy73gfRX+xu3H X-Received: by 2002:a05:6a20:32aa:b0:15e:10e:2cc3 with SMTP id g42-20020a056a2032aa00b0015e010e2cc3mr22807819pzd.60.1697464832550; Mon, 16 Oct 2023 07:00:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697464832; cv=none; d=google.com; s=arc-20160816; b=GdBhQU/4edaV6C/6krqzP0Hg889bL4zSFFsvugjPZPK4SNkRIbjI06TiF6O9s40gyx ogXcSMZLFCRYlOheD/uAdNCakVriQprF/bq+pvQiWlR1yUZU4apYX/QveW5TuEvZsrNC AWFAVswVyhVxndG4m7p2jBGvyhQ2ueAsKh+TB3KyMnS+x91Y6zD4+is6tmTTFzdOwV7A /ACNWKbBNFD6cJa52EJn2YstypUKuAtVOqJE0MJbvc8XCmHCOy81kAaopytXwNMqC/1x Z2yqggU8js67k5/sdxXUQ8OmSeS0iEcmZxGpryOHw3o4ve/ihRpDcSrxMKZseluTDM69 KGrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=eDGQeCo8V2sN0t+O9GNO1jubu9jMmcOIokTbMoAoZDo=; fh=K1ReixlBk5gXan7FtpkIjqru4XjyhdGNgzXB4bzvHOI=; b=xNF184xFjbXWPQCWFYZpVtBqhPIVyW1NqcA4hK6UJBuxDePtn2KX8dZ+v6aEhbj/wG VzCBYTbxlNS9yDd+bditQeQKrfSypFWJN4QRmLExYf/+MSlUU1/qJuoOQbunDYO+BW19 rV9+0HvHgNL1dBTybYI8rJkiJbM7vsc6yHNQwbGLqLHM9nxA6puO98OoHqetDMJNMKzI KA88WB1jCvNzamv0eg3yP/AuyU/jWa1d0oWlZBqvdjhuVO7Z71SgRrys5nTSWNilzmHf uIcO1qrXKzYNzvy5Bn70VaonDVQm2aTK5bCSZ1E8gpPwmTxlMXcgqZKmB1PT0pDwPjhm B7DA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=bFxfjoLQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id g4-20020a636b04000000b00588fdd9504bsi9090788pgc.858.2023.10.16.07.00.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 07:00:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=bFxfjoLQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 8DCA080C5FA4; Mon, 16 Oct 2023 07:00:31 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233445AbjJPOA0 (ORCPT + 99 others); Mon, 16 Oct 2023 10:00:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34602 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232300AbjJPOAY (ORCPT ); Mon, 16 Oct 2023 10:00:24 -0400 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3FEA09B for ; Mon, 16 Oct 2023 07:00:22 -0700 (PDT) Received: by mail-wr1-x42f.google.com with SMTP id ffacd0b85a97d-32db188e254so894848f8f.0 for ; Mon, 16 Oct 2023 07:00:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1697464820; x=1698069620; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=eDGQeCo8V2sN0t+O9GNO1jubu9jMmcOIokTbMoAoZDo=; b=bFxfjoLQ6UsRQvRt9P3Dr4gZetJrSZ76FEmQeKchxGQm3IMj/orNo7ByMLj+BrNOeR vFju/3KKhe2qkbWLHkQqwiuCHv3tG0kcEL4n8uYdmuij8oetlCLNbOxGcVnqaNiu5QmO hPLmyYSBVGotSKErNyCgJPOXPh7qwuagmEz9/ZvyT4rNcDpYxyF7zKEgkUTue2872zZQ YTTepw0DpttM1KtnMQA/tOnfOCFw/lTPfIya3+PNcPQxSFZDMKOJsO8+m0aGKYdPsHZi faNnBK6u8M8qXtHcpMhTnP7qZvNuGkGC6n2dk1JzOYK3VoNhMb8o5qvQn3dl+SbW6VtK yuBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697464820; x=1698069620; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=eDGQeCo8V2sN0t+O9GNO1jubu9jMmcOIokTbMoAoZDo=; b=jmU2pmGjz6LIlRx63XlVwK0P+snOYe35KdC9y+gorFpawvSBYCy/Q/u35Y7cXLbeGS pJB6C1SimP/85L/IzdgcaUGfwDj+F38RkNwTTS9A4+Hxy824p5szeJKss3ZdbzwjUA/X P9RwIWsUoXHWSFX39GPBhiGV3BM1BagiCpnlQ9BWCq9tu6lne+th+a5safMbcZQ7Ddn2 XeqxNSYgrRoZ3z+0Bht10wE0z79VIkrl8ZV9OUrpQNxJPLFdb5a/uxHa20sgx5IPPk/W vnXQkzaQpeM2qs7rbaYoxrKdrh53NeJ/w/c9DmxBFAljoJ/lCmdGxMNGecfYl3e1TCSV wOfw== X-Gm-Message-State: AOJu0Yy/dNRPJGgGTBsc6wRyQWqDUFKS5zHB6n9JbSAPIgDivLQnkf6j sVHhW5tf9swTXbwSMkQUTreZ0A== X-Received: by 2002:a5d:5f04:0:b0:32d:8872:aacb with SMTP id cl4-20020a5d5f04000000b0032d8872aacbmr13881089wrb.53.1697464820459; Mon, 16 Oct 2023 07:00:20 -0700 (PDT) Received: from aspen.lan (aztw-34-b2-v4wan-166919-cust780.vm26.cable.virginm.net. [82.37.195.13]) by smtp.gmail.com with ESMTPSA id bw14-20020a0560001f8e00b0032d886039easm13152255wrb.14.2023.10.16.07.00.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 07:00:20 -0700 (PDT) Date: Mon, 16 Oct 2023 15:00:18 +0100 From: Daniel Thompson To: Petr Mladek Cc: Doug Anderson , Jason Wessel , kgdb-bugreport@lists.sourceforge.net, linux-kernel@vger.kernel.org Subject: Re: [PATCH] kgdb: Flush console before entering kgdb on panic Message-ID: <20231016140018.GA35190@aspen.lan> References: <20230822131945.1.I5b460ae8f954e4c4f628a373d6e74713c06dd26f@changeid> <20230825100854.GC270740@aspen.lan> <20230830095359.GA44243@aspen.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Mon, 16 Oct 2023 07:00:31 -0700 (PDT) On Wed, Sep 20, 2023 at 02:01:30PM +0200, Petr Mladek wrote: > On Wed 2023-08-30 10:53:59, Daniel Thompson wrote: > > On Fri, Aug 25, 2023 at 07:18:44AM -0700, Doug Anderson wrote: > > > Hi, > > > > > > On Fri, Aug 25, 2023 at 3:09 AM Daniel Thompson > > > wrote: > > > > > > > > On Tue, Aug 22, 2023 at 01:19:46PM -0700, Douglas Anderson wrote: > > > > > When entering kdb/kgdb on a kernel panic, it was be observed that the > > > > > console isn't flushed before the `kdb` prompt came up. Specifically, > > > > > when using the buddy lockup detector on arm64 and running: > > > > > echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT > > > > > > > > > > I could see: > > > > > [ 26.161099] lkdtm: Performing direct entry HARDLOCKUP > > > > > [ 32.499881] watchdog: Watchdog detected hard LOCKUP on cpu 6 > > > > > [ 32.552865] Sending NMI from CPU 5 to CPUs 6: > > > > > [ 32.557359] NMI backtrace for cpu 6 > > > > > ... [backtrace for cpu 6] ... > > > > > [ 32.558353] NMI backtrace for cpu 5 > > > > > ... [backtrace for cpu 5] ... > > > > > [ 32.867471] Sending NMI from CPU 5 to CPUs 0-4,7: > > > > > [ 32.872321] NMI backtrace forP cpuANC: Hard LOCKUP > > > > > > > > > > Entering kdb (current=..., pid 0) on processor 5 due to Keyboard Entry > > > > > [5]kdb> > > > > > > > > > > As you can see, backtraces for the other CPUs start printing and get > > > > > interleaved with the kdb PANIC print. > > > > > > > > > > Let's replicate the commands to flush the console in the kdb panic > > > > > entry point to avoid this. > > > > > > > > > > Signed-off-by: Douglas Anderson > > > > > --- > > > > > > > > > > kernel/debug/debug_core.c | 3 +++ > > > > > 1 file changed, 3 insertions(+) > > > > > > > > > > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c > > > > > index d5e9ccde3ab8..3a904d8697c8 100644 > > > > > --- a/kernel/debug/debug_core.c > > > > > +++ b/kernel/debug/debug_core.c > > > > > @@ -1006,6 +1006,9 @@ void kgdb_panic(const char *msg) > > > > > if (panic_timeout) > > > > > return; > > > > > > > > > > + debug_locks_off(); > > > > > + console_flush_on_panic(CONSOLE_FLUSH_PENDING); > > > > > + > > > > > if (dbg_kdb_mode) > > > > > kdb_printf("PANIC: %s\n", msg); > > > > > > > > I'm somewhat included to say *this* (calling kdb_printf() when not > > > > actually in the debugger) is the cause of the problem. kdb_printf() > > > > does some pretty horid things to the console and isn't intended to > > > > run while the system is active. > > > > > > > > I'd therefore be more tempted to defer the print to the b.p. trap > > > > handler itself and make this part of kgdb_panic() look more like: > > > > > > > > kgdb_panic_msg = msg; > > > > kgdb_breakpoint(); > > > > kgdb_panic_msg = NULL; > > > > > > Unfortunately I think that only solves half the problem. As a quick > > > test, I tried simply commenting out the "kdb_printf" line in > > > kgdb_panic(). While that avoids the interleaved panic message and > > > backtrace, it does nothing to actually get the backtraces printed out > > > before you end up in kdb. As an example, this is what happened when I > > > used `echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT` and > > > had the "kdb_printf" in kgdb_panic() commented out: > > > > > > [ 72.658424] lkdtm: Performing direct entry HARDLOCKUP > > > [ 82.181857] watchdog: Watchdog detected hard LOCKUP on cpu 6 > > > ... > > > [ 82.234801] Sending NMI from CPU 5 to CPUs 6: > > > [ 82.239296] NMI backtrace for cpu 6 > > > ... [ stack trace for CPU 6 ] ... > > > [ 82.240294] NMI backtrace for cpu 5 > > > ... [ stack trace for CPU 5 ] ... > > > [ 82.576443] Sending NMI from CPU 5 to CPUs 0-4,7: > > > [ 82.581291] NMI backtrace > > > Entering kdb (current=0xffffff80da5a1080, pid 6978) on processor 5 due > > > to Keyboard Entry > > > [5]kdb> > > > > > > As you can see, I don't see the traces for CPUs 0-4 and 7. Those do > > > show up if I use the "dmesg" command but it's a bit of a hassle to run > > > "dmesg" to look for any extra debug messages every time I drop in kdb. > > > > > > I guess perhaps that part isn't obvious from the commit message? > > > > I figured it was a risk. > > > > In fact it's an area where my instinct to honour console messages and my > > instinct to get into the kernel as soon as possible after the decision > > to invoke it has been made come into conflict. > > > > In other words does it matter that the console buffers are not flushed > > before entering kgdb? However having thought about it for a little while > > (and knowing the console code tends to be written to be decently robust) > > I can come to the view the flushing is best. > > Just for record. I wondered if there was bug in printk() that the > backtraces were not flushed. And I think that it is because they > are printed in NMI context where the conosle handling is deferred. > > From this POV, the explicit flush makes perfect sense. Agreed... and (with apologies for the length of time to get this done): Applied! Thanks Daniel.