Received: by 2002:ab2:6857:0:b0:1ef:ffd0:ce49 with SMTP id l23csp1088400lqp; Fri, 22 Mar 2024 05:29:46 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXPksI8DoJGdNnbK0RfSi8RPSiW/ZkTLzELLe6jIVCLPuvzKvlZLgkJFiAa4NtUDZezbH7tR0uPdOi4VGMIC+KgTuBc39PbGYfahaWQoQ== X-Google-Smtp-Source: AGHT+IHe0wYHAN8N1QNwjgi/QCPXAtBIQHNA9iF1P03yJ4QG3n+JGKFkVb9N7/vIockJseHIPqqs X-Received: by 2002:a05:6870:d188:b0:229:f7fc:54aa with SMTP id a8-20020a056870d18800b00229f7fc54aamr393854oac.5.1711110586450; Fri, 22 Mar 2024 05:29:46 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711110586; cv=pass; d=google.com; s=arc-20160816; b=yfAwrS7ErZhMDwSXt6A57rUb6po1vVDbtnkL8vo6qCr4YOAeFhyJkY3hCsT70+rqAB 4OjHtixiApYWfBtcYOKvODS78GZtLAWQNEoCG3lIniGxf6vL4etm3QwTyFZWTyro/jhN GEz4d49FBAbEiZddBWi1kD6A1vWrSY4lLqwD8oLNVH9DWLBaVEtJXbmepGXLoi4XgVOC AHrMjC6w2xQljyH9BNdLLnb3EvhDCTy2NXwr9Ga2WdbcyWMuiZA4CB1hoa8y0P66zeyq mHIe6xwqwqLF4pJZNeJRs2GKWUPOdHsmYjBfVuOV2oy+1ior9kbLIQQqM9Lt68pWm/PU mgZg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:subject:cc:to:from:dkim-signature :dkim-signature:date; bh=kw6jKfgYseBviEK7ThWkCBO+I75JNZXgwEbS3NadGUA=; fh=+2TKxjry7GSyKzhljEDpnzD39O1td2QW5ucqkepsHPc=; b=cPMkfZwMv2WWyGtc/VXicVi6HSqUzGX8q3hATPShsmXXf+vD6mi2yLfymsrGcfi+3P XaKQecdH6LkQlUsS+YrrU0WSN7a14gttlVAQ5y/zM1DTa0hZcjN6AyUoJ+JueQQxGP+L 2mdVuIlCWpI4DGN6swaOdOZ0xgGHOS1+sTZXBxZcyuwqfk8Si5tZwhPO9KPLil/6sSc0 OHjihHVioQaG6cDa+Ji9P2v6kM4H8DgVAwSrhP3EJl7cI+rkRh/prg0CnQsbx4npT9Bv QiJjqv8b97H1Ptxcx9ueniAItgqNyRkDhj3gmtekLEZkwsuVXpw0du8sTCVXNi77sc3w D/IQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=XX34NDzS; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; arc=pass (i=1 spf=pass spfdomain=linutronix.de dkim=pass dkdomain=linutronix.de dmarc=pass fromdomain=linutronix.de); spf=pass (google.com: domain of linux-kernel+bounces-111419-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-111419-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id a7-20020a056a001d0700b006e724ccdc4asi1956484pfx.178.2024.03.22.05.29.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Mar 2024 05:29:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-111419-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=XX34NDzS; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; arc=pass (i=1 spf=pass spfdomain=linutronix.de dkim=pass dkdomain=linutronix.de dmarc=pass fromdomain=linutronix.de); spf=pass (google.com: domain of linux-kernel+bounces-111419-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-111419-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 1EA0A280F00 for ; Fri, 22 Mar 2024 12:29:46 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 88A4341744; Fri, 22 Mar 2024 12:29:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="XX34NDzS"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="kJorp9nC" Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C4F05405DF; Fri, 22 Mar 2024 12:29:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711110576; cv=none; b=h2grvNn4so4h2cAaP9hOPuDzTaC8iKdN18Tpn2m30Kt1uEIoADcYeIu1cymDZBS3TkmQX7vgL4LMGfj8x5DeVAiSUPv9En8jiW7m54DqsqSTR8TujV1vN/SQhGLGYRg6+X+nzs+YqUHd3zV+hx6SFEe9k8cSXDfmQIlrCeC4Gig= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711110576; c=relaxed/simple; bh=foIK+9f73HNeNks9IMM4f7cYpYcW1Ipegzfx9UD+DpU=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=s70ebONEOV8NW9WX9SP0CylbespbuHImTn5yRtVQAA6vdLCQ/25kt4z44c3oQk5wKuvtog3rWGaHPnyZy7nv3O+4+1K/Z3k32MN5skEx6DOCt4cN9O1D1xXmVPsykl2S/OnkfolpPki7/GO5KGmKf6q838j6I42lbPKjDOItgw0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=XX34NDzS; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=kJorp9nC; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Date: Fri, 22 Mar 2024 13:29:21 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1711110572; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kw6jKfgYseBviEK7ThWkCBO+I75JNZXgwEbS3NadGUA=; b=XX34NDzSpJY5/auKw8m3bAZVQPXLyurLUHyus62eyaS2WihtYaCEGn2x5v6cx8IYwIYl0Z FdhOyNgmsV5KROmqopCVIlwSLC8efd9Ep0K7NBcTIbFQqryrbvB1wBt6Cb4RABIj/XBkU6 t1xM2w63JndgZP/t5oa+W/R3Lfox9lneCo2EMtdtkpzsNXL0y88Ud0OeWMjMc/wuS++vje l/SUfNj8v1kGnH/J8yj+YR3w7gH8qWIgCcL51AnDBo14Auz1X9h7Hae9gWkD+ZCy+4TyH+ PigUl7xDiMLCOtH+s4qUxBXGQ3A/OJ3GYMgzR8cDbavecI1+KX1GxJFJoiulZA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1711110572; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kw6jKfgYseBviEK7ThWkCBO+I75JNZXgwEbS3NadGUA=; b=kJorp9nC1dajl14Wt4UPgHrt5NzDwscK2Snh1wYkxF4PxWwuKAwcAsfgO1NtdUzoOJSqGR CsN45VmUAYGfniAw== From: Sebastian Andrzej Siewior To: "John B. Wyatt IV" Cc: John Ogness , Petr Mladek , Clark Williams , Juri Lelli , Derek Barbosa , Bruno Goncalves , "John B. Wyatt IV" , LKML , linux-rt-users Subject: Re: NMIs reported by console_blast.sh with 6.6.20-rt25 Message-ID: <20240322122921.U3WRsO4X@linutronix.de> References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: On 2024-03-15 15:21:19 [-0400], John B. Wyatt IV wrote: > Hello John, Hi John B., > The real-time team at Red Hat is discussing backporting the rt patchset i= n 6.6 > to RHEL 9/Stream 9. I decided to test v6.6.20-rt25 from stable-rt with > console_blast.sh. It reported similar NMIs from my testing of 6.7.0-rt6 w= ith > that high cpu count server over uart 8250; which is expected since the pa= tchset > is similar. >=20 > One interesting thing is that 6.7.0-rt6 fully preemptive + realtime tuned= profile > did not return any NMIs while 6.6 did with that same configuration. The thing is that console_blast.sh does this "show a backtrace on all CPUs, please" which triggers NMIs on all CPUs for backtrace. I can't imagine how you did obtain the backtraces without an NMI. Unless the tuned profile disables this somehow. > Another aspect I noticed during my testing. I did not set grub to > start with the realtime profile at boot for this machine. When I did set > it the second (and latter) NMI did not show for fully preemptive (the > 3rd set at the bottom of this email).=20 >=20 > Caller info was enabled. No modifications to the source code were made. >=20 > I have not tested previous versions before 6.7.0-rt6 or 6.6.20-rt25; > with the exception of accidently testing 6.6.10-rt19. 6.6.10 also > reported NMIs during this test. If you wish to see these reports please > let me know. This NMI part has nothing todo with printk. If you need this clarified, I would need a reproducer. =E2=80=A6 > ----------------------------- > NMI Backtrace for 6.6.20-rt25 no forced preemption with tuned realtime pr= ofile > ----------------------------- >=20 > [ T2614] Kernel panic - not syncing: sysrq triggered crash > [ C56] NMI backtrace for cpu 56 > [ C56] Hardware name: Intel Corporation D50DNP1SBB/D50DNP1SBB, BIOS SE5= C7411.86B.9409.D04.2212261349 12/26/2022 > [ C56] RIP: 0010:io_serial_out (arch/x86/kernel/early_printk.c:105)=20 > [ C56] Code: 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f = 1f 44 00 00 0f b6 8f c1 00 00 00 89 d0 0f b7 57 08 d3 e6 01 f2 ee cc c= c cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 > All code > =3D=3D=3D=3D=3D=3D=3D =E2=80=A6 > 12: 90 nop > 13: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > 18: 0f b6 8f c1 00 00 00 movzbl 0xc1(%rdi),%ecx > 1f: 89 d0 mov %edx,%eax > 21: 0f b7 57 08 movzwl 0x8(%rdi),%edx > 25: d3 e6 shl %cl,%esi > 27: 01 f2 add %esi,%edx > 29: ee out %al,(%dx) > 2a:* c3 ret <-- trapping instruction where is this output from? The `ret' opcode usually does not cause a trap. My guess is that the machine has been interrupted by an external user at this position. Side note: This is using early_printk, correct? =E2=80=A6 > [ C56] Call Trace: > [ C56] > [ C56] ? nmi_cpu_backtrace (lib/nmi_backtrace.c:115)=20 > [ C56] ? nmi_cpu_backtrace_handler (arch/x86/kernel/apic/hw_nmi.c:47 (dis= criminator 1))=20 > [ C56] ? nmi_handle (arch/x86/kernel/nmi.c:149)=20 > [ C56] ? io_serial_out (arch/x86/kernel/early_printk.c:105)=20 > [ C56] ? default_do_nmi (arch/x86/kernel/nmi.c:347)=20 > [ C56] ? exc_nmi (arch/x86/kernel/nmi.c:538)=20 > [ C56] ? end_repeat_nmi (arch/x86/entry/entry_64.S:1458)=20 > [ C56] ? io_serial_out (arch/x86/kernel/early_printk.c:105)=20 > [ C56] ? io_serial_out (arch/x86/kernel/early_printk.c:105)=20 > [ C56] ? io_serial_out (arch/x86/kernel/early_printk.c:105)=20 > [ C56] This looks okay. The NMI did the backtrace as expected. > [ C56] > [ C56] serial8250_console_putchar (./include/linux/serial_core.h:704 driv= ers/tty/serial/8250/8250_port.c:3347)=20 > [ C56] ? __pfx_serial8250_console_putchar (drivers/tty/serial/8250/8250_p= ort.c:3343)=20 > [ C56] uart_console_write (drivers/tty/serial/serial_core.c:2134)=20 > [ C56] serial8250_console_write_atomic (drivers/tty/serial/8250/8250_port= =2Ec:3628)=20 > [ C56] nbcon_emit_next_record (kernel/printk/nbcon.c:940)=20 > [ C56] __nbcon_atomic_flush_all (kernel/printk/nbcon.c:1192 (discriminato= r 1) kernel/printk/nbcon.c:1326 (discriminator 1))=20 > [ C56] vprintk_emit (kernel/printk/printk.c:2414)=20 > [ C56] _printk (kernel/printk/printk.c:2474)=20 > [ C56] panic (./arch/x86/include/asm/bitops.h:207 ./arch/x86/include/asm/= bitops.h:239 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 ker= nel/panic.c:528 kernel/panic.c:339)=20 > [ C56] ? _printk (kernel/printk/printk.c:2474)=20 > [ C56] sysrq_handle_crash (drivers/tty/sysrq.c:154)=20 > [ C56] __handle_sysrq (drivers/tty/sysrq.c:601)=20 > [ C56] write_sysrq_trigger (drivers/tty/sysrq.c:1165)=20 > [ C56] proc_reg_write (fs/proc/inode.c:340 fs/proc/inode.c:352)=20 > [ C56] ? preempt_count_add (./include/linux/ftrace.h:974 (discriminator 1= ) kernel/sched/core.c:5847 (discriminator 1) kernel/sched/core.c:5844 (disc= riminator 1) kernel/sched/core.c:5872 (discriminator 1))=20 > [ C56] vfs_write (fs/read_write.c:582)=20 > [ C56] ksys_write (fs/read_write.c:637)=20 > [ C56] do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:= 81)=20 > [ C56] ? do_dup2 (fs/file.c:1142)=20 > [ C56] ? syscall_exit_to_user_mode (kernel/entry/common.c:299)=20 > [ C56] ? do_syscall_64 (arch/x86/entry/common.c:88)=20 > [ C56] ? exc_page_fault (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/= include/asm/irqflags.h:72 arch/x86/mm/fault.c:1513 arch/x86/mm/fault.c:1561= )=20 > [ C56] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)=20 According to this, someone issued a `crash' via sysrq. Why? =E2=80=A6 > [ T2614] ---[ end Kernel panic - not syncing: sysrq triggered crash ]--- yes exactly. =E2=80=A6 > NMI Backtrace for 6.6.20-rt25 no forced preemption with tuned throughput-= performance profile > ----------------------------- This and the following backtrace shows the same picture: The CPU is crashing due to proc/sysrq request and does CPU-backtraces via NMI and polls in early_printk, waiting for the UART to become idle (probably). I don't see an issue here so far. =E2=80=A6 > --=20 > Sincerly, > John Wyatt > Software Engineer, Core Kernel Sebastian