Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932646AbeAJFe4 (ORCPT + 1 other); Wed, 10 Jan 2018 00:34:56 -0500 Received: from mail-pg0-f68.google.com ([74.125.83.68]:42972 "EHLO mail-pg0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932093AbeAJFez (ORCPT ); Wed, 10 Jan 2018 00:34:55 -0500 X-Google-Smtp-Source: ACJfBoseoLlmJCUVgxXneP4/0Qy8w00frg1e8WlbOEV/VBVRCVh8k0E6urg7BjonYpZyFCTGw/hFhw== Date: Wed, 10 Jan 2018 14:34:50 +0900 From: Sergey Senozhatsky To: Tejun Heo Cc: "Paul E. McKenney" , linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: RCU: Call touch_nmi_watchdog() while printing stall warnings Message-ID: <20180110053450.GA1063@jagdpanzerIV> References: <20180109184709.GN3668920@devbig577.frc2.facebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180109184709.GN3668920@devbig577.frc2.facebook.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Hello, On (01/09/18 10:47), Tejun Heo wrote: > When RCU stall warning triggers, it can print out a lot of messages > while holding spinlocks. If the console device is slow (e.g. an > actual or IPMI serial console), it may end up triggering NMI hard > lockup watchdog like the following. > > *** CPU printking while holding RCU spinlock > > PID: 4149739 TASK: ffff881a46baa880 CPU: 13 COMMAND: "CPUThreadPool8" > #0 [ffff881fff945e48] crash_nmi_callback at ffffffff8103f7d0 > #1 [ffff881fff945e58] nmi_handle at ffffffff81020653 > #2 [ffff881fff945eb0] default_do_nmi at ffffffff81020c36 > #3 [ffff881fff945ed0] do_nmi at ffffffff81020d32 > #4 [ffff881fff945ef0] end_repeat_nmi at ffffffff81956a7e > [exception RIP: io_serial_in+21] > RIP: ffffffff81630e55 RSP: ffff881fff943b88 RFLAGS: 00000002 > RAX: 000000000000ca00 RBX: ffffffff8230e188 RCX: 0000000000000000 > RDX: 00000000000002fd RSI: 0000000000000005 RDI: ffffffff8230e188 > RBP: ffff881fff943bb0 R8: 0000000000000000 R9: ffffffff820cb3c4 > R10: 0000000000000019 R11: 0000000000002000 R12: 00000000000026e1 > R13: 0000000000000020 R14: ffffffff820cd398 R15: 0000000000000035 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 > --- --- > #5 [ffff881fff943b88] io_serial_in at ffffffff81630e55 > #6 [ffff881fff943b90] wait_for_xmitr at ffffffff8163175c > #7 [ffff881fff943bb8] serial8250_console_putchar at ffffffff816317dc > #8 [ffff881fff943bd8] uart_console_write at ffffffff8162ac00 > #9 [ffff881fff943c08] serial8250_console_write at ffffffff81634691 > #10 [ffff881fff943c80] univ8250_console_write at ffffffff8162f7c2 > #11 [ffff881fff943c90] console_unlock at ffffffff810dfc55 > #12 [ffff881fff943cf0] vprintk_emit at ffffffff810dffb5 > #13 [ffff881fff943d50] vprintk_default at ffffffff810e01bf > #14 [ffff881fff943d60] vprintk_func at ffffffff810e1127 > #15 [ffff881fff943d70] printk at ffffffff8119a8a4 > #16 [ffff881fff943dd0] print_cpu_stall_info at ffffffff810eb78c > #17 [ffff881fff943e88] rcu_check_callbacks at ffffffff810ef133 > #18 [ffff881fff943ee8] update_process_times at ffffffff810f3497 > #19 [ffff881fff943f10] tick_sched_timer at ffffffff81103037 > #20 [ffff881fff943f38] __hrtimer_run_queues at ffffffff810f3f38 > #21 [ffff881fff943f88] hrtimer_interrupt at ffffffff810f442b Tejun, while we are at it, shall we also touch NMI in show_workqueue_state()? I suppose that show_pwq() can be quite verbose on some systems, can't it? for instance: sysrq-t [irq] -> show_workqueue_state() --- diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 9960f8842b31..649c00a9af12 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -48,6 +48,7 @@ #include #include #include +#include #include "workqueue_internal.h" @@ -4473,6 +4474,8 @@ void show_workqueue_state(void) if (pwq->nr_active || !list_empty(&pwq->delayed_works)) show_pwq(pwq); spin_unlock_irqrestore(&pwq->pool->lock, flags); + + touch_nmi_watchdog(); } }