Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754825AbbEUBjS (ORCPT ); Wed, 20 May 2015 21:39:18 -0400 Received: from szxga02-in.huawei.com ([119.145.14.65]:10411 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753150AbbEUBjQ (ORCPT ); Wed, 20 May 2015 21:39:16 -0400 Message-ID: <555D370F.2070501@huawei.com> Date: Thu, 21 May 2015 09:38:23 +0800 From: "long.wanglong" User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:16.0) Gecko/20121010 Thunderbird/16.0.1 MIME-Version: 1.0 To: CC: Petr Mladek , , , , , , , , , , Subject: Re: [PATCH v2 00/17] [request for stable 3.10 inclusion] x86/nmi: Print all cpu stacks from NMI safely References: <1432026542-123571-1-git-send-email-long.wanglong@huawei.com> <20150519124754.GA12395@pathway.suse.cz> <20150520132259.GE2728@pathway.suse.cz> In-Reply-To: <20150520132259.GE2728@pathway.suse.cz> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.111.88.174] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3429 Lines: 101 On 2015/5/20 21:22, Petr Mladek wrote: > On Tue 2015-05-19 14:57:46, Petr Mladek wrote: >> On Tue 2015-05-19 09:08:45, Wang Long wrote: >>> This is my backport patch series to Fix the problem(backport to 3.10): >>> " >>> When trigger_all_cpu_backtrace() is called on x86, it will trigger an >>> NMI on each CPU and call show_regs(). But this can lead to a hard lock >>> up if the NMI comes in on another printk(). >>> " >>> The solution is described in commit "a9edc88093287183ac934be44f295f183b2c62dd": >>> when the NMI triggers, it switches the printk routine for that CPU to call >>> a NMI safe printk function that records the printk in a per_cpu seq_buf >>> descriptor. After all NMIs have finished recording its data, the trace_ >>> seqs are printed in a safe context. >>> >>> The solution use "switch printk routine" and "seq_buf" infrastructures, but the >>> 3.10 stable have no both of them. >>> >>> The patch 1-13 backport the "seq_buf" infrastructures. in detail, patch 1, 2 >>> and 6 only backport "seq_buf" related code. >>> >>> The patch 14-15 backport the "switch printk routine". >>> >>> The patch 16-17 is the patch to print all cpu stacks from NMI safely >>> >>> as discussed in https://lkml.org/lkml/2015/5/13/497, in 3.10 stable, this is >>> the only way to solve the problem and the backport code is a bit more. >>> >>> v1 -> v2: >>> * fix the indent error. >>> * rebase on 3.10.79 >>> >>> Any thoughts? >> >> Please, wait with the integration. I am testing it with a storm of >> sysrq requests: >> >> $> while true ; do echo l >/proc/sysrq-trigger ; done >> >> with iptables enabled: >> >> $> iptables -A INPUT -j LOG --log-prefix "incomming packet:" >> >> and storm of pings from other machine: >> >> $> ping -f >> >> >> The machine somehow freezes. It does not make sense. I am trying to investigate. > > OK, it seems that the machine freezes because there are still few > messages printed in the NMI context, e.g.: > > [ 3080.286277] Uhhuh. NMI received for unknown reason 3d on CPU 12. > [ 3637.939276] Uhhuh. NMI received for unknown reason 2d on CPU 13. > > I am not exactly sure why I get them on the test machine. But I get > such messages from time to time when hammering it by the pings and > sysrq-l requests. > > I modified vprintk_emit() to do raw_spin_trylock(&logbuf_lock) > and do not try to lock console in NMI context. The trylock fails > from time to time but it does not longer freeze. > > I am going to clean up the vprintk_emit() modification and send it for > review. > > Anyway, this patch set seems to work as expected. It heavily reduces > the risk of NMI/printk-related deadlocks => it is worth having. > > Feel free to use the following for the whole patchset (backport): > > Reviewed-by: Petr Mladek > Tested-by: Petr Mladek Hi Greg, This patch set is the only way to solve the NMI/printk-related deadlock problems. Could you please include them to 3.10 stable? Although the code a bit more, most of the code is "seq_buf" infrastructures and it does not affect other parts of the kernel. Best Regards Wang Long > > > Best Regards, > Petr > > . > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/