Message-ID: <55548503.2050406@huawei.com>
Date: Thu, 14 May 2015 19:20:35 +0800
From: "long.wanglong" <long.wanglong@huawei.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:16.0) Gecko/20121010 Thunderbird/16.0.1
MIME-Version: 1.0
To: Jiri Kosina <jkosina@suse.cz>
CC: =?GB2312?B?zfXB+g==?= <wanglong@laoqinren.net>,
        rostedt <rostedt@goodmis.org>, paulmck <paulmck@linux.vnet.ibm.com>,
        pmladek <pmladek@suse.cz>, dzickus <dzickus@redhat.com>,
        johannes <johannes@sipsolutions.net>, koct9i <koct9i@gmail.com>,
        tglx <tglx@linutronix.de>, mingo <mingo@redhat.com>,
        hpa <hpa@zytor.com>, x86 <x86@kernel.org>,
        atomlin <atomlin@redhat.com>, akpm <akpm@linux-foundation.org>,
        "sasha.levin" <sasha.levin@oracle.com>,
        linux-kernel <linux-kernel@vger.kernel.org>,
        peifeiyue <peifeiyue@huawei.com>,
        "morgan.wang" <morgan.wang@huawei.com>
Subject: Re: [RFC] how to perform a safe NMI stack trace on all CPUs on x86?
References: <tencent_2DEC6ECC6194905D15D8E6D5@qq.com> <alpine.LNX.2.00.1505131621200.8186@pobox.suse.cz>
In-Reply-To: <alpine.LNX.2.00.1505131621200.8186@pobox.suse.cz>
Content-Type: text/plain; charset="GB2312"
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2105
Lines: 54

On 2015/5/13 22:26, Jiri Kosina wrote:
> On Wed, 13 May 2015, ???? wrote:
> 
>> Hi all,
>>
>> In kernel before 3.19, when trigger_all_cpu_backtrace() is called on x86, 
>> it will trigger an NMI on each CPU and call show_regs(). But this can lead
>> to a hard lock up if the NMI comes in on another printk().
>>
>> The commit a9edc88093287183ac934be44f295f183b2c62dd (x86/nmi: Perform a safe 
>> NMI stack trace on all CPUs) fix this problem on kernel mainline. when the NMI 
>> triggers, it switches the printk routine for that CPU to call a NMI safe printk 
>> function that records the printk in a per_cpu seq_buf descriptor. After all 
>> NMIs have finished recording its data, the seq_bufs are printed in a safe 
>> context. But how do we fix this problem in older version of kernel(eg, 3.10 stable)? 
>> The 3.10 stable has no "switch printk routine" and "seq_buf" infrastructures.
>>
>> Could anyone give me some ideas?
> 
> Either you backport seq_buf-based aproach to the older kernel, or, if you 
> are working on 3.4 kernel or earlier (basically any kernel preceeding the 
> printk() revamp that happened in 7ff9554bb57 and after), you can use 
> slightly simpler aproach.
> 
> It's an aproach we used initially when finding out the issue for the first 
> time, and it is proven to work as well (but it's not applicable after Kay 
> added all the complexity to printk()).
> 
> You can see it in our SLE11 kernel tree, available on
> 	
> 	http://kernel.suse.com/cgit/kernel/commit/?h=SLE11-SP4&id=8d62ae68ff61d77ae3c4899f05dbd9c9742b14c9
> 
> for example.
> 
> It's up to you to judget which is the least painful way :)
> 

Hi Jiri Kosina,

For 3.10 stable, the only way to solve this problem is backport seq_buf-based aproach.

I will backport necessary patches to 3.10 stable. Welcome you to review my backport patches.

Best Regards
Wang Long


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/