2013-10-15 16:40:34

by Steven Rostedt

[permalink] [raw]
Subject: Re: Panic and page fault in loop during handling NMI backtrace handler


BTW, please do not send out HTML email, as that gets blocked from going
to LKML.

On Tue, 15 Oct 2013 02:01:04 +0000
"Liu, Chuansheng" <[email protected]> wrote:

> We meet one issue that during trigger all CPU backtrace, but during in the NMI handler arch_trigger_all_cpu_backtrace_handler,
> It hit the PAGE fault, then PAGE fault is in loop, at last the thread stack overflow, and system panic.
>
> Anyone can give some help? Thanks.
>
>
> Panic log as below:
> ===============
> [ 15.069144] BUG: unable to handle kernel [ 15.073635] paging request at 1649736d
> [ 15.076379] IP: [<c200402a>] print_context_stack+0x4a/0xa0
> [ 15.082529] *pde = 00000000
> [ 15.085758] Thread overran stack, or stack corrupted
> [ 15.091303] Oops: 0000 [#1] SMP
> [ 15.094932] Modules linked in: atomisp_css2400b0_v2(+) lm3554 ov2722 imx1x5 atmel_mxt_ts vxd392 videobuf_vmalloc videobuf_core bcm_bt_lpm bcm43241 kct_daemon(O)
> [ 15.111093] CPU: 2 PID: 2443 Comm: Compiler Tainted: G W O 3.10.1+ #1

I'm curious, what "Out-of-tree" module was loaded?

Read the rest from the bottom up, as that's how I wrote it :-)


> [ 15.119075] task: f213f980 ti: f0c42000 task.ti: f0c42000
> [ 15.125116] EIP: 0060:[<c200402a>] EFLAGS: 00210087 CPU: 2
> [ 15.131255] EIP is at print_context_stack+0x4a/0xa0
> [ 15.136712] EAX: 16497ffc EBX: 1649736d ECX: 986736d8 EDX: 1649736d
> [ 15.143722] ESI: 00000000 EDI: ffffe000 EBP: f0c4220c ESP: f0c421ec
> [ 15.150732] DS: 007b ES: 007b FS: 00d8 GS: 003b SS: 0068
> [ 15.156771] CR0: 80050033 CR2: 1649736d CR3: 31245000 CR4: 001007d0
> [ 15.163781] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 15.170789] DR6: ffff0ff0 DR7: 00000400
> [ 15.175076] Stack:
> [ 15.177324] 16497ffc 16496000 986736d8 ffffe000 986736d8 1649736d c282c148 16496000
> [ 15.186067] f0c4223c c20033b0 c282c148 c29ceecf 00000000 f0c4222c 986736d8 f0c4222c
> [ 15.194810] 00000000 c29ceecf 00000000 00000000 f0c42260 c20041a7 f0c4229c c282c148
> [ 15.203549] Call Trace:
> [ 15.206295] [<c20033b0>] dump_trace+0x70/0xf0
> [ 15.211274] [<c20041a7>] show_trace_log_lvl+0x47/0x60
> [ 15.217028] [<c2003482>] show_stack_log_lvl+0x52/0xd0
> [ 15.222782] [<c2004201>] show_stack+0x21/0x50
> [ 15.227762] [<c281b38b>] dump_stack+0x16/0x18
> [ 15.232742] [<c2037cff>] warn_slowpath_common+0x5f/0x80
> [ 15.238693] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> [ 15.244156] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> [ 15.249621] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [ 15.255472] [<c2037d3d>] warn_slowpath_null+0x1d/0x20
> [ 15.261228] [<c282553a>] vmalloc_fault+0x5a/0xcf
> [ 15.266497] [<c282592f>] __do_page_fault+0x2cf/0x4a0
> [ 15.272154] [<c25e13e0>] ? logger_aio_write+0x230/0x230
> [ 15.278106] [<c2039c94>] ? console_unlock+0x314/0x440
> ... //
> [ 16.885364] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [ 16.891217] [<c2825b08>] do_page_fault+0x8/0x10
> [ 16.896387] [<c2823066>] error_code+0x5a/0x60
> [ 16.901367] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [ 16.907219] [<c208d6a0>] ? print_modules+0x20/0x90
> [ 16.912685] [<c2037cfa>] warn_slowpath_common+0x5a/0x80
> [ 16.918634] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> [ 16.924097] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> [ 16.929562] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [ 16.935415] [<c2037d3d>] warn_slowpath_null+0x1d/0x20
> [ 16.941169] [<c282553a>] vmalloc_fault+0x5a/0xcf
> [ 16.946437] [<c282592f>] __do_page_fault+0x2cf/0x4a0
> [ 16.952095] [<c25e13e0>] ? logger_aio_write+0x230/0x230
> [ 16.958046] [<c2039c94>] ? console_unlock+0x314/0x440
> [ 16.963800] [<c2003e62>] ? sys_modify_ldt+0x2/0x160
> [ 16.969362] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [ 16.975215] [<c2825b08>] do_page_fault+0x8/0x10
> [ 16.980386] [<c2823066>] error_code+0x5a/0x60
> [ 16.985366] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [ 16.991215] [<c208d6a0>] ? print_modules+0x20/0x90
> [ 16.996673] [<c2037cfa>] warn_slowpath_common+0x5a/0x80
> [ 17.002622] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> [ 17.008086] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> [ 17.013550] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [ 17.019403] [<c2037d3d>] warn_slowpath_null+0x1d/0x20
> [ 17.025159] [<c282553a>] vmalloc_fault+0x5a/0xcf

Oh look, we are constantly warning about this same fault! There's your
infinite loop.

Note the WARN_ON_ONCE() does the WARN_ON() first and then updates
__warned = true. Thus, if the WARN_ON() itself faults, then we are in
an infinite loop.

> [ 17.030428] [<c282592f>] __do_page_fault+0x2cf/0x4a0
> [ 17.036085] [<c25e13e0>] ? logger_aio_write+0x230/0x230
> [ 17.042037] [<c2039c94>] ? console_unlock+0x314/0x440
> [ 17.047790] [<c2003e62>] ? sys_modify_ldt+0x2/0x160
> [ 17.053352] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [ 17.059205] [<c2825b08>] do_page_fault+0x8/0x10
> [ 17.064375] [<c2823066>] error_code+0x5a/0x60
> [ 17.069354] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [ 17.075204] [<c208d6a0>] ? print_modules+0x20/0x90
> [ 17.080669] [<c2037cfa>] warn_slowpath_common+0x5a/0x80
> [ 17.086619] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> [ 17.092082] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> [ 17.097546] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [ 17.103399] [<c2037d3d>] warn_slowpath_null+0x1d/0x20
> [ 17.109154] [<c282553a>] vmalloc_fault+0x5a/0xcf

Yep, the WARN_ON() triggered in vmalloc_fault(). We shouldn't worry
about warning in_nmi() for vmalloc faults anymore.


> [ 17.114422] [<c282592f>] __do_page_fault+0x2cf/0x4a0
> [ 17.120080] [<c206b93d>] ? update_group_power+0x1fd/0x240
> [ 17.126224] [<c227827b>] ? number.isra.2+0x32b/0x330
> [ 17.131880] [<c20679bc>] ? update_curr+0xac/0x190
> [ 17.137247] [<c227827b>] ? number.isra.2+0x32b/0x330
> [ 17.142905] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [ 17.148755] [<c2825b08>] do_page_fault+0x8/0x10
> [ 17.153926] [<c2823066>] error_code+0x5a/0x60
> [ 17.158905] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [ 17.164760] [<c208d1a9>] ? module_address_lookup+0x29/0xb0
> [ 17.170999] [<c208dddb>] kallsyms_lookup+0x9b/0xb0

Looks like kallsyms_lookup() faulted?

> [ 17.176462] [<c208de1d>] __sprint_symbol+0x2d/0xd0
> [ 17.181926] [<c22790cc>] ? sprintf+0x1c/0x20
> [ 17.186804] [<c208def4>] sprint_symbol+0x14/0x20
> [ 17.192063] [<c208df1e>] __print_symbol+0x1e/0x40
> [ 17.197430] [<c25e00d7>] ? ashmem_shrink+0x77/0xf0
> [ 17.202895] [<c25e13e0>] ? logger_aio_write+0x230/0x230
> [ 17.208845] [<c205bdf5>] ? up+0x25/0x40
> [ 17.213242] [<c2039cb7>] ? console_unlock+0x337/0x440
> [ 17.218998] [<c2818236>] ? printk+0x38/0x3a
> [ 17.223782] [<c20006d0>] __show_regs+0x70/0x190
> [ 17.228954] [<c200353a>] show_regs+0x3a/0x1b0
> [ 17.233931] [<c2818236>] ? printk+0x38/0x3a
> [ 17.238717] [<c2824182>] arch_trigger_all_cpu_backtrace_handler+0x62/0x80
> [ 17.246413] [<c2823919>] nmi_handle.isra.0+0x39/0x60
> [ 17.252071] [<c2823a29>] do_nmi+0xe9/0x3f0

Start here and read upward.

Can you try this patch:

>From 794197cf3f563d36e5ee5b29cbf8e941163f9bc9 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Red Hat)" <[email protected]>
Date: Tue, 15 Oct 2013 12:34:56 -0400
Subject: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault

Since the NMI iretq nesting has been fixed, there's no reason that
an NMI handler can not take a page fault for vmalloc'd code. No locks
are taken in that code path, and the software now handles nested NMIs
when the fault re-enables NMIs on iretq.

Not only that, if the vmalloc_fault() WARN_ON_ONCE() is hit, and that
warn on triggers a vmalloc fault for some reason, then we can go into
an infinite loop (the WARN_ON_ONCE() does the WARN() before updating
the variable to make it happen "once").

Reported-by: "Liu, Chuansheng" <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
arch/x86/mm/fault.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 3aaeffc..78926c6 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -268,8 +268,6 @@ static noinline __kprobes int vmalloc_fault(unsigned long address)
if (!(address >= VMALLOC_START && address < VMALLOC_END))
return -1;

- WARN_ON_ONCE(in_nmi());
-
/*
* Synchronize this task's top level page-table
* with the 'reference' page table.
--
1.8.1.4


2013-10-16 01:54:57

by Liu, Chuansheng

[permalink] [raw]
Subject: RE: Panic and page fault in loop during handling NMI backtrace handler

Hello Steven,

> -----Original Message-----
> From: Steven Rostedt [mailto:[email protected]]
> Sent: Wednesday, October 16, 2013 12:40 AM
> To: Liu, Chuansheng
> Cc: Ingo Molnar ([email protected]); [email protected]; [email protected];
> [email protected]; [email protected]; Peter Zijlstra
> ([email protected]); [email protected]; '[email protected]'
> ([email protected]); Wang, Xiaoming; Li, Zhuangzhi
> Subject: Re: Panic and page fault in loop during handling NMI backtrace handler
>
>
> BTW, please do not send out HTML email, as that gets blocked from going
> to LKML.
Thanks your reminder, I forgot to convert it into txt email.

>
> On Tue, 15 Oct 2013 02:01:04 +0000
> "Liu, Chuansheng" <[email protected]> wrote:
>
> > We meet one issue that during trigger all CPU backtrace, but during in the
> NMI handler arch_trigger_all_cpu_backtrace_handler,
> > It hit the PAGE fault, then PAGE fault is in loop, at last the thread stack
> overflow, and system panic.
> >
> > Anyone can give some help? Thanks.
> >
> >
> > Panic log as below:
> > ===============
> > [ 15.069144] BUG: unable to handle kernel [ 15.073635] paging request
> at 1649736d
> > [ 15.076379] IP: [<c200402a>] print_context_stack+0x4a/0xa0
> > [ 15.082529] *pde = 00000000
> > [ 15.085758] Thread overran stack, or stack corrupted
> > [ 15.091303] Oops: 0000 [#1] SMP
> > [ 15.094932] Modules linked in: atomisp_css2400b0_v2(+) lm3554 ov2722
> imx1x5 atmel_mxt_ts vxd392 videobuf_vmalloc videobuf_core bcm_bt_lpm
> bcm43241 kct_daemon(O)
> > [ 15.111093] CPU: 2 PID: 2443 Comm: Compiler Tainted: G W O
> 3.10.1+ #1
>
> I'm curious, what "Out-of-tree" module was loaded?
We have some un-upstream modules indeed:)

>
> Read the rest from the bottom up, as that's how I wrote it :-)
>
>
> > [ 15.119075] task: f213f980 ti: f0c42000 task.ti: f0c42000
> > [ 15.125116] EIP: 0060:[<c200402a>] EFLAGS: 00210087 CPU: 2
> > [ 15.131255] EIP is at print_context_stack+0x4a/0xa0
> > [ 15.136712] EAX: 16497ffc EBX: 1649736d ECX: 986736d8 EDX: 1649736d
> > [ 15.143722] ESI: 00000000 EDI: ffffe000 EBP: f0c4220c ESP: f0c421ec
> > [ 15.150732] DS: 007b ES: 007b FS: 00d8 GS: 003b SS: 0068
> > [ 15.156771] CR0: 80050033 CR2: 1649736d CR3: 31245000 CR4:
> 001007d0
> > [ 15.163781] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3:
> 00000000
> > [ 15.170789] DR6: ffff0ff0 DR7: 00000400
> > [ 15.175076] Stack:
> > [ 15.177324] 16497ffc 16496000 986736d8 ffffe000 986736d8 1649736d
> c282c148 16496000
> > [ 15.186067] f0c4223c c20033b0 c282c148 c29ceecf 00000000 f0c4222c
> 986736d8 f0c4222c
> > [ 15.194810] 00000000 c29ceecf 00000000 00000000 f0c42260
> c20041a7 f0c4229c c282c148
> > [ 15.203549] Call Trace:
> > [ 15.206295] [<c20033b0>] dump_trace+0x70/0xf0
> > [ 15.211274] [<c20041a7>] show_trace_log_lvl+0x47/0x60
> > [ 15.217028] [<c2003482>] show_stack_log_lvl+0x52/0xd0
> > [ 15.222782] [<c2004201>] show_stack+0x21/0x50
> > [ 15.227762] [<c281b38b>] dump_stack+0x16/0x18
> > [ 15.232742] [<c2037cff>] warn_slowpath_common+0x5f/0x80
> > [ 15.238693] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> > [ 15.244156] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> > [ 15.249621] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [ 15.255472] [<c2037d3d>] warn_slowpath_null+0x1d/0x20
> > [ 15.261228] [<c282553a>] vmalloc_fault+0x5a/0xcf
> > [ 15.266497] [<c282592f>] __do_page_fault+0x2cf/0x4a0
> > [ 15.272154] [<c25e13e0>] ? logger_aio_write+0x230/0x230
> > [ 15.278106] [<c2039c94>] ? console_unlock+0x314/0x440
> > ... //
> > [ 16.885364] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [ 16.891217] [<c2825b08>] do_page_fault+0x8/0x10
> > [ 16.896387] [<c2823066>] error_code+0x5a/0x60
> > [ 16.901367] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [ 16.907219] [<c208d6a0>] ? print_modules+0x20/0x90
> > [ 16.912685] [<c2037cfa>] warn_slowpath_common+0x5a/0x80
> > [ 16.918634] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> > [ 16.924097] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> > [ 16.929562] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [ 16.935415] [<c2037d3d>] warn_slowpath_null+0x1d/0x20
> > [ 16.941169] [<c282553a>] vmalloc_fault+0x5a/0xcf
> > [ 16.946437] [<c282592f>] __do_page_fault+0x2cf/0x4a0
> > [ 16.952095] [<c25e13e0>] ? logger_aio_write+0x230/0x230
> > [ 16.958046] [<c2039c94>] ? console_unlock+0x314/0x440
> > [ 16.963800] [<c2003e62>] ? sys_modify_ldt+0x2/0x160
> > [ 16.969362] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [ 16.975215] [<c2825b08>] do_page_fault+0x8/0x10
> > [ 16.980386] [<c2823066>] error_code+0x5a/0x60
> > [ 16.985366] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [ 16.991215] [<c208d6a0>] ? print_modules+0x20/0x90
> > [ 16.996673] [<c2037cfa>] warn_slowpath_common+0x5a/0x80
> > [ 17.002622] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> > [ 17.008086] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> > [ 17.013550] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [ 17.019403] [<c2037d3d>] warn_slowpath_null+0x1d/0x20
> > [ 17.025159] [<c282553a>] vmalloc_fault+0x5a/0xcf
>
> Oh look, we are constantly warning about this same fault! There's your
> infinite loop.
Yes, it is the real WARN_ON infinite loop.

>
> Note the WARN_ON_ONCE() does the WARN_ON() first and then updates
> __warned = true. Thus, if the WARN_ON() itself faults, then we are in
> an infinite loop.
>
> > [ 17.030428] [<c282592f>] __do_page_fault+0x2cf/0x4a0
> > [ 17.036085] [<c25e13e0>] ? logger_aio_write+0x230/0x230
> > [ 17.042037] [<c2039c94>] ? console_unlock+0x314/0x440
> > [ 17.047790] [<c2003e62>] ? sys_modify_ldt+0x2/0x160
> > [ 17.053352] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [ 17.059205] [<c2825b08>] do_page_fault+0x8/0x10
> > [ 17.064375] [<c2823066>] error_code+0x5a/0x60
> > [ 17.069354] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [ 17.075204] [<c208d6a0>] ? print_modules+0x20/0x90
> > [ 17.080669] [<c2037cfa>] warn_slowpath_common+0x5a/0x80
> > [ 17.086619] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> > [ 17.092082] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> > [ 17.097546] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [ 17.103399] [<c2037d3d>] warn_slowpath_null+0x1d/0x20
> > [ 17.109154] [<c282553a>] vmalloc_fault+0x5a/0xcf
>
> Yep, the WARN_ON() triggered in vmalloc_fault(). We shouldn't worry
> about warning in_nmi() for vmalloc faults anymore.
Got it.

>
>
> > [ 17.114422] [<c282592f>] __do_page_fault+0x2cf/0x4a0
> > [ 17.120080] [<c206b93d>] ? update_group_power+0x1fd/0x240
> > [ 17.126224] [<c227827b>] ? number.isra.2+0x32b/0x330
> > [ 17.131880] [<c20679bc>] ? update_curr+0xac/0x190
> > [ 17.137247] [<c227827b>] ? number.isra.2+0x32b/0x330
> > [ 17.142905] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [ 17.148755] [<c2825b08>] do_page_fault+0x8/0x10
> > [ 17.153926] [<c2823066>] error_code+0x5a/0x60
> > [ 17.158905] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [ 17.164760] [<c208d1a9>] ? module_address_lookup+0x29/0xb0
> > [ 17.170999] [<c208dddb>] kallsyms_lookup+0x9b/0xb0
>
> Looks like kallsyms_lookup() faulted?
>
> > [ 17.176462] [<c208de1d>] __sprint_symbol+0x2d/0xd0
> > [ 17.181926] [<c22790cc>] ? sprintf+0x1c/0x20
> > [ 17.186804] [<c208def4>] sprint_symbol+0x14/0x20
> > [ 17.192063] [<c208df1e>] __print_symbol+0x1e/0x40
> > [ 17.197430] [<c25e00d7>] ? ashmem_shrink+0x77/0xf0
> > [ 17.202895] [<c25e13e0>] ? logger_aio_write+0x230/0x230
> > [ 17.208845] [<c205bdf5>] ? up+0x25/0x40
> > [ 17.213242] [<c2039cb7>] ? console_unlock+0x337/0x440
> > [ 17.218998] [<c2818236>] ? printk+0x38/0x3a
> > [ 17.223782] [<c20006d0>] __show_regs+0x70/0x190
> > [ 17.228954] [<c200353a>] show_regs+0x3a/0x1b0
> > [ 17.233931] [<c2818236>] ? printk+0x38/0x3a
> > [ 17.238717] [<c2824182>]
> arch_trigger_all_cpu_backtrace_handler+0x62/0x80
> > [ 17.246413] [<c2823919>] nmi_handle.isra.0+0x39/0x60
> > [ 17.252071] [<c2823a29>] do_nmi+0xe9/0x3f0
>
> Start here and read upward.
>
> Can you try this patch:
>
> From 794197cf3f563d36e5ee5b29cbf8e941163f9bc9 Mon Sep 17 00:00:00
> 2001
> From: "Steven Rostedt (Red Hat)" <[email protected]>
> Date: Tue, 15 Oct 2013 12:34:56 -0400
> Subject: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault
>
> Since the NMI iretq nesting has been fixed, there's no reason that
I think you patch fix the infinite loop, we will have a test soon.
BTW, we are using 3.10, could you help to point out which NMI iretq nesting patch?
Thanks.

> an NMI handler can not take a page fault for vmalloc'd code. No locks
> are taken in that code path, and the software now handles nested NMIs
> when the fault re-enables NMIs on iretq.
>
> Not only that, if the vmalloc_fault() WARN_ON_ONCE() is hit, and that
> warn on triggers a vmalloc fault for some reason, then we can go into
> an infinite loop (the WARN_ON_ONCE() does the WARN() before updating
> the variable to make it happen "once").
>
> Reported-by: "Liu, Chuansheng" <[email protected]>
> Signed-off-by: Steven Rostedt <[email protected]>
> ---
> arch/x86/mm/fault.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 3aaeffc..78926c6 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -268,8 +268,6 @@ static noinline __kprobes int vmalloc_fault(unsigned
> long address)
> if (!(address >= VMALLOC_START && address < VMALLOC_END))
> return -1;
>
> - WARN_ON_ONCE(in_nmi());
> -
> /*
> * Synchronize this task's top level page-table
> * with the 'reference' page table.
> --
> 1.8.1.4

2013-10-16 02:08:07

by Steven Rostedt

[permalink] [raw]
Subject: Re: Panic and page fault in loop during handling NMI backtrace handler

On Wed, 16 Oct 2013 01:54:51 +0000
"Liu, Chuansheng" <[email protected]> wrote:


> > Since the NMI iretq nesting has been fixed, there's no reason that
> I think you patch fix the infinite loop, we will have a test soon.
> BTW, we are using 3.10, could you help to point out which NMI iretq nesting patch?

There were many. You can read about what was done here:

https://lwn.net/Articles/484932/

The original is here:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ccd49c2391773ffbf52bb80d75c4a92b16972517

But more were added.

But that is back in 3.3, so 3.10 has all the required updates.

-- Steve

2013-10-16 02:09:54

by Liu, Chuansheng

[permalink] [raw]
Subject: RE: Panic and page fault in loop during handling NMI backtrace handler

Hello Steven,

> -----Original Message-----
> From: Steven Rostedt [mailto:[email protected]]
> Sent: Wednesday, October 16, 2013 10:08 AM
> To: Liu, Chuansheng
> Cc: Ingo Molnar ([email protected]); [email protected]; [email protected];
> [email protected]; [email protected]; Peter Zijlstra
> ([email protected]); [email protected]; '[email protected]'
> ([email protected]); Wang, Xiaoming; Li, Zhuangzhi
> Subject: Re: Panic and page fault in loop during handling NMI backtrace handler
>
> On Wed, 16 Oct 2013 01:54:51 +0000
> "Liu, Chuansheng" <[email protected]> wrote:
>
>
> > > Since the NMI iretq nesting has been fixed, there's no reason that
> > I think you patch fix the infinite loop, we will have a test soon.
> > BTW, we are using 3.10, could you help to point out which NMI iretq nesting
> patch?
>
> There were many. You can read about what was done here:
>
> https://lwn.net/Articles/484932/
>
> The original is here:
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ccd49c
> 2391773ffbf52bb80d75c4a92b16972517
>
> But more were added.
>
> But that is back in 3.3, so 3.10 has all the required updates.
Thanks your info, is trying your patch now.

>
> -- Steve