LinuxLists.cc - [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc

2013-10-15 20:39:10

Subject: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault

Since the NMI iretq nesting has been fixed, there's no reason that
an NMI handler can not take a page fault for vmalloc'd code. No locks
are taken in that code path, and the software now handles nested NMIs
when the fault re-enables NMIs on iretq.

Not only that, if the vmalloc_fault() WARN_ON_ONCE() is hit, and that
warn on triggers a vmalloc fault for some reason, then we can go into
an infinite loop (the WARN_ON_ONCE() does the WARN() before updating
the variable to make it happen "once").

Reported-by: "Liu, Chuansheng" <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
arch/x86/mm/fault.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 3aaeffc..78926c6 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -268,8 +268,6 @@ static noinline __kprobes int vmalloc_fault(unsigned long address)
if (!(address >= VMALLOC_START && address < VMALLOC_END))
return -1;

- WARN_ON_ONCE(in_nmi());
-
/*
* Synchronize this task's top level page-table
* with the 'reference' page table.
--
1.8.1.4

2013-10-16 06:11:23

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault

* Steven Rostedt <[email protected]> wrote:

> Since the NMI iretq nesting has been fixed, there's no reason that
> an NMI handler can not take a page fault for vmalloc'd code. No locks
> are taken in that code path, and the software now handles nested NMIs
> when the fault re-enables NMIs on iretq.
>
> Not only that, if the vmalloc_fault() WARN_ON_ONCE() is hit, and that
> warn on triggers a vmalloc fault for some reason, then we can go into
> an infinite loop (the WARN_ON_ONCE() does the WARN() before updating
> the variable to make it happen "once").
>
> Reported-by: "Liu, Chuansheng" <[email protected]>
> Signed-off-by: Steven Rostedt <[email protected]>

Would be nice to see the warning quoted that triggered this.

Thanks,

Ingo

2013-10-16 11:40:44

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault

On Tue, Oct 15, 2013 at 04:39:06PM -0400, Steven Rostedt wrote:
> Since the NMI iretq nesting has been fixed, there's no reason that
> an NMI handler can not take a page fault for vmalloc'd code. No locks
> are taken in that code path, and the software now handles nested NMIs
> when the fault re-enables NMIs on iretq.
>
> Not only that, if the vmalloc_fault() WARN_ON_ONCE() is hit, and that
> warn on triggers a vmalloc fault for some reason, then we can go into
> an infinite loop (the WARN_ON_ONCE() does the WARN() before updating
> the variable to make it happen "once").
>
> Reported-by: "Liu, Chuansheng" <[email protected]>
> Signed-off-by: Steven Rostedt <[email protected]>

Thanks! For now we probably indeed want this patch. But I hope it's only
for the short term.

I still think that allowing faults in NMIs is very nasty, as we expect NMIs to never
be disturbed. I'm not even sure if that interacts correctly with the rcu_nmi_enter()
and preempt_count & NMI_MASK things. Not sure how perf is ready for that either (now
hardware events can be interrupted by fault trace events).

So I hope we can think about something else for the long term.

> ---
> arch/x86/mm/fault.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 3aaeffc..78926c6 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -268,8 +268,6 @@ static noinline __kprobes int vmalloc_fault(unsigned long address)
> if (!(address >= VMALLOC_START && address < VMALLOC_END))
> return -1;
>
> - WARN_ON_ONCE(in_nmi());
> -
> /*
> * Synchronize this task's top level page-table
> * with the 'reference' page table.
> --
> 1.8.1.4
>

2013-10-16 12:42:25

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault

On Wed, 16 Oct 2013 08:11:18 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Steven Rostedt <[email protected]> wrote:
>
> > Since the NMI iretq nesting has been fixed, there's no reason that
> > an NMI handler can not take a page fault for vmalloc'd code. No locks
> > are taken in that code path, and the software now handles nested NMIs
> > when the fault re-enables NMIs on iretq.
> >
> > Not only that, if the vmalloc_fault() WARN_ON_ONCE() is hit, and that
> > warn on triggers a vmalloc fault for some reason, then we can go into
> > an infinite loop (the WARN_ON_ONCE() does the WARN() before updating
> > the variable to make it happen "once").
> >
> > Reported-by: "Liu, Chuansheng" <[email protected]>
> > Signed-off-by: Steven Rostedt <[email protected]>
>
> Would be nice to see the warning quoted that triggered this.

Sure, want me to add this to the change log?

===============
[ 15.069144] BUG: unable to handle kernel [ 15.073635] paging request at 1649736d
[ 15.076379] IP: [<c200402a>] print_context_stack+0x4a/0xa0
[ 15.082529] *pde = 00000000
[ 15.085758] Thread overran stack, or stack corrupted
[ 15.091303] Oops: 0000 [#1] SMP
[ 15.094932] Modules linked in: atomisp_css2400b0_v2(+) lm3554 ov2722 imx1x5 atmel_mxt_ts vxd392 videobuf_vmalloc videobuf_core bcm_bt_lpm bcm43241 kct_daemon(O)
[ 15.111093] CPU: 2 PID: 2443 Comm: Compiler Tainted: G W O 3.10.1+ #1
[ 15.119075] task: f213f980 ti: f0c42000 task.ti: f0c42000
[ 15.125116] EIP: 0060:[<c200402a>] EFLAGS: 00210087 CPU: 2
[ 15.131255] EIP is at print_context_stack+0x4a/0xa0
[ 15.136712] EAX: 16497ffc EBX: 1649736d ECX: 986736d8 EDX: 1649736d
[ 15.143722] ESI: 00000000 EDI: ffffe000 EBP: f0c4220c ESP: f0c421ec
[ 15.150732] DS: 007b ES: 007b FS: 00d8 GS: 003b SS: 0068
[ 15.156771] CR0: 80050033 CR2: 1649736d CR3: 31245000 CR4: 001007d0
[ 15.163781] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 15.170789] DR6: ffff0ff0 DR7: 00000400
[ 15.175076] Stack:
[ 15.177324] 16497ffc 16496000 986736d8 ffffe000 986736d8 1649736d c282c148 16496000
[ 15.186067] f0c4223c c20033b0 c282c148 c29ceecf 00000000 f0c4222c 986736d8 f0c4222c
[ 15.194810] 00000000 c29ceecf 00000000 00000000 f0c42260 c20041a7 f0c4229c c282c148
[ 15.203549] Call Trace:
[ 15.206295] [<c20033b0>] dump_trace+0x70/0xf0
[ 15.211274] [<c20041a7>] show_trace_log_lvl+0x47/0x60
[ 15.217028] [<c2003482>] show_stack_log_lvl+0x52/0xd0
[ 15.222782] [<c2004201>] show_stack+0x21/0x50
[ 15.227762] [<c281b38b>] dump_stack+0x16/0x18
[ 15.232742] [<c2037cff>] warn_slowpath_common+0x5f/0x80
[ 15.238693] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
[ 15.244156] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
[ 15.249621] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
[ 15.255472] [<c2037d3d>] warn_slowpath_null+0x1d/0x20
[ 15.261228] [<c282553a>] vmalloc_fault+0x5a/0xcf
[ 15.266497] [<c282592f>] __do_page_fault+0x2cf/0x4a0
[ 15.272154] [<c25e13e0>] ? logger_aio_write+0x230/0x230
[ 15.278106] [<c2039c94>] ? console_unlock+0x314/0x440
... //
[ 16.885364] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
[ 16.891217] [<c2825b08>] do_page_fault+0x8/0x10
[ 16.896387] [<c2823066>] error_code+0x5a/0x60
[ 16.901367] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
[ 16.907219] [<c208d6a0>] ? print_modules+0x20/0x90
[ 16.912685] [<c2037cfa>] warn_slowpath_common+0x5a/0x80
[ 16.918634] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
[ 16.924097] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
[ 16.929562] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
[ 16.935415] [<c2037d3d>] warn_slowpath_null+0x1d/0x20
[ 16.941169] [<c282553a>] vmalloc_fault+0x5a/0xcf
[ 16.946437] [<c282592f>] __do_page_fault+0x2cf/0x4a0
[ 16.952095] [<c25e13e0>] ? logger_aio_write+0x230/0x230
[ 16.958046] [<c2039c94>] ? console_unlock+0x314/0x440
[ 16.963800] [<c2003e62>] ? sys_modify_ldt+0x2/0x160
[ 16.969362] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
[ 16.975215] [<c2825b08>] do_page_fault+0x8/0x10
[ 16.980386] [<c2823066>] error_code+0x5a/0x60
[ 16.985366] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
[ 16.991215] [<c208d6a0>] ? print_modules+0x20/0x90
[ 16.996673] [<c2037cfa>] warn_slowpath_common+0x5a/0x80
[ 17.002622] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
[ 17.008086] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
[ 17.013550] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
[ 17.019403] [<c2037d3d>] warn_slowpath_null+0x1d/0x20
[ 17.025159] [<c282553a>] vmalloc_fault+0x5a/0xcf
[ 17.030428] [<c282592f>] __do_page_fault+0x2cf/0x4a0
[ 17.036085] [<c25e13e0>] ? logger_aio_write+0x230/0x230
[ 17.042037] [<c2039c94>] ? console_unlock+0x314/0x440
[ 17.047790] [<c2003e62>] ? sys_modify_ldt+0x2/0x160
[ 17.053352] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
[ 17.059205] [<c2825b08>] do_page_fault+0x8/0x10
[ 17.064375] [<c2823066>] error_code+0x5a/0x60
[ 17.069354] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
[ 17.075204] [<c208d6a0>] ? print_modules+0x20/0x90
[ 17.080669] [<c2037cfa>] warn_slowpath_common+0x5a/0x80
[ 17.086619] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
[ 17.092082] [<c282553a>] ? vmalloc_fault+0x5a/0xcf
[ 17.097546] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
[ 17.103399] [<c2037d3d>] warn_slowpath_null+0x1d/0x20
[ 17.109154] [<c282553a>] vmalloc_fault+0x5a/0xcf
[ 17.114422] [<c282592f>] __do_page_fault+0x2cf/0x4a0
[ 17.120080] [<c206b93d>] ? update_group_power+0x1fd/0x240
[ 17.126224] [<c227827b>] ? number.isra.2+0x32b/0x330
[ 17.131880] [<c20679bc>] ? update_curr+0xac/0x190
[ 17.137247] [<c227827b>] ? number.isra.2+0x32b/0x330
[ 17.142905] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
[ 17.148755] [<c2825b08>] do_page_fault+0x8/0x10
[ 17.153926] [<c2823066>] error_code+0x5a/0x60
[ 17.158905] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
[ 17.164760] [<c208d1a9>] ? module_address_lookup+0x29/0xb0
[ 17.170999] [<c208dddb>] kallsyms_lookup+0x9b/0xb0
[ 17.176462] [<c208de1d>] __sprint_symbol+0x2d/0xd0
[ 17.181926] [<c22790cc>] ? sprintf+0x1c/0x20
[ 17.186804] [<c208def4>] sprint_symbol+0x14/0x20
[ 17.192063] [<c208df1e>] __print_symbol+0x1e/0x40
[ 17.197430] [<c25e00d7>] ? ashmem_shrink+0x77/0xf0
[ 17.202895] [<c25e13e0>] ? logger_aio_write+0x230/0x230
[ 17.208845] [<c205bdf5>] ? up+0x25/0x40
[ 17.213242] [<c2039cb7>] ? console_unlock+0x337/0x440
[ 17.218998] [<c2818236>] ? printk+0x38/0x3a
[ 17.223782] [<c20006d0>] __show_regs+0x70/0x190
[ 17.228954] [<c200353a>] show_regs+0x3a/0x1b0
[ 17.233931] [<c2818236>] ? printk+0x38/0x3a
[ 17.238717] [<c2824182>] arch_trigger_all_cpu_backtrace_handler+0x62/0x80
[ 17.246413] [<c2823919>] nmi_handle.isra.0+0x39/0x60
[ 17.252071] [<c2823a29>] do_nmi+0xe9/0x3f0

-- Steve

2013-10-16 12:45:21

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault

On Wed, 16 Oct 2013 13:40:37 +0200
Frederic Weisbecker <[email protected]> wrote:

> On Tue, Oct 15, 2013 at 04:39:06PM -0400, Steven Rostedt wrote:
> > Since the NMI iretq nesting has been fixed, there's no reason that
> > an NMI handler can not take a page fault for vmalloc'd code. No locks
> > are taken in that code path, and the software now handles nested NMIs
> > when the fault re-enables NMIs on iretq.
> >
> > Not only that, if the vmalloc_fault() WARN_ON_ONCE() is hit, and that
> > warn on triggers a vmalloc fault for some reason, then we can go into
> > an infinite loop (the WARN_ON_ONCE() does the WARN() before updating
> > the variable to make it happen "once").
> >
> > Reported-by: "Liu, Chuansheng" <[email protected]>
> > Signed-off-by: Steven Rostedt <[email protected]>
>
> Thanks! For now we probably indeed want this patch. But I hope it's only
> for the short term.

Why?

>
> I still think that allowing faults in NMIs is very nasty, as we expect NMIs to never
> be disturbed.

We do faults (well, breakpoints really) in NMI to enable tracing.

> I'm not even sure if that interacts correctly with the rcu_nmi_enter()
> and preempt_count & NMI_MASK things. Not sure how perf is ready for that either (now
> hardware events can be interrupted by fault trace events).

I'm a bit confused. What doesn't interact correctly with
rcu_nmi_enter()?

>
> So I hope we can think about something else for the long term.

I still don't understand what's wrong with it. As long as the faulting
code does not grab any locks there shouldn't be anything wrong with
faulting in NMI. For vmalloc, it is just updating page tables.

-- Steve

2013-10-16 12:51:16

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault

* Steven Rostedt <[email protected]> wrote:

> On Wed, 16 Oct 2013 08:11:18 +0200
> Ingo Molnar <[email protected]> wrote:
>
> >
> > * Steven Rostedt <[email protected]> wrote:
> >
> > > Since the NMI iretq nesting has been fixed, there's no reason that
> > > an NMI handler can not take a page fault for vmalloc'd code. No locks
> > > are taken in that code path, and the software now handles nested NMIs
> > > when the fault re-enables NMIs on iretq.
> > >
> > > Not only that, if the vmalloc_fault() WARN_ON_ONCE() is hit, and that
> > > warn on triggers a vmalloc fault for some reason, then we can go into
> > > an infinite loop (the WARN_ON_ONCE() does the WARN() before updating
> > > the variable to make it happen "once").
> > >
> > > Reported-by: "Liu, Chuansheng" <[email protected]>
> > > Signed-off-by: Steven Rostedt <[email protected]>
> >
> > Would be nice to see the warning quoted that triggered this.
>
> Sure, want me to add this to the change log?

Yeah, that would be helpful - but only the stack trace portion I suspect,
to make it clear what caused the fault.

The one posted in the thread shows:

[ 17.148755] [<c2825b08>] do_page_fault+0x8/0x10
[ 17.153926] [<c2823066>] error_code+0x5a/0x60
[ 17.158905] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
[ 17.164760] [<c208d1a9>] ? module_address_lookup+0x29/0xb0
[ 17.170999] [<c208dddb>] kallsyms_lookup+0x9b/0xb0
[ 17.186804] [<c208def4>] sprint_symbol+0x14/0x20
[ 17.192063] [<c208df1e>] __print_symbol+0x1e/0x40
[ 17.197430] [<c25e00d7>] ? ashmem_shrink+0x77/0xf0
[ 17.202895] [<c25e13e0>] ? logger_aio_write+0x230/0x230
[ 17.208845] [<c205bdf5>] ? up+0x25/0x40
[ 17.213242] [<c2039cb7>] ? console_unlock+0x337/0x440
[ 17.218998] [<c2818236>] ? printk+0x38/0x3a
[ 17.223782] [<c20006d0>] __show_regs+0x70/0x190
[ 17.228954] [<c200353a>] show_regs+0x3a/0x1b0
[ 17.233931] [<c2818236>] ? printk+0x38/0x3a
[ 17.238717] [<c2824182>] arch_trigger_all_cpu_backtrace_handler+0x62/0x80
[ 17.246413] [<c2823919>] nmi_handle.isra.0+0x39/0x60
[ 17.252071] [<c2823a29>] do_nmi+0xe9/0x3f0

So kallsyms_lookup() faulted, while the NMI watchdog triggered a
show_regs()? How is that possible?

Thanks,

Ingo

2013-10-16 13:02:08

by Borislav Petkov

[permalink] [raw]

Subject: Re: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault

On Wed, Oct 16, 2013 at 02:51:11PM +0200, Ingo Molnar wrote:
> The one posted in the thread shows:
>
> [ 17.148755] [<c2825b08>] do_page_fault+0x8/0x10
> [ 17.153926] [<c2823066>] error_code+0x5a/0x60
> [ 17.158905] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [ 17.164760] [<c208d1a9>] ? module_address_lookup+0x29/0xb0
> [ 17.170999] [<c208dddb>] kallsyms_lookup+0x9b/0xb0
> [ 17.186804] [<c208def4>] sprint_symbol+0x14/0x20
> [ 17.192063] [<c208df1e>] __print_symbol+0x1e/0x40
> [ 17.197430] [<c25e00d7>] ? ashmem_shrink+0x77/0xf0
> [ 17.202895] [<c25e13e0>] ? logger_aio_write+0x230/0x230
> [ 17.208845] [<c205bdf5>] ? up+0x25/0x40
> [ 17.213242] [<c2039cb7>] ? console_unlock+0x337/0x440
> [ 17.218998] [<c2818236>] ? printk+0x38/0x3a
> [ 17.223782] [<c20006d0>] __show_regs+0x70/0x190
> [ 17.228954] [<c200353a>] show_regs+0x3a/0x1b0
> [ 17.233931] [<c2818236>] ? printk+0x38/0x3a
> [ 17.238717] [<c2824182>] arch_trigger_all_cpu_backtrace_handler+0x62/0x80
> [ 17.246413] [<c2823919>] nmi_handle.isra.0+0x39/0x60
> [ 17.252071] [<c2823a29>] do_nmi+0xe9/0x3f0

Btw, you probably should drop all the numbers and leave only the
function names in the stack trace.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-10-16 13:03:27

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault

On Wed, 16 Oct 2013 14:51:11 +0200
Ingo Molnar <[email protected]> wrote:

> So kallsyms_lookup() faulted, while the NMI watchdog triggered a
> show_regs()? How is that possible?

It was a vmalloc fault. Do modules keep their symbol tables in a
vmalloced area? If so, I think the get_ksymbol() can fault when
searching for a module address.

-- Steve

2013-10-16 13:09:04

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault

On Wed, Oct 16, 2013 at 08:45:18AM -0400, Steven Rostedt wrote:
> On Wed, 16 Oct 2013 13:40:37 +0200
> Frederic Weisbecker <[email protected]> wrote:
>
> > On Tue, Oct 15, 2013 at 04:39:06PM -0400, Steven Rostedt wrote:
> > > Since the NMI iretq nesting has been fixed, there's no reason that
> > > an NMI handler can not take a page fault for vmalloc'd code. No locks
> > > are taken in that code path, and the software now handles nested NMIs
> > > when the fault re-enables NMIs on iretq.
> > >
> > > Not only that, if the vmalloc_fault() WARN_ON_ONCE() is hit, and that
> > > warn on triggers a vmalloc fault for some reason, then we can go into
> > > an infinite loop (the WARN_ON_ONCE() does the WARN() before updating
> > > the variable to make it happen "once").
> > >
> > > Reported-by: "Liu, Chuansheng" <[email protected]>
> > > Signed-off-by: Steven Rostedt <[email protected]>
> >
> > Thanks! For now we probably indeed want this patch. But I hope it's only
> > for the short term.
>
> Why?
>
> >
> > I still think that allowing faults in NMIs is very nasty, as we expect NMIs to never
> > be disturbed.
>
> We do faults (well, breakpoints really) in NMI to enable tracing.
>
> > I'm not even sure if that interacts correctly with the rcu_nmi_enter()
> > and preempt_count & NMI_MASK things. Not sure how perf is ready for that either (now
> > hardware events can be interrupted by fault trace events).
>
> I'm a bit confused. What doesn't interact correctly with
> rcu_nmi_enter()?

Faults can call rcu_user_exit() / rcu_user_enter(). This is not supposed to happen
between rcu_nmi_enter() and rcu_nmi_exit(). rdtp->dynticks would be incremented in the
wrong way.

Ah but we have an in_interrupt() check in context_tracking_user_enter() that protects
us against that.

>
> >
> > So I hope we can think about something else for the long term.
>
> I still don't understand what's wrong with it. As long as the faulting
> code does not grab any locks there shouldn't be anything wrong with
> faulting in NMI. For vmalloc, it is just updating page tables.

NMI code is written with the idea that it can't be interrupted. May be that
paranoia (again), you know. And I can't point you any problem in practice.
I just think that allowing such a thing is asking for troubles.

But I'm ok with your patch, it fixes a real bug and as long as we don't have
a better solution, we should keep that.

BTW, does faulting in NMIs re-enable NMIs?

2013-10-16 13:14:40

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault

On Wed, 16 Oct 2013 15:08:57 +0200
Frederic Weisbecker <[email protected]> wrote:

> Faults can call rcu_user_exit() / rcu_user_enter(). This is not supposed to happen
> between rcu_nmi_enter() and rcu_nmi_exit(). rdtp->dynticks would be incremented in the
> wrong way.
>
> Ah but we have an in_interrupt() check in context_tracking_user_enter() that protects
> us against that.

I will say that we should probably warn if it's any fault other than a
vmalloc fault. A vmalloc fault should only happen in kernel space, and
should not be happening from user code.

>
> >
> > >
> > > So I hope we can think about something else for the long term.
> >
> > I still don't understand what's wrong with it. As long as the faulting
> > code does not grab any locks there shouldn't be anything wrong with
> > faulting in NMI. For vmalloc, it is just updating page tables.
>
> NMI code is written with the idea that it can't be interrupted. May be that
> paranoia (again), you know. And I can't point you any problem in practice.
> I just think that allowing such a thing is asking for troubles.

The WARN_ON() that I removed is from vmalloc fault. I don't see an
issue with NMIs faulting via vmalloc. For any other page fault, sure, I
would be concerned about it. But what's wrong with an NMI running
module code?

>
> But I'm ok with your patch, it fixes a real bug and as long as we don't have
> a better solution, we should keep that.
>
> BTW, does faulting in NMIs re-enable NMIs?

Yes, but we now have code to handle that :-)

-- Steve

2013-10-16 13:28:22

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault

On Wed, Oct 16, 2013 at 09:14:37AM -0400, Steven Rostedt wrote:
> On Wed, 16 Oct 2013 15:08:57 +0200
> Frederic Weisbecker <[email protected]> wrote:
>
>
> > Faults can call rcu_user_exit() / rcu_user_enter(). This is not supposed to happen
> > between rcu_nmi_enter() and rcu_nmi_exit(). rdtp->dynticks would be incremented in the
> > wrong way.
> >
> > Ah but we have an in_interrupt() check in context_tracking_user_enter() that protects
> > us against that.
>
> I will say that we should probably warn if it's any fault other than a
> vmalloc fault. A vmalloc fault should only happen in kernel space, and
> should not be happening from user code.

The NMI can interrupt userspace. When the fault happens, it sees that context tracking
state is set to userspace (NMIs and interrupts in general don't exit that state, hence
the in_interrupt() check that returns when user_exit/enter is called) so it calls user_enter().
But anyway we should be protected against that.

>
> >
> > >
> > > >
> > > > So I hope we can think about something else for the long term.
> > >
> > > I still don't understand what's wrong with it. As long as the faulting
> > > code does not grab any locks there shouldn't be anything wrong with
> > > faulting in NMI. For vmalloc, it is just updating page tables.
> >
> > NMI code is written with the idea that it can't be interrupted. May be that
> > paranoia (again), you know. And I can't point you any problem in practice.
> > I just think that allowing such a thing is asking for troubles.
>
> The WARN_ON() that I removed is from vmalloc fault. I don't see an
> issue with NMIs faulting via vmalloc. For any other page fault, sure, I
> would be concerned about it. But what's wrong with an NMI running
> module code?

I won't argue further as none of us is going to change his opinion on this :)

2013-10-16 13:37:16

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault

On Wed, 16 Oct 2013 15:28:15 +0200
Frederic Weisbecker <[email protected]> wrote:

> On Wed, Oct 16, 2013 at 09:14:37AM -0400, Steven Rostedt wrote:
> > On Wed, 16 Oct 2013 15:08:57 +0200
> > Frederic Weisbecker <[email protected]> wrote:
> >
> >
> > > Faults can call rcu_user_exit() / rcu_user_enter(). This is not supposed to happen
> > > between rcu_nmi_enter() and rcu_nmi_exit(). rdtp->dynticks would be incremented in the
> > > wrong way.
> > >
> > > Ah but we have an in_interrupt() check in context_tracking_user_enter() that protects
> > > us against that.
> >
> > I will say that we should probably warn if it's any fault other than a
> > vmalloc fault. A vmalloc fault should only happen in kernel space, and
> > should not be happening from user code.
>
> The NMI can interrupt userspace. When the fault happens, it sees that context tracking
> state is set to userspace (NMIs and interrupts in general don't exit that state, hence
> the in_interrupt() check that returns when user_exit/enter is called) so it calls user_enter().
> But anyway we should be protected against that.

IIRC, NMI itself is safe to use rcu_read_lock(), at least I remember
Paul making sure that stuff was lockless and NMI safe.

> > The WARN_ON() that I removed is from vmalloc fault. I don't see an
> > issue with NMIs faulting via vmalloc. For any other page fault, sure, I
> > would be concerned about it. But what's wrong with an NMI running
> > module code?
>
> I won't argue further as none of us is going to change his opinion on this :)

Sure sure, yet another argument continues with two sides stubbornly
refusing to negotiate about a looming future (de)fault!

-- Steve

2013-10-16 19:36:48

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault

On Wed, Oct 16, 2013 at 03:08:57PM +0200, Frederic Weisbecker wrote:
> On Wed, Oct 16, 2013 at 08:45:18AM -0400, Steven Rostedt wrote:
> > On Wed, 16 Oct 2013 13:40:37 +0200
> > Frederic Weisbecker <[email protected]> wrote:
> >
> > > On Tue, Oct 15, 2013 at 04:39:06PM -0400, Steven Rostedt wrote:
> > > > Since the NMI iretq nesting has been fixed, there's no reason that
> > > > an NMI handler can not take a page fault for vmalloc'd code. No locks
> > > > are taken in that code path, and the software now handles nested NMIs
> > > > when the fault re-enables NMIs on iretq.
> > > >
> > > > Not only that, if the vmalloc_fault() WARN_ON_ONCE() is hit, and that
> > > > warn on triggers a vmalloc fault for some reason, then we can go into
> > > > an infinite loop (the WARN_ON_ONCE() does the WARN() before updating
> > > > the variable to make it happen "once").
> > > >
> > > > Reported-by: "Liu, Chuansheng" <[email protected]>
> > > > Signed-off-by: Steven Rostedt <[email protected]>
> > >
> > > Thanks! For now we probably indeed want this patch. But I hope it's only
> > > for the short term.
> >
> > Why?
> >
> > >
> > > I still think that allowing faults in NMIs is very nasty, as we expect NMIs to never
> > > be disturbed.
> >
> > We do faults (well, breakpoints really) in NMI to enable tracing.
> >
> > > I'm not even sure if that interacts correctly with the rcu_nmi_enter()
> > > and preempt_count & NMI_MASK things. Not sure how perf is ready for that either (now
> > > hardware events can be interrupted by fault trace events).
> >
> > I'm a bit confused. What doesn't interact correctly with
> > rcu_nmi_enter()?
>
> Faults can call rcu_user_exit() / rcu_user_enter(). This is not supposed to happen
> between rcu_nmi_enter() and rcu_nmi_exit(). rdtp->dynticks would be incremented in the
> wrong way.

I can attest to this! NMIs check for being nested within
process/irq-based non-idle sojourns, but not the other way around.
The result is that RCU will be ignoring you during that time, and not
even disabling interrupts will save you. It will check rdtp->dynticks,
see that its value is even, and register a quiescent state on behalf of
the hapless CPU.

> Ah but we have an in_interrupt() check in context_tracking_user_enter() that protects
> us against that.

Here you are relying on the exception being treated as an interrupt,
correct?

Thanx, Paul

> > > So I hope we can think about something else for the long term.
> >
> > I still don't understand what's wrong with it. As long as the faulting
> > code does not grab any locks there shouldn't be anything wrong with
> > faulting in NMI. For vmalloc, it is just updating page tables.
>
> NMI code is written with the idea that it can't be interrupted. May be that
> paranoia (again), you know. And I can't point you any problem in practice.
> I just think that allowing such a thing is asking for troubles.
>
> But I'm ok with your patch, it fixes a real bug and as long as we don't have
> a better solution, we should keep that.
>
> BTW, does faulting in NMIs re-enable NMIs?
>

2013-10-16 19:39:55

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault

On Wed, Oct 16, 2013 at 09:37:12AM -0400, Steven Rostedt wrote:
> On Wed, 16 Oct 2013 15:28:15 +0200
> Frederic Weisbecker <[email protected]> wrote:
>
> > On Wed, Oct 16, 2013 at 09:14:37AM -0400, Steven Rostedt wrote:
> > > On Wed, 16 Oct 2013 15:08:57 +0200
> > > Frederic Weisbecker <[email protected]> wrote:
> > >
> > >
> > > > Faults can call rcu_user_exit() / rcu_user_enter(). This is not supposed to happen
> > > > between rcu_nmi_enter() and rcu_nmi_exit(). rdtp->dynticks would be incremented in the
> > > > wrong way.
> > > >
> > > > Ah but we have an in_interrupt() check in context_tracking_user_enter() that protects
> > > > us against that.
> > >
> > > I will say that we should probably warn if it's any fault other than a
> > > vmalloc fault. A vmalloc fault should only happen in kernel space, and
> > > should not be happening from user code.
> >
> > The NMI can interrupt userspace. When the fault happens, it sees that context tracking
> > state is set to userspace (NMIs and interrupts in general don't exit that state, hence
> > the in_interrupt() check that returns when user_exit/enter is called) so it calls user_enter().
> > But anyway we should be protected against that.
>
> IIRC, NMI itself is safe to use rcu_read_lock(), at least I remember
> Paul making sure that stuff was lockless and NMI safe.

Yep, even preemptible RCU. This relies on the fact that we cannot be
preempted within either an NMI handler or an exception handler.

> > > The WARN_ON() that I removed is from vmalloc fault. I don't see an
> > > issue with NMIs faulting via vmalloc. For any other page fault, sure, I
> > > would be concerned about it. But what's wrong with an NMI running
> > > module code?
> >
> > I won't argue further as none of us is going to change his opinion on this :)
>
> Sure sure, yet another argument continues with two sides stubbornly
> refusing to negotiate about a looming future (de)fault!

I figure some good hard testing will bring the truth of the matter to light.
The arguing parties might well then wish that they had compromised so as
to avoid the hard sharp truth, but by then it will be too late. ;-)

Thanx, Paul

2013-10-16 19:57:36

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault

On Wed, 16 Oct 2013 12:36:32 -0700
"Paul E. McKenney" <[email protected]> wrote:

> > Ah but we have an in_interrupt() check in context_tracking_user_enter() that protects
> > us against that.
>
> Here you are relying on the exception being treated as an interrupt,
> correct?

I don't think so. It's relying on nmi_enter() also makes in_interrupt()
return true.

Like I said before. An NMI interrupting userspace should be no
different than an interrupt interrupting userspace. They both can
trigger vmalloc faults, and we should be able to deal with it.

-- Steve

2013-10-17 00:29:11

by Liu, Chuansheng

[permalink] [raw]

Subject: RE: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault

> -----Original Message-----
> From: Ingo Molnar [mailto:[email protected]] On Behalf Of Ingo
> Molnar
> Sent: Wednesday, October 16, 2013 8:51 PM
> To: Steven Rostedt
> Cc: LKML; Thomas Gleixner; H. Peter Anvin; Frederic Weisbecker; Andrew
> Morton; [email protected]; Peter Zijlstra; [email protected]; Wang,
> Xiaoming; Li, Zhuangzhi; Liu, Chuansheng
> Subject: Re: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault
>
>
> * Steven Rostedt <[email protected]> wrote:
>
> > On Wed, 16 Oct 2013 08:11:18 +0200
> > Ingo Molnar <[email protected]> wrote:
> >
> > >
> > > * Steven Rostedt <[email protected]> wrote:
> > >
> > > > Since the NMI iretq nesting has been fixed, there's no reason that
> > > > an NMI handler can not take a page fault for vmalloc'd code. No locks
> > > > are taken in that code path, and the software now handles nested NMIs
> > > > when the fault re-enables NMIs on iretq.
> > > >
> > > > Not only that, if the vmalloc_fault() WARN_ON_ONCE() is hit, and that
> > > > warn on triggers a vmalloc fault for some reason, then we can go into
> > > > an infinite loop (the WARN_ON_ONCE() does the WARN() before updating
> > > > the variable to make it happen "once").
> > > >
> > > > Reported-by: "Liu, Chuansheng" <[email protected]>
> > > > Signed-off-by: Steven Rostedt <[email protected]>
> > >
> > > Would be nice to see the warning quoted that triggered this.
> >
> > Sure, want me to add this to the change log?
>
> Yeah, that would be helpful - but only the stack trace portion I suspect,
> to make it clear what caused the fault.
>
> The one posted in the thread shows:
>
> [ 17.148755] [<c2825b08>] do_page_fault+0x8/0x10
> [ 17.153926] [<c2823066>] error_code+0x5a/0x60
> [ 17.158905] [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [ 17.164760] [<c208d1a9>] ? module_address_lookup+0x29/0xb0
> [ 17.170999] [<c208dddb>] kallsyms_lookup+0x9b/0xb0
> [ 17.186804] [<c208def4>] sprint_symbol+0x14/0x20
> [ 17.192063] [<c208df1e>] __print_symbol+0x1e/0x40
> [ 17.197430] [<c25e00d7>] ? ashmem_shrink+0x77/0xf0
> [ 17.202895] [<c25e13e0>] ? logger_aio_write+0x230/0x230
> [ 17.208845] [<c205bdf5>] ? up+0x25/0x40
> [ 17.213242] [<c2039cb7>] ? console_unlock+0x337/0x440
> [ 17.218998] [<c2818236>] ? printk+0x38/0x3a
> [ 17.223782] [<c20006d0>] __show_regs+0x70/0x190
> [ 17.228954] [<c200353a>] show_regs+0x3a/0x1b0
> [ 17.233931] [<c2818236>] ? printk+0x38/0x3a
> [ 17.238717] [<c2824182>]
> arch_trigger_all_cpu_backtrace_handler+0x62/0x80
> [ 17.246413] [<c2823919>] nmi_handle.isra.0+0x39/0x60
> [ 17.252071] [<c2823a29>] do_nmi+0xe9/0x3f0
>
> So kallsyms_lookup() faulted, while the NMI watchdog triggered a
> show_regs()? How is that possible?
Not NMI watchdog triggered show_regs(), when we call arch_trigger_all_cpu_backtrace(),
the NMI handler arch_trigger_all_cpu_backtrace_handler() will call show_regs().

>
> Thanks,
>
> Ingo

2013-10-17 13:59:55

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault

On Wed, Oct 16, 2013 at 12:36:32PM -0700, Paul E. McKenney wrote:
> On Wed, Oct 16, 2013 at 03:08:57PM +0200, Frederic Weisbecker wrote:
> > On Wed, Oct 16, 2013 at 08:45:18AM -0400, Steven Rostedt wrote:
> > > On Wed, 16 Oct 2013 13:40:37 +0200
> > > Frederic Weisbecker <[email protected]> wrote:
> > >
> > > > On Tue, Oct 15, 2013 at 04:39:06PM -0400, Steven Rostedt wrote:
> > > > > Since the NMI iretq nesting has been fixed, there's no reason that
> > > > > an NMI handler can not take a page fault for vmalloc'd code. No locks
> > > > > are taken in that code path, and the software now handles nested NMIs
> > > > > when the fault re-enables NMIs on iretq.
> > > > >
> > > > > Not only that, if the vmalloc_fault() WARN_ON_ONCE() is hit, and that
> > > > > warn on triggers a vmalloc fault for some reason, then we can go into
> > > > > an infinite loop (the WARN_ON_ONCE() does the WARN() before updating
> > > > > the variable to make it happen "once").
> > > > >
> > > > > Reported-by: "Liu, Chuansheng" <[email protected]>
> > > > > Signed-off-by: Steven Rostedt <[email protected]>
> > > >
> > > > Thanks! For now we probably indeed want this patch. But I hope it's only
> > > > for the short term.
> > >
> > > Why?
> > >
> > > >
> > > > I still think that allowing faults in NMIs is very nasty, as we expect NMIs to never
> > > > be disturbed.
> > >
> > > We do faults (well, breakpoints really) in NMI to enable tracing.
> > >
> > > > I'm not even sure if that interacts correctly with the rcu_nmi_enter()
> > > > and preempt_count & NMI_MASK things. Not sure how perf is ready for that either (now
> > > > hardware events can be interrupted by fault trace events).
> > >
> > > I'm a bit confused. What doesn't interact correctly with
> > > rcu_nmi_enter()?
> >
> > Faults can call rcu_user_exit() / rcu_user_enter(). This is not supposed to happen
> > between rcu_nmi_enter() and rcu_nmi_exit(). rdtp->dynticks would be incremented in the
> > wrong way.
>
> I can attest to this! NMIs check for being nested within
> process/irq-based non-idle sojourns, but not the other way around.
> The result is that RCU will be ignoring you during that time, and not
> even disabling interrupts will save you. It will check rdtp->dynticks,
> see that its value is even, and register a quiescent state on behalf of
> the hapless CPU.

Fortunately, we are avoiding this with the in_interrupt() check on user_enter()
and user_exit(). Their goal is precisely to deal with traps/faults happening on
interrupts :)

>
> > Ah but we have an in_interrupt() check in context_tracking_user_enter() that protects
> > us against that.
>
> Here you are relying on the exception being treated as an interrupt,
> correct?

>From an RCU point of view yeah. In these cases the exception is either protected under
rcu_irq_* and rcu_nmi* APIs, depending on where it happened.

2013-10-18 11:55:03

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault

On Wed, Oct 16, 2013 at 03:57:30PM -0400, Steven Rostedt wrote:
> On Wed, 16 Oct 2013 12:36:32 -0700
> "Paul E. McKenney" <[email protected]> wrote:
>
>
> > > Ah but we have an in_interrupt() check in context_tracking_user_enter() that protects
> > > us against that.
> >
> > Here you are relying on the exception being treated as an interrupt,
> > correct?
>
> I don't think so. It's relying on nmi_enter() also makes in_interrupt()
> return true.

Got it, never mind!

Thanx, Paul

> Like I said before. An NMI interrupting userspace should be no
> different than an interrupt interrupting userspace. They both can
> trigger vmalloc faults, and we should be able to deal with it.
>
> -- Steve
>