2018-01-04 01:59:17

by Thomas Zeitlhofer

[permalink] [raw]
Subject: "BUG: using smp_processor_id() in preemptible" with KPTI on 4.14.11

Hello,

on an Ivybridge CPU, I get with 4.14.11:

BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4510
caller is native_flush_tlb_single+0x57/0xc0
CPU: 3 PID: 4510 Comm: ovsdb-server Not tainted 4.14.11-kvm-00434-gcd0b8eb84f5c #3
Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
Call Trace:
dump_stack+0x5c/0x86
check_preemption_disabled+0xdd/0xe0
native_flush_tlb_single+0x57/0xc0
? __set_pte_vaddr+0x2d/0x40
__set_pte_vaddr+0x2d/0x40
set_pte_vaddr+0x2f/0x40
cea_set_pte+0x30/0x40
ds_update_cea.constprop.4+0x4d/0x70
reserve_ds_buffers+0x159/0x410
? wp_page_copy+0x36d/0x6a0
x86_reserve_hardware+0x150/0x160
x86_pmu_event_init+0x3e/0x1f0
perf_try_init_event+0x69/0x80
perf_event_alloc+0x652/0x740
SyS_perf_event_open+0x3f6/0xd60
do_syscall_64+0x5c/0x190
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x74a1d94580b9
RSP: 002b:00007fff0c01d5d8 EFLAGS: 00000206 ORIG_RAX: 000000000000012a
RAX: ffffffffffffffda RBX: 00007fff0c01d7b0 RCX: 000074a1d94580b9
RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 00007fff0c01d5e0
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000007000000000
R10: 00000000ffffffff R11: 0000000000000206 R12: 0000000000000008
R13: 0000000000000000 R14: 00007fff0c01d790 R15: 00005df43a799600

This does not show up when booting with pti=off.

Maybe it is related to the issue that is fixed for the upcoming 4.4.110
release by https://lkml.org/lkml/2018/1/3/692

Thanks,

Thomas


2018-01-04 10:20:37

by Thomas Zeitlhofer

[permalink] [raw]
Subject: Re: "BUG: using smp_processor_id() in preemptible" with KPTI on 4.14.11

On Thu, Jan 04, 2018 at 02:59:06AM +0100, Thomas Zeitlhofer wrote:
> Hello,
>
> on an Ivybridge CPU, I get with 4.14.11:
>
> BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4510
> caller is native_flush_tlb_single+0x57/0xc0
> CPU: 3 PID: 4510 Comm: ovsdb-server Not tainted 4.14.11-kvm-00434-gcd0b8eb84f5c #3
> Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
> Call Trace:
> dump_stack+0x5c/0x86
> check_preemption_disabled+0xdd/0xe0
> native_flush_tlb_single+0x57/0xc0
> ? __set_pte_vaddr+0x2d/0x40
> __set_pte_vaddr+0x2d/0x40
> set_pte_vaddr+0x2f/0x40
> cea_set_pte+0x30/0x40
> ds_update_cea.constprop.4+0x4d/0x70
> reserve_ds_buffers+0x159/0x410
> ? wp_page_copy+0x36d/0x6a0
> x86_reserve_hardware+0x150/0x160
> x86_pmu_event_init+0x3e/0x1f0
> perf_try_init_event+0x69/0x80
> perf_event_alloc+0x652/0x740
> SyS_perf_event_open+0x3f6/0xd60
> do_syscall_64+0x5c/0x190
> entry_SYSCALL64_slow_path+0x25/0x25
> RIP: 0033:0x74a1d94580b9
> RSP: 002b:00007fff0c01d5d8 EFLAGS: 00000206 ORIG_RAX: 000000000000012a
> RAX: ffffffffffffffda RBX: 00007fff0c01d7b0 RCX: 000074a1d94580b9
> RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 00007fff0c01d5e0
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000007000000000
> R10: 00000000ffffffff R11: 0000000000000206 R12: 0000000000000008
> R13: 0000000000000000 R14: 00007fff0c01d790 R15: 00005df43a799600
>
> This does not show up when booting with pti=off.
>
> Maybe it is related to the issue that is fixed for the upcoming 4.4.110
> release by https://lkml.org/lkml/2018/1/3/692

JFYI, the very same kernel does not show this issue on a Haswell CPU.

Thanks,

Thomas

2018-01-04 10:51:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: "BUG: using smp_processor_id() in preemptible" with KPTI on 4.14.11

On Thu, Jan 04, 2018 at 11:20:29AM +0100, Thomas Zeitlhofer wrote:
> On Thu, Jan 04, 2018 at 02:59:06AM +0100, Thomas Zeitlhofer wrote:
> > Hello,
> >
> > on an Ivybridge CPU, I get with 4.14.11:
> >
> > BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4510
> > caller is native_flush_tlb_single+0x57/0xc0
> > CPU: 3 PID: 4510 Comm: ovsdb-server Not tainted 4.14.11-kvm-00434-gcd0b8eb84f5c #3
> > Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
> > Call Trace:
> > dump_stack+0x5c/0x86
> > check_preemption_disabled+0xdd/0xe0
> > native_flush_tlb_single+0x57/0xc0
> > ? __set_pte_vaddr+0x2d/0x40
> > __set_pte_vaddr+0x2d/0x40
> > set_pte_vaddr+0x2f/0x40
> > cea_set_pte+0x30/0x40
> > ds_update_cea.constprop.4+0x4d/0x70
> > reserve_ds_buffers+0x159/0x410
> > ? wp_page_copy+0x36d/0x6a0
> > x86_reserve_hardware+0x150/0x160
> > x86_pmu_event_init+0x3e/0x1f0
> > perf_try_init_event+0x69/0x80
> > perf_event_alloc+0x652/0x740
> > SyS_perf_event_open+0x3f6/0xd60
> > do_syscall_64+0x5c/0x190
> > entry_SYSCALL64_slow_path+0x25/0x25
> > RIP: 0033:0x74a1d94580b9
> > RSP: 002b:00007fff0c01d5d8 EFLAGS: 00000206 ORIG_RAX: 000000000000012a
> > RAX: ffffffffffffffda RBX: 00007fff0c01d7b0 RCX: 000074a1d94580b9
> > RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 00007fff0c01d5e0
> > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000007000000000
> > R10: 00000000ffffffff R11: 0000000000000206 R12: 0000000000000008
> > R13: 0000000000000000 R14: 00007fff0c01d790 R15: 00005df43a799600
> >
> > This does not show up when booting with pti=off.
> >
> > Maybe it is related to the issue that is fixed for the upcoming 4.4.110
> > release by https://lkml.org/lkml/2018/1/3/692

I don't understand this link. The 4.4 and 4.9 backports are much
different than the 4.14 tree.

> JFYI, the very same kernel does not show this issue on a Haswell CPU.

I have now queued up a bunch of patches that are in Linus's tree, can
you test these out as well:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-4.14

thanks,

greg k-h

2018-01-04 12:43:39

by Thomas Zeitlhofer

[permalink] [raw]
Subject: Re: "BUG: using smp_processor_id() in preemptible" with KPTI on 4.14.11

On Thu, Jan 04, 2018 at 11:51:11AM +0100, Greg Kroah-Hartman wrote:
> On Thu, Jan 04, 2018 at 11:20:29AM +0100, Thomas Zeitlhofer wrote:
> > On Thu, Jan 04, 2018 at 02:59:06AM +0100, Thomas Zeitlhofer wrote:
> > > Hello,
> > >
> > > on an Ivybridge CPU, I get with 4.14.11:
> > >
> > > BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4510
> > > caller is native_flush_tlb_single+0x57/0xc0
> > > CPU: 3 PID: 4510 Comm: ovsdb-server Not tainted 4.14.11-kvm-00434-gcd0b8eb84f5c #3
> > > Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
> > > Call Trace:
> > > dump_stack+0x5c/0x86
> > > check_preemption_disabled+0xdd/0xe0
> > > native_flush_tlb_single+0x57/0xc0
> > > ? __set_pte_vaddr+0x2d/0x40
> > > __set_pte_vaddr+0x2d/0x40
> > > set_pte_vaddr+0x2f/0x40
> > > cea_set_pte+0x30/0x40
> > > ds_update_cea.constprop.4+0x4d/0x70
> > > reserve_ds_buffers+0x159/0x410
> > > ? wp_page_copy+0x36d/0x6a0
> > > x86_reserve_hardware+0x150/0x160
> > > x86_pmu_event_init+0x3e/0x1f0
> > > perf_try_init_event+0x69/0x80
> > > perf_event_alloc+0x652/0x740
> > > SyS_perf_event_open+0x3f6/0xd60
> > > do_syscall_64+0x5c/0x190
> > > entry_SYSCALL64_slow_path+0x25/0x25
> > > RIP: 0033:0x74a1d94580b9
> > > RSP: 002b:00007fff0c01d5d8 EFLAGS: 00000206 ORIG_RAX: 000000000000012a
> > > RAX: ffffffffffffffda RBX: 00007fff0c01d7b0 RCX: 000074a1d94580b9
> > > RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 00007fff0c01d5e0
> > > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000007000000000
> > > R10: 00000000ffffffff R11: 0000000000000206 R12: 0000000000000008
> > > R13: 0000000000000000 R14: 00007fff0c01d790 R15: 00005df43a799600
> > >
> > > This does not show up when booting with pti=off.
> > >
> > > Maybe it is related to the issue that is fixed for the upcoming 4.4.110
> > > release by https://lkml.org/lkml/2018/1/3/692
>
> I don't understand this link.

I found that link when trying to search for the error message. That
patch touches __native_flush_tlb_single() and mentions hardware
differences in Ivybridge and below:

"We have many machines (Westmere, Sandybridge, Ivybridge)
supporting PCID but not INVPCID..."

As I see the error message only on Ivybridge and not on Haswell, I came
up with the vague guess that this could be related.

> The 4.4 and 4.9 backports are much different than the 4.14 tree.

Yes, I have seen that.

> > JFYI, the very same kernel does not show this issue on a Haswell CPU.
>
> I have now queued up a bunch of patches that are in Linus's tree, can
> you test these out as well:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-4.14

Does not seem to make any difference - with those patches applied I
still get:

BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4383
caller is native_flush_tlb_single+0x57/0xc0
CPU: 3 PID: 4383 Comm: ovsdb-server Not tainted 4.14.11-kvm-00435-g3138001170c9 #3
Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
Call Trace:
dump_stack+0x5c/0x86
check_preemption_disabled+0xdd/0xe0
native_flush_tlb_single+0x57/0xc0
? __set_pte_vaddr+0x2d/0x40
__set_pte_vaddr+0x2d/0x40
set_pte_vaddr+0x2f/0x40
cea_set_pte+0x30/0x40
ds_update_cea.constprop.4+0x4d/0x70
reserve_ds_buffers+0x159/0x410
? wp_page_copy+0x36d/0x6a0
x86_reserve_hardware+0x150/0x160
x86_pmu_event_init+0x3e/0x1f0
perf_try_init_event+0x69/0x80
perf_event_alloc+0x652/0x740
SyS_perf_event_open+0x3f6/0xd60
do_syscall_64+0x5c/0x190
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x755c0b8580b9
RSP: 002b:00007fffc87cf9e8 EFLAGS: 00000206 ORIG_RAX: 000000000000012a
RAX: ffffffffffffffda RBX: 00007fffc87cfbc0 RCX: 0000755c0b8580b9
RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 00007fffc87cf9f0
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000007000000000
R10: 00000000ffffffff R11: 0000000000000206 R12: 0000000000000008
R13: 0000000000000000 R14: 00007fffc87cfba0 R15: 000062ea2cbff600

Thanks,

Thomas

2018-01-04 12:55:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: "BUG: using smp_processor_id() in preemptible" with KPTI on 4.14.11

On Thu, Jan 04, 2018 at 01:43:20PM +0100, Thomas Zeitlhofer wrote:
> On Thu, Jan 04, 2018 at 11:51:11AM +0100, Greg Kroah-Hartman wrote:
> > On Thu, Jan 04, 2018 at 11:20:29AM +0100, Thomas Zeitlhofer wrote:
> > > On Thu, Jan 04, 2018 at 02:59:06AM +0100, Thomas Zeitlhofer wrote:
> > > > Hello,
> > > >
> > > > on an Ivybridge CPU, I get with 4.14.11:
> > > >
> > > > BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4510
> > > > caller is native_flush_tlb_single+0x57/0xc0
> > > > CPU: 3 PID: 4510 Comm: ovsdb-server Not tainted 4.14.11-kvm-00434-gcd0b8eb84f5c #3
> > > > Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
> > > > Call Trace:
> > > > dump_stack+0x5c/0x86
> > > > check_preemption_disabled+0xdd/0xe0
> > > > native_flush_tlb_single+0x57/0xc0
> > > > ? __set_pte_vaddr+0x2d/0x40
> > > > __set_pte_vaddr+0x2d/0x40
> > > > set_pte_vaddr+0x2f/0x40
> > > > cea_set_pte+0x30/0x40
> > > > ds_update_cea.constprop.4+0x4d/0x70
> > > > reserve_ds_buffers+0x159/0x410
> > > > ? wp_page_copy+0x36d/0x6a0
> > > > x86_reserve_hardware+0x150/0x160
> > > > x86_pmu_event_init+0x3e/0x1f0
> > > > perf_try_init_event+0x69/0x80
> > > > perf_event_alloc+0x652/0x740
> > > > SyS_perf_event_open+0x3f6/0xd60
> > > > do_syscall_64+0x5c/0x190
> > > > entry_SYSCALL64_slow_path+0x25/0x25
> > > > RIP: 0033:0x74a1d94580b9
> > > > RSP: 002b:00007fff0c01d5d8 EFLAGS: 00000206 ORIG_RAX: 000000000000012a
> > > > RAX: ffffffffffffffda RBX: 00007fff0c01d7b0 RCX: 000074a1d94580b9
> > > > RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 00007fff0c01d5e0
> > > > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000007000000000
> > > > R10: 00000000ffffffff R11: 0000000000000206 R12: 0000000000000008
> > > > R13: 0000000000000000 R14: 00007fff0c01d790 R15: 00005df43a799600
> > > >
> > > > This does not show up when booting with pti=off.
> > > >
> > > > Maybe it is related to the issue that is fixed for the upcoming 4.4.110
> > > > release by https://lkml.org/lkml/2018/1/3/692
> >
> > I don't understand this link.
>
> I found that link when trying to search for the error message. That
> patch touches __native_flush_tlb_single() and mentions hardware
> differences in Ivybridge and below:
>
> "We have many machines (Westmere, Sandybridge, Ivybridge)
> supporting PCID but not INVPCID..."
>
> As I see the error message only on Ivybridge and not on Haswell, I came
> up with the vague guess that this could be related.
>
> > The 4.4 and 4.9 backports are much different than the 4.14 tree.
>
> Yes, I have seen that.
>
> > > JFYI, the very same kernel does not show this issue on a Haswell CPU.
> >
> > I have now queued up a bunch of patches that are in Linus's tree, can
> > you test these out as well:
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-4.14
>
> Does not seem to make any difference - with those patches applied I
> still get:
>
> BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4383
> caller is native_flush_tlb_single+0x57/0xc0
> CPU: 3 PID: 4383 Comm: ovsdb-server Not tainted 4.14.11-kvm-00435-g3138001170c9 #3
> Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
> Call Trace:
> dump_stack+0x5c/0x86
> check_preemption_disabled+0xdd/0xe0
> native_flush_tlb_single+0x57/0xc0
> ? __set_pte_vaddr+0x2d/0x40
> __set_pte_vaddr+0x2d/0x40
> set_pte_vaddr+0x2f/0x40
> cea_set_pte+0x30/0x40
> ds_update_cea.constprop.4+0x4d/0x70
> reserve_ds_buffers+0x159/0x410
> ? wp_page_copy+0x36d/0x6a0
> x86_reserve_hardware+0x150/0x160
> x86_pmu_event_init+0x3e/0x1f0
> perf_try_init_event+0x69/0x80
> perf_event_alloc+0x652/0x740
> SyS_perf_event_open+0x3f6/0xd60
> do_syscall_64+0x5c/0x190
> entry_SYSCALL64_slow_path+0x25/0x25
> RIP: 0033:0x755c0b8580b9
> RSP: 002b:00007fffc87cf9e8 EFLAGS: 00000206 ORIG_RAX: 000000000000012a
> RAX: ffffffffffffffda RBX: 00007fffc87cfbc0 RCX: 0000755c0b8580b9
> RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 00007fffc87cf9f0
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000007000000000
> R10: 00000000ffffffff R11: 0000000000000206 R12: 0000000000000008
> R13: 0000000000000000 R14: 00007fffc87cfba0 R15: 000062ea2cbff600
>

Odd, does 4.15-rc6 also trigger the same error? Thomas is working on an
issue with KALSR (see lkml with:
Subject: Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
)

thanks,

greg k-h

2018-01-04 15:25:27

by Thomas Zeitlhofer

[permalink] [raw]
Subject: Re: "BUG: using smp_processor_id() in preemptible" with KPTI on 4.14.11

On Thu, Jan 04, 2018 at 01:55:28PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Jan 04, 2018 at 01:43:20PM +0100, Thomas Zeitlhofer wrote:
> > On Thu, Jan 04, 2018 at 11:51:11AM +0100, Greg Kroah-Hartman wrote:
> > > On Thu, Jan 04, 2018 at 11:20:29AM +0100, Thomas Zeitlhofer wrote:
> > > > On Thu, Jan 04, 2018 at 02:59:06AM +0100, Thomas Zeitlhofer wrote:
> > > > > Hello,
> > > > >
> > > > > on an Ivybridge CPU, I get with 4.14.11:
> > > > >
> > > > > BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4510
> > > > > caller is native_flush_tlb_single+0x57/0xc0
> > > > > CPU: 3 PID: 4510 Comm: ovsdb-server Not tainted 4.14.11-kvm-00434-gcd0b8eb84f5c #3
> > > > > Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
> > > > > Call Trace:
> > > > > dump_stack+0x5c/0x86
> > > > > check_preemption_disabled+0xdd/0xe0
> > > > > native_flush_tlb_single+0x57/0xc0
> > > > > ? __set_pte_vaddr+0x2d/0x40
> > > > > __set_pte_vaddr+0x2d/0x40
> > > > > set_pte_vaddr+0x2f/0x40
> > > > > cea_set_pte+0x30/0x40
> > > > > ds_update_cea.constprop.4+0x4d/0x70
> > > > > reserve_ds_buffers+0x159/0x410
> > > > > ? wp_page_copy+0x36d/0x6a0
> > > > > x86_reserve_hardware+0x150/0x160
> > > > > x86_pmu_event_init+0x3e/0x1f0
> > > > > perf_try_init_event+0x69/0x80
> > > > > perf_event_alloc+0x652/0x740
> > > > > SyS_perf_event_open+0x3f6/0xd60
> > > > > do_syscall_64+0x5c/0x190
> > > > > entry_SYSCALL64_slow_path+0x25/0x25
> > > > > RIP: 0033:0x74a1d94580b9
> > > > > RSP: 002b:00007fff0c01d5d8 EFLAGS: 00000206 ORIG_RAX: 000000000000012a
> > > > > RAX: ffffffffffffffda RBX: 00007fff0c01d7b0 RCX: 000074a1d94580b9
> > > > > RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 00007fff0c01d5e0
> > > > > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000007000000000
> > > > > R10: 00000000ffffffff R11: 0000000000000206 R12: 0000000000000008
> > > > > R13: 0000000000000000 R14: 00007fff0c01d790 R15: 00005df43a799600
> > > > >
> > > > > This does not show up when booting with pti=off.
> > > > >
> > > > > Maybe it is related to the issue that is fixed for the upcoming 4.4.110
> > > > > release by https://lkml.org/lkml/2018/1/3/692
> > >
> > > I don't understand this link.
> >
> > I found that link when trying to search for the error message. That
> > patch touches __native_flush_tlb_single() and mentions hardware
> > differences in Ivybridge and below:
> >
> > "We have many machines (Westmere, Sandybridge, Ivybridge)
> > supporting PCID but not INVPCID..."
> >
> > As I see the error message only on Ivybridge and not on Haswell, I came
> > up with the vague guess that this could be related.
> >
> > > The 4.4 and 4.9 backports are much different than the 4.14 tree.
> >
> > Yes, I have seen that.
> >
> > > > JFYI, the very same kernel does not show this issue on a Haswell CPU.
> > >
> > > I have now queued up a bunch of patches that are in Linus's tree, can
> > > you test these out as well:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-4.14
> >
> > Does not seem to make any difference - with those patches applied I
> > still get:
> >
> > BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4383
> > caller is native_flush_tlb_single+0x57/0xc0
> > CPU: 3 PID: 4383 Comm: ovsdb-server Not tainted 4.14.11-kvm-00435-g3138001170c9 #3
> > Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
> > Call Trace:
> > dump_stack+0x5c/0x86
> > check_preemption_disabled+0xdd/0xe0
> > native_flush_tlb_single+0x57/0xc0
> > ? __set_pte_vaddr+0x2d/0x40
> > __set_pte_vaddr+0x2d/0x40
> > set_pte_vaddr+0x2f/0x40
> > cea_set_pte+0x30/0x40
> > ds_update_cea.constprop.4+0x4d/0x70
> > reserve_ds_buffers+0x159/0x410
> > ? wp_page_copy+0x36d/0x6a0
> > x86_reserve_hardware+0x150/0x160
> > x86_pmu_event_init+0x3e/0x1f0
> > perf_try_init_event+0x69/0x80
> > perf_event_alloc+0x652/0x740
> > SyS_perf_event_open+0x3f6/0xd60
> > do_syscall_64+0x5c/0x190
> > entry_SYSCALL64_slow_path+0x25/0x25
> > RIP: 0033:0x755c0b8580b9
> > RSP: 002b:00007fffc87cf9e8 EFLAGS: 00000206 ORIG_RAX: 000000000000012a
> > RAX: ffffffffffffffda RBX: 00007fffc87cfbc0 RCX: 0000755c0b8580b9
> > RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 00007fffc87cf9f0
> > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000007000000000
> > R10: 00000000ffffffff R11: 0000000000000206 R12: 0000000000000008
> > R13: 0000000000000000 R14: 00007fffc87cfba0 R15: 000062ea2cbff600
> >
>
> Odd, does 4.15-rc6 also trigger the same error?

Yes:

BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4498
caller is native_flush_tlb_single+0x57/0xc0
CPU: 2 PID: 4498 Comm: ovsdb-server Not tainted 4.15.0-rc6-kvm-00423-gea1908c252eb #3
Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
Call Trace:
dump_stack+0x5c/0x86
check_preemption_disabled+0xdd/0xe0
native_flush_tlb_single+0x57/0xc0
? __set_pte_vaddr+0x2d/0x40
__set_pte_vaddr+0x2d/0x40
set_pte_vaddr+0x2f/0x40
cea_set_pte+0x30/0x40
ds_update_cea.constprop.4+0x4d/0x70
reserve_ds_buffers+0x159/0x410
? wp_page_copy+0x370/0x6c0
x86_reserve_hardware+0x150/0x160
x86_pmu_event_init+0x3e/0x1f0
perf_try_init_event+0x69/0x80
perf_event_alloc+0x652/0x740
SyS_perf_event_open+0x3f6/0xd60
do_syscall_64+0x5c/0x190
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x72bff0a3c0b9
RSP: 002b:00007ffed11c2f18 EFLAGS: 00000206 ORIG_RAX: 000000000000012a
RAX: ffffffffffffffda RBX: 00007ffed11c30f0 RCX: 000072bff0a3c0b9
RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 00007ffed11c2f20
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000007000000000
R10: 00000000ffffffff R11: 0000000000000206 R12: 0000000000000008
R13: 0000000000000000 R14: 00007ffed11c30d0 R15: 000060986ecfb600
device ovs-system entered promiscuous mode
netlink: 'ovs-vswitchd': attribute type 5 has an invalid length.

In addition, with v4.15-rc6, netlink messages like in the last line show
up, but I guess this is a different openvswitch related issue.

> Thomas is working on an
> issue with KALSR (see lkml with:
> Subject: Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
> )

Yes, I have also seen that thread, but I did not see any similarities to
my issue. Anyway, I also tried out the patch proposed in
https://lkml.org/lkml/2018/1/4/313 but it does not change anything here.

Thanks,

Thomas

2018-01-04 15:37:28

by Thomas Gleixner

[permalink] [raw]
Subject: Re: "BUG: using smp_processor_id() in preemptible" with KPTI on 4.14.11

On Thu, 4 Jan 2018, Thomas Zeitlhofer wrote:
> On Thu, Jan 04, 2018 at 01:55:28PM +0100, Greg Kroah-Hartman wrote:
> > > > > > on an Ivybridge CPU, I get with 4.14.11:
> > > > > >
> > > > > > BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4510
> > > > > > caller is native_flush_tlb_single+0x57/0xc0
> > > > > > CPU: 3 PID: 4510 Comm: ovsdb-server Not tainted 4.14.11-kvm-00434-gcd0b8eb84f5c #3
> > > > > > Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
> > > > > > Call Trace:
> > > > > > dump_stack+0x5c/0x86
> > > > > > check_preemption_disabled+0xdd/0xe0
> > > > > > native_flush_tlb_single+0x57/0xc0
> > > > > > ? __set_pte_vaddr+0x2d/0x40
> > > > > > __set_pte_vaddr+0x2d/0x40
> > > > > > set_pte_vaddr+0x2f/0x40
> > > > > > cea_set_pte+0x30/0x40
> > > > > > ds_update_cea.constprop.4+0x4d/0x70
> > > > > > reserve_ds_buffers+0x159/0x410
> > > > > > ? wp_page_copy+0x36d/0x6a0
> > > > > > x86_reserve_hardware+0x150/0x160
> > > > > > x86_pmu_event_init+0x3e/0x1f0
> > > > > > perf_try_init_event+0x69/0x80
> > > > > > perf_event_alloc+0x652/0x740
> > > > > > SyS_perf_event_open+0x3f6/0xd60
> > > > > > do_syscall_64+0x5c/0x190
> > > > > > entry_SYSCALL64_slow_path+0x25/0x25
> > > > > > RIP: 0033:0x74a1d94580b9
> > > > > > RSP: 002b:00007fff0c01d5d8 EFLAGS: 00000206 ORIG_RAX: 000000000000012a
> > > > > > RAX: ffffffffffffffda RBX: 00007fff0c01d7b0 RCX: 000074a1d94580b9
> > > > > > RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 00007fff0c01d5e0
> > > > > > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000007000000000
> > > > > > R10: 00000000ffffffff R11: 0000000000000206 R12: 0000000000000008
> > > > > > R13: 0000000000000000 R14: 00007fff0c01d790 R15: 00005df43a799600
> > > > > >
> > > > > > This does not show up when booting with pti=off.

Right, because the code path is not invoked ....

> > Odd, does 4.15-rc6 also trigger the same error?
>
> Yes:
>
> BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4498
> caller is native_flush_tlb_single+0x57/0xc0
> CPU: 2 PID: 4498 Comm: ovsdb-server Not tainted 4.15.0-rc6-kvm-00423-gea1908c252eb #3
> Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
> Call Trace:
> dump_stack+0x5c/0x86
> check_preemption_disabled+0xdd/0xe0
> native_flush_tlb_single+0x57/0xc0
> ? __set_pte_vaddr+0x2d/0x40
> __set_pte_vaddr+0x2d/0x40
> set_pte_vaddr+0x2f/0x40
> cea_set_pte+0x30/0x40
> ds_update_cea.constprop.4+0x4d/0x70
> reserve_ds_buffers+0x159/0x410
> ? wp_page_copy+0x370/0x6c0
> x86_reserve_hardware+0x150/0x160
> x86_pmu_event_init+0x3e/0x1f0
> perf_try_init_event+0x69/0x80
> perf_event_alloc+0x652/0x740
> SyS_perf_event_open+0x3f6/0xd60
> do_syscall_64+0x5c/0x190
> entry_SYSCALL64_slow_path+0x25/0x25
> RIP: 0033:0x72bff0a3c0b9
> RSP: 002b:00007ffed11c2f18 EFLAGS: 00000206 ORIG_RAX: 000000000000012a
> RAX: ffffffffffffffda RBX: 00007ffed11c30f0 RCX: 000072bff0a3c0b9
> RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 00007ffed11c2f20
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000007000000000
> R10: 00000000ffffffff R11: 0000000000000206 R12: 0000000000000008
> R13: 0000000000000000 R14: 00007ffed11c30d0 R15: 000060986ecfb600
> device ovs-system entered promiscuous mode
> netlink: 'ovs-vswitchd': attribute type 5 has an invalid length.
>
> In addition, with v4.15-rc6, netlink messages like in the last line show
> up, but I guess this is a different openvswitch related issue.
>
> > Thomas is working on an
> > issue with KALSR (see lkml with:
> > Subject: Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
> > )
>
> Yes, I have also seen that thread, but I did not see any similarities to
> my issue. Anyway, I also tried out the patch proposed in
> https://lkml.org/lkml/2018/1/4/313 but it does not change anything here.

Correct. I'm looking into a fix. Stay tuned.

Thanks,

tglx

2018-01-04 17:07:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: "BUG: using smp_processor_id() in preemptible" with KPTI on 4.14.11

On Thu, Jan 04, 2018 at 04:37:24PM +0100, Thomas Gleixner wrote:
> > Yes:
> >
> > BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4498
> > caller is native_flush_tlb_single+0x57/0xc0
> > CPU: 2 PID: 4498 Comm: ovsdb-server Not tainted 4.15.0-rc6-kvm-00423-gea1908c252eb #3
> > Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
> > Call Trace:
> > dump_stack+0x5c/0x86
> > check_preemption_disabled+0xdd/0xe0
> > native_flush_tlb_single+0x57/0xc0
> > ? __set_pte_vaddr+0x2d/0x40
> > __set_pte_vaddr+0x2d/0x40
> > set_pte_vaddr+0x2f/0x40
> > cea_set_pte+0x30/0x40
> > ds_update_cea.constprop.4+0x4d/0x70
> > reserve_ds_buffers+0x159/0x410
> > ? wp_page_copy+0x370/0x6c0
> > x86_reserve_hardware+0x150/0x160
> > x86_pmu_event_init+0x3e/0x1f0
> > perf_try_init_event+0x69/0x80
> > perf_event_alloc+0x652/0x740
> > SyS_perf_event_open+0x3f6/0xd60
> > do_syscall_64+0x5c/0x190
> > entry_SYSCALL64_slow_path+0x25/0x25
> > RIP: 0033:0x72bff0a3c0b9
> > RSP: 002b:00007ffed11c2f18 EFLAGS: 00000206 ORIG_RAX: 000000000000012a
> > RAX: ffffffffffffffda RBX: 00007ffed11c30f0 RCX: 000072bff0a3c0b9
> > RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 00007ffed11c2f20
> > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000007000000000
> > R10: 00000000ffffffff R11: 0000000000000206 R12: 0000000000000008
> > R13: 0000000000000000 R14: 00007ffed11c30d0 R15: 000060986ecfb600

Fun, so set_pte_vaddr() and the whole cpu_entry_area are supposed to be
per CPU. But the DS crud does cross CPU updates of those tables.

So we need some additional fun and games..

How's the below?

---
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 8f0aace08b87..8156e47da7ba 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -5,6 +5,7 @@

#include <asm/cpu_entry_area.h>
#include <asm/perf_event.h>
+#include <asm/tlbflush.h>
#include <asm/insn.h>

#include "../perf_event.h"
@@ -283,20 +284,35 @@ static DEFINE_PER_CPU(void *, insn_buffer);

static void ds_update_cea(void *cea, void *addr, size_t size, pgprot_t prot)
{
+ unsigned long start = (unsigned long)cea;
phys_addr_t pa;
size_t msz = 0;

pa = virt_to_phys(addr);
+
+ preempt_disable();
for (; msz < size; msz += PAGE_SIZE, pa += PAGE_SIZE, cea += PAGE_SIZE)
cea_set_pte(cea, pa, prot);
+
+ /*
+ * This is a cross-CPU update of the cpu_entry_area, we must shoot down
+ * all TLB entries for it.
+ */
+ flush_tlb_kernel_range(start, start + size);
+ preempt_enable();
}

static void ds_clear_cea(void *cea, size_t size)
{
+ unsigned long start = (unsigned long)cea;
size_t msz = 0;

+ preempt_disable();
for (; msz < size; msz += PAGE_SIZE, cea += PAGE_SIZE)
cea_set_pte(cea, 0, PAGE_NONE);
+
+ flush_tlb_kernel_range(start, start + size);
+ preempt_enable();
}

static void *dsalloc_pages(size_t size, gfp_t flags, int cpu)

2018-01-04 18:38:05

by Thomas Zeitlhofer

[permalink] [raw]
Subject: Re: "BUG: using smp_processor_id() in preemptible" with KPTI on 4.14.11

On Thu, Jan 04, 2018 at 06:07:12PM +0100, Peter Zijlstra wrote:
> On Thu, Jan 04, 2018 at 04:37:24PM +0100, Thomas Gleixner wrote:
> > > Yes:
> > >
> > > BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4498
> > > caller is native_flush_tlb_single+0x57/0xc0
> > > CPU: 2 PID: 4498 Comm: ovsdb-server Not tainted 4.15.0-rc6-kvm-00423-gea1908c252eb #3
> > > Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
> > > Call Trace:
> > > dump_stack+0x5c/0x86
> > > check_preemption_disabled+0xdd/0xe0
> > > native_flush_tlb_single+0x57/0xc0
> > > ? __set_pte_vaddr+0x2d/0x40
> > > __set_pte_vaddr+0x2d/0x40
> > > set_pte_vaddr+0x2f/0x40
> > > cea_set_pte+0x30/0x40
> > > ds_update_cea.constprop.4+0x4d/0x70
> > > reserve_ds_buffers+0x159/0x410
> > > ? wp_page_copy+0x370/0x6c0
> > > x86_reserve_hardware+0x150/0x160
> > > x86_pmu_event_init+0x3e/0x1f0
> > > perf_try_init_event+0x69/0x80
> > > perf_event_alloc+0x652/0x740
> > > SyS_perf_event_open+0x3f6/0xd60
> > > do_syscall_64+0x5c/0x190
> > > entry_SYSCALL64_slow_path+0x25/0x25
> > > RIP: 0033:0x72bff0a3c0b9
> > > RSP: 002b:00007ffed11c2f18 EFLAGS: 00000206 ORIG_RAX: 000000000000012a
> > > RAX: ffffffffffffffda RBX: 00007ffed11c30f0 RCX: 000072bff0a3c0b9
> > > RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 00007ffed11c2f20
> > > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000007000000000
> > > R10: 00000000ffffffff R11: 0000000000000206 R12: 0000000000000008
> > > R13: 0000000000000000 R14: 00007ffed11c30d0 R15: 000060986ecfb600
>
> Fun, so set_pte_vaddr() and the whole cpu_entry_area are supposed to be
> per CPU. But the DS crud does cross CPU updates of those tables.
>
> So we need some additional fun and games..
>
> How's the below?
[...]

Looks good - I have successfully tested it on top of 4.14.11 and
4.15-rc6. In both cases, the error message is gone when this patch is
applied.

Thanks,

Thomas

Subject: [tip:x86/pti] x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers

Commit-ID: 2f8411c2e98d93100de55d413f7c54a090bdf04e
Gitweb: https://git.kernel.org/tip/2f8411c2e98d93100de55d413f7c54a090bdf04e
Author: Peter Zijlstra <[email protected]>
AuthorDate: Thu, 4 Jan 2018 18:07:12 +0100
Committer: Thomas Gleixner <[email protected]>
CommitDate: Thu, 4 Jan 2018 23:04:58 +0100

x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers

Thomas reported the following warning:

BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4498
caller is native_flush_tlb_single+0x57/0xc0
native_flush_tlb_single+0x57/0xc0
__set_pte_vaddr+0x2d/0x40
set_pte_vaddr+0x2f/0x40
cea_set_pte+0x30/0x40
ds_update_cea.constprop.4+0x4d/0x70
reserve_ds_buffers+0x159/0x410
x86_reserve_hardware+0x150/0x160
x86_pmu_event_init+0x3e/0x1f0
perf_try_init_event+0x69/0x80
perf_event_alloc+0x652/0x740
SyS_perf_event_open+0x3f6/0xd60
do_syscall_64+0x5c/0x190

set_pte_vaddr is used to map the ds buffers into the cpu entry area, but
there are two problems with that:

1) The resulting flush is not supposed to be called in preemptible context

2) The cpu entry area is supposed to be per CPU, but the debug store
buffers are mapped for all CPUs so these mappings need to be flushed
globally.

Add the necessary preemption protection across the mapping code and flush
TLBs globally.

Fixes: c1961a4631da ("x86/events/intel/ds: Map debug buffers in cpu_entry_area")
Reported-by: Thomas Zeitlhofer <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Thomas Zeitlhofer <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

---
arch/x86/events/intel/ds.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 8f0aace..8156e47 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -5,6 +5,7 @@

#include <asm/cpu_entry_area.h>
#include <asm/perf_event.h>
+#include <asm/tlbflush.h>
#include <asm/insn.h>

#include "../perf_event.h"
@@ -283,20 +284,35 @@ static DEFINE_PER_CPU(void *, insn_buffer);

static void ds_update_cea(void *cea, void *addr, size_t size, pgprot_t prot)
{
+ unsigned long start = (unsigned long)cea;
phys_addr_t pa;
size_t msz = 0;

pa = virt_to_phys(addr);
+
+ preempt_disable();
for (; msz < size; msz += PAGE_SIZE, pa += PAGE_SIZE, cea += PAGE_SIZE)
cea_set_pte(cea, pa, prot);
+
+ /*
+ * This is a cross-CPU update of the cpu_entry_area, we must shoot down
+ * all TLB entries for it.
+ */
+ flush_tlb_kernel_range(start, start + size);
+ preempt_enable();
}

static void ds_clear_cea(void *cea, size_t size)
{
+ unsigned long start = (unsigned long)cea;
size_t msz = 0;

+ preempt_disable();
for (; msz < size; msz += PAGE_SIZE, cea += PAGE_SIZE)
cea_set_pte(cea, 0, PAGE_NONE);
+
+ flush_tlb_kernel_range(start, start + size);
+ preempt_enable();
}

static void *dsalloc_pages(size_t size, gfp_t flags, int cpu)

Subject: [tip:x86/pti] x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers

Commit-ID: 42f3bdc5dd962a5958bc024c1e1444248a6b8b4a
Gitweb: https://git.kernel.org/tip/42f3bdc5dd962a5958bc024c1e1444248a6b8b4a
Author: Peter Zijlstra <[email protected]>
AuthorDate: Thu, 4 Jan 2018 18:07:12 +0100
Committer: Thomas Gleixner <[email protected]>
CommitDate: Fri, 5 Jan 2018 00:39:58 +0100

x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers

Thomas reported the following warning:

BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4498
caller is native_flush_tlb_single+0x57/0xc0
native_flush_tlb_single+0x57/0xc0
__set_pte_vaddr+0x2d/0x40
set_pte_vaddr+0x2f/0x40
cea_set_pte+0x30/0x40
ds_update_cea.constprop.4+0x4d/0x70
reserve_ds_buffers+0x159/0x410
x86_reserve_hardware+0x150/0x160
x86_pmu_event_init+0x3e/0x1f0
perf_try_init_event+0x69/0x80
perf_event_alloc+0x652/0x740
SyS_perf_event_open+0x3f6/0xd60
do_syscall_64+0x5c/0x190

set_pte_vaddr is used to map the ds buffers into the cpu entry area, but
there are two problems with that:

1) The resulting flush is not supposed to be called in preemptible context

2) The cpu entry area is supposed to be per CPU, but the debug store
buffers are mapped for all CPUs so these mappings need to be flushed
globally.

Add the necessary preemption protection across the mapping code and flush
TLBs globally.

Fixes: c1961a4631da ("x86/events/intel/ds: Map debug buffers in cpu_entry_area")
Reported-by: Thomas Zeitlhofer <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Thomas Zeitlhofer <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

---
arch/x86/events/intel/ds.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 8f0aace..8156e47 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -5,6 +5,7 @@

#include <asm/cpu_entry_area.h>
#include <asm/perf_event.h>
+#include <asm/tlbflush.h>
#include <asm/insn.h>

#include "../perf_event.h"
@@ -283,20 +284,35 @@ static DEFINE_PER_CPU(void *, insn_buffer);

static void ds_update_cea(void *cea, void *addr, size_t size, pgprot_t prot)
{
+ unsigned long start = (unsigned long)cea;
phys_addr_t pa;
size_t msz = 0;

pa = virt_to_phys(addr);
+
+ preempt_disable();
for (; msz < size; msz += PAGE_SIZE, pa += PAGE_SIZE, cea += PAGE_SIZE)
cea_set_pte(cea, pa, prot);
+
+ /*
+ * This is a cross-CPU update of the cpu_entry_area, we must shoot down
+ * all TLB entries for it.
+ */
+ flush_tlb_kernel_range(start, start + size);
+ preempt_enable();
}

static void ds_clear_cea(void *cea, size_t size)
{
+ unsigned long start = (unsigned long)cea;
size_t msz = 0;

+ preempt_disable();
for (; msz < size; msz += PAGE_SIZE, cea += PAGE_SIZE)
cea_set_pte(cea, 0, PAGE_NONE);
+
+ flush_tlb_kernel_range(start, start + size);
+ preempt_enable();
}

static void *dsalloc_pages(size_t size, gfp_t flags, int cpu)

2018-01-06 21:38:52

by Thomas Zeitlhofer

[permalink] [raw]
Subject: Re: "BUG: using smp_processor_id() in preemptible" with KPTI on 4.14.11

On Thu, Jan 04, 2018 at 07:38:00PM +0100, Thomas Zeitlhofer wrote:
> On Thu, Jan 04, 2018 at 06:07:12PM +0100, Peter Zijlstra wrote:
> > On Thu, Jan 04, 2018 at 04:37:24PM +0100, Thomas Gleixner wrote:
> > > > Yes:
> > > >
> > > > BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4498
> > > > caller is native_flush_tlb_single+0x57/0xc0
> > > > CPU: 2 PID: 4498 Comm: ovsdb-server Not tainted 4.15.0-rc6-kvm-00423-gea1908c252eb #3
> > > > Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
> > > > Call Trace:
> > > > dump_stack+0x5c/0x86
> > > > check_preemption_disabled+0xdd/0xe0
> > > > native_flush_tlb_single+0x57/0xc0
> > > > ? __set_pte_vaddr+0x2d/0x40
> > > > __set_pte_vaddr+0x2d/0x40
> > > > set_pte_vaddr+0x2f/0x40
> > > > cea_set_pte+0x30/0x40
> > > > ds_update_cea.constprop.4+0x4d/0x70
> > > > reserve_ds_buffers+0x159/0x410
> > > > ? wp_page_copy+0x370/0x6c0
> > > > x86_reserve_hardware+0x150/0x160
> > > > x86_pmu_event_init+0x3e/0x1f0
> > > > perf_try_init_event+0x69/0x80
> > > > perf_event_alloc+0x652/0x740
> > > > SyS_perf_event_open+0x3f6/0xd60
> > > > do_syscall_64+0x5c/0x190
> > > > entry_SYSCALL64_slow_path+0x25/0x25
> > > > RIP: 0033:0x72bff0a3c0b9
> > > > RSP: 002b:00007ffed11c2f18 EFLAGS: 00000206 ORIG_RAX: 000000000000012a
> > > > RAX: ffffffffffffffda RBX: 00007ffed11c30f0 RCX: 000072bff0a3c0b9
> > > > RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 00007ffed11c2f20
> > > > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000007000000000
> > > > R10: 00000000ffffffff R11: 0000000000000206 R12: 0000000000000008
> > > > R13: 0000000000000000 R14: 00007ffed11c30d0 R15: 000060986ecfb600
> >
> > Fun, so set_pte_vaddr() and the whole cpu_entry_area are supposed to be
> > per CPU. But the DS crud does cross CPU updates of those tables.
> >
> > So we need some additional fun and games..
> >
> > How's the below?
> [...]
>
> Looks good - I have successfully tested it on top of 4.14.11 and
> 4.15-rc6. In both cases, the error message is gone when this patch is
> applied.

While solving the previous problem, this patch also introduces new "fun
and games"...

Now, terminating a systemd-nspawn container, reliably crashes the host
(so far tested only on Haswell, if that matters). Once, I was able to
capture the following trace:

BUG: unable to handle kernel paging request at 0000000000206ccc
IP: __task_pid_nr_ns+0x57/0xc0
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
Modules linked in: uinput veth ip_vti ip_tunnel esp4 xfrm6_mode_tunnel fuse ccm xt_CHECKSUM tun bridge stp llc xfrm_user xfrm_algo ebtable_filter twofish_generic twofish_avx_x86_64 ebtables twofish_x86_64_3way twofish_x86_64 twofish_common vxlan ip6_udp_tunnel udp_tunnel serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic devlink blowfish_generic blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic cast_common des_generic algif_skcipher camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 ablk_helper camellia_x86_64 xcbc openvswitch nf_nat_ipv6 md4 algif_hash af_alg cmac rfcomm bnep xt_policy nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat msr nf_nat_ipv4 nf_nat xt_TCPMSS iptable_mangle ipt_REJECT
nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_conntrack nf_conntrack binfmt_misc iptable_filter snd_hda_codec_hdmi hid_sensor_als hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_incl_3d hid_sensor_rotation hid_sensor_accel_3d hid_sensor_trigger hid_sensor_iio_common industrialio_triggered_buffer kfifo_buf industrialio rtsx_pci_sdmmc mmc_core iTCO_wdt wmi_bmof arc4 x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel uvcvideo joydev wacom videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_sensor_hub videodev btusb btrtl hid_multitouch btbcm media btintel rtsx_pci i915 bluetooth snd_hda_codec_conexant lpc_ich snd_hda_codec_generic mfd_core iwlmvm iosf_mbi i2c_algo_bit ecdh_generic
drm_kms_helper mac80211 snd_hda_intel syscopyarea snd_hda_codec sysfillrect sysimgblt snd_hda_core snd_pcm_oss iwlwifi fb_sys_fops thinkpad_acpi snd_mixer_oss drm nvram snd_pcm video cfg80211 intel_gtt snd_timer rfkill snd evdev wmi ecryptfs nfsd ip_tables x_tables ipv6 crc_ccitt
CPU: 2 PID: 1 Comm: systemd Not tainted 4.14.12-kvm-00437-gd6765c06f03d #4
Hardware name: LENOVO 20CD0035GE/20CD0035GE, BIOS GQET40WW (1.20 ) 11/07/2014
task: ffff9c66560e0d00 task.stack: ffffbc6a00038000
RIP: 0010:__task_pid_nr_ns+0x57/0xc0
RSP: 0018:ffffbc6a0003bdb0 EFLAGS: 00010246
RAX: ffff9c66560e8680 RBX: 0000000000000000 RCX: 0000000000206cc8
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000004d0
RBP: 0000000000000000 R08: ffffffffb0237b10 R09: 0000000000000005
R10: ffffbc6a0003bee0 R11: ffff9c65aa33c004 R12: ffffffffb02309a0
R13: 0000000000001000 R14: ffff9c65ecbd4a00 R15: ffff9c6624516b00
FS: 0000767a01669980(0000) GS:ffff9c665f280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000206ccc CR3: 0000000215476003 CR4: 00000000001606e0
Call Trace:
cgroup_procs_show+0x10/0x30
seq_read+0x30c/0x3d0
__vfs_read+0x2e/0x150
vfs_read+0x84/0x110
SyS_read+0x4d/0xc0
do_syscall_64+0x5c/0x190
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x767a00fa671d
RSP: 002b:00007ffca8edc6e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 000057d4d8a02c10 RCX: 0000767a00fa671d
RDX: 0000000000001000 RSI: 000057d4d8a05320 RDI: 0000000000000083
RBP: 0000000000000d68 R08: 0000767a01265178 R09: 0000000000001010
R10: 000057d4d8a03490 R11: 0000000000000293 R12: 0000767a01261440
R13: 0000767a01260900 R14: 00000000ffffffff R15: 0000000000000000
Code: 74 0d 48 8d 44 6d 00 48 8d 3c c5 d0 04 00 00 48 8b 9b 98 04 00 00 48 01 fb 48 8b 0b 48 85 c9 74 37 41 8b b4 24 30 08 00 00 31 db <3b> 71 04 77 0d 48 c1 e6 05 48 01 f1 4c 3b 61 38 74 0c e8 12 db
RIP: __task_pid_nr_ns+0x57/0xc0 RSP: ffffbc6a0003bdb0
CR2: 0000000000206ccc
---[ end trace ce7578070732b5ee ]---
BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
IP: pids_free+0xb/0x30
PGD 0 P4D 0
Oops: 0000 [#2] PREEMPT SMP PTI
Modules linked in: uinput veth ip_vti ip_tunnel esp4 xfrm6_mode_tunnel fuse ccm xt_CHECKSUM tun bridge stp llc xfrm_user xfrm_algo ebtable_filter twofish_generic twofish_avx_x86_64 ebtables twofish_x86_64_3way twofish_x86_64 twofish_common vxlan ip6_udp_tunnel udp_tunnel serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic devlink blowfish_generic blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic cast_common des_generic algif_skcipher camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 ablk_helper camellia_x86_64 xcbc openvswitch nf_nat_ipv6 md4 algif_hash af_alg cmac rfcomm bnep xt_policy nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat msr nf_nat_ipv4 nf_nat xt_TCPMSS iptable_mangle ipt_REJECT
nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_conntrack nf_conntrack binfmt_misc iptable_filter snd_hda_codec_hdmi hid_sensor_als hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_incl_3d hid_sensor_rotation hid_sensor_accel_3d hid_sensor_trigger hid_sensor_iio_common industrialio_triggered_buffer kfifo_buf industrialio rtsx_pci_sdmmc mmc_core iTCO_wdt wmi_bmof arc4 x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel uvcvideo joydev wacom videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_sensor_hub videodev btusb btrtl hid_multitouch btbcm media btintel rtsx_pci i915 bluetooth snd_hda_codec_conexant lpc_ich snd_hda_codec_generic mfd_core iwlmvm iosf_mbi i2c_algo_bit ecdh_generic
drm_kms_helper mac80211 snd_hda_intel syscopyarea snd_hda_codec sysfillrect sysimgblt snd_hda_core snd_pcm_oss iwlwifi fb_sys_fops thinkpad_acpi snd_mixer_oss drm nvram snd_pcm video cfg80211 intel_gtt snd_timer rfkill snd evdev wmi ecryptfs nfsd ip_tables x_tables ipv6 crc_ccitt
CPU: 2 PID: 1 Comm: systemd Tainted: G D 4.14.12-kvm-00437-gd6765c06f03d #4
Hardware name: LENOVO 20CD0035GE/20CD0035GE, BIOS GQET40WW (1.20 ) 11/07/2014
task: ffff9c66560e0d00 task.stack: ffffbc6a00038000
RIP: 0010:pids_free+0xb/0x30
RSP: 0018:ffffbc6a0003bdd8 EFLAGS: 00010297
RAX: 0000000000000000 RBX: 000000000000000a RCX: 000000000000000a
RDX: 000000000000000a RSI: 000000000000000c RDI: ffff9c6624516b00
RBP: ffff9c6624516b00 R08: 0000000000000000 R09: 0000000000000000
R10: ffff9c65bf8a8510 R11: ffff9c6656003800 R12: ffffffffb02387e0
R13: ffff9c662ac6d590 R14: ffff9c66534cc7a0 R15: ffff9c6625d5f1e0
FS: 0000000000000000(0000) GS:ffff9c665f280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000b0 CR3: 000000008220a006 CR4: 00000000001606e0
Call Trace:
cgroup_free+0x57/0xd0
__put_task_struct+0x38/0x130
cgroup_procs_release+0x12/0x20
kernfs_fop_release+0x82/0x90
__fput+0x9d/0x220
task_work_run+0x84/0xa0
do_exit+0x2b1/0xab0
rewind_stack_do_exit+0x17/0x20
Code: c7 e8 6a fd ff ff 48 8b 80 b0 00 00 00 48 83 b8 b0 00 00 00 00 75 e7 f3 c3 0f 1f 80 00 00 00 00 48 8b 87 88 07 00 00 48 8b 40 50 <48> 83 b8 b0 00 00 00 00 74 19 48 89 c7 e8 33 fd ff ff 48 8b 80
RIP: pids_free+0xb/0x30 RSP: ffffbc6a0003bdd8
CR2: 00000000000000b0
---[ end trace ce7578070732b5ef ]---
Fixing recursive fault but reboot is needed!
------------[ cut here ]------------
WARNING: CPU: 2 PID: 1 at kernel/rcu/tree_plugin.h:329 rcu_note_context_switch+0x27/0x350
Modules linked in: uinput veth ip_vti ip_tunnel esp4 xfrm6_mode_tunnel fuse ccm xt_CHECKSUM tun bridge stp llc xfrm_user xfrm_algo ebtable_filter twofish_generic twofish_avx_x86_64 ebtables twofish_x86_64_3way twofish_x86_64 twofish_common vxlan ip6_udp_tunnel udp_tunnel serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic devlink blowfish_generic blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic cast_common des_generic algif_skcipher camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 ablk_helper camellia_x86_64 xcbc openvswitch nf_nat_ipv6 md4 algif_hash af_alg cmac rfcomm bnep xt_policy nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat msr nf_nat_ipv4 nf_nat xt_TCPMSS iptable_mangle ipt_REJECT
nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_conntrack nf_conntrack binfmt_misc iptable_filter snd_hda_codec_hdmi hid_sensor_als hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_incl_3d hid_sensor_rotation hid_sensor_accel_3d hid_sensor_trigger hid_sensor_iio_common industrialio_triggered_buffer kfifo_buf industrialio rtsx_pci_sdmmc mmc_core iTCO_wdt wmi_bmof arc4 x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel uvcvideo joydev wacom videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_sensor_hub videodev btusb btrtl hid_multitouch btbcm media btintel rtsx_pci i915 bluetooth snd_hda_codec_conexant lpc_ich snd_hda_codec_generic mfd_core iwlmvm iosf_mbi i2c_algo_bit ecdh_generic
drm_kms_helper mac80211 snd_hda_intel syscopyarea snd_hda_codec sysfillrect sysimgblt snd_hda_core snd_pcm_oss iwlwifi fb_sys_fops thinkpad_acpi snd_mixer_oss drm nvram snd_pcm video cfg80211 intel_gtt snd_timer rfkill snd evdev wmi ecryptfs nfsd ip_tables x_tables ipv6 crc_ccitt
CPU: 2 PID: 1 Comm: systemd Tainted: G D 4.14.12-kvm-00437-gd6765c06f03d #4
Hardware name: LENOVO 20CD0035GE/20CD0035GE, BIOS GQET40WW (1.20 ) 11/07/2014
task: ffff9c66560e0d00 task.stack: ffffbc6a00038000
RIP: 0010:rcu_note_context_switch+0x27/0x350
RSP: 0018:ffffbc6a0003be58 EFLAGS: 00010002
RAX: 0000000000000001 RBX: ffff9c66560e0d00 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffffffafff992f RDI: ffffffffaffb7ead
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000365
R10: 0000000000000086 R11: 0000000000000000 R12: ffff9c665f29fbc0
R13: ffff9c66560e0d00 R14: ffff9c66560e12a8 R15: 000000000001fbc0
FS: 0000000000000000(0000) GS:ffff9c665f280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000b0 CR3: 000000008220a006 CR4: 00000000001606e0
Call Trace:
__schedule+0x84/0x6f0
schedule+0x37/0x90
do_exit+0x8c2/0xab0
rewind_stack_do_exit+0x17/0x20
Code: 00 00 00 00 41 56 41 55 41 54 55 89 fd 53 65 48 8b 1c 25 00 4d 01 00 e8 48 da ff ff 40 84 ed 8b 83 f8 02 00 00 75 7d 85 c0 7e 7d <0f> ff 80 bb fc 02 00 00 00 0f 84 89 00 00 00 e8 c5 ca ff ff e8
---[ end trace ce7578070732b5f0 ]---
INFO: rcu_preempt detected stalls on CPUs/tasks:
Tasks blocked on level-0 rcu_node (CPUs 0-7): P1
(detected by 2, t=60002 jiffies, g=551687, c=551686, q=11683)
systemd D 0 1 0 0x80080002
Call Trace:
? __schedule+0x292/0x6f0
schedule+0x37/0x90
do_exit+0x8c2/0xab0
rewind_stack_do_exit+0x17/0x20
systemd D 0 1 0 0x80080002
Call Trace:
? __schedule+0x292/0x6f0
schedule+0x37/0x90
do_exit+0x8c2/0xab0
rewind_stack_do_exit+0x17/0x20

The crash does not happen with plain 4.14.11, but when this patch (*) is
included it happens with 4.14.1[12], and 4.14.12 plus the following set
of patches from the current 4.14 stable-queue:

x86-mm-set-modules_end-to-0xffffffffff000000.patch
x86-mm-map-cpu_entry_area-at-the-same-place-on-4-5-level.patch
x86-kaslr-fix-the-vaddr_end-mess.patch
(*) x86-events-intel-ds-use-the-proper-cache-flush-method-for-mapping-ds-buffers.patch
x86-tlb-drop-the-_gpl-from-the-cpu_tlbstate-export.patch
x86-alternatives-add-missing-n-at-end-of-alternative-inline-asm.patch
x86-pti-rename-bug_cpu_insecure-to-bug_cpu_meltdown.patch
kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
mm-mprotect-add-a-cond_resched-inside-change_pmd_range.patch
mm-sparse.c-wrong-allocation-for-mem_section.patch
userfaultfd-clear-the-vma-vm_userfaultfd_ctx-if-uffd_event_fork-fails.patch
btrfs-fix-refcount_t-usage-when-deleting-btrfs_delayed_nodes.patch
efi-capsule-loader-reinstate-virtual-capsule-mapping.patch
crypto-n2-cure-use-after-free.patch
crypto-chacha20poly1305-validate-the-digest-size.patch
crypto-pcrypt-fix-freeing-pcrypt-instances.patch
crypto-chelsio-select-crypto_gf128mul.patch
drm-i915-disable-dc-states-around-gmbus-on-glk.patch
drm-i915-apply-display-wa-1183-on-skl-kbl-and-cfl.patch
sunxi-rsb-include-of-based-modalias-in-device-uevent.patch
fscache-fix-the-default-for-fscache_maybe_release_page.patch

Thanks,

Thomas

2018-01-07 08:17:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: "BUG: using smp_processor_id() in preemptible" with KPTI on 4.14.11

On Sat, Jan 06, 2018 at 10:38:38PM +0100, Thomas Zeitlhofer wrote:
> On Thu, Jan 04, 2018 at 07:38:00PM +0100, Thomas Zeitlhofer wrote:
> > On Thu, Jan 04, 2018 at 06:07:12PM +0100, Peter Zijlstra wrote:
> > > On Thu, Jan 04, 2018 at 04:37:24PM +0100, Thomas Gleixner wrote:
> > > > > Yes:
> > > > >
> > > > > BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4498
> > > > > caller is native_flush_tlb_single+0x57/0xc0
> > > > > CPU: 2 PID: 4498 Comm: ovsdb-server Not tainted 4.15.0-rc6-kvm-00423-gea1908c252eb #3
> > > > > Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
> > > > > Call Trace:
> > > > > dump_stack+0x5c/0x86
> > > > > check_preemption_disabled+0xdd/0xe0
> > > > > native_flush_tlb_single+0x57/0xc0
> > > > > ? __set_pte_vaddr+0x2d/0x40
> > > > > __set_pte_vaddr+0x2d/0x40
> > > > > set_pte_vaddr+0x2f/0x40
> > > > > cea_set_pte+0x30/0x40
> > > > > ds_update_cea.constprop.4+0x4d/0x70
> > > > > reserve_ds_buffers+0x159/0x410
> > > > > ? wp_page_copy+0x370/0x6c0
> > > > > x86_reserve_hardware+0x150/0x160
> > > > > x86_pmu_event_init+0x3e/0x1f0
> > > > > perf_try_init_event+0x69/0x80
> > > > > perf_event_alloc+0x652/0x740
> > > > > SyS_perf_event_open+0x3f6/0xd60
> > > > > do_syscall_64+0x5c/0x190
> > > > > entry_SYSCALL64_slow_path+0x25/0x25
> > > > > RIP: 0033:0x72bff0a3c0b9
> > > > > RSP: 002b:00007ffed11c2f18 EFLAGS: 00000206 ORIG_RAX: 000000000000012a
> > > > > RAX: ffffffffffffffda RBX: 00007ffed11c30f0 RCX: 000072bff0a3c0b9
> > > > > RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 00007ffed11c2f20
> > > > > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000007000000000
> > > > > R10: 00000000ffffffff R11: 0000000000000206 R12: 0000000000000008
> > > > > R13: 0000000000000000 R14: 00007ffed11c30d0 R15: 000060986ecfb600
> > >
> > > Fun, so set_pte_vaddr() and the whole cpu_entry_area are supposed to be
> > > per CPU. But the DS crud does cross CPU updates of those tables.
> > >
> > > So we need some additional fun and games..
> > >
> > > How's the below?
> > [...]
> >
> > Looks good - I have successfully tested it on top of 4.14.11 and
> > 4.15-rc6. In both cases, the error message is gone when this patch is
> > applied.
>
> While solving the previous problem, this patch also introduces new "fun
> and games"...
>
> Now, terminating a systemd-nspawn container, reliably crashes the host
> (so far tested only on Haswell, if that matters). Once, I was able to
> capture the following trace:

Is this also reproducable on Linus's tree right now?

I've been running nspawn containers on it with no issues like this at
all :(

thanks,

greg k-h

2018-01-07 08:53:30

by Thomas Zeitlhofer

[permalink] [raw]
Subject: Re: "BUG: using smp_processor_id() in preemptible" with KPTI on 4.14.11

On Sun, Jan 07, 2018 at 09:17:18AM +0100, Greg Kroah-Hartman wrote:
> On Sat, Jan 06, 2018 at 10:38:38PM +0100, Thomas Zeitlhofer wrote:
> > On Thu, Jan 04, 2018 at 07:38:00PM +0100, Thomas Zeitlhofer wrote:
> > > On Thu, Jan 04, 2018 at 06:07:12PM +0100, Peter Zijlstra wrote:
> > > > On Thu, Jan 04, 2018 at 04:37:24PM +0100, Thomas Gleixner wrote:
> > > > > > Yes:
> > > > > >
> > > > > > BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4498
> > > > > > caller is native_flush_tlb_single+0x57/0xc0
> > > > > > CPU: 2 PID: 4498 Comm: ovsdb-server Not tainted 4.15.0-rc6-kvm-00423-gea1908c252eb #3
> > > > > > Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
> > > > > > Call Trace:
> > > > > > dump_stack+0x5c/0x86
> > > > > > check_preemption_disabled+0xdd/0xe0
> > > > > > native_flush_tlb_single+0x57/0xc0
> > > > > > ? __set_pte_vaddr+0x2d/0x40
> > > > > > __set_pte_vaddr+0x2d/0x40
> > > > > > set_pte_vaddr+0x2f/0x40
> > > > > > cea_set_pte+0x30/0x40
> > > > > > ds_update_cea.constprop.4+0x4d/0x70
> > > > > > reserve_ds_buffers+0x159/0x410
> > > > > > ? wp_page_copy+0x370/0x6c0
> > > > > > x86_reserve_hardware+0x150/0x160
> > > > > > x86_pmu_event_init+0x3e/0x1f0
> > > > > > perf_try_init_event+0x69/0x80
> > > > > > perf_event_alloc+0x652/0x740
> > > > > > SyS_perf_event_open+0x3f6/0xd60
> > > > > > do_syscall_64+0x5c/0x190
> > > > > > entry_SYSCALL64_slow_path+0x25/0x25
> > > > > > RIP: 0033:0x72bff0a3c0b9
> > > > > > RSP: 002b:00007ffed11c2f18 EFLAGS: 00000206 ORIG_RAX: 000000000000012a
> > > > > > RAX: ffffffffffffffda RBX: 00007ffed11c30f0 RCX: 000072bff0a3c0b9
> > > > > > RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 00007ffed11c2f20
> > > > > > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000007000000000
> > > > > > R10: 00000000ffffffff R11: 0000000000000206 R12: 0000000000000008
> > > > > > R13: 0000000000000000 R14: 00007ffed11c30d0 R15: 000060986ecfb600
> > > >
> > > > Fun, so set_pte_vaddr() and the whole cpu_entry_area are supposed to be
> > > > per CPU. But the DS crud does cross CPU updates of those tables.
> > > >
> > > > So we need some additional fun and games..
> > > >
> > > > How's the below?
> > > [...]
> > >
> > > Looks good - I have successfully tested it on top of 4.14.11 and
> > > 4.15-rc6. In both cases, the error message is gone when this patch is
> > > applied.
> >
> > While solving the previous problem, this patch also introduces new "fun
> > and games"...
> >
> > Now, terminating a systemd-nspawn container, reliably crashes the host
> > (so far tested only on Haswell, if that matters). Once, I was able to
> > capture the following trace:
>
> Is this also reproducable on Linus's tree right now?

It is reproducible with this patch on top of 4.15-rc6 (might be able to
test Linus's current tree later that day).

Thanks,

Thomas

2018-01-08 00:37:28

by Thomas Zeitlhofer

[permalink] [raw]
Subject: Re: "BUG: using smp_processor_id() in preemptible" with KPTI on 4.14.11

On Sun, Jan 07, 2018 at 09:53:19AM +0100, Thomas Zeitlhofer wrote:
> On Sun, Jan 07, 2018 at 09:17:18AM +0100, Greg Kroah-Hartman wrote:
> > On Sat, Jan 06, 2018 at 10:38:38PM +0100, Thomas Zeitlhofer wrote:
[...]
> > > While solving the previous problem, this patch also introduces new
> > > "fun and games"...
> > >
> > > Now, terminating a systemd-nspawn container, reliably crashes the
> > > host (so far tested only on Haswell, if that matters). Once, I was
> > > able to capture the following trace:
> >
> > Is this also reproducable on Linus's tree right now?
>
> It is reproducible with this patch on top of 4.15-rc6 (might be able
> to test Linus's current tree later that day).

Some more testing showed that this is not caused by the patch after all,
sorry for the noise.

The crash happens quite reliably, but with a rather low probability it
does not occur. When I have tested 4.14.11 without the patch it was
obviously such a low probability event - in the meantime 4.14.11 without
the patch also crashed. The situation is also unchanged with 4.15-rc7.

Interestingly, it happens only when using the boot switch "-b", i.e.:

systemd-nspawn -b -D <path to rootfs>

_and_ terminating the container by pressing ^] three times. Other
combinations (e.g., no "-b" and terminating with ^]^]^], "-b" and
terminating by running shutdown inside the container) work just fine.
Anyway, this is already off-topic and might be subject to a different
thread...

Thanks,

Thomas