2023-11-23 09:01:55

by David Woodhouse

[permalink] [raw]
Subject: CONFIG_PROVE_RAW_LOCK_NESTING false positive?

Is this telling me that I'm no longer allowed to take a read_lock() in
a callback for an HRTIMER_MODE_ABS_HARD timer? Is that intentional?

If I must, I can probably cope with this by using read_trylock()
instead. The object being locked is a cache, and we opportunistically
try to use it from the fast path but fall back to a slow path in
process context which will revalidate and try again. So if someone
*has* taken the write lock, it's a fairly safe bet that the cache is
going to be invalidated and we were going to take the slow path anyway.

[ 62.336965] =============================
[ 62.336973] [ BUG: Invalid wait context ]
[ 62.336992] 6.7.0-rc1+ #1437 Tainted: G I
[ 62.337001] -----------------------------
[ 62.337008] qemu-system-x86/1935 is trying to lock:
[ 62.337017] ffffc900018fecc0 (&gpc->lock){....}-{3:3}, at: kvm_xen_set_evtchn_fast+0xe7/0x460 [kvm]
[ 62.337133] other info that might help us debug this:
[ 62.337142] context-{2:2}
[ 62.337148] 2 locks held by qemu-system-x86/1935:
[ 62.337156] #0: ffff888108f780b0 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x7f/0x730 [kvm]
[ 62.337239] #1: ffffc900018ff2d8 (&kvm->srcu){.?.+}-{0:0}, at: kvm_xen_set_evtchn_fast+0xcd/0x460 [kvm]
[ 62.337339] stack backtrace:
[ 62.337346] CPU: 7 PID: 1935 Comm: qemu-system-x86 Tainted: G I 6.7.0-rc1+ #1437
[ 62.337370] Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
[ 62.337384] Call Trace:
[ 62.337390] <IRQ>
[ 62.337395] dump_stack_lvl+0x57/0x90
[ 62.337407] __lock_acquire+0x7bb/0xbb0
[ 62.337416] ? __lock_acquire+0x4f0/0xbb0
[ 62.337425] lock_acquire.part.0+0xad/0x240
[ 62.337433] ? kvm_xen_set_evtchn_fast+0xe7/0x460 [kvm]
[ 62.337512] ? rcu_is_watching+0xd/0x40
[ 62.337520] ? lock_acquire+0xf2/0x110
[ 62.337529] __raw_read_lock_irqsave+0x4e/0xa0
[ 62.337538] ? kvm_xen_set_evtchn_fast+0xe7/0x460 [kvm]
[ 62.337604] kvm_xen_set_evtchn_fast+0xe7/0x460 [kvm]
[ 62.337669] ? kvm_xen_set_evtchn_fast+0xcd/0x460 [kvm]
[ 62.337734] xen_timer_callback+0x86/0xc0 [kvm]
[ 62.337799] ? __pfx_xen_timer_callback+0x10/0x10 [kvm]
[ 62.337863] __hrtimer_run_queues+0x1b2/0x380
[ 62.337874] hrtimer_interrupt+0xfe/0x240
[ 62.337883] __sysvec_apic_timer_interrupt+0x85/0x1d0
[ 62.337893] sysvec_apic_timer_interrupt+0x99/0xc0
[ 62.337905] </IRQ>
[ 62.337910] <TASK>
[ 62.337914] asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 62.337925] RIP: 0010:vcpu_enter_guest.constprop.0+0x9c5/0x1240 [kvm]
[ 62.337994] Code: 00 00 48 89 df e8 9b e2 12 00 48 83 bb 38 08 00 00 00 0f 85 ac 01 00 00 c6 83 32 0b 00 00 01 e8 c1 e9 8a e0 fb 0f 1f 44 00 00 <48> 83 83 c0 1c 00 00 01 9c 58 0f 1f 40 00 48 89 c2 fa 0f 1f 44 00
[ 62.338241] RSP: 0018:ffffc90001923d70 EFLAGS: 00000206
[ 62.338465] RAX: 000000000002ce1d RBX: ffff888108f78000 RCX: 0000000000000040
[ 62.338692] RDX: 0000000000000000 RSI: ffffffff82569339 RDI: ffffffff824f5def
[ 62.338919] RBP: ffffc90001923de0 R08: 0000000000000001 R09: 0000000000000000
[ 62.339172] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 62.339396] R13: 0000000000000001 R14: 0000000000000000 R15: ffff888108f78000
[ 62.339617] ? vcpu_run+0x40/0x430 [kvm]
[ 62.339888] vcpu_run+0x40/0x430 [kvm]
[ 62.340173] kvm_arch_vcpu_ioctl_run+0x1da/0x680 [kvm]
[ 62.340443] kvm_vcpu_ioctl+0x1a1/0x730 [kvm]
[ 62.340702] ? __fget_files+0xbc/0x190
[ 62.340915] __x64_sys_ioctl+0x93/0xd0
[ 62.341125] do_syscall_64+0x36/0xb0
[ 62.341335] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 62.341544] RIP: 0033:0x7f180b200ecd
[ 62.341745] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[ 62.342178] RSP: 002b:00007f1802bb1560 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 62.342398] RAX: ffffffffffffffda RBX: 00005601bc1b8530 RCX: 00007f180b200ecd
[ 62.342616] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000c
[ 62.342832] RBP: 00007f1802bb15b0 R08: 00005601ba5a27d0 R09: 0000000000000000
[ 62.343046] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 62.343256] R13: 0000000000000022 R14: 0000000000000007 R15: 00007ffeac3c08a0
[ 62.343463] </TASK>


Attachments:
smime.p7s (5.83 kB)

2023-11-23 15:02:00

by Peter Zijlstra

[permalink] [raw]
Subject: Re: CONFIG_PROVE_RAW_LOCK_NESTING false positive?

On Thu, Nov 23, 2023 at 09:00:41AM +0000, David Woodhouse wrote:
> Is this telling me that I'm no longer allowed to take a read_lock() in
> a callback for an HRTIMER_MODE_ABS_HARD timer? Is that intentional?
>
> If I must, I can probably cope with this by using read_trylock()
> instead. The object being locked is a cache, and we opportunistically
> try to use it from the fast path but fall back to a slow path in
> process context which will revalidate and try again. So if someone
> *has* taken the write lock, it's a fairly safe bet that the cache is
> going to be invalidated and we were going to take the slow path anyway.
>
> [ 62.336965] =============================
> [ 62.336973] [ BUG: Invalid wait context ]
> [ 62.336992] 6.7.0-rc1+ #1437 Tainted: G I
> [ 62.337001] -----------------------------
> [ 62.337008] qemu-system-x86/1935 is trying to lock:
> [ 62.337017] ffffc900018fecc0 (&gpc->lock){....}-{3:3}, at: kvm_xen_set_evtchn_fast+0xe7/0x460 [kvm]
> [ 62.337133] other info that might help us debug this:
> [ 62.337142] context-{2:2}
> [ 62.337148] 2 locks held by qemu-system-x86/1935:
> [ 62.337156] #0: ffff888108f780b0 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x7f/0x730 [kvm]
> [ 62.337239] #1: ffffc900018ff2d8 (&kvm->srcu){.?.+}-{0:0}, at: kvm_xen_set_evtchn_fast+0xcd/0x460 [kvm]
> [ 62.337339] stack backtrace:
> [ 62.337346] CPU: 7 PID: 1935 Comm: qemu-system-x86 Tainted: G I 6.7.0-rc1+ #1437
> [ 62.337370] Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
> [ 62.337384] Call Trace:
> [ 62.337390] <IRQ>
> [ 62.337395] dump_stack_lvl+0x57/0x90
> [ 62.337407] __lock_acquire+0x7bb/0xbb0
> [ 62.337416] ? __lock_acquire+0x4f0/0xbb0
> [ 62.337425] lock_acquire.part.0+0xad/0x240
> [ 62.337433] ? kvm_xen_set_evtchn_fast+0xe7/0x460 [kvm]
> [ 62.337512] ? rcu_is_watching+0xd/0x40
> [ 62.337520] ? lock_acquire+0xf2/0x110
> [ 62.337529] __raw_read_lock_irqsave+0x4e/0xa0
> [ 62.337538] ? kvm_xen_set_evtchn_fast+0xe7/0x460 [kvm]
> [ 62.337604] kvm_xen_set_evtchn_fast+0xe7/0x460 [kvm]
> [ 62.337669] ? kvm_xen_set_evtchn_fast+0xcd/0x460 [kvm]
> [ 62.337734] xen_timer_callback+0x86/0xc0 [kvm]

xen_timer_callback is HRTIMER_MODE_ABS_HARD, which means it will still
run in IRQ context for PREEMPT_RT.

OTOH read_lock_irqsave() is not a raw spinlock and will be turned into a
blocking lock.

This then gives scheduling from IRQ context, which is somewhat frowned
upon.

Warning is real and valid.

2023-11-23 15:05:45

by David Woodhouse

[permalink] [raw]
Subject: Re: CONFIG_PROVE_RAW_LOCK_NESTING false positive?

On 23 November 2023 15:01:19 GMT, Peter Zijlstra <[email protected]> wrote:
>On Thu, Nov 23, 2023 at 09:00:41AM +0000, David Woodhouse wrote:
>> Is this telling me that I'm no longer allowed to take a read_lock() in
>> a callback for an HRTIMER_MODE_ABS_HARD timer? Is that intentional?
>>
>> If I must, I can probably cope with this by using read_trylock()
>> instead. The object being locked is a cache, and we opportunistically
>> try to use it from the fast path but fall back to a slow path in
>> process context which will revalidate and try again. So if someone
>> *has* taken the write lock, it's a fairly safe bet that the cache is
>> going to be invalidated and we were going to take the slow path anyway.
>>
>> [ 62.336965] =============================
>> [ 62.336973] [ BUG: Invalid wait context ]
>> [ 62.336992] 6.7.0-rc1+ #1437 Tainted: G I
>> [ 62.337001] -----------------------------
>> [ 62.337008] qemu-system-x86/1935 is trying to lock:
>> [ 62.337017] ffffc900018fecc0 (&gpc->lock){....}-{3:3}, at: kvm_xen_set_evtchn_fast+0xe7/0x460 [kvm]
>> [ 62.337133] other info that might help us debug this:
>> [ 62.337142] context-{2:2}
>> [ 62.337148] 2 locks held by qemu-system-x86/1935:
>> [ 62.337156] #0: ffff888108f780b0 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x7f/0x730 [kvm]
>> [ 62.337239] #1: ffffc900018ff2d8 (&kvm->srcu){.?.+}-{0:0}, at: kvm_xen_set_evtchn_fast+0xcd/0x460 [kvm]
>> [ 62.337339] stack backtrace:
>> [ 62.337346] CPU: 7 PID: 1935 Comm: qemu-system-x86 Tainted: G I 6.7.0-rc1+ #1437
>> [ 62.337370] Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
>> [ 62.337384] Call Trace:
>> [ 62.337390] <IRQ>
>> [ 62.337395] dump_stack_lvl+0x57/0x90
>> [ 62.337407] __lock_acquire+0x7bb/0xbb0
>> [ 62.337416] ? __lock_acquire+0x4f0/0xbb0
>> [ 62.337425] lock_acquire.part.0+0xad/0x240
>> [ 62.337433] ? kvm_xen_set_evtchn_fast+0xe7/0x460 [kvm]
>> [ 62.337512] ? rcu_is_watching+0xd/0x40
>> [ 62.337520] ? lock_acquire+0xf2/0x110
>> [ 62.337529] __raw_read_lock_irqsave+0x4e/0xa0
>> [ 62.337538] ? kvm_xen_set_evtchn_fast+0xe7/0x460 [kvm]
>> [ 62.337604] kvm_xen_set_evtchn_fast+0xe7/0x460 [kvm]
>> [ 62.337669] ? kvm_xen_set_evtchn_fast+0xcd/0x460 [kvm]
>> [ 62.337734] xen_timer_callback+0x86/0xc0 [kvm]
>
>xen_timer_callback is HRTIMER_MODE_ABS_HARD, which means it will still
>run in IRQ context for PREEMPT_RT.
>
>OTOH read_lock_irqsave() is not a raw spinlock and will be turned into a
>blocking lock.
>
>This then gives scheduling from IRQ context, which is somewhat frowned
>upon.
>
>Warning is real and valid.


... or at least will be when PREEMPT_RT turns the read_lock into a mutex? But there is no raw version of read_lock(). Can we have one please?

Thanks....

2023-11-23 15:14:03

by Peter Zijlstra

[permalink] [raw]
Subject: Re: CONFIG_PROVE_RAW_LOCK_NESTING false positive?

On Thu, Nov 23, 2023 at 03:05:15PM +0000, David Woodhouse wrote:
> On 23 November 2023 15:01:19 GMT, Peter Zijlstra <[email protected]> wrote:
> >On Thu, Nov 23, 2023 at 09:00:41AM +0000, David Woodhouse wrote:
> >> Is this telling me that I'm no longer allowed to take a read_lock() in
> >> a callback for an HRTIMER_MODE_ABS_HARD timer? Is that intentional?
> >>
> >> If I must, I can probably cope with this by using read_trylock()
> >> instead. The object being locked is a cache, and we opportunistically
> >> try to use it from the fast path but fall back to a slow path in
> >> process context which will revalidate and try again. So if someone
> >> *has* taken the write lock, it's a fairly safe bet that the cache is
> >> going to be invalidated and we were going to take the slow path anyway.
> >>
> >> [ 62.336965] =============================
> >> [ 62.336973] [ BUG: Invalid wait context ]
> >> [ 62.336992] 6.7.0-rc1+ #1437 Tainted: G I
> >> [ 62.337001] -----------------------------
> >> [ 62.337008] qemu-system-x86/1935 is trying to lock:
> >> [ 62.337017] ffffc900018fecc0 (&gpc->lock){....}-{3:3}, at: kvm_xen_set_evtchn_fast+0xe7/0x460 [kvm]
> >> [ 62.337133] other info that might help us debug this:
> >> [ 62.337142] context-{2:2}
> >> [ 62.337148] 2 locks held by qemu-system-x86/1935:
> >> [ 62.337156] #0: ffff888108f780b0 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x7f/0x730 [kvm]
> >> [ 62.337239] #1: ffffc900018ff2d8 (&kvm->srcu){.?.+}-{0:0}, at: kvm_xen_set_evtchn_fast+0xcd/0x460 [kvm]
> >> [ 62.337339] stack backtrace:
> >> [ 62.337346] CPU: 7 PID: 1935 Comm: qemu-system-x86 Tainted: G I 6.7.0-rc1+ #1437
> >> [ 62.337370] Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
> >> [ 62.337384] Call Trace:
> >> [ 62.337390] <IRQ>
> >> [ 62.337395] dump_stack_lvl+0x57/0x90
> >> [ 62.337407] __lock_acquire+0x7bb/0xbb0
> >> [ 62.337416] ? __lock_acquire+0x4f0/0xbb0
> >> [ 62.337425] lock_acquire.part.0+0xad/0x240
> >> [ 62.337433] ? kvm_xen_set_evtchn_fast+0xe7/0x460 [kvm]
> >> [ 62.337512] ? rcu_is_watching+0xd/0x40
> >> [ 62.337520] ? lock_acquire+0xf2/0x110
> >> [ 62.337529] __raw_read_lock_irqsave+0x4e/0xa0
> >> [ 62.337538] ? kvm_xen_set_evtchn_fast+0xe7/0x460 [kvm]
> >> [ 62.337604] kvm_xen_set_evtchn_fast+0xe7/0x460 [kvm]
> >> [ 62.337669] ? kvm_xen_set_evtchn_fast+0xcd/0x460 [kvm]
> >> [ 62.337734] xen_timer_callback+0x86/0xc0 [kvm]
> >
> >xen_timer_callback is HRTIMER_MODE_ABS_HARD, which means it will still
> >run in IRQ context for PREEMPT_RT.
> >
> >OTOH read_lock_irqsave() is not a raw spinlock and will be turned into a
> >blocking lock.
> >
> >This then gives scheduling from IRQ context, which is somewhat frowned
> >upon.
> >
> >Warning is real and valid.
>
>
> ... or at least will be when PREEMPT_RT turns the read_lock into a mutex?

Right, this check specifically validates the RT lock nesting rules.

> But there is no raw version of read_lock(). Can we have one please?

Should be possible, but is somewhat non-trivial, it is very easy to
create significant latencies with RW locks. Definitely not something I'm
going to be able to do in a hurry.

Also, I suspect Thomas is going to strongly suggest not going down that
road and looking if this can be solved differently.

2023-11-23 15:15:43

by David Woodhouse

[permalink] [raw]
Subject: Re: CONFIG_PROVE_RAW_LOCK_NESTING false positive?

On 23 November 2023 15:13:45 GMT, Peter Zijlstra <[email protected]> wrote:
>On Thu, Nov 23, 2023 at 03:05:15PM +0000, David Woodhouse wrote:
>> On 23 November 2023 15:01:19 GMT, Peter Zijlstra <[email protected]> wrote:
>> >On Thu, Nov 23, 2023 at 09:00:41AM +0000, David Woodhouse wrote:
>> >> Is this telling me that I'm no longer allowed to take a read_lock() in
>> >> a callback for an HRTIMER_MODE_ABS_HARD timer? Is that intentional?
>> >>
>> >> If I must, I can probably cope with this by using read_trylock()
>> >> instead. The object being locked is a cache, and we opportunistically
>> >> try to use it from the fast path but fall back to a slow path in
>> >> process context which will revalidate and try again. So if someone
>> >> *has* taken the write lock, it's a fairly safe bet that the cache is
>> >> going to be invalidated and we were going to take the slow path anyway.
>> >>
>> >> [ 62.336965] =============================
>> >> [ 62.336973] [ BUG: Invalid wait context ]
>> >> [ 62.336992] 6.7.0-rc1+ #1437 Tainted: G I
>> >> [ 62.337001] -----------------------------
>> >> [ 62.337008] qemu-system-x86/1935 is trying to lock:
>> >> [ 62.337017] ffffc900018fecc0 (&gpc->lock){....}-{3:3}, at: kvm_xen_set_evtchn_fast+0xe7/0x460 [kvm]
>> >> [ 62.337133] other info that might help us debug this:
>> >> [ 62.337142] context-{2:2}
>> >> [ 62.337148] 2 locks held by qemu-system-x86/1935:
>> >> [ 62.337156] #0: ffff888108f780b0 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x7f/0x730 [kvm]
>> >> [ 62.337239] #1: ffffc900018ff2d8 (&kvm->srcu){.?.+}-{0:0}, at: kvm_xen_set_evtchn_fast+0xcd/0x460 [kvm]
>> >> [ 62.337339] stack backtrace:
>> >> [ 62.337346] CPU: 7 PID: 1935 Comm: qemu-system-x86 Tainted: G I 6.7.0-rc1+ #1437
>> >> [ 62.337370] Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
>> >> [ 62.337384] Call Trace:
>> >> [ 62.337390] <IRQ>
>> >> [ 62.337395] dump_stack_lvl+0x57/0x90
>> >> [ 62.337407] __lock_acquire+0x7bb/0xbb0
>> >> [ 62.337416] ? __lock_acquire+0x4f0/0xbb0
>> >> [ 62.337425] lock_acquire.part.0+0xad/0x240
>> >> [ 62.337433] ? kvm_xen_set_evtchn_fast+0xe7/0x460 [kvm]
>> >> [ 62.337512] ? rcu_is_watching+0xd/0x40
>> >> [ 62.337520] ? lock_acquire+0xf2/0x110
>> >> [ 62.337529] __raw_read_lock_irqsave+0x4e/0xa0
>> >> [ 62.337538] ? kvm_xen_set_evtchn_fast+0xe7/0x460 [kvm]
>> >> [ 62.337604] kvm_xen_set_evtchn_fast+0xe7/0x460 [kvm]
>> >> [ 62.337669] ? kvm_xen_set_evtchn_fast+0xcd/0x460 [kvm]
>> >> [ 62.337734] xen_timer_callback+0x86/0xc0 [kvm]
>> >
>> >xen_timer_callback is HRTIMER_MODE_ABS_HARD, which means it will still
>> >run in IRQ context for PREEMPT_RT.
>> >
>> >OTOH read_lock_irqsave() is not a raw spinlock and will be turned into a
>> >blocking lock.
>> >
>> >This then gives scheduling from IRQ context, which is somewhat frowned
>> >upon.
>> >
>> >Warning is real and valid.
>>
>>
>> ... or at least will be when PREEMPT_RT turns the read_lock into a mutex?
>
>Right, this check specifically validates the RT lock nesting rules.
>
>> But there is no raw version of read_lock(). Can we have one please?
>
>Should be possible, but is somewhat non-trivial, it is very easy to
>create significant latencies with RW locks. Definitely not something I'm
>going to be able to do in a hurry.
>
>Also, I suspect Thomas is going to strongly suggest not going down that
>road and looking if this can be solved differently.

That's the "If I must…" paragraph above. I'll hack it up. Thanks.