2023-01-11 18:57:00

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH] Documentation: kvm: fix SRCU locking order docs

kvm->srcu is taken in KVM_RUN and several other vCPU ioctls, therefore
vcpu->mutex is susceptible to the same deadlock that is documented
for kvm->slots_lock. The same holds for kvm->lock, since kvm->lock
is held outside vcpu->mutex. Fix the documentation and rearrange it
to highlight the difference between these locks and kvm->slots_arch_lock,
and how kvm->slots_arch_lock can be useful while processing a vmexit.

Signed-off-by: Paolo Bonzini <[email protected]>
---
Documentation/virt/kvm/locking.rst | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index 897ca39b72bf..53826098183e 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -24,17 +24,18 @@ The acquisition orders for mutexes are as follows:

For SRCU:

-- ``synchronize_srcu(&kvm->srcu)`` is called _inside_
- the kvm->slots_lock critical section, therefore kvm->slots_lock
- cannot be taken inside a kvm->srcu read-side critical section.
- Instead, kvm->slots_arch_lock is released before the call
- to ``synchronize_srcu()`` and _can_ be taken inside a
- kvm->srcu read-side critical section.
-
-- kvm->lock is taken inside kvm->srcu, therefore
- ``synchronize_srcu(&kvm->srcu)`` cannot be called inside
- a kvm->lock critical section. If you cannot delay the
- call until after kvm->lock is released, use ``call_srcu``.
+- ``synchronize_srcu(&kvm->srcu)`` is called inside critical sections
+ for kvm->lock, vcpu->mutex and kvm->slots_lock. These locks _cannot_
+ be taken inside a kvm->srcu read-side critical section; that is, the
+ following is broken::
+
+ srcu_read_lock(&kvm->srcu);
+ mutex_lock(&kvm->slots_lock);
+
+- kvm->slots_arch_lock instead is released before the call to
+ ``synchronize_srcu()``. It _can_ therefore be taken inside a
+ kvm->srcu read-side critical section, for example while processing
+ a vmexit.

On x86:

--
2.39.0


2023-01-12 09:09:16

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] Documentation: kvm: fix SRCU locking order docs

On Wed, 2023-01-11 at 13:30 -0500, Paolo Bonzini wrote:
>
> +- ``synchronize_srcu(&kvm->srcu)`` is called inside critical sections
> +  for kvm->lock, vcpu->mutex and kvm->slots_lock.  These locks _cannot_
> +  be taken inside a kvm->srcu read-side critical section; that is, the
> +  following is broken::
> +
> +      srcu_read_lock(&kvm->srcu);
> +      mutex_lock(&kvm->slots_lock);
> +

"Don't tell me. Tell lockdep!"

Did we conclude in
https://lore.kernel.org/kvm/[email protected]/
that lockdep *could* be clever enough to catch a violation of this rule
by itself?

The general case of the rule would be that 'if mutex A is taken in a
read-section for SCRU B, then any synchronize_srcu(B) while mutex A is
held shall be verboten'. And vice versa.

If we can make lockdep catch it automatically, yay!

If not, I'm inclined to suggest that we have explicit wrappers of our
own for kvm_mutex_lock() which will do the check directly.


Attachments:
smime.p7s (5.83 kB)

2023-01-12 16:12:27

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] Documentation: kvm: fix SRCU locking order docs

On Thu, Jan 12, 2023 at 08:24:16AM +0000, David Woodhouse wrote:
> On Wed, 2023-01-11 at 13:30 -0500, Paolo Bonzini wrote:
> >
> > +- ``synchronize_srcu(&kvm->srcu)`` is called inside critical sections
> > +? for kvm->lock, vcpu->mutex and kvm->slots_lock.? These locks _cannot_
> > +? be taken inside a kvm->srcu read-side critical section; that is, the
> > +? following is broken::
> > +
> > +????? srcu_read_lock(&kvm->srcu);
> > +????? mutex_lock(&kvm->slots_lock);
> > +
>
> "Don't tell me. Tell lockdep!"
>
> Did we conclude in
> https://lore.kernel.org/kvm/[email protected]/
> that lockdep *could* be clever enough to catch a violation of this rule
> by itself?
>
> The general case of the rule would be that 'if mutex A is taken in a
> read-section for SCRU B, then any synchronize_srcu(B) while mutex A is
> held shall be verboten'. And vice versa.
>
> If we can make lockdep catch it automatically, yay!

Unfortunately, lockdep needs to see a writer to complain, and that patch
just adds a reader. And adding that writer would make lockdep complain
about things that are perfectly fine. It should be possible to make
lockdep catch this sort of thing, but from what I can see, doing so
requires modifications to lockdep itself.

> If not, I'm inclined to suggest that we have explicit wrappers of our
> own for kvm_mutex_lock() which will do the check directly.

This does allow much more wiggle room. For example, you guys could decide
to let lockdep complain about things that other SRCU users want to do.
For completeness, here is one such scenario:

CPU 0: read_lock(&rla); srcu_read_lock(&srcua); ...

CPU 1: srcu_read_lock(&srcua); read_lock(&rla); ...

CPU 2: synchronize_srcu(&srcua);

CPU 3: write_lock(&rla); ...

If you guys are OK with lockdep complaining about this, then doing a
currently mythical rcu_write_acquire()/rcu_write_release() pair around
your calls to synchronize_srcu() should catch the other issue.

And probably break something else, but you have to start somewhere! ;-)

Thanx, Paul

2023-01-13 07:31:27

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH] Documentation: kvm: fix SRCU locking order docs

On Thu, Jan 12, 2023 at 07:20:48AM -0800, Paul E. McKenney wrote:
> On Thu, Jan 12, 2023 at 08:24:16AM +0000, David Woodhouse wrote:
> > On Wed, 2023-01-11 at 13:30 -0500, Paolo Bonzini wrote:
> > >
> > > +- ``synchronize_srcu(&kvm->srcu)`` is called inside critical sections
> > > +? for kvm->lock, vcpu->mutex and kvm->slots_lock.? These locks _cannot_
> > > +? be taken inside a kvm->srcu read-side critical section; that is, the
> > > +? following is broken::
> > > +
> > > +????? srcu_read_lock(&kvm->srcu);
> > > +????? mutex_lock(&kvm->slots_lock);
> > > +
> >
> > "Don't tell me. Tell lockdep!"
> >
> > Did we conclude in
> > https://lore.kernel.org/kvm/[email protected]/
> > that lockdep *could* be clever enough to catch a violation of this rule
> > by itself?
> >
> > The general case of the rule would be that 'if mutex A is taken in a
> > read-section for SCRU B, then any synchronize_srcu(B) while mutex A is
> > held shall be verboten'. And vice versa.
> >
> > If we can make lockdep catch it automatically, yay!
>
> Unfortunately, lockdep needs to see a writer to complain, and that patch
> just adds a reader. And adding that writer would make lockdep complain
> about things that are perfectly fine. It should be possible to make
> lockdep catch this sort of thing, but from what I can see, doing so
> requires modifications to lockdep itself.
>

Please see if the follow patchset works:

https://lore.kernel.org/lkml/[email protected]

"I have been called. I must answer. Always." ;-)

> > If not, I'm inclined to suggest that we have explicit wrappers of our
> > own for kvm_mutex_lock() which will do the check directly.
>
> This does allow much more wiggle room. For example, you guys could decide
> to let lockdep complain about things that other SRCU users want to do.
> For completeness, here is one such scenario:
>
> CPU 0: read_lock(&rla); srcu_read_lock(&srcua); ...
>
> CPU 1: srcu_read_lock(&srcua); read_lock(&rla); ...
>
> CPU 2: synchronize_srcu(&srcua);
>
> CPU 3: write_lock(&rla); ...
>
> If you guys are OK with lockdep complaining about this, then doing a

Actually lockdep won't complain about this, since srcu_read_lock() is
always a recursive read lock, so it won't break other srcu_read_lock().
FWIW if CPU2 or CPU3 does

write_lock(&rla);
synchronize_srcu(&srcua);

it's a deadlock (with CPU 1)

Regards,
Boqun

> currently mythical rcu_write_acquire()/rcu_write_release() pair around
> your calls to synchronize_srcu() should catch the other issue.
>
> And probably break something else, but you have to start somewhere! ;-)
>
> Thanx, Paul

2023-01-13 09:31:24

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] Documentation: kvm: fix SRCU locking order docs

On 1/13/23 08:18, Boqun Feng wrote:
> On Thu, Jan 12, 2023 at 07:20:48AM -0800, Paul E. McKenney wrote:
>> On Thu, Jan 12, 2023 at 08:24:16AM +0000, David Woodhouse wrote:
>>> On Wed, 2023-01-11 at 13:30 -0500, Paolo Bonzini wrote:
>>>>
>>>> +- ``synchronize_srcu(&kvm->srcu)`` is called inside critical sections
>>>> +  for kvm->lock, vcpu->mutex and kvm->slots_lock.  These locks _cannot_
>>>> +  be taken inside a kvm->srcu read-side critical section; that is, the
>>>> +  following is broken::
>>>> +
>>>> +      srcu_read_lock(&kvm->srcu);
>>>> +      mutex_lock(&kvm->slots_lock);
>>>> +
>>>
>>> "Don't tell me. Tell lockdep!"
>>>
>>> Did we conclude in
>>> https://lore.kernel.org/kvm/[email protected]/
>>> that lockdep *could* be clever enough to catch a violation of this rule
>>> by itself?
>>>
>>> The general case of the rule would be that 'if mutex A is taken in a
>>> read-section for SCRU B, then any synchronize_srcu(B) while mutex A is
>>> held shall be verboten'. And vice versa.
>>>
>>> If we can make lockdep catch it automatically, yay!
>>
>> Unfortunately, lockdep needs to see a writer to complain, and that patch
>> just adds a reader. And adding that writer would make lockdep complain
>> about things that are perfectly fine. It should be possible to make
>> lockdep catch this sort of thing, but from what I can see, doing so
>> requires modifications to lockdep itself.
>>
>
> Please see if the follow patchset works:
>
> https://lore.kernel.org/lkml/[email protected]
>
> "I have been called. I must answer. Always." ;-)

It's missing an important testcase; if it passes (does not warn), then
it should work:

CPU 1 CPU 2
---------------------------- ------------------------------
mutex_lock(&m1); srcu_read_lock(&srcu1);
srcu_read_lock(&srcu1); mutex_lock(&m1);
srcu_read_unlock(&srcu1); mutex_unlock(&m1);
mutex_unlock(&m1); srcu_read_unlock(&srcu1);

This is the main difference, lockdep-wise, between SRCU and an rwlock.

Paolo

2023-01-13 11:12:51

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] Documentation: kvm: fix SRCU locking order docs

On Fri, 2023-01-13 at 10:20 +0100, Paolo Bonzini wrote:
> On 1/13/23 08:18, Boqun Feng wrote:
> > On Thu, Jan 12, 2023 at 07:20:48AM -0800, Paul E. McKenney wrote:
> > > On Thu, Jan 12, 2023 at 08:24:16AM +0000, David Woodhouse wrote:
> > > > On Wed, 2023-01-11 at 13:30 -0500, Paolo Bonzini wrote:
> > > > >
> > > > > +- ``synchronize_srcu(&kvm->srcu)`` is called inside critical sections
> > > > > +  for kvm->lock, vcpu->mutex and kvm->slots_lock.  These locks _cannot_
> > > > > +  be taken inside a kvm->srcu read-side critical section; that is, the
> > > > > +  following is broken::
> > > > > +
> > > > > +      srcu_read_lock(&kvm->srcu);
> > > > > +      mutex_lock(&kvm->slots_lock);
> > > > > +
> > > >
> > > > "Don't tell me. Tell lockdep!"
> > > >
> > > > Did we conclude in
> > > > https://lore.kernel.org/kvm/[email protected]/
> > > > that lockdep *could* be clever enough to catch a violation of this rule
> > > > by itself?
> > > >
> > > > The general case of the rule would be that 'if mutex A is taken in a
> > > > read-section for SCRU B, then any synchronize_srcu(B) while mutex A is
> > > > held shall be verboten'. And vice versa.
> > > >
> > > > If we can make lockdep catch it automatically, yay!
> > >
> > > Unfortunately, lockdep needs to see a writer to complain, and that patch
> > > just adds a reader.  And adding that writer would make lockdep complain
> > > about things that are perfectly fine.  It should be possible to make
> > > lockdep catch this sort of thing, but from what I can see, doing so
> > > requires modifications to lockdep itself.
> > >
> >
> > Please see if the follow patchset works:
> >
> >         https://lore.kernel.org/lkml/[email protected]
> >
> > "I have been called. I must answer. Always." ;-)

Amazing! Thank you, Boqun!

> It's missing an important testcase; if it passes (does not warn), then
> it should work:

I think it does.

I started with kvm/master from
https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=master so that
lockdep didn't find other things to complain about first. I then:

• Dropped the last two commits, putting us back to using kvm->lock and
removing the dummy mutex lock that would have told lockdep that it's
a (different) problem.

• I then added Boqun's three commits

• Reverted a79b53aa so that the srcu_synchronize() deadlock returns

• Couldn't reproduce with xen_shinfo_test, so added Michal's test from
https://lore.kernel.org/kvm/[email protected]/

The resulting tree is at
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/kvm-srcu-lockdep


Now when I run tools/testing/selftests/kvm/x86_64/deadlocks_test I do
see lockdep complain about it (shown below; I have a cosmetic
nit/request to make). If I restore the evtchn_reset fix then it also
complains about kvm_xen_set_evtchn() vs. kvm_xen_kvm_set_attr(), and if
I restore the xen_lock fix from the tip of kvm/master then Michal's
deadlock_test passes and there are no complaints.

So everything seems to be working as it should... *except* for the fact
that I don't quite understand why xen_shinfo_test didn't trigger the
warning. Michal, I guess you already worked that out when you came up
with your deadlock-test instead... is there something we should add to
xen_shinfo_test that would mean it *would* have triggered? I even tried
this:

--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1173,6 +1173,16 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
if (init_srcu_struct(&kvm->irq_srcu))
goto out_err_no_irq_srcu;

+#ifdef CONFIG_LOCKDEP
+ /*
+ * Ensure lockdep knows that it's not permitted to lock kvm->lock
+ * from a SRCU read section on kvm->srcu.
+ */
+ mutex_lock(&kvm->lock);
+ synchronize_srcu(&kvm->srcu);
+ mutex_unlock(&kvm->lock);
+#endif
+
refcount_set(&kvm->users_count, 1);
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
for (j = 0; j < 2; j++) {




> [ 845.474169] ======================================================
> [ 845.474170] WARNING: possible circular locking dependency detected
> [ 845.474172] 6.2.0-rc3+ #1025 Tainted: G E
> [ 845.474175] ------------------------------------------------------
> [ 845.474176] deadlocks_test/22767 is trying to acquire lock:
> [ 845.474178] ffffc9000ba4b868 (&kvm->srcu){.+.+}-{0:0}, at: __synchronize_srcu+0x5/0x170
> [ 845.474192]
> but task is already holding lock:
> [ 845.474194] ffffc9000ba423c0 (&kvm->lock){+.+.}-{3:3}, at: kvm_vm_ioctl_set_msr_filter+0x188/0x220 [kvm]
> [ 845.474319]
> which lock already depends on the new lock.

So the above part is good, and clearly tells me it was synchronize_srcu()

> [ 845.474320]
> the existing dependency chain (in reverse order) is:
> [ 845.474322]
> -> #1 (&kvm->lock){+.+.}-{3:3}:
> [ 845.474327] __lock_acquire+0x4b4/0x940
> [ 845.474333] lock_acquire.part.0+0xa8/0x210
> [ 845.474337] __mutex_lock+0x94/0x920
> [ 845.474344] kvm_xen_set_evtchn.part.0+0x6d/0x170 [kvm]
> [ 845.474437] kvm_xen_inject_timer_irqs+0x79/0xa0 [kvm]
> [ 845.474529] vcpu_run+0x20c/0x450 [kvm]
> [ 845.474618] kvm_arch_vcpu_ioctl_run+0x1df/0x670 [kvm]
> [ 845.474707] kvm_vcpu_ioctl+0x279/0x700 [kvm]
> [ 845.474783] __x64_sys_ioctl+0x8a/0xc0
> [ 845.474787] do_syscall_64+0x3b/0x90
> [ 845.474796] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [ 845.474804]
> -> #0 (&kvm->srcu){.+.+}-{0:0}:
> [ 845.474809] check_prev_add+0x8f/0xc20
> [ 845.474812] validate_chain+0x3ba/0x450
> [ 845.474814] __lock_acquire+0x4b4/0x940
> [ 845.474817] lock_sync+0x99/0x110
> [ 845.474820] __synchronize_srcu+0x4d/0x170
> [ 845.474824] kvm_vm_ioctl_set_msr_filter+0x1a5/0x220 [kvm]
. [ 845.474907] kvm_arch_vm_ioctl+0x8df/0xd50 [kvm]
> [ 845.474997] kvm_vm_ioctl+0x5ca/0x800 [kvm]
> [ 845.475075] __x64_sys_ioctl+0x8a/0xc0
> [ 845.475079] do_syscall_64+0x3b/0x90
> [ 845.475084] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [ 845.475089]
> other info that might help us debug this:
>
> [ 845.475091] Possible unsafe locking scenario:
>
> [ 845.475092] CPU0 CPU1
> [ 845.475093] ---- ----
> [ 845.475095] lock(&kvm->lock);
> [ 845.475098] lock(&kvm->srcu);
> [ 845.475101] lock(&kvm->lock);
> [ 845.475103] lock(&kvm->srcu);
> [ 845.475106]
> *** DEADLOCK ***

But is there any chance the above could say 'synchronize_srcu' and
'read_lock_srcu' in the appropriate places?

> [ 845.475108] 1 lock held by deadlocks_test/22767:
> [ 845.475110] #0: ffffc9000ba423c0 (&kvm->lock){+.+.}-{3:3}, at: kvm_vm_ioctl_set_msr_filter+0x188/0x220 [kvm]
> [ 845.475200]
> stack backtrace:
> [ 845.475202] CPU: 10 PID: 22767 Comm: deadlocks_test Tainted: G E 6.2.0-rc3+ #1025
> [ 845.475206] Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
> [ 845.475208] Call Trace:
> [ 845.475210] <TASK>
> [ 845.475214] dump_stack_lvl+0x56/0x73
> [ 845.475221] check_noncircular+0x102/0x120
> [ 845.475229] ? check_noncircular+0x7f/0x120
> [ 845.475236] check_prev_add+0x8f/0xc20
> [ 845.475239] ? add_chain_cache+0x10b/0x2d0
> [ 845.475244] validate_chain+0x3ba/0x450
> [ 845.475249] __lock_acquire+0x4b4/0x940
> [ 845.475253] ? __synchronize_srcu+0x5/0x170
> [ 845.475258] lock_sync+0x99/0x110
> [ 845.475261] ? __synchronize_srcu+0x5/0x170
> [ 845.475265] __synchronize_srcu+0x4d/0x170
? [ 845.475269] ? mark_held_locks+0x49/0x80
> [ 845.475272] ? _raw_spin_unlock_irqrestore+0x2d/0x60
> [ 845.475278] ? __pfx_read_tsc+0x10/0x10
> [ 845.475286] ? ktime_get_mono_fast_ns+0x3d/0x90
> [ 845.475292] kvm_vm_ioctl_set_msr_filter+0x1a5/0x220 [kvm]
> [ 845.475380] kvm_arch_vm_ioctl+0x8df/0xd50 [kvm]
> [ 845.475472] ? __lock_acquire+0x4b4/0x940
> [ 845.475485] kvm_vm_ioctl+0x5ca/0x800 [kvm]
> [ 845.475566] ? lockdep_unregister_key+0x76/0x110
> [ 845.475575] __x64_sys_ioctl+0x8a/0xc0
> [ 845.475579] do_syscall_64+0x3b/0x90
> [ 845.475586] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [ 845.475591] RIP: 0033:0x7f79de23fd1b
> [ 845.475595] Code: 73 01 c3 48 8b 0d 05 a1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 a0 1b 00 f7 d8 64 89 01 48
> [ 845.475598] RSP: 002b:00007f79ddff7c98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 845.475602] RAX: ffffffffffffffda RBX: 00007f79ddff8640 RCX: 00007f79de23fd1b
> [ 845.475605] RDX: 00007f79ddff7ca0 RSI: 000000004188aec6 RDI: 0000000000000004
> [ 845.475607] RBP: 00007f79ddff85c0 R08: 0000000000000000 R09: 00007fffceb1ff2f
> [ 845.475609] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f79ddff7ca0
> [ 845.475611] R13: 0000000001c322a0 R14: 00007f79de2a05f0 R15: 0000000000000000
> [ 845.475617] </TASK>



Attachments:
smime.p7s (5.83 kB)

2023-01-13 11:34:13

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] Documentation: kvm: fix SRCU locking order docs

On Fri, 2023-01-13 at 10:33 +0000, David Woodhouse wrote:
>
> So everything seems to be working as it should... *except* for the fact
> that I don't quite understand why xen_shinfo_test didn't trigger the
> warning. Michal, I guess you already worked that out when you came up
> with your deadlock-test instead... is there something we should add to
> xen_shinfo_test that would mean it *would* have triggered?

Got it. It only happens when kvm_xen_set_evtchn() takes the slow path
when kvm_xen_set_evtchn_fast() fails. Not utterly sure why that works
in your deadlock_test but I can make it happen in xen_shinfo_test just
by invalidating the GPC by changing the memslots:


--- a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
+++ b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
@@ -29,6 +29,9 @@
#define DUMMY_REGION_GPA (SHINFO_REGION_GPA + (3 * PAGE_SIZE))
#define DUMMY_REGION_SLOT 11

+#define DUMMY_REGION_GPA_2 (SHINFO_REGION_GPA + (4 * PAGE_SIZE))
+#define DUMMY_REGION_SLOT_2 12
+
#define SHINFO_ADDR (SHINFO_REGION_GPA)
#define VCPU_INFO_ADDR (SHINFO_REGION_GPA + 0x40)
#define PVTIME_ADDR (SHINFO_REGION_GPA + PAGE_SIZE)
@@ -765,6 +768,8 @@ int main(int argc, char *argv[])

if (verbose)
printf("Testing guest EVTCHNOP_send direct to evtchn\n");
+ vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
+ DUMMY_REGION_GPA_2, DUMMY_REGION_SLOT_2, 1, 0);
evtchn_irq_expected = true;
alarm(1);
break;



I did also need the trick below. I'll send patches properly, keeping
the fast path test and *adding* the slow one, instead of just changing
it as above.

I also validated that if I put back the evtchn_reset deadlock fix, and
the separate xen_lock which is currently the tip of kvm/master, it all
works without complaints (or deadlocks).

> I even tried this:
>
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1173,6 +1173,16 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
>         if (init_srcu_struct(&kvm->irq_srcu))
>                 goto out_err_no_irq_srcu;
>  
> +#ifdef CONFIG_LOCKDEP
> +       /*
> +        * Ensure lockdep knows that it's not permitted to lock kvm->lock
> +        * from a SRCU read section on kvm->srcu.
> +        */
> +       mutex_lock(&kvm->lock);
> +       synchronize_srcu(&kvm->srcu);
> +       mutex_unlock(&kvm->lock);
> +#endif
> +
>         refcount_set(&kvm->users_count, 1);
>         for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>                 for (j = 0; j < 2; j++) {
>
>

[ 91.866348] ======================================================
[ 91.866349] WARNING: possible circular locking dependency detected
[ 91.866351] 6.2.0-rc3+ #1025 Tainted: G OE
[ 91.866353] ------------------------------------------------------
[ 91.866354] xen_shinfo_test/2938 is trying to acquire lock:
[ 91.866356] ffffc9000179e3c0 (&kvm->lock){+.+.}-{3:3}, at: kvm_xen_set_evtchn.part.0+0x6d/0x170 [kvm]
[ 91.866453]
but task is already holding lock:
[ 91.866454] ffffc900017a7868 (&kvm->srcu){.+.+}-{0:0}, at: vcpu_enter_guest.constprop.0+0xa89/0x1270 [kvm]
[ 91.866527]
which lock already depends on the new lock.

[ 91.866528]
the existing dependency chain (in reverse order) is:
[ 91.866529]
-> #1 (&kvm->srcu){.+.+}-{0:0}:
[ 91.866532] __lock_acquire+0x4b4/0x940
[ 91.866537] lock_sync+0x99/0x110
[ 91.866540] __synchronize_srcu+0x4d/0x170
[ 91.866543] kvm_create_vm+0x271/0x6e0 [kvm]
[ 91.866621] kvm_dev_ioctl+0x102/0x1c0 [kvm]
[ 91.866694] __x64_sys_ioctl+0x8a/0xc0
[ 91.866697] do_syscall_64+0x3b/0x90
[ 91.866701] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 91.866707]
-> #0 (&kvm->lock){+.+.}-{3:3}:
[ 91.866710] check_prev_add+0x8f/0xc20
[ 91.866712] validate_chain+0x3ba/0x450
[ 91.866714] __lock_acquire+0x4b4/0x940
[ 91.866716] lock_acquire.part.0+0xa8/0x210
[ 91.866717] __mutex_lock+0x94/0x920
[ 91.866721] kvm_xen_set_evtchn.part.0+0x6d/0x170 [kvm]
[ 91.866790] kvm_xen_hcall_evtchn_send.constprop.0+0x138/0x1c0 [kvm]
[ 91.866869] kvm_xen_hypercall+0x475/0x5a0 [kvm]
[ 91.866938] vmx_handle_exit+0xe/0x50 [kvm_intel]
[ 91.866955] vcpu_enter_guest.constprop.0+0xb08/0x1270 [kvm]
[ 91.867034] vcpu_run+0x1bd/0x450 [kvm]
[ 91.867100] kvm_arch_vcpu_ioctl_run+0x1df/0x670 [kvm]
[ 91.867167] kvm_vcpu_ioctl+0x279/0x700 [kvm]
[ 91.867229] __x64_sys_ioctl+0x8a/0xc0
[ 91.867231] do_syscall_64+0x3b/0x90
[ 91.867235] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 91.867238]
other info that might help us debug this:

[ 91.867239] Possible unsafe locking scenario:

[ 91.867240] CPU0 CPU1
[ 91.867241] ---- ----
[ 91.867242] lock(&kvm->srcu);
[ 91.867244] lock(&kvm->lock);
[ 91.867246] lock(&kvm->srcu);
[ 91.867248] lock(&kvm->lock);
[ 91.867249]
*** DEADLOCK ***

[ 91.867250] 2 locks held by xen_shinfo_test/2938:
[ 91.867252] #0: ffff88815a8800b0 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x77/0x700 [kvm]
[ 91.867318] #1: ffffc900017a7868 (&kvm->srcu){.+.+}-{0:0}, at: vcpu_enter_guest.constprop.0+0xa89/0x1270 [kvm]
[ 91.867387]
stack backtrace:
[ 91.867389] CPU: 26 PID: 2938 Comm: xen_shinfo_test Tainted: G OE 6.2.0-rc3+ #1025
[ 91.867392] Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
[ 91.867394] Call Trace:
[ 91.867395] <TASK>
[ 91.867398] dump_stack_lvl+0x56/0x73
[ 91.867403] check_noncircular+0x102/0x120
[ 91.867409] check_prev_add+0x8f/0xc20
[ 91.867411] ? add_chain_cache+0x10b/0x2d0
[ 91.867415] validate_chain+0x3ba/0x450
[ 91.867418] __lock_acquire+0x4b4/0x940
[ 91.867421] lock_acquire.part.0+0xa8/0x210
[ 91.867424] ? kvm_xen_set_evtchn.part.0+0x6d/0x170 [kvm]
[ 91.867494] ? rcu_read_lock_sched_held+0x43/0x70
[ 91.867498] ? lock_acquire+0x102/0x140
[ 91.867501] __mutex_lock+0x94/0x920
[ 91.867505] ? kvm_xen_set_evtchn.part.0+0x6d/0x170 [kvm]
[ 91.867574] ? find_held_lock+0x2b/0x80
[ 91.867578] ? kvm_xen_set_evtchn.part.0+0x6d/0x170 [kvm]
[ 91.867647] ? __lock_release+0x5f/0x170
[ 91.867652] ? kvm_xen_set_evtchn.part.0+0x6d/0x170 [kvm]
[ 91.867721] kvm_xen_set_evtchn.part.0+0x6d/0x170 [kvm]
[ 91.867791] kvm_xen_hcall_evtchn_send.constprop.0+0x138/0x1c0 [kvm]
[ 91.867875] kvm_xen_hypercall+0x475/0x5a0 [kvm]
[ 91.867947] ? rcu_read_lock_sched_held+0x43/0x70
[ 91.867952] vmx_handle_exit+0xe/0x50 [kvm_intel]
[ 91.867966] vcpu_enter_guest.constprop.0+0xb08/0x1270 [kvm]
[ 91.868046] ? lock_acquire.part.0+0xa8/0x210
[ 91.868050] ? vcpu_run+0x1bd/0x450 [kvm]
[ 91.868117] ? lock_acquire+0x102/0x140
[ 91.868121] vcpu_run+0x1bd/0x450 [kvm]
[ 91.868189] kvm_arch_vcpu_ioctl_run+0x1df/0x670 [kvm]
[ 91.868257] kvm_vcpu_ioctl+0x279/0x700 [kvm]
[ 91.868322] ? get_cpu_entry_area+0xb/0x30
[ 91.868327] ? _raw_spin_unlock_irq+0x34/0x50
[ 91.868330] ? do_setitimer+0x190/0x1e0
[ 91.868335] __x64_sys_ioctl+0x8a/0xc0
[ 91.868338] do_syscall_64+0x3b/0x90
[ 91.868341] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 91.868345] RIP: 0033:0x7f313103fd1b
[ 91.868348] Code: 73 01 c3 48 8b 0d 05 a1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 a0 1b 00 f7 d8 64 89 01 48
[ 91.868350] RSP: 002b:00007ffcdc02dba8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 91.868353] RAX: ffffffffffffffda RBX: 00007f31313d2000 RCX: 00007f313103fd1b
[ 91.868355] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000007
[ 91.868356] RBP: 00007f313139a6c0 R08: 000000000065a2f0 R09: 0000000000000000
[ 91.868357] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000065c800
[ 91.868359] R13: 000000000000000a R14: 00007f31313d0ff1 R15: 000000000065a2a0
[ 91.868363] </TASK>


Attachments:
smime.p7s (5.83 kB)

2023-01-13 23:14:36

by Michal Luczaj

[permalink] [raw]
Subject: Re: [PATCH] Documentation: kvm: fix SRCU locking order docs

On 1/13/23 12:03, David Woodhouse wrote:
> On Fri, 2023-01-13 at 10:33 +0000, David Woodhouse wrote:
>> So everything seems to be working as it should... *except* for the fact
>> that I don't quite understand why xen_shinfo_test didn't trigger the
>> warning. Michal, I guess you already worked that out when you came up
>> with your deadlock-test instead... is there something we should add to
>> xen_shinfo_test that would mean it *would* have triggered?

No, I didn't implement those deadlock selftests out of xen_shinfo_test
because there was some problem. I just wanted to have a cleaner workspace
and then, maybe, move them to xen_shinfo_test, which, well, did not happen
:) I guess there's no need for them filthy races anymore; lockdep does a
better job.

> Got it. It only happens when kvm_xen_set_evtchn() takes the slow path
> when kvm_xen_set_evtchn_fast() fails.

I fully agree. And sorry for late reply.

> Not utterly sure why that works
> in your deadlock_test but I can make it happen in xen_shinfo_test just
> by invalidating the GPC by changing the memslots:

Could it be that deadlocks_test starts with the right conditions, i.e.
invalid KVM_XEN_ATTR_TYPE_SHARED_INFO along with valid
KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO? xen_shinfo_test, on the other hand, have
them both valid, and so the fast path is taken.

I suppose instead of changing memslots, you can invalidate the
KVM_XEN_ATTR_TYPE_SHARED_INFO for that particular test unit, e.g.

struct kvm_xen_hvm_attr ha = {
.type = KVM_XEN_ATTR_TYPE_SHARED_INFO,
.u.shared_info.gfn = KVM_XEN_INVALID_GFN,
};
vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &ha);

One more thing concerning the lockdep priming you did in kvm_create_vm();

mutex_lock(&kvm->lock);
synchronize_srcu(&kvm->srcu);
mutex_unlock(&kvm->lock)

It seems that deadlocks_test's set_msr_filter() effectively did the same
thanks to kvm_vm_ioctl_set_msr_filter()'s sync-under-mutex (which won't
happen if those I-used-to-be-a-deadlock optimization patches[*] get
merged). Naturally, xen_shinfo_test do not mess with MSR filters, so that
could be another reason for inconsistencies you've noticed before the
priming?

[*] https://lore.kernel.org/kvm/[email protected]/

Michal

2023-01-14 00:06:35

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH] Documentation: kvm: fix SRCU locking order docs

On Fri, Jan 13, 2023 at 10:33:33AM +0000, David Woodhouse wrote:
> On Fri, 2023-01-13 at 10:20 +0100, Paolo Bonzini wrote:
> > On 1/13/23 08:18, Boqun Feng wrote:
> > > On Thu, Jan 12, 2023 at 07:20:48AM -0800, Paul E. McKenney wrote:
> > > > On Thu, Jan 12, 2023 at 08:24:16AM +0000, David Woodhouse wrote:
> > > > > On Wed, 2023-01-11 at 13:30 -0500, Paolo Bonzini wrote:
> > > > > >
> > > > > > +- ``synchronize_srcu(&kvm->srcu)`` is called inside critical sections
> > > > > > +  for kvm->lock, vcpu->mutex and kvm->slots_lock.  These locks _cannot_
> > > > > > +  be taken inside a kvm->srcu read-side critical section; that is, the
> > > > > > +  following is broken::
> > > > > > +
> > > > > > +      srcu_read_lock(&kvm->srcu);
> > > > > > +      mutex_lock(&kvm->slots_lock);
> > > > > > +
> > > > >
> > > > > "Don't tell me. Tell lockdep!"
> > > > >
> > > > > Did we conclude in
> > > > > https://lore.kernel.org/kvm/[email protected]/
> > > > > that lockdep *could* be clever enough to catch a violation of this rule
> > > > > by itself?
> > > > >
> > > > > The general case of the rule would be that 'if mutex A is taken in a
> > > > > read-section for SCRU B, then any synchronize_srcu(B) while mutex A is
> > > > > held shall be verboten'. And vice versa.
> > > > >
> > > > > If we can make lockdep catch it automatically, yay!
> > > >
> > > > Unfortunately, lockdep needs to see a writer to complain, and that patch
> > > > just adds a reader.  And adding that writer would make lockdep complain
> > > > about things that are perfectly fine.  It should be possible to make
> > > > lockdep catch this sort of thing, but from what I can see, doing so
> > > > requires modifications to lockdep itself.
> > > >
> > >
> > > Please see if the follow patchset works:
> > >
> > >         https://lore.kernel.org/lkml/[email protected]
> > >
> > > "I have been called. I must answer. Always." ;-)
>
> Amazing! Thank you, Boqun!
>

Thank you for trying it out ;-)

> > It's missing an important testcase; if it passes (does not warn), then
> > it should work:
>
> I think it does.
>
> I started with kvm/master from
> https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=master so that
> lockdep didn't find other things to complain about first. I then:
>
> • Dropped the last two commits, putting us back to using kvm->lock and
> removing the dummy mutex lock that would have told lockdep that it's
> a (different) problem.
>
> • I then added Boqun's three commits
>
> • Reverted a79b53aa so that the srcu_synchronize() deadlock returns
>
> • Couldn't reproduce with xen_shinfo_test, so added Michal's test from
> https://lore.kernel.org/kvm/[email protected]/
>
> The resulting tree is at
> https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/kvm-srcu-lockdep
>
>
> Now when I run tools/testing/selftests/kvm/x86_64/deadlocks_test I do
> see lockdep complain about it (shown below; I have a cosmetic
> nit/request to make). If I restore the evtchn_reset fix then it also
> complains about kvm_xen_set_evtchn() vs. kvm_xen_kvm_set_attr(), and if
> I restore the xen_lock fix from the tip of kvm/master then Michal's
> deadlock_test passes and there are no complaints.
>
> So everything seems to be working as it should... *except* for the fact
> that I don't quite understand why xen_shinfo_test didn't trigger the
> warning. Michal, I guess you already worked that out when you came up
> with your deadlock-test instead... is there something we should add to
> xen_shinfo_test that would mean it *would* have triggered? I even tried
> this:
>
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1173,6 +1173,16 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
> if (init_srcu_struct(&kvm->irq_srcu))
> goto out_err_no_irq_srcu;
>
> +#ifdef CONFIG_LOCKDEP
> + /*
> + * Ensure lockdep knows that it's not permitted to lock kvm->lock
> + * from a SRCU read section on kvm->srcu.
> + */
> + mutex_lock(&kvm->lock);
> + synchronize_srcu(&kvm->srcu);
> + mutex_unlock(&kvm->lock);
> +#endif
> +
> refcount_set(&kvm->users_count, 1);
> for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> for (j = 0; j < 2; j++) {
>
>
>
>
> > [ 845.474169] ======================================================
> > [ 845.474170] WARNING: possible circular locking dependency detected
> > [ 845.474172] 6.2.0-rc3+ #1025 Tainted: G E
> > [ 845.474175] ------------------------------------------------------
> > [ 845.474176] deadlocks_test/22767 is trying to acquire lock:
> > [ 845.474178] ffffc9000ba4b868 (&kvm->srcu){.+.+}-{0:0}, at: __synchronize_srcu+0x5/0x170
> > [ 845.474192]
> > but task is already holding lock:
> > [ 845.474194] ffffc9000ba423c0 (&kvm->lock){+.+.}-{3:3}, at: kvm_vm_ioctl_set_msr_filter+0x188/0x220 [kvm]
> > [ 845.474319]
> > which lock already depends on the new lock.
>
> So the above part is good, and clearly tells me it was synchronize_srcu()
>
> > [ 845.474320]
> > the existing dependency chain (in reverse order) is:
> > [ 845.474322]
> > -> #1 (&kvm->lock){+.+.}-{3:3}:
> > [ 845.474327] __lock_acquire+0x4b4/0x940
> > [ 845.474333] lock_acquire.part.0+0xa8/0x210
> > [ 845.474337] __mutex_lock+0x94/0x920
> > [ 845.474344] kvm_xen_set_evtchn.part.0+0x6d/0x170 [kvm]
> > [ 845.474437] kvm_xen_inject_timer_irqs+0x79/0xa0 [kvm]
> > [ 845.474529] vcpu_run+0x20c/0x450 [kvm]
> > [ 845.474618] kvm_arch_vcpu_ioctl_run+0x1df/0x670 [kvm]
> > [ 845.474707] kvm_vcpu_ioctl+0x279/0x700 [kvm]
> > [ 845.474783] __x64_sys_ioctl+0x8a/0xc0
> > [ 845.474787] do_syscall_64+0x3b/0x90
> > [ 845.474796] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> > [ 845.474804]
> > -> #0 (&kvm->srcu){.+.+}-{0:0}:
> > [ 845.474809] check_prev_add+0x8f/0xc20
> > [ 845.474812] validate_chain+0x3ba/0x450
> > [ 845.474814] __lock_acquire+0x4b4/0x940
> > [ 845.474817] lock_sync+0x99/0x110
> > [ 845.474820] __synchronize_srcu+0x4d/0x170
> > [ 845.474824] kvm_vm_ioctl_set_msr_filter+0x1a5/0x220 [kvm]
> . [ 845.474907] kvm_arch_vm_ioctl+0x8df/0xd50 [kvm]
> > [ 845.474997] kvm_vm_ioctl+0x5ca/0x800 [kvm]
> > [ 845.475075] __x64_sys_ioctl+0x8a/0xc0
> > [ 845.475079] do_syscall_64+0x3b/0x90
> > [ 845.475084] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> > [ 845.475089]
> > other info that might help us debug this:
> >
> > [ 845.475091] Possible unsafe locking scenario:
> >
> > [ 845.475092] CPU0 CPU1
> > [ 845.475093] ---- ----
> > [ 845.475095] lock(&kvm->lock);
> > [ 845.475098] lock(&kvm->srcu);
> > [ 845.475101] lock(&kvm->lock);
> > [ 845.475103] lock(&kvm->srcu);
> > [ 845.475106]
> > *** DEADLOCK ***
>
> But is there any chance the above could say 'synchronize_srcu' and
> 'read_lock_srcu' in the appropriate places?
>

That requires some non-trivial rework of locking scenario printing, it's
in my TODO list...

That said, we can do some improvement on "CPU0", since when we print, we
have all the information for these two locks. I've done a POC at:

https://lore.kernel.org/lkml/[email protected]

, which should improve the print a little. For example, the above
scenario will not be shown as:

[..] CPU0 CPU1
[..] ---- ----
[..] lock(&kvm->lock);
[..] lock(&kvm->srcu);
[..] lock(&kvm->lock);
[..] sync(&kvm->srcu);
[..]

Regards,
Boqun

> > [ 845.475108] 1 lock held by deadlocks_test/22767:
> > [ 845.475110] #0: ffffc9000ba423c0 (&kvm->lock){+.+.}-{3:3}, at: kvm_vm_ioctl_set_msr_filter+0x188/0x220 [kvm]
> > [ 845.475200]
> > stack backtrace:
> > [ 845.475202] CPU: 10 PID: 22767 Comm: deadlocks_test Tainted: G E 6.2.0-rc3+ #1025
> > [ 845.475206] Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
> > [ 845.475208] Call Trace:
> > [ 845.475210] <TASK>
> > [ 845.475214] dump_stack_lvl+0x56/0x73
> > [ 845.475221] check_noncircular+0x102/0x120
> > [ 845.475229] ? check_noncircular+0x7f/0x120
> > [ 845.475236] check_prev_add+0x8f/0xc20
> > [ 845.475239] ? add_chain_cache+0x10b/0x2d0
> > [ 845.475244] validate_chain+0x3ba/0x450
> > [ 845.475249] __lock_acquire+0x4b4/0x940
> > [ 845.475253] ? __synchronize_srcu+0x5/0x170
> > [ 845.475258] lock_sync+0x99/0x110
> > [ 845.475261] ? __synchronize_srcu+0x5/0x170
> > [ 845.475265] __synchronize_srcu+0x4d/0x170
> ? [ 845.475269] ? mark_held_locks+0x49/0x80
> > [ 845.475272] ? _raw_spin_unlock_irqrestore+0x2d/0x60
> > [ 845.475278] ? __pfx_read_tsc+0x10/0x10
> > [ 845.475286] ? ktime_get_mono_fast_ns+0x3d/0x90
> > [ 845.475292] kvm_vm_ioctl_set_msr_filter+0x1a5/0x220 [kvm]
> > [ 845.475380] kvm_arch_vm_ioctl+0x8df/0xd50 [kvm]
> > [ 845.475472] ? __lock_acquire+0x4b4/0x940
> > [ 845.475485] kvm_vm_ioctl+0x5ca/0x800 [kvm]
> > [ 845.475566] ? lockdep_unregister_key+0x76/0x110
> > [ 845.475575] __x64_sys_ioctl+0x8a/0xc0
> > [ 845.475579] do_syscall_64+0x3b/0x90
> > [ 845.475586] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> > [ 845.475591] RIP: 0033:0x7f79de23fd1b
> > [ 845.475595] Code: 73 01 c3 48 8b 0d 05 a1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 a0 1b 00 f7 d8 64 89 01 48
> > [ 845.475598] RSP: 002b:00007f79ddff7c98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > [ 845.475602] RAX: ffffffffffffffda RBX: 00007f79ddff8640 RCX: 00007f79de23fd1b
> > [ 845.475605] RDX: 00007f79ddff7ca0 RSI: 000000004188aec6 RDI: 0000000000000004
> > [ 845.475607] RBP: 00007f79ddff85c0 R08: 0000000000000000 R09: 00007fffceb1ff2f
> > [ 845.475609] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f79ddff7ca0
> > [ 845.475611] R13: 0000000001c322a0 R14: 00007f79de2a05f0 R15: 0000000000000000
> > [ 845.475617] </TASK>
>
>


2023-01-16 18:06:30

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] Documentation: kvm: fix SRCU locking order docs

On 1/13/23 11:33, David Woodhouse wrote:
>> It's missing an important testcase; if it passes (does not warn), then
>> it should work:
>
> I think it does.

What I'm worried about is a false positive, not a false negative, so I'm
afraid your test may not cover this. I replied in the thread with
Boqun's patches.

Paolo