From: Prasad Pandit <[email protected]>
kvm_vcpu_ioctl_x86_set_vcpu_events() routine makes 'KVM_REQ_NMI'
request for a vcpu even when its 'events->nmi.pending' is zero.
Ex:
qemu_thread_start
kvm_vcpu_thread_fn
qemu_wait_io_event
qemu_wait_io_event_common
process_queued_cpu_work
do_kvm_cpu_synchronize_post_init/_reset
kvm_arch_put_registers
kvm_put_vcpu_events (cpu, level=[2|3])
This leads vCPU threads in QEMU to constantly acquire & release the
global mutex lock, delaying the guest boot due to lock contention.
Add check to make KVM_REQ_NMI request only if vcpu has NMI pending.
Fixes: bdedff263132 ("KVM: x86: Route pending NMIs from userspace through process_nmi()")
Signed-off-by: Prasad Pandit <[email protected]>
---
arch/x86/kvm/x86.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1a3aaa7dafae..468870450b8b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5405,7 +5405,8 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
vcpu->arch.nmi_pending = 0;
atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
- kvm_make_request(KVM_REQ_NMI, vcpu);
+ if (events->nmi.pending)
+ kvm_make_request(KVM_REQ_NMI, vcpu);
}
static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
--
2.43.0
On Wed, 3 Jan 2024 at 13:24, Prasad Pandit <[email protected]> wrote:
> kvm_vcpu_ioctl_x86_set_vcpu_events() routine makes 'KVM_REQ_NMI'
> request for a vcpu even when its 'events->nmi.pending' is zero.
> Ex:
> qemu_thread_start
> kvm_vcpu_thread_fn
> qemu_wait_io_event
> qemu_wait_io_event_common
> process_queued_cpu_work
> do_kvm_cpu_synchronize_post_init/_reset
> kvm_arch_put_registers
> kvm_put_vcpu_events (cpu, level=[2|3])
>
> This leads vCPU threads in QEMU to constantly acquire & release the
> global mutex lock, delaying the guest boot due to lock contention.
> Add check to make KVM_REQ_NMI request only if vcpu has NMI pending.
>
> Fixes: bdedff263132 ("KVM: x86: Route pending NMIs from userspace through process_nmi()")
> Signed-off-by: Prasad Pandit <[email protected]>
> ---
> arch/x86/kvm/x86.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1a3aaa7dafae..468870450b8b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5405,7 +5405,8 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
> if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
> vcpu->arch.nmi_pending = 0;
> atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
> - kvm_make_request(KVM_REQ_NMI, vcpu);
> + if (events->nmi.pending)
> + kvm_make_request(KVM_REQ_NMI, vcpu);
> }
> static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
> --
> 2.43.0
Ping...!
---
- Prasad
On Mon, Jan 08, 2024, Prasad Pandit wrote:
> On Wed, 3 Jan 2024 at 13:24, Prasad Pandit <[email protected]> wrote:
> > kvm_vcpu_ioctl_x86_set_vcpu_events() routine makes 'KVM_REQ_NMI'
> > request for a vcpu even when its 'events->nmi.pending' is zero.
> > Ex:
> > qemu_thread_start
> > kvm_vcpu_thread_fn
> > qemu_wait_io_event
> > qemu_wait_io_event_common
> > process_queued_cpu_work
> > do_kvm_cpu_synchronize_post_init/_reset
> > kvm_arch_put_registers
> > kvm_put_vcpu_events (cpu, level=[2|3])
> >
> > This leads vCPU threads in QEMU to constantly acquire & release the
> > global mutex lock, delaying the guest boot due to lock contention.
> > Add check to make KVM_REQ_NMI request only if vcpu has NMI pending.
> >
> > Fixes: bdedff263132 ("KVM: x86: Route pending NMIs from userspace through process_nmi()")
> > Signed-off-by: Prasad Pandit <[email protected]>
> > ---
> > arch/x86/kvm/x86.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 1a3aaa7dafae..468870450b8b 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5405,7 +5405,8 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
> > if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
> > vcpu->arch.nmi_pending = 0;
> > atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
> > - kvm_make_request(KVM_REQ_NMI, vcpu);
> > + if (events->nmi.pending)
> > + kvm_make_request(KVM_REQ_NMI, vcpu);
> > }
> > static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
> > --
> > 2.43.0
>
> Ping...!
This is on my list of things to grab for 6.8, I'm just waiting for various pull
requests to fully land in order to simplify my branch management.
On Mon, 8 Jan 2024 at 23:08, Sean Christopherson <[email protected]> wrote:
> This is on my list of things to grab for 6.8, I'm just waiting for various pull
> requests to fully land in order to simplify my branch management.
* Okay, cool.
Thank you.
---
- Prasad
On Wed, 03 Jan 2024 13:23:43 +0530, Prasad Pandit wrote:
> kvm_vcpu_ioctl_x86_set_vcpu_events() routine makes 'KVM_REQ_NMI'
> request for a vcpu even when its 'events->nmi.pending' is zero.
> Ex:
> qemu_thread_start
> kvm_vcpu_thread_fn
> qemu_wait_io_event
> qemu_wait_io_event_common
> process_queued_cpu_work
> do_kvm_cpu_synchronize_post_init/_reset
> kvm_arch_put_registers
> kvm_put_vcpu_events (cpu, level=[2|3])
>
> [...]
Applied to kvm-x86 fixes, thanks!
[1/1] KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu
https://github.com/kvm-x86/linux/commit/6231c9e1a9f3
--
https://github.com/kvm-x86/linux/tree/next
Hi Prasad,
On 1/2/24 23:53, Prasad Pandit wrote:
> From: Prasad Pandit <[email protected]>
>
> kvm_vcpu_ioctl_x86_set_vcpu_events() routine makes 'KVM_REQ_NMI'
> request for a vcpu even when its 'events->nmi.pending' is zero.
> Ex:
> qemu_thread_start
> kvm_vcpu_thread_fn
> qemu_wait_io_event
> qemu_wait_io_event_common
> process_queued_cpu_work
> do_kvm_cpu_synchronize_post_init/_reset
> kvm_arch_put_registers
> kvm_put_vcpu_events (cpu, level=[2|3])
>
> This leads vCPU threads in QEMU to constantly acquire & release the
> global mutex lock, delaying the guest boot due to lock contention.
Would you mind sharing where and how the lock contention is at QEMU space? That
is, how the QEMU mutex lock is impacted by KVM KVM_REQ_NMI?
Or you meant line 3031 at QEMU side?
2858 int kvm_cpu_exec(CPUState *cpu)
2859 {
2860 struct kvm_run *run = cpu->kvm_run;
2861 int ret, run_ret;
.. ...
3023 default:
3024 DPRINTF("kvm_arch_handle_exit\n");
3025 ret = kvm_arch_handle_exit(cpu, run);
3026 break;
3027 }
3028 } while (ret == 0);
3029
3030 cpu_exec_end(cpu);
3031 qemu_mutex_lock_iothread();
3032
3033 if (ret < 0) {
3034 cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
3035 vm_stop(RUN_STATE_INTERNAL_ERROR);
3036 }
3037
3038 qatomic_set(&cpu->exit_request, 0);
3039 return ret;
3040 }
Thank you very much!
Dongli Zhang
> Add check to make KVM_REQ_NMI request only if vcpu has NMI pending.
>
> Fixes: bdedff263132 ("KVM: x86: Route pending NMIs from userspace through process_nmi()")
> Signed-off-by: Prasad Pandit <[email protected]>
> ---
> arch/x86/kvm/x86.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1a3aaa7dafae..468870450b8b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5405,7 +5405,8 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
> if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
> vcpu->arch.nmi_pending = 0;
> atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
> - kvm_make_request(KVM_REQ_NMI, vcpu);
> + if (events->nmi.pending)
> + kvm_make_request(KVM_REQ_NMI, vcpu);
> }
> static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
>
> --
> 2.43.0
>
>
On Tue, Feb 06, 2024, Dongli Zhang wrote:
> Hi Prasad,
>
> On 1/2/24 23:53, Prasad Pandit wrote:
> > From: Prasad Pandit <[email protected]>
> >
> > kvm_vcpu_ioctl_x86_set_vcpu_events() routine makes 'KVM_REQ_NMI'
> > request for a vcpu even when its 'events->nmi.pending' is zero.
> > Ex:
> > qemu_thread_start
> > kvm_vcpu_thread_fn
> > qemu_wait_io_event
> > qemu_wait_io_event_common
> > process_queued_cpu_work
> > do_kvm_cpu_synchronize_post_init/_reset
> > kvm_arch_put_registers
> > kvm_put_vcpu_events (cpu, level=[2|3])
> >
> > This leads vCPU threads in QEMU to constantly acquire & release the
> > global mutex lock, delaying the guest boot due to lock contention.
>
> Would you mind sharing where and how the lock contention is at QEMU space? That
> is, how the QEMU mutex lock is impacted by KVM KVM_REQ_NMI?
>
> Or you meant line 3031 at QEMU side?
Yeah, something like that. Details in this thread.
https://lore.kernel.org/all/CAE8KmOyffXD4k69vRAFwesaqrBGzFY3i+kefbkHcQf4=jNYzOA@mail.gmail.com
> 2858 int kvm_cpu_exec(CPUState *cpu)
> 2859 {
> 2860 struct kvm_run *run = cpu->kvm_run;
> 2861 int ret, run_ret;
> ... ...
> 3023 default:
> 3024 DPRINTF("kvm_arch_handle_exit\n");
> 3025 ret = kvm_arch_handle_exit(cpu, run);
> 3026 break;
> 3027 }
> 3028 } while (ret == 0);
> 3029
> 3030 cpu_exec_end(cpu);
> 3031 qemu_mutex_lock_iothread();
> 3032
> 3033 if (ret < 0) {
> 3034 cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
> 3035 vm_stop(RUN_STATE_INTERNAL_ERROR);
> 3036 }
> 3037
> 3038 qatomic_set(&cpu->exit_request, 0);
> 3039 return ret;
> 3040 }
On 2/6/24 12:58, Sean Christopherson wrote:
> On Tue, Feb 06, 2024, Dongli Zhang wrote:
>> Hi Prasad,
>>
>> On 1/2/24 23:53, Prasad Pandit wrote:
>>> From: Prasad Pandit <[email protected]>
>>>
>>> kvm_vcpu_ioctl_x86_set_vcpu_events() routine makes 'KVM_REQ_NMI'
>>> request for a vcpu even when its 'events->nmi.pending' is zero.
>>> Ex:
>>> qemu_thread_start
>>> kvm_vcpu_thread_fn
>>> qemu_wait_io_event
>>> qemu_wait_io_event_common
>>> process_queued_cpu_work
>>> do_kvm_cpu_synchronize_post_init/_reset
>>> kvm_arch_put_registers
>>> kvm_put_vcpu_events (cpu, level=[2|3])
>>>
>>> This leads vCPU threads in QEMU to constantly acquire & release the
>>> global mutex lock, delaying the guest boot due to lock contention.
>>
>> Would you mind sharing where and how the lock contention is at QEMU space? That
>> is, how the QEMU mutex lock is impacted by KVM KVM_REQ_NMI?
>>
>> Or you meant line 3031 at QEMU side?
>
> Yeah, something like that. Details in this thread.
>
> https://urldefense.com/v3/__https://lore.kernel.org/all/CAE8KmOyffXD4k69vRAFwesaqrBGzFY3i*[email protected]__;Kw!!ACWV5N9M2RV99hQ!N61g2QXuC5B5RpVNBowgKUXjHzX4vp0lCXuton3fKVRbzBuXaVtJgePu0ddSf3EB9EEQORTmwop4vD5KrQ$
Thank you very much for pointing to the discussion. I should have found them :)
Here is my understanding.
1. During the VM creation, the mp_state of AP (non-BSP) is always
KVM_MP_STATE_UNINITIALIZED, until INIT/SIPI.
2. Ideally, AP should block at below. That is, line 3775 is always false.
3760 bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
3761 {
3762 struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
3763 bool waited = false;
3764
3765 vcpu->stat.generic.blocking = 1;
3766
3767 preempt_disable();
3768 kvm_arch_vcpu_blocking(vcpu);
3769 prepare_to_rcuwait(wait);
3770 preempt_enable();
3771
3772 for (;;) {
3773 set_current_state(TASK_INTERRUPTIBLE);
3774
3775 if (kvm_vcpu_check_block(vcpu) < 0)
3776 break;
3777
3778 waited = true;
3779 schedule();
3780 }
3. Unfortunately, the issue may set KVM_REQ_NMI for AP.
4. This leads to the kvm_vcpu_check_block() to return.
kvm_arch_vcpu_ioctl_run()
-> kvm_vcpu_block()
-> kvm_vcpu_check_block()
-> kvm_arch_vcpu_runnable()
-> kvm_vcpu_has_events()
-> kvm_test_request(KVM_REQ_NMI, vcpu)
5. The kvm_arch_vcpu_ioctl_run() returns to QEMU with -EAGAIN.
6. The QEMU side is not able to handle -EAGAIN, but to goto line 2984 to return.
It acquires the global mutex at line 2976 (release before entering into guest
again). The KVM_REQ_NMI is never cleared until INIT/SIPI.
2808 int kvm_cpu_exec(CPUState *cpu)
2809 {
.. ...
2868 if (run_ret < 0) {
2869 if (run_ret == -EINTR || run_ret == -EAGAIN) {
2870 trace_kvm_io_window_exit();
2871 kvm_eat_signals(cpu);
2872 ret = EXCP_INTERRUPT;
2873 break;
2874 }
.. ...
2973 } while (ret == 0);
2974
2975 cpu_exec_end(cpu);
2976 bql_lock();
2977
2978 if (ret < 0) {
2979 cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
2980 vm_stop(RUN_STATE_INTERNAL_ERROR);
2981 }
2982
2983 qatomic_set(&cpu->exit_request, 0);
2984 return ret;
2985 }
7. The QEMU AP vCPU thread enters into KVM_RUN again. Same flow as step 4, goto
step 4, again and again.
The lock has been frequently acquired/released. The vCPU 0 is unhappy with it,
especially when the number of APs is large!
I guess it is not an issue after VM reboot (without QEMU instance re-creation
because the mpstate is not KVM_MP_STATE_UNINITIALIZED any longer).
Thank you very much!
Dongli Zhang
>
>> 2858 int kvm_cpu_exec(CPUState *cpu)
>> 2859 {
>> 2860 struct kvm_run *run = cpu->kvm_run;
>> 2861 int ret, run_ret;
>> ... ...
>> 3023 default:
>> 3024 DPRINTF("kvm_arch_handle_exit\n");
>> 3025 ret = kvm_arch_handle_exit(cpu, run);
>> 3026 break;
>> 3027 }
>> 3028 } while (ret == 0);
>> 3029
>> 3030 cpu_exec_end(cpu);
>> 3031 qemu_mutex_lock_iothread();
>> 3032
>> 3033 if (ret < 0) {
>> 3034 cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
>> 3035 vm_stop(RUN_STATE_INTERNAL_ERROR);
>> 3036 }
>> 3037
>> 3038 qatomic_set(&cpu->exit_request, 0);
>> 3039 return ret;
>> 3040 }