2024-04-26 04:16:39

by Yi Wang

[permalink] [raw]
Subject: [v4 RESEND 0/3] KVM: irqchip: synchronize srcu only if needed

From: Yi Wang <[email protected]>

We found that it may cost more than 20 milliseconds very accidentally
to enable cap of KVM_CAP_SPLIT_IRQCHIP on a host which has many vms
already.

The reason is that when vmm(qemu/CloudHypervisor) invokes
KVM_CAP_SPLIT_IRQCHIP kvm will call synchronize_srcu_expedited() and
might_sleep and kworker of srcu may cost some delay during this period.
One way makes sence is setup empty irq routing when creating vm and
so that x86/s390 don't need to setup empty/dummy irq routing.

Note: I have no s390 machine so this patch has not been tested
thoroughly on s390 platform. Thanks to Christian for a quick test on
s390 and it still seems to work[1].

Changelog:
----------
v4:
- replace loop with memset when setup empty irq routing table.

v3:
- squash setup empty routing function and use of that into one commit
- drop the comment in s390 part

v2:
- setup empty irq routing in kvm_create_vm
- don't setup irq routing in x86 KVM_CAP_SPLIT_IRQCHIP
- don't setup irq routing in s390 KVM_CREATE_IRQCHIP

v1:
https://lore.kernel.org/kvm/[email protected]/

1. https://lore.kernel.org/lkml/[email protected]/


Yi Wang (3):
KVM: setup empty irq routing when create vm
KVM: x86: don't setup empty irq routing when KVM_CAP_SPLIT_IRQCHIP
KVM: s390: don't setup dummy routing when KVM_CREATE_IRQCHIP

arch/s390/kvm/kvm-s390.c | 9 +--------
arch/x86/kvm/irq.h | 1 -
arch/x86/kvm/irq_comm.c | 5 -----
arch/x86/kvm/x86.c | 3 ---
include/linux/kvm_host.h | 1 +
virt/kvm/irqchip.c | 19 +++++++++++++++++++
virt/kvm/kvm_main.c | 4 ++++
7 files changed, 25 insertions(+), 17 deletions(-)

--
2.39.3



2024-04-26 04:17:12

by Yi Wang

[permalink] [raw]
Subject: [v4 RESEND 1/3] KVM: setup empty irq routing when create vm

From: Yi Wang <[email protected]>

Add a new function to setup empty irq routing in kvm path, which
can be invoded in non-architecture-specific functions. The difference
compared to the kvm_setup_empty_irq_routing() is this function just
alloc the empty irq routing and does not need synchronize srcu, as
we will call it in kvm_create_vm().

Using the new adding function, we can setup empty irq routing when
kvm_create_vm(), so that x86 and s390 no longer need to set
empty/dummy irq routing when creating an IRQCHIP 'cause it avoid
an synchronize_srcu.

Signed-off-by: Yi Wang <[email protected]>
---
include/linux/kvm_host.h | 1 +
virt/kvm/irqchip.c | 19 +++++++++++++++++++
virt/kvm/kvm_main.c | 4 ++++
3 files changed, 24 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 48f31dcd318a..9256539139ef 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2100,6 +2100,7 @@ int kvm_set_irq_routing(struct kvm *kvm,
const struct kvm_irq_routing_entry *entries,
unsigned nr,
unsigned flags);
+int kvm_setup_empty_irq_routing_lockless(struct kvm *kvm);
int kvm_set_routing_entry(struct kvm *kvm,
struct kvm_kernel_irq_routing_entry *e,
const struct kvm_irq_routing_entry *ue);
diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
index 1e567d1f6d3d..266bab99a8a8 100644
--- a/virt/kvm/irqchip.c
+++ b/virt/kvm/irqchip.c
@@ -237,3 +237,22 @@ int kvm_set_irq_routing(struct kvm *kvm,

return r;
}
+
+int kvm_setup_empty_irq_routing_lockless(struct kvm *kvm)
+{
+ struct kvm_irq_routing_table *new;
+ int chip_size;
+
+ new = kzalloc(struct_size(new, map, 1), GFP_KERNEL_ACCOUNT);
+ if (!new)
+ return -ENOMEM;
+
+ new->nr_rt_entries = 1;
+
+ chip_size = sizeof(int) * KVM_NR_IRQCHIPS * KVM_IRQCHIP_NUM_PINS;
+ memset(new->chip, -1, chip_size);
+
+ RCU_INIT_POINTER(kvm->irq_routing, new);
+
+ return 0;
+}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ff0a20565f90..b5f4fa9d050d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1285,6 +1285,10 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
if (r)
goto out_err;

+ r = kvm_setup_empty_irq_routing_lockless(kvm);
+ if (r)
+ goto out_err;
+
mutex_lock(&kvm_lock);
list_add(&kvm->vm_list, &vm_list);
mutex_unlock(&kvm_lock);
--
2.39.3


2024-04-26 04:17:13

by Yi Wang

[permalink] [raw]
Subject: [v4 RESEND 2/3] KVM: x86: don't setup empty irq routing when KVM_CAP_SPLIT_IRQCHIP

From: Yi Wang <[email protected]>

We found that it may cost more than 20 milliseconds very accidentally
to enable cap of KVM_CAP_SPLIT_IRQCHIP on a host which has many vms
already.

The reason is that when vmm(qemu/CloudHypervisor) invokes
KVM_CAP_SPLIT_IRQCHIP kvm will call synchronize_srcu_expedited() and
might_sleep and kworker of srcu may cost some delay during this period.

As we have set up empty irq routing when creating vm, so this is no
need now.

Signed-off-by: Yi Wang <[email protected]>
---
arch/x86/kvm/irq.h | 1 -
arch/x86/kvm/irq_comm.c | 5 -----
arch/x86/kvm/x86.c | 3 ---
3 files changed, 9 deletions(-)

diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index c2d7cfe82d00..76d46b2f41dd 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -106,7 +106,6 @@ void __kvm_migrate_timers(struct kvm_vcpu *vcpu);
int apic_has_pending_timer(struct kvm_vcpu *vcpu);

int kvm_setup_default_irq_routing(struct kvm *kvm);
-int kvm_setup_empty_irq_routing(struct kvm *kvm);
int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq,
struct dest_map *dest_map);
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 68f3f6c26046..6ee7ca39466e 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -397,11 +397,6 @@ int kvm_setup_default_irq_routing(struct kvm *kvm)

static const struct kvm_irq_routing_entry empty_routing[] = {};

-int kvm_setup_empty_irq_routing(struct kvm *kvm)
-{
- return kvm_set_irq_routing(kvm, empty_routing, 0, 0);
-}
-
void kvm_arch_post_irq_routing_update(struct kvm *kvm)
{
if (!irqchip_split(kvm))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 91478b769af0..01270182757b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6527,9 +6527,6 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
goto split_irqchip_unlock;
if (kvm->created_vcpus)
goto split_irqchip_unlock;
- r = kvm_setup_empty_irq_routing(kvm);
- if (r)
- goto split_irqchip_unlock;
/* Pairs with irqchip_in_kernel. */
smp_wmb();
kvm->arch.irqchip_mode = KVM_IRQCHIP_SPLIT;
--
2.39.3


2024-04-26 04:17:32

by Yi Wang

[permalink] [raw]
Subject: [v4 RESEND 3/3] KVM: s390: don't setup dummy routing when KVM_CREATE_IRQCHIP

From: Yi Wang <[email protected]>

As we have setup empty irq routing in kvm_create_vm(), there's
no need to setup dummy routing when KVM_CREATE_IRQCHIP.

Signed-off-by: Yi Wang <[email protected]>
---
arch/s390/kvm/kvm-s390.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 5147b943a864..ba7fd39bcbf4 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2998,14 +2998,7 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
break;
}
case KVM_CREATE_IRQCHIP: {
- struct kvm_irq_routing_entry routing;
-
- r = -EINVAL;
- if (kvm->arch.use_irqchip) {
- /* Set up dummy routing. */
- memset(&routing, 0, sizeof(routing));
- r = kvm_set_irq_routing(kvm, &routing, 0, 0);
- }
+ r = 0;
break;
}
case KVM_SET_DEVICE_ATTR: {
--
2.39.3


2024-05-03 22:09:41

by Sean Christopherson

[permalink] [raw]
Subject: Re: [v4 RESEND 1/3] KVM: setup empty irq routing when create vm

On Fri, Apr 26, 2024, Yi Wang wrote:
> From: Yi Wang <[email protected]>
>
> Add a new function to setup empty irq routing in kvm path, which
> can be invoded in non-architecture-specific functions. The difference
> compared to the kvm_setup_empty_irq_routing() is this function just
> alloc the empty irq routing and does not need synchronize srcu, as
> we will call it in kvm_create_vm().
>
> Using the new adding function, we can setup empty irq routing when
> kvm_create_vm(), so that x86 and s390 no longer need to set
> empty/dummy irq routing when creating an IRQCHIP 'cause it avoid
> an synchronize_srcu.
>
> Signed-off-by: Yi Wang <[email protected]>
> ---
> include/linux/kvm_host.h | 1 +
> virt/kvm/irqchip.c | 19 +++++++++++++++++++
> virt/kvm/kvm_main.c | 4 ++++
> 3 files changed, 24 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 48f31dcd318a..9256539139ef 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2100,6 +2100,7 @@ int kvm_set_irq_routing(struct kvm *kvm,
> const struct kvm_irq_routing_entry *entries,
> unsigned nr,
> unsigned flags);
> +int kvm_setup_empty_irq_routing_lockless(struct kvm *kvm);

This is in an #ifdef, the #else needs a stub (for MIPS).

> int kvm_set_routing_entry(struct kvm *kvm,
> struct kvm_kernel_irq_routing_entry *e,
> const struct kvm_irq_routing_entry *ue);
> diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
> index 1e567d1f6d3d..266bab99a8a8 100644
> --- a/virt/kvm/irqchip.c
> +++ b/virt/kvm/irqchip.c
> @@ -237,3 +237,22 @@ int kvm_set_irq_routing(struct kvm *kvm,
>
> return r;
> }
> +
> +int kvm_setup_empty_irq_routing_lockless(struct kvm *kvm)

I vote for kvm_init_irq_routing() to make it more obvious that the API is purely
for initializing the routing, i.e. only to be used at VM creation. Then the
"lockless" tag is largely redundant. And then maybe add a comment about how this
creates an empty routing table? Because every time I look at this code, it takes
me a few seconds to remember how this is actually an empty table.

> +{
> + struct kvm_irq_routing_table *new;
> + int chip_size;
> +
> + new = kzalloc(struct_size(new, map, 1), GFP_KERNEL_ACCOUNT);
> + if (!new)
> + return -ENOMEM;
> +
> + new->nr_rt_entries = 1;
> +
> + chip_size = sizeof(int) * KVM_NR_IRQCHIPS * KVM_IRQCHIP_NUM_PINS;
> + memset(new->chip, -1, chip_size);
> +
> + RCU_INIT_POINTER(kvm->irq_routing, new);
> +
> + return 0;
> +}
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ff0a20565f90..b5f4fa9d050d 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1285,6 +1285,10 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
> if (r)
> goto out_err;
>
> + r = kvm_setup_empty_irq_routing_lockless(kvm);
> + if (r)
> + goto out_err;

This is too late. It might not matter in practice, but the call before this is
to kvm_arch_post_init_vm(), which quite strongly suggests that *all* common setup
is done before that arch hook is invoked.

Calling this right before kvm_arch_init_vm() seems like the best/easiest fit, e.g.

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2e388972d856..ab607441686f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1197,9 +1197,13 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
rcu_assign_pointer(kvm->buses[i],
kzalloc(sizeof(struct kvm_io_bus), GFP_KERNEL_ACCOUNT));
if (!kvm->buses[i])
- goto out_err_no_arch_destroy_vm;
+ goto out_err_no_irq_routing;
}

+ r = kvm_init_irq_routing(kvm);
+ if (r)
+ goto out_err_no_irq_routing;
+
r = kvm_arch_init_vm(kvm, type);
if (r)
goto out_err_no_arch_destroy_vm;
@@ -1254,6 +1258,8 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
WARN_ON_ONCE(!refcount_dec_and_test(&kvm->users_count));
for (i = 0; i < KVM_NR_BUSES; i++)
kfree(kvm_get_bus(kvm, i));
+ kvm_free_irq_routing(kvm);
+out_err_no_irq_routing:
cleanup_srcu_struct(&kvm->irq_srcu);
out_err_no_irq_srcu:
cleanup_srcu_struct(&kvm->srcu);


2024-05-05 04:57:10

by Yi Wang

[permalink] [raw]
Subject: Re: [v4 RESEND 1/3] KVM: setup empty irq routing when create vm

Hi Sean,

Thanks so much for your patient reply.

On Sat, May 4, 2024 at 6:08 AM Sean Christopherson <[email protected]> wrote:
>
> On Fri, Apr 26, 2024, Yi Wang wrote:
> > From: Yi Wang <[email protected]>
> >
> > Add a new function to setup empty irq routing in kvm path, which
> > can be invoded in non-architecture-specific functions. The difference
> > compared to the kvm_setup_empty_irq_routing() is this function just
> > alloc the empty irq routing and does not need synchronize srcu, as
> > we will call it in kvm_create_vm().
> >
> > Using the new adding function, we can setup empty irq routing when
> > kvm_create_vm(), so that x86 and s390 no longer need to set
> > empty/dummy irq routing when creating an IRQCHIP 'cause it avoid
> > an synchronize_srcu.
> >
> > Signed-off-by: Yi Wang <[email protected]>
> > ---
> > include/linux/kvm_host.h | 1 +
> > virt/kvm/irqchip.c | 19 +++++++++++++++++++
> > virt/kvm/kvm_main.c | 4 ++++
> > 3 files changed, 24 insertions(+)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 48f31dcd318a..9256539139ef 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -2100,6 +2100,7 @@ int kvm_set_irq_routing(struct kvm *kvm,
> > const struct kvm_irq_routing_entry *entries,
> > unsigned nr,
> > unsigned flags);
> > +int kvm_setup_empty_irq_routing_lockless(struct kvm *kvm);
>
> This is in an #ifdef, the #else needs a stub (for MIPS).

Okay, I'll update the patch.

>
> > int kvm_set_routing_entry(struct kvm *kvm,
> > struct kvm_kernel_irq_routing_entry *e,
> > const struct kvm_irq_routing_entry *ue);
> > diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
> > index 1e567d1f6d3d..266bab99a8a8 100644
> > --- a/virt/kvm/irqchip.c
> > +++ b/virt/kvm/irqchip.c
> > @@ -237,3 +237,22 @@ int kvm_set_irq_routing(struct kvm *kvm,
> >
> > return r;
> > }
> > +
> > +int kvm_setup_empty_irq_routing_lockless(struct kvm *kvm)
>
> I vote for kvm_init_irq_routing() to make it more obvious that the API is purely
> for initializing the routing, i.e. only to be used at VM creation. Then the
> "lockless" tag is largely redundant. And then maybe add a comment about how this
> creates an empty routing table? Because every time I look at this code, it takes
> me a few seconds to remember how this is actually an empty table.

That sounds reasonable to me.

>
> > +{
> > + struct kvm_irq_routing_table *new;
> > + int chip_size;
> > +
> > + new = kzalloc(struct_size(new, map, 1), GFP_KERNEL_ACCOUNT);
> > + if (!new)
> > + return -ENOMEM;
> > +
> > + new->nr_rt_entries = 1;
> > +
> > + chip_size = sizeof(int) * KVM_NR_IRQCHIPS * KVM_IRQCHIP_NUM_PINS;
> > + memset(new->chip, -1, chip_size);
> > +
> > + RCU_INIT_POINTER(kvm->irq_routing, new);
> > +
> > + return 0;
> > +}
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index ff0a20565f90..b5f4fa9d050d 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -1285,6 +1285,10 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
> > if (r)
> > goto out_err;
> >
> > + r = kvm_setup_empty_irq_routing_lockless(kvm);
> > + if (r)
> > + goto out_err;
>
> This is too late. It might not matter in practice, but the call before this is
> to kvm_arch_post_init_vm(), which quite strongly suggests that *all* common setup
> is done before that arch hook is invoked.
>
> Calling this right before kvm_arch_init_vm() seems like the best/easiest fit, e.g.

Got it. I will update the patch ASAP, before that I will do some tests.

Thanks again, Sean.

>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 2e388972d856..ab607441686f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1197,9 +1197,13 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
> rcu_assign_pointer(kvm->buses[i],
> kzalloc(sizeof(struct kvm_io_bus), GFP_KERNEL_ACCOUNT));
> if (!kvm->buses[i])
> - goto out_err_no_arch_destroy_vm;
> + goto out_err_no_irq_routing;
> }
>
> + r = kvm_init_irq_routing(kvm);
> + if (r)
> + goto out_err_no_irq_routing;
> +
> r = kvm_arch_init_vm(kvm, type);
> if (r)
> goto out_err_no_arch_destroy_vm;
> @@ -1254,6 +1258,8 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
> WARN_ON_ONCE(!refcount_dec_and_test(&kvm->users_count));
> for (i = 0; i < KVM_NR_BUSES; i++)
> kfree(kvm_get_bus(kvm, i));
> + kvm_free_irq_routing(kvm);
> +out_err_no_irq_routing:
> cleanup_srcu_struct(&kvm->irq_srcu);
> out_err_no_irq_srcu:
> cleanup_srcu_struct(&kvm->srcu);
>


--
---
Best wishes
Yi Wang