2017-08-11 20:12:09

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH] KVM: SVM: delete avic_vm_id_bitmap (2 megabyte static array)

With lightly tweaked defconfig:

text data bss dec hex filename
11259661 5109408 2981888 19350957 12745ad vmlinux.before
11259661 5109408 884736 17253805 10745ad vmlinux.after

Only compile-tested.

Signed-off-by: Denys Vlasenko <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
arch/x86/kvm/svm.c | 61 +++++++++++++++++++-----------------------------------
1 file changed, 21 insertions(+), 40 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1107626..f575089 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -280,10 +280,6 @@ module_param(avic, int, S_IRUGO);
static int vls = true;
module_param(vls, int, 0444);

-/* AVIC VM ID bit masks and lock */
-static DECLARE_BITMAP(avic_vm_id_bitmap, AVIC_VM_ID_NR);
-static DEFINE_SPINLOCK(avic_vm_id_lock);
-
static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
static void svm_flush_tlb(struct kvm_vcpu *vcpu);
static void svm_complete_interrupts(struct vcpu_svm *svm);
@@ -989,6 +985,8 @@ static void disable_nmi_singlestep(struct vcpu_svm *svm)
*/
#define SVM_VM_DATA_HASH_BITS 8
static DEFINE_HASHTABLE(svm_vm_data_hash, SVM_VM_DATA_HASH_BITS);
+static u32 next_vm_id = 0;
+static bool next_vm_id_wrapped = 0;
static DEFINE_SPINLOCK(svm_vm_data_hash_lock);

/* Note:
@@ -1387,34 +1385,6 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
return 0;
}

-static inline int avic_get_next_vm_id(void)
-{
- int id;
-
- spin_lock(&avic_vm_id_lock);
-
- /* AVIC VM ID is one-based. */
- id = find_next_zero_bit(avic_vm_id_bitmap, AVIC_VM_ID_NR, 1);
- if (id <= AVIC_VM_ID_MASK)
- __set_bit(id, avic_vm_id_bitmap);
- else
- id = -EAGAIN;
-
- spin_unlock(&avic_vm_id_lock);
- return id;
-}
-
-static inline int avic_free_vm_id(int id)
-{
- if (id <= 0 || id > AVIC_VM_ID_MASK)
- return -EINVAL;
-
- spin_lock(&avic_vm_id_lock);
- __clear_bit(id, avic_vm_id_bitmap);
- spin_unlock(&avic_vm_id_lock);
- return 0;
-}
-
static void avic_vm_destroy(struct kvm *kvm)
{
unsigned long flags;
@@ -1423,8 +1393,6 @@ static void avic_vm_destroy(struct kvm *kvm)
if (!avic)
return;

- avic_free_vm_id(vm_data->avic_vm_id);
-
if (vm_data->avic_logical_id_table_page)
__free_page(vm_data->avic_logical_id_table_page);
if (vm_data->avic_physical_id_table_page)
@@ -1438,19 +1406,16 @@ static void avic_vm_destroy(struct kvm *kvm)
static int avic_vm_init(struct kvm *kvm)
{
unsigned long flags;
- int vm_id, err = -ENOMEM;
+ int err = -ENOMEM;
struct kvm_arch *vm_data = &kvm->arch;
struct page *p_page;
struct page *l_page;
+ struct kvm_arch *ka;
+ u32 vm_id;

if (!avic)
return 0;

- vm_id = avic_get_next_vm_id();
- if (vm_id < 0)
- return vm_id;
- vm_data->avic_vm_id = (u32)vm_id;
-
/* Allocating physical APIC ID table (4KB) */
p_page = alloc_page(GFP_KERNEL);
if (!p_page)
@@ -1468,6 +1433,22 @@ static int avic_vm_init(struct kvm *kvm)
clear_page(page_address(l_page));

spin_lock_irqsave(&svm_vm_data_hash_lock, flags);
+ again:
+ vm_id = next_vm_id = (next_vm_id + 1) & AVIC_VM_ID_MASK;
+ if (vm_id == 0) { /* id is 1-based, zero is not okay */
+ next_vm_id_wrapped = 1;
+ goto again;
+ }
+ /* Is it still in use? Only possible if wrapped at least once */
+ if (next_vm_id_wrapped) {
+ hash_for_each_possible(svm_vm_data_hash, ka, hnode, vm_id) {
+ struct kvm *k2 = container_of(ka, struct kvm, arch);
+ struct kvm_arch *vd2 = &k2->arch;
+ if (vd2->avic_vm_id == vm_id)
+ goto again;
+ }
+ }
+ vm_data->avic_vm_id = vm_id;
hash_add(svm_vm_data_hash, &vm_data->hnode, vm_data->avic_vm_id);
spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags);

--
2.9.2


2017-08-15 20:12:06

by Radim Krčmář

[permalink] [raw]
Subject: [PATCH] KVM: SVM: refactor avic VM ID allocation

2017-08-11 22:11+0200, Denys Vlasenko:
> With lightly tweaked defconfig:
>
> text data bss dec hex filename
> 11259661 5109408 2981888 19350957 12745ad vmlinux.before
> 11259661 5109408 884736 17253805 10745ad vmlinux.after
>
> Only compile-tested.
>
> Signed-off-by: Denys Vlasenko <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> ---

Ah, I suspected my todo wasn't this short; thanks for the patch!

> @@ -1468,6 +1433,22 @@ static int avic_vm_init(struct kvm *kvm)
> clear_page(page_address(l_page));
>
> spin_lock_irqsave(&svm_vm_data_hash_lock, flags);
> + again:
> + vm_id = next_vm_id = (next_vm_id + 1) & AVIC_VM_ID_MASK;
> + if (vm_id == 0) { /* id is 1-based, zero is not okay */

Suravee, did the reserved zero mean something?

> + next_vm_id_wrapped = 1;
> + goto again;
> + }
> + /* Is it still in use? Only possible if wrapped at least once */
> + if (next_vm_id_wrapped) {
> + hash_for_each_possible(svm_vm_data_hash, ka, hnode, vm_id) {
> + struct kvm *k2 = container_of(ka, struct kvm, arch);
> + struct kvm_arch *vd2 = &k2->arch;
> + if (vd2->avic_vm_id == vm_id)
> + goto again;

Although hitting the case where all 2^24 ids are already used is
practically impossible, I don't like the loose end ...

What about applying something like the following on top?
---8<---
Subject: [PATCH] KVM: SVM: refactor avic VM ID allocation

We missed protection in case someone deserving a medal allocated 2^24
VMs and got a deadlock instead. I think it is nicer without the goto
even if we droppped the error handling.

Signed-off-by: Radim Krčmář <[email protected]>
---
arch/x86/kvm/svm.c | 58 +++++++++++++++++++++++++++++++++++-------------------
1 file changed, 38 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c21b49b5ee96..4d9ee8d76db3 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -985,8 +985,6 @@ static void disable_nmi_singlestep(struct vcpu_svm *svm)
*/
#define SVM_VM_DATA_HASH_BITS 8
static DEFINE_HASHTABLE(svm_vm_data_hash, SVM_VM_DATA_HASH_BITS);
-static u32 next_vm_id = 0;
-static bool next_vm_id_wrapped = 0;
static DEFINE_SPINLOCK(svm_vm_data_hash_lock);

/* Note:
@@ -1385,6 +1383,38 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
return 0;
}

+static bool avic_vm_id_is_used(u32 vm_id)
+{
+ struct kvm_arch *ka;
+
+ hash_for_each_possible(svm_vm_data_hash, ka, hnode, vm_id)
+ if (ka->avic_vm_id == vm_id)
+ return true;
+
+ return false;
+}
+
+static u32 avic_get_next_vm_id(void)
+{
+ static u32 next_vm_id = 0;
+ static bool next_vm_id_wrapped = false;
+ unsigned i;
+
+ for (i = 0; i < AVIC_VM_ID_NR; i++) {
+ next_vm_id = (next_vm_id + 1) & AVIC_VM_ID_MASK;
+
+ if (next_vm_id == 0) { /* id is 1-based, zero is not okay */
+ next_vm_id = 1;
+ next_vm_id_wrapped = true;
+ }
+
+ if (!next_vm_id_wrapped || !avic_vm_id_is_used(next_vm_id))
+ return next_vm_id;
+ }
+
+ return 0;
+}
+
static void avic_vm_destroy(struct kvm *kvm)
{
unsigned long flags;
@@ -1410,8 +1440,6 @@ static int avic_vm_init(struct kvm *kvm)
struct kvm_arch *vm_data = &kvm->arch;
struct page *p_page;
struct page *l_page;
- struct kvm_arch *ka;
- u32 vm_id;

if (!avic)
return 0;
@@ -1432,28 +1460,18 @@ static int avic_vm_init(struct kvm *kvm)
vm_data->avic_logical_id_table_page = l_page;
clear_page(page_address(l_page));

+ err = -EAGAIN;
spin_lock_irqsave(&svm_vm_data_hash_lock, flags);
- again:
- vm_id = next_vm_id = (next_vm_id + 1) & AVIC_VM_ID_MASK;
- if (vm_id == 0) { /* id is 1-based, zero is not okay */
- next_vm_id_wrapped = 1;
- goto again;
- }
- /* Is it still in use? Only possible if wrapped at least once */
- if (next_vm_id_wrapped) {
- hash_for_each_possible(svm_vm_data_hash, ka, hnode, vm_id) {
- struct kvm *k2 = container_of(ka, struct kvm, arch);
- struct kvm_arch *vd2 = &k2->arch;
- if (vd2->avic_vm_id == vm_id)
- goto again;
- }
- }
- vm_data->avic_vm_id = vm_id;
+ vm_data->avic_vm_id = avic_get_next_vm_id();
+ if (!vm_data->avic_vm_id)
+ goto unlock;
hash_add(svm_vm_data_hash, &vm_data->hnode, vm_data->avic_vm_id);
spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags);

return 0;

+unlock:
+ spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags);
free_avic:
avic_vm_destroy(kvm);
return err;
--
2.13.3

2017-08-17 14:33:17

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] KVM: SVM: refactor avic VM ID allocation

On 15/08/2017 22:12, Radim Krčmář wrote:
> 2017-08-11 22:11+0200, Denys Vlasenko:
>> With lightly tweaked defconfig:
>>
>> text data bss dec hex filename
>> 11259661 5109408 2981888 19350957 12745ad vmlinux.before
>> 11259661 5109408 884736 17253805 10745ad vmlinux.after
>>
>> Only compile-tested.
>>
>> Signed-off-by: Denys Vlasenko <[email protected]>
>> Cc: Joerg Roedel <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> ---
>
> Ah, I suspected my todo wasn't this short; thanks for the patch!
>
>> @@ -1468,6 +1433,22 @@ static int avic_vm_init(struct kvm *kvm)
>> clear_page(page_address(l_page));
>>
>> spin_lock_irqsave(&svm_vm_data_hash_lock, flags);
>> + again:
>> + vm_id = next_vm_id = (next_vm_id + 1) & AVIC_VM_ID_MASK;
>> + if (vm_id == 0) { /* id is 1-based, zero is not okay */
>
> Suravee, did the reserved zero mean something?
>
>> + next_vm_id_wrapped = 1;
>> + goto again;
>> + }
>> + /* Is it still in use? Only possible if wrapped at least once */
>> + if (next_vm_id_wrapped) {
>> + hash_for_each_possible(svm_vm_data_hash, ka, hnode, vm_id) {
>> + struct kvm *k2 = container_of(ka, struct kvm, arch);
>> + struct kvm_arch *vd2 = &k2->arch;
>> + if (vd2->avic_vm_id == vm_id)
>> + goto again;
>
> Although hitting the case where all 2^24 ids are already used is
> practically impossible, I don't like the loose end ...

I think even the case where 2^16 ids are used deserves a medal. Why
don't we just make the bitmap 8 KiB and call it a day? :)

Paolo


> What about applying something like the following on top?
> ---8<---
> Subject: [PATCH] KVM: SVM: refactor avic VM ID allocation
>
> We missed protection in case someone deserving a medal allocated 2^24
> VMs and got a deadlock instead. I think it is nicer without the goto
> even if we droppped the error handling.
>
> Signed-off-by: Radim Krčmář <[email protected]>
> ---
> arch/x86/kvm/svm.c | 58 +++++++++++++++++++++++++++++++++++-------------------
> 1 file changed, 38 insertions(+), 20 deletions(-)
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index c21b49b5ee96..4d9ee8d76db3 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -985,8 +985,6 @@ static void disable_nmi_singlestep(struct vcpu_svm *svm)
> */
> #define SVM_VM_DATA_HASH_BITS 8
> static DEFINE_HASHTABLE(svm_vm_data_hash, SVM_VM_DATA_HASH_BITS);
> -static u32 next_vm_id = 0;
> -static bool next_vm_id_wrapped = 0;
> static DEFINE_SPINLOCK(svm_vm_data_hash_lock);
>
> /* Note:
> @@ -1385,6 +1383,38 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
> return 0;
> }
>
> +static bool avic_vm_id_is_used(u32 vm_id)
> +{
> + struct kvm_arch *ka;
> +
> + hash_for_each_possible(svm_vm_data_hash, ka, hnode, vm_id)
> + if (ka->avic_vm_id == vm_id)
> + return true;
> +
> + return false;
> +}
> +
> +static u32 avic_get_next_vm_id(void)
> +{
> + static u32 next_vm_id = 0;
> + static bool next_vm_id_wrapped = false;
> + unsigned i;
> +
> + for (i = 0; i < AVIC_VM_ID_NR; i++) {
> + next_vm_id = (next_vm_id + 1) & AVIC_VM_ID_MASK;
> +
> + if (next_vm_id == 0) { /* id is 1-based, zero is not okay */
> + next_vm_id = 1;
> + next_vm_id_wrapped = true;
> + }
> +
> + if (!next_vm_id_wrapped || !avic_vm_id_is_used(next_vm_id))
> + return next_vm_id;
> + }
> +
> + return 0;
> +}
> +
> static void avic_vm_destroy(struct kvm *kvm)
> {
> unsigned long flags;
> @@ -1410,8 +1440,6 @@ static int avic_vm_init(struct kvm *kvm)
> struct kvm_arch *vm_data = &kvm->arch;
> struct page *p_page;
> struct page *l_page;
> - struct kvm_arch *ka;
> - u32 vm_id;
>
> if (!avic)
> return 0;
> @@ -1432,28 +1460,18 @@ static int avic_vm_init(struct kvm *kvm)
> vm_data->avic_logical_id_table_page = l_page;
> clear_page(page_address(l_page));
>
> + err = -EAGAIN;
> spin_lock_irqsave(&svm_vm_data_hash_lock, flags);
> - again:
> - vm_id = next_vm_id = (next_vm_id + 1) & AVIC_VM_ID_MASK;
> - if (vm_id == 0) { /* id is 1-based, zero is not okay */
> - next_vm_id_wrapped = 1;
> - goto again;
> - }
> - /* Is it still in use? Only possible if wrapped at least once */
> - if (next_vm_id_wrapped) {
> - hash_for_each_possible(svm_vm_data_hash, ka, hnode, vm_id) {
> - struct kvm *k2 = container_of(ka, struct kvm, arch);
> - struct kvm_arch *vd2 = &k2->arch;
> - if (vd2->avic_vm_id == vm_id)
> - goto again;
> - }
> - }
> - vm_data->avic_vm_id = vm_id;
> + vm_data->avic_vm_id = avic_get_next_vm_id();
> + if (!vm_data->avic_vm_id)
> + goto unlock;
> hash_add(svm_vm_data_hash, &vm_data->hnode, vm_data->avic_vm_id);
> spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags);
>
> return 0;
>
> +unlock:
> + spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags);
> free_avic:
> avic_vm_destroy(kvm);
> return err;
>

2017-08-18 15:22:35

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH] KVM: SVM: refactor avic VM ID allocation

On 08/17/2017 04:33 PM, Paolo Bonzini wrote:
> On 15/08/2017 22:12, Radim Krčmář wrote:
>> 2017-08-11 22:11+0200, Denys Vlasenko:
>>> With lightly tweaked defconfig:
>>>
>>> text data bss dec hex filename
>>> 11259661 5109408 2981888 19350957 12745ad vmlinux.before
>>> 11259661 5109408 884736 17253805 10745ad vmlinux.after
>>>
>>> Only compile-tested.
>>>
>>> Signed-off-by: Denys Vlasenko <[email protected]>
>>> Cc: Joerg Roedel <[email protected]>
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> ---
>>
>> Ah, I suspected my todo wasn't this short; thanks for the patch!
>>
>>> @@ -1468,6 +1433,22 @@ static int avic_vm_init(struct kvm *kvm)
>>> clear_page(page_address(l_page));
>>>
>>> spin_lock_irqsave(&svm_vm_data_hash_lock, flags);
>>> + again:
>>> + vm_id = next_vm_id = (next_vm_id + 1) & AVIC_VM_ID_MASK;
>>> + if (vm_id == 0) { /* id is 1-based, zero is not okay */
>>
>> Suravee, did the reserved zero mean something?
>>
>>> + next_vm_id_wrapped = 1;
>>> + goto again;
>>> + }
>>> + /* Is it still in use? Only possible if wrapped at least once */
>>> + if (next_vm_id_wrapped) {
>>> + hash_for_each_possible(svm_vm_data_hash, ka, hnode, vm_id) {
>>> + struct kvm *k2 = container_of(ka, struct kvm, arch);
>>> + struct kvm_arch *vd2 = &k2->arch;
>>> + if (vd2->avic_vm_id == vm_id)
>>> + goto again;
>>
>> Although hitting the case where all 2^24 ids are already used is
>> practically impossible, I don't like the loose end ...
>
> I think even the case where 2^16 ids are used deserves a medal. Why
> don't we just make the bitmap 8 KiB and call it a day? :)

Well, the RAM is cheap, but a 4-byte variable is still thousands of times
smaller than a 8K bitmap.

Since a 256 element hash table is used here, you need to have ~one hundred
VMs to start seeing (very small) degradation in speed of creation of new VMs
compared to bitmap approach, and that is only after 16777216 VMs
were created since reboot.

If you want to spend RAM on speeding this up, you can increase hash table size
instead. That would speed up avic_ga_log_notifier() too.

2017-08-18 16:05:39

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] KVM: SVM: refactor avic VM ID allocation

On 18/08/2017 17:22, Denys Vlasenko wrote:
> On 08/17/2017 04:33 PM, Paolo Bonzini wrote:
>> On 15/08/2017 22:12, Radim Krčmář wrote:
>>> 2017-08-11 22:11+0200, Denys Vlasenko:
>>>> With lightly tweaked defconfig:
>>>>
>>>> text data bss dec hex filename
>>>> 11259661 5109408 2981888 19350957 12745ad vmlinux.before
>>>> 11259661 5109408 884736 17253805 10745ad vmlinux.after
>>>>
>>>> Only compile-tested.
>>>>
>>>> Signed-off-by: Denys Vlasenko <[email protected]>
>>>> Cc: Joerg Roedel <[email protected]>
>>>> Cc: [email protected]
>>>> Cc: [email protected]
>>>> Cc: [email protected]
>>>> Cc: [email protected]
>>>> Cc: [email protected]
>>>> Cc: [email protected]
>>>> Cc: [email protected]
>>>> Cc: [email protected]
>>>> ---
>>>
>>> Ah, I suspected my todo wasn't this short; thanks for the patch!
>>>
>>>> @@ -1468,6 +1433,22 @@ static int avic_vm_init(struct kvm *kvm)
>>>> clear_page(page_address(l_page));
>>>>
>>>> spin_lock_irqsave(&svm_vm_data_hash_lock, flags);
>>>> + again:
>>>> + vm_id = next_vm_id = (next_vm_id + 1) & AVIC_VM_ID_MASK;
>>>> + if (vm_id == 0) { /* id is 1-based, zero is not okay */
>>>
>>> Suravee, did the reserved zero mean something?
>>>
>>>> + next_vm_id_wrapped = 1;
>>>> + goto again;
>>>> + }
>>>> + /* Is it still in use? Only possible if wrapped at least once */
>>>> + if (next_vm_id_wrapped) {
>>>> + hash_for_each_possible(svm_vm_data_hash, ka, hnode, vm_id) {
>>>> + struct kvm *k2 = container_of(ka, struct kvm, arch);
>>>> + struct kvm_arch *vd2 = &k2->arch;
>>>> + if (vd2->avic_vm_id == vm_id)
>>>> + goto again;
>>>
>>> Although hitting the case where all 2^24 ids are already used is
>>> practically impossible, I don't like the loose end ...
>>
>> I think even the case where 2^16 ids are used deserves a medal. Why
>> don't we just make the bitmap 8 KiB and call it a day? :)
>
> Well, the RAM is cheap, but a 4-byte variable is still thousands of times
> smaller than a 8K bitmap.
>
> Since a 256 element hash table is used here, you need to have ~one hundred
> VMs to start seeing (very small) degradation in speed of creation of new
> VMs compared to bitmap approach, and that is only after 16777216 VMs
> were created since reboot.

I guess that's fair since we already have the hash table for other uses.

Paolo

> If you want to spend RAM on speeding this up, you can increase hash
> table size
> instead. That would speed up avic_ga_log_notifier() too.