2013-03-04 18:05:21

by Raghavendra K T

[permalink] [raw]
Subject: [PATCH RFC 0/2] kvm: Better yield_to candidate using preemption notifiers

This patch series further filters better vcpu candidate to yield to
in PLE handler. The main idea is to record the preempted vcpus using
preempt notifiers and iterate only those preempted vcpus in the
handler. Note that the vcpus which were in spinloop during pause loop
exit are already filtered.

Thanks Jiannan, Avi for bringing the idea and Gleb, PeterZ for
precious suggestions during the discussion.
Thanks Srikar for suggesting to avoid rcu lock while checking task state
that has improved overcommit cases.

There are basically two approches for the implementation.

Method 1: Uses per vcpu preempt flag (this series).

Method 2: We keep a bitmap of preempted vcpus. using this we can easily
iterate over preempted vcpus.

Note that method 2 needs an extra index variable to identify/map bitmap to
vcpu and it also needs static vcpu allocation.

I am also posting Method 2 approach for reference in case it interests.

Result: decent improvement for kernbench and ebizzy.

base = 3.8.0 + undercommit patches
patched = base + preempt patches

Tested on 32 core (no HT) mx3850 machine with 32 vcpu guest 8GB RAM

--+-----------+-----------+-----------+------------+-----------+
kernbench (exec time in sec lower is beter)
--+-----------+-----------+-----------+------------+-----------+
base stdev patched stdev %improve
--+-----------+-----------+-----------+------------+-----------+
1x 47.0383 4.6977 44.2584 1.2899 5.90986
2x 96.0071 7.1873 91.2605 7.3567 4.94401
3x 164.0157 10.3613 156.6750 11.4267 4.47561
4x 212.5768 23.7326 204.4800 13.2908 3.80888
--+-----------+-----------+-----------+------------+-----------+
no ple kernbench 1x result for reference: 46.056133

--+-----------+-----------+-----------+------------+-----------+
ebizzy (record/sec higher is better)
--+-----------+-----------+-----------+------------+-----------+
base stdev patched stdev %improve
--+-----------+-----------+-----------+------------+-----------+
1x 5609.2000 56.9343 6263.7000 64.7097 11.66833
2x 2071.9000 108.4829 2653.5000 181.8395 28.07085
3x 1557.4167 109.7141 1993.5000 166.3176 28.00043
4x 1254.7500 91.2997 1765.5000 237.5410 40.70532
--+-----------+-----------+-----------+------------+-----------+
no ple ebizzy 1x result for reference : 7394.9 rec/sec

Please let me know if you have any suggestions and comments.

Raghavendra K T (2):
kvm: Record the preemption status of vcpus using preempt notifiers
kvm: Iterate over only vcpus that are preempted

----
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 7 +++++++
2 files changed, 8 insertions(+)

Reference patch for Method 2
---8<---
Use preempt bitmap and optimize vcpu iteration using preempt notifiers

From: Raghavendra K T <[email protected]>

Record the preempted vcpus in a bit map using preempt notifiers.
Add the logic of iterating over only preempted vcpus thus making
vcpu iteration fast.
Thanks Jiannan, Avi for initially proposing patch. Gleb, Peter for
precious suggestions.
Thanks srikar for suggesting to remove rcu lock while checking
task state that helped in reducing overcommit overhead

Not-yet-signed-off-by: Raghavendra K T <[email protected]>
---
include/linux/kvm_host.h | 7 +++++++
virt/kvm/kvm_main.c | 15 ++++++++++++---
2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index cad77fe..8c4a2409 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -252,6 +252,7 @@ struct kvm_vcpu {
bool dy_eligible;
} spin_loop;
#endif
+ int idx;
struct kvm_vcpu_arch arch;
};

@@ -385,6 +386,7 @@ struct kvm {
long mmu_notifier_count;
#endif
long tlbs_dirty;
+ DECLARE_BITMAP(preempt_bitmap, KVM_MAX_VCPUS);
};

#define kvm_err(fmt, ...) \
@@ -413,6 +415,11 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
(vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \
idx++)

+#define kvm_for_each_preempted_vcpu(idx, vcpup, kvm, n) \
+ for (idx = find_first_bit(kvm->preempt_bitmap, KVM_MAX_VCPUS); \
+ idx < n && (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \
+ idx = find_next_bit(kvm->preempt_bitmap, KVM_MAX_VCPUS, idx+1))
+
#define kvm_for_each_memslot(memslot, slots) \
for (memslot = &slots->memslots[0]; \
memslot < slots->memslots + KVM_MEM_SLOTS_NUM && memslot->npages;\
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index adc68fe..1db16b3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1770,10 +1770,12 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
struct kvm_vcpu *vcpu;
int last_boosted_vcpu = me->kvm->last_boosted_vcpu;
int yielded = 0;
+ int num_vcpus;
int try = 3;
int pass;
int i;
-
+
+ num_vcpus = atomic_read(&kvm->online_vcpus);
kvm_vcpu_set_in_spin_loop(me, true);
/*
* We boost the priority of a VCPU that is runnable but not
@@ -1783,7 +1785,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
* We approximate round-robin by starting at the last boosted VCPU.
*/
for (pass = 0; pass < 2 && !yielded && try; pass++) {
- kvm_for_each_vcpu(i, vcpu, kvm) {
+ kvm_for_each_preempted_vcpu(i, vcpu, kvm, num_vcpus) {
if (!pass && i <= last_boosted_vcpu) {
i = last_boosted_vcpu;
continue;
@@ -1878,6 +1880,7 @@ static int create_vcpu_fd(struct kvm_vcpu *vcpu)
static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
{
int r;
+ int curr_idx;
struct kvm_vcpu *vcpu, *v;

vcpu = kvm_arch_vcpu_create(kvm, id);
@@ -1916,7 +1919,9 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
goto unlock_vcpu_destroy;
}

- kvm->vcpus[atomic_read(&kvm->online_vcpus)] = vcpu;
+ curr_idx = atomic_read(&kvm->online_vcpus);
+ kvm->vcpus[curr_idx] = vcpu;
+ vcpu->idx = curr_idx;
smp_wmb();
atomic_inc(&kvm->online_vcpus);

@@ -2902,6 +2907,7 @@ struct kvm_vcpu *preempt_notifier_to_vcpu(struct preempt_notifier *pn)
static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
{
struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
+ clear_bit(vcpu->idx, vcpu->kvm->preempt_bitmap);

kvm_arch_vcpu_load(vcpu, cpu);
}
@@ -2911,6 +2917,9 @@ static void kvm_sched_out(struct preempt_notifier *pn,
{
struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);

+ if (current->state == TASK_RUNNING)
+ set_bit(vcpu->idx, vcpu->kvm->preempt_bitmap);
+
kvm_arch_vcpu_put(vcpu);
}


2013-03-04 18:05:38

by Raghavendra K T

[permalink] [raw]
Subject: [PATCH RFC 1/2] kvm: Record the preemption status of vcpus using preempt notifiers

From: Raghavendra K T <[email protected]>

Note that we mark as preempted only when vcpu's task state was
Running during preemption.

Thanks Jiannan, Avi for preemption notifier ideas. Thanks Gleb, PeterZ
for their precious suggestions. Thanks Srikar for an idea on avoiding
rcu lock while checking task state that improved overcommit numbers.

Signed-off-by: Raghavendra K T <[email protected]>
---
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 5 +++++
2 files changed, 6 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index cad77fe..0b31e1c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -252,6 +252,7 @@ struct kvm_vcpu {
bool dy_eligible;
} spin_loop;
#endif
+ bool preempted;
struct kvm_vcpu_arch arch;
};

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index adc68fe..83a804c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -244,6 +244,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)

kvm_vcpu_set_in_spin_loop(vcpu, false);
kvm_vcpu_set_dy_eligible(vcpu, false);
+ vcpu->preempted = false;

r = kvm_arch_vcpu_init(vcpu);
if (r < 0)
@@ -2902,6 +2903,8 @@ struct kvm_vcpu *preempt_notifier_to_vcpu(struct preempt_notifier *pn)
static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
{
struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
+ if (vcpu->preempted)
+ vcpu->preempted = false;

kvm_arch_vcpu_load(vcpu, cpu);
}
@@ -2911,6 +2914,8 @@ static void kvm_sched_out(struct preempt_notifier *pn,
{
struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);

+ if (current->state == TASK_RUNNING)
+ vcpu->preempted = true;
kvm_arch_vcpu_put(vcpu);
}

2013-03-04 18:05:58

by Raghavendra K T

[permalink] [raw]
Subject: [PATCH RFC 2/2] kvm: Iterate over only vcpus that are preempted

From: Raghavendra K T <[email protected]>

This helps in filtering out the eligible candidates further and
thus potentially helps in quickly allowing preempted lockholders to run.
Note that if a vcpu was spinning during preemption we filter them
by checking whether they are preempted due to pause loop exit.

Signed-off-by: Raghavendra K T <[email protected]>
---
virt/kvm/kvm_main.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 83a804c..60114e1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1790,6 +1790,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
continue;
} else if (pass && i > last_boosted_vcpu)
break;
+ if (!ACCESS_ONCE(vcpu->preempted))
+ continue;
if (vcpu == me)
continue;
if (waitqueue_active(&vcpu->wq))

2013-03-05 09:55:10

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH RFC 0/2] kvm: Better yield_to candidate using preemption notifiers

On Mon, Mar 04, 2013 at 11:31:46PM +0530, Raghavendra K T wrote:
> This patch series further filters better vcpu candidate to yield to
> in PLE handler. The main idea is to record the preempted vcpus using
> preempt notifiers and iterate only those preempted vcpus in the
> handler. Note that the vcpus which were in spinloop during pause loop
> exit are already filtered.

The %improvement and patch series look good.

>
> Thanks Jiannan, Avi for bringing the idea and Gleb, PeterZ for
> precious suggestions during the discussion.
> Thanks Srikar for suggesting to avoid rcu lock while checking task state
> that has improved overcommit cases.
>
> There are basically two approches for the implementation.
>
> Method 1: Uses per vcpu preempt flag (this series).
>
> Method 2: We keep a bitmap of preempted vcpus. using this we can easily
> iterate over preempted vcpus.
>
> Note that method 2 needs an extra index variable to identify/map bitmap to
> vcpu and it also needs static vcpu allocation.

We definitely don't want something that requires static vcpu allocation.
I think it'd be better to add another counter for the vcpu bit assignment.

>
> I am also posting Method 2 approach for reference in case it interests.

I guess the interest in Method2 would come from perf numbers. Did you try
comparing Method1 vs. Method2?

>
> Result: decent improvement for kernbench and ebizzy.
>
> base = 3.8.0 + undercommit patches
> patched = base + preempt patches
>
> Tested on 32 core (no HT) mx3850 machine with 32 vcpu guest 8GB RAM
>
> --+-----------+-----------+-----------+------------+-----------+
> kernbench (exec time in sec lower is beter)
> --+-----------+-----------+-----------+------------+-----------+
> base stdev patched stdev %improve
> --+-----------+-----------+-----------+------------+-----------+
> 1x 47.0383 4.6977 44.2584 1.2899 5.90986
> 2x 96.0071 7.1873 91.2605 7.3567 4.94401
> 3x 164.0157 10.3613 156.6750 11.4267 4.47561
> 4x 212.5768 23.7326 204.4800 13.2908 3.80888
> --+-----------+-----------+-----------+------------+-----------+
> no ple kernbench 1x result for reference: 46.056133
>
> --+-----------+-----------+-----------+------------+-----------+
> ebizzy (record/sec higher is better)
> --+-----------+-----------+-----------+------------+-----------+
> base stdev patched stdev %improve
> --+-----------+-----------+-----------+------------+-----------+
> 1x 5609.2000 56.9343 6263.7000 64.7097 11.66833
> 2x 2071.9000 108.4829 2653.5000 181.8395 28.07085
> 3x 1557.4167 109.7141 1993.5000 166.3176 28.00043
> 4x 1254.7500 91.2997 1765.5000 237.5410 40.70532
> --+-----------+-----------+-----------+------------+-----------+
> no ple ebizzy 1x result for reference : 7394.9 rec/sec
>
> Please let me know if you have any suggestions and comments.
>
> Raghavendra K T (2):
> kvm: Record the preemption status of vcpus using preempt notifiers
> kvm: Iterate over only vcpus that are preempted
>
> ----
> include/linux/kvm_host.h | 1 +
> virt/kvm/kvm_main.c | 7 +++++++
> 2 files changed, 8 insertions(+)
>
> Reference patch for Method 2
> ---8<---
> Use preempt bitmap and optimize vcpu iteration using preempt notifiers
>
> From: Raghavendra K T <[email protected]>
>
> Record the preempted vcpus in a bit map using preempt notifiers.
> Add the logic of iterating over only preempted vcpus thus making
> vcpu iteration fast.
> Thanks Jiannan, Avi for initially proposing patch. Gleb, Peter for
> precious suggestions.
> Thanks srikar for suggesting to remove rcu lock while checking
> task state that helped in reducing overcommit overhead
>
> Not-yet-signed-off-by: Raghavendra K T <[email protected]>
> ---
> include/linux/kvm_host.h | 7 +++++++
> virt/kvm/kvm_main.c | 15 ++++++++++++---
> 2 files changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index cad77fe..8c4a2409 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -252,6 +252,7 @@ struct kvm_vcpu {
> bool dy_eligible;
> } spin_loop;
> #endif
> + int idx;
> struct kvm_vcpu_arch arch;
> };
>
> @@ -385,6 +386,7 @@ struct kvm {
> long mmu_notifier_count;
> #endif
> long tlbs_dirty;
> + DECLARE_BITMAP(preempt_bitmap, KVM_MAX_VCPUS);
> };
>
> #define kvm_err(fmt, ...) \
> @@ -413,6 +415,11 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
> (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \
> idx++)
>
> +#define kvm_for_each_preempted_vcpu(idx, vcpup, kvm, n) \
> + for (idx = find_first_bit(kvm->preempt_bitmap, KVM_MAX_VCPUS); \
> + idx < n && (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \
> + idx = find_next_bit(kvm->preempt_bitmap, KVM_MAX_VCPUS, idx+1))
> +
> #define kvm_for_each_memslot(memslot, slots) \
> for (memslot = &slots->memslots[0]; \
> memslot < slots->memslots + KVM_MEM_SLOTS_NUM && memslot->npages;\
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index adc68fe..1db16b3 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1770,10 +1770,12 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
> struct kvm_vcpu *vcpu;
> int last_boosted_vcpu = me->kvm->last_boosted_vcpu;
> int yielded = 0;
> + int num_vcpus;
> int try = 3;
> int pass;
> int i;
> -
> +
> + num_vcpus = atomic_read(&kvm->online_vcpus);
> kvm_vcpu_set_in_spin_loop(me, true);
> /*
> * We boost the priority of a VCPU that is runnable but not
> @@ -1783,7 +1785,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
> * We approximate round-robin by starting at the last boosted VCPU.
> */
> for (pass = 0; pass < 2 && !yielded && try; pass++) {
> - kvm_for_each_vcpu(i, vcpu, kvm) {
> + kvm_for_each_preempted_vcpu(i, vcpu, kvm, num_vcpus) {
> if (!pass && i <= last_boosted_vcpu) {
> i = last_boosted_vcpu;
> continue;
> @@ -1878,6 +1880,7 @@ static int create_vcpu_fd(struct kvm_vcpu *vcpu)
> static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
> {
> int r;
> + int curr_idx;
> struct kvm_vcpu *vcpu, *v;
>
> vcpu = kvm_arch_vcpu_create(kvm, id);
> @@ -1916,7 +1919,9 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
> goto unlock_vcpu_destroy;
> }
>
> - kvm->vcpus[atomic_read(&kvm->online_vcpus)] = vcpu;
> + curr_idx = atomic_read(&kvm->online_vcpus);
> + kvm->vcpus[curr_idx] = vcpu;
> + vcpu->idx = curr_idx;
> smp_wmb();
> atomic_inc(&kvm->online_vcpus);
>
> @@ -2902,6 +2907,7 @@ struct kvm_vcpu *preempt_notifier_to_vcpu(struct preempt_notifier *pn)
> static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
> {
> struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
> + clear_bit(vcpu->idx, vcpu->kvm->preempt_bitmap);
>
> kvm_arch_vcpu_load(vcpu, cpu);
> }
> @@ -2911,6 +2917,9 @@ static void kvm_sched_out(struct preempt_notifier *pn,
> {
> struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
>
> + if (current->state == TASK_RUNNING)
> + set_bit(vcpu->idx, vcpu->kvm->preempt_bitmap);
> +
> kvm_arch_vcpu_put(vcpu);
> }
>
>

2013-03-05 12:27:46

by Raghavendra K T

[permalink] [raw]
Subject: Re: [PATCH RFC 0/2] kvm: Better yield_to candidate using preemption notifiers

On 03/05/2013 03:23 PM, Andrew Jones wrote:
> On Mon, Mar 04, 2013 at 11:31:46PM +0530, Raghavendra K T wrote:
>> This patch series further filters better vcpu candidate to yield to
>> in PLE handler. The main idea is to record the preempted vcpus using
>> preempt notifiers and iterate only those preempted vcpus in the
>> handler. Note that the vcpus which were in spinloop during pause loop
>> exit are already filtered.
>
> The %improvement and patch series look good.
>

Thank you for the review.

>>
>> Thanks Jiannan, Avi for bringing the idea and Gleb, PeterZ for
>> precious suggestions during the discussion.
>> Thanks Srikar for suggesting to avoid rcu lock while checking task state
>> that has improved overcommit cases.
>>
>> There are basically two approches for the implementation.
>>
>> Method 1: Uses per vcpu preempt flag (this series).
>>
>> Method 2: We keep a bitmap of preempted vcpus. using this we can easily
>> iterate over preempted vcpus.
>>
>> Note that method 2 needs an extra index variable to identify/map bitmap to
>> vcpu and it also needs static vcpu allocation.
>
> We definitely don't want something that requires static vcpu allocation.
> I think it'd be better to add another counter for the vcpu bit assignment.
>

So do you mean some thing parallel to online_vcpus?

>>
>> I am also posting Method 2 approach for reference in case it interests.
>
> I guess the interest in Method2 would come from perf numbers. Did you try
> comparing Method1 vs. Method2?
>

Yes I did. Performance wise method2 is almost equal to method1. But I
believe if there is any difference it may show when we have large vcpu
guest. (Currently I have only 32 core host).

2013-03-05 12:40:54

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH RFC 0/2] kvm: Better yield_to candidate using preemption notifiers

On Tue, Mar 05, 2013 at 05:54:09PM +0530, Raghavendra K T wrote:
> On 03/05/2013 03:23 PM, Andrew Jones wrote:
> >On Mon, Mar 04, 2013 at 11:31:46PM +0530, Raghavendra K T wrote:
> >> This patch series further filters better vcpu candidate to yield to
> >>in PLE handler. The main idea is to record the preempted vcpus using
> >>preempt notifiers and iterate only those preempted vcpus in the
> >>handler. Note that the vcpus which were in spinloop during pause loop
> >>exit are already filtered.
> >
> >The %improvement and patch series look good.
> >
>
> Thank you for the review.
>
> >>
> >>Thanks Jiannan, Avi for bringing the idea and Gleb, PeterZ for
> >>precious suggestions during the discussion.
> >>Thanks Srikar for suggesting to avoid rcu lock while checking task state
> >>that has improved overcommit cases.
> >>
> >>There are basically two approches for the implementation.
> >>
> >>Method 1: Uses per vcpu preempt flag (this series).
> >>
> >>Method 2: We keep a bitmap of preempted vcpus. using this we can easily
> >>iterate over preempted vcpus.
> >>
> >>Note that method 2 needs an extra index variable to identify/map bitmap to
> >>vcpu and it also needs static vcpu allocation.
> >
> >We definitely don't want something that requires static vcpu allocation.
> >I think it'd be better to add another counter for the vcpu bit assignment.
> >
>
> So do you mean some thing parallel to online_vcpus?

Yes, one that only grows. However, then, if a vcpu is unplugged, its bit
would have to be skipped over.

>
> >>
> >>I am also posting Method 2 approach for reference in case it interests.
> >
> >I guess the interest in Method2 would come from perf numbers. Did you try
> >comparing Method1 vs. Method2?
> >
>
> Yes I did. Performance wise method2 is almost equal to method1. But I
> believe if there is any difference it may show when we have large vcpu
> guest. (Currently I have only 32 core host).
>

OK, probably not worth it at this point then.

thanks,
drew

2013-03-05 15:19:49

by Vinod, Chegu

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] kvm: Record the preemption status of vcpus using preempt notifiers

On 3/4/2013 10:02 AM, Raghavendra K T wrote:
> From: Raghavendra K T <[email protected]>
>
> Note that we mark as preempted only when vcpu's task state was
> Running during preemption.
>
> Thanks Jiannan, Avi for preemption notifier ideas. Thanks Gleb, PeterZ
> for their precious suggestions. Thanks Srikar for an idea on avoiding
> rcu lock while checking task state that improved overcommit numbers.
>
> Signed-off-by: Raghavendra K T <[email protected]>
> ---
> include/linux/kvm_host.h | 1 +
> virt/kvm/kvm_main.c | 5 +++++
> 2 files changed, 6 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index cad77fe..0b31e1c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -252,6 +252,7 @@ struct kvm_vcpu {
> bool dy_eligible;
> } spin_loop;
> #endif
> + bool preempted;
> struct kvm_vcpu_arch arch;
> };
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index adc68fe..83a804c 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -244,6 +244,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
>
> kvm_vcpu_set_in_spin_loop(vcpu, false);
> kvm_vcpu_set_dy_eligible(vcpu, false);
> + vcpu->preempted = false;
>
> r = kvm_arch_vcpu_init(vcpu);
> if (r < 0)
> @@ -2902,6 +2903,8 @@ struct kvm_vcpu *preempt_notifier_to_vcpu(struct preempt_notifier *pn)
> static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
> {
> struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
> + if (vcpu->preempted)
> + vcpu->preempted = false;
>
> kvm_arch_vcpu_load(vcpu, cpu);
> }
> @@ -2911,6 +2914,8 @@ static void kvm_sched_out(struct preempt_notifier *pn,
> {
> struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
>
> + if (current->state == TASK_RUNNING)
> + vcpu->preempted = true;
> kvm_arch_vcpu_put(vcpu);
> }
>
>
> .
>
Reviewed-by: Chegu Vinod <[email protected]>

2013-03-05 15:20:42

by Vinod, Chegu

[permalink] [raw]
Subject: Re: [PATCH RFC 2/2] kvm: Iterate over only vcpus that are preempted

On 3/4/2013 10:02 AM, Raghavendra K T wrote:
> From: Raghavendra K T <[email protected]>
>
> This helps in filtering out the eligible candidates further and
> thus potentially helps in quickly allowing preempted lockholders to run.
> Note that if a vcpu was spinning during preemption we filter them
> by checking whether they are preempted due to pause loop exit.
>
> Signed-off-by: Raghavendra K T <[email protected]>
> ---
> virt/kvm/kvm_main.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 83a804c..60114e1 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1790,6 +1790,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
> continue;
> } else if (pass && i > last_boosted_vcpu)
> break;
> + if (!ACCESS_ONCE(vcpu->preempted))
> + continue;
> if (vcpu == me)
> continue;
> if (waitqueue_active(&vcpu->wq))
>
> .
>
Reviewed-by: Chegu Vinod <[email protected]>

2013-03-07 09:23:01

by Raghavendra K T

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] kvm: Record the preemption status of vcpus using preempt notifiers

On 03/05/2013 08:49 PM, Chegu Vinod wrote:
> On 3/4/2013 10:02 AM, Raghavendra K T wrote:
>> From: Raghavendra K T <[email protected]>
>>
>> Note that we mark as preempted only when vcpu's task state was
>> Running during preemption.
>>
>> Thanks Jiannan, Avi for preemption notifier ideas. Thanks Gleb, PeterZ
>> for their precious suggestions. Thanks Srikar for an idea on avoiding
>> rcu lock while checking task state that improved overcommit numbers.
>>
>> Signed-off-by: Raghavendra K T <[email protected]>
>> ---
>> include/linux/kvm_host.h | 1 +
>> virt/kvm/kvm_main.c | 5 +++++
>> 2 files changed, 6 insertions(+)
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index cad77fe..0b31e1c 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -252,6 +252,7 @@ struct kvm_vcpu {
>> bool dy_eligible;
>> } spin_loop;
>> #endif
>> + bool preempted;
>> struct kvm_vcpu_arch arch;
>> };
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index adc68fe..83a804c 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -244,6 +244,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct
>> kvm *kvm, unsigned id)
>> kvm_vcpu_set_in_spin_loop(vcpu, false);
>> kvm_vcpu_set_dy_eligible(vcpu, false);
>> + vcpu->preempted = false;
>> r = kvm_arch_vcpu_init(vcpu);
>> if (r < 0)
>> @@ -2902,6 +2903,8 @@ struct kvm_vcpu *preempt_notifier_to_vcpu(struct
>> preempt_notifier *pn)
>> static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
>> {
>> struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
>> + if (vcpu->preempted)
>> + vcpu->preempted = false;
>> kvm_arch_vcpu_load(vcpu, cpu);
>> }
>> @@ -2911,6 +2914,8 @@ static void kvm_sched_out(struct
>> preempt_notifier *pn,
>> {
>> struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
>> + if (current->state == TASK_RUNNING)
>> + vcpu->preempted = true;
>> kvm_arch_vcpu_put(vcpu);
>> }
>>
>> .
>>
> Reviewed-by: Chegu Vinod <[email protected]>

Thank you Vinod.

Gleb, Marcelo,
any comment, concern on the patches?

2013-03-07 19:11:32

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH RFC 0/2] kvm: Better yield_to candidate using preemption notifiers

On Mon, Mar 04, 2013 at 11:31:46PM +0530, Raghavendra K T wrote:
> This patch series further filters better vcpu candidate to yield to
> in PLE handler. The main idea is to record the preempted vcpus using
> preempt notifiers and iterate only those preempted vcpus in the
> handler. Note that the vcpus which were in spinloop during pause loop
> exit are already filtered.
>
> Thanks Jiannan, Avi for bringing the idea and Gleb, PeterZ for
> precious suggestions during the discussion.
> Thanks Srikar for suggesting to avoid rcu lock while checking task state
> that has improved overcommit cases.
>
> There are basically two approches for the implementation.
>
> Method 1: Uses per vcpu preempt flag (this series).
>
> Method 2: We keep a bitmap of preempted vcpus. using this we can easily
> iterate over preempted vcpus.
>
> Note that method 2 needs an extra index variable to identify/map bitmap to
> vcpu and it also needs static vcpu allocation.
>
> I am also posting Method 2 approach for reference in case it interests.
>
> Result: decent improvement for kernbench and ebizzy.
>
> base = 3.8.0 + undercommit patches
> patched = base + preempt patches
>
> Tested on 32 core (no HT) mx3850 machine with 32 vcpu guest 8GB RAM
>
> --+-----------+-----------+-----------+------------+-----------+
> kernbench (exec time in sec lower is beter)
> --+-----------+-----------+-----------+------------+-----------+
> base stdev patched stdev %improve
> --+-----------+-----------+-----------+------------+-----------+
> 1x 47.0383 4.6977 44.2584 1.2899 5.90986
> 2x 96.0071 7.1873 91.2605 7.3567 4.94401
> 3x 164.0157 10.3613 156.6750 11.4267 4.47561
> 4x 212.5768 23.7326 204.4800 13.2908 3.80888
> --+-----------+-----------+-----------+------------+-----------+
> no ple kernbench 1x result for reference: 46.056133
>
> --+-----------+-----------+-----------+------------+-----------+
> ebizzy (record/sec higher is better)
> --+-----------+-----------+-----------+------------+-----------+
> base stdev patched stdev %improve
> --+-----------+-----------+-----------+------------+-----------+
> 1x 5609.2000 56.9343 6263.7000 64.7097 11.66833
> 2x 2071.9000 108.4829 2653.5000 181.8395 28.07085
> 3x 1557.4167 109.7141 1993.5000 166.3176 28.00043
> 4x 1254.7500 91.2997 1765.5000 237.5410 40.70532
> --+-----------+-----------+-----------+------------+-----------+
> no ple ebizzy 1x result for reference : 7394.9 rec/sec
>
> Please let me know if you have any suggestions and comments.
>
> Raghavendra K T (2):
> kvm: Record the preemption status of vcpus using preempt notifiers
> kvm: Iterate over only vcpus that are preempted

Reviewed-by: Marcelo Tosatti <[email protected]>

2013-03-08 07:16:40

by Raghavendra K T

[permalink] [raw]
Subject: Re: [PATCH RFC 0/2] kvm: Better yield_to candidate using preemption notifiers

On 03/08/2013 12:40 AM, Marcelo Tosatti wrote:
> On Mon, Mar 04, 2013 at 11:31:46PM +0530, Raghavendra K T wrote:
>> This patch series further filters better vcpu candidate to yield to
>> in PLE handler. The main idea is to record the preempted vcpus using
>> preempt notifiers and iterate only those preempted vcpus in the
>> handler. Note that the vcpus which were in spinloop during pause loop
>> exit are already filtered.
>>
>> Thanks Jiannan, Avi for bringing the idea and Gleb, PeterZ for
>> precious suggestions during the discussion.
>> Thanks Srikar for suggesting to avoid rcu lock while checking task state
>> that has improved overcommit cases.
>>
>> There are basically two approches for the implementation.
>>
>> Method 1: Uses per vcpu preempt flag (this series).
>>
>> Method 2: We keep a bitmap of preempted vcpus. using this we can easily
>> iterate over preempted vcpus.
>>
>> Note that method 2 needs an extra index variable to identify/map bitmap to
>> vcpu and it also needs static vcpu allocation.
>>
>> I am also posting Method 2 approach for reference in case it interests.
>>
>> Result: decent improvement for kernbench and ebizzy.
>>
>> base = 3.8.0 + undercommit patches
>> patched = base + preempt patches
>>
>> Tested on 32 core (no HT) mx3850 machine with 32 vcpu guest 8GB RAM
>>
>> --+-----------+-----------+-----------+------------+-----------+
>> kernbench (exec time in sec lower is beter)
>> --+-----------+-----------+-----------+------------+-----------+
>> base stdev patched stdev %improve
>> --+-----------+-----------+-----------+------------+-----------+
>> 1x 47.0383 4.6977 44.2584 1.2899 5.90986
>> 2x 96.0071 7.1873 91.2605 7.3567 4.94401
>> 3x 164.0157 10.3613 156.6750 11.4267 4.47561
>> 4x 212.5768 23.7326 204.4800 13.2908 3.80888
>> --+-----------+-----------+-----------+------------+-----------+
>> no ple kernbench 1x result for reference: 46.056133
>>
>> --+-----------+-----------+-----------+------------+-----------+
>> ebizzy (record/sec higher is better)
>> --+-----------+-----------+-----------+------------+-----------+
>> base stdev patched stdev %improve
>> --+-----------+-----------+-----------+------------+-----------+
>> 1x 5609.2000 56.9343 6263.7000 64.7097 11.66833
>> 2x 2071.9000 108.4829 2653.5000 181.8395 28.07085
>> 3x 1557.4167 109.7141 1993.5000 166.3176 28.00043
>> 4x 1254.7500 91.2997 1765.5000 237.5410 40.70532
>> --+-----------+-----------+-----------+------------+-----------+
>> no ple ebizzy 1x result for reference : 7394.9 rec/sec
>>
>> Please let me know if you have any suggestions and comments.
>>
>> Raghavendra K T (2):
>> kvm: Record the preemption status of vcpus using preempt notifiers
>> kvm: Iterate over only vcpus that are preempted
>
> Reviewed-by: Marcelo Tosatti <[email protected]>
>

Thank you Marcelo.

2013-03-11 09:38:26

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH RFC 0/2] kvm: Better yield_to candidate using preemption notifiers

On Mon, Mar 04, 2013 at 11:31:46PM +0530, Raghavendra K T wrote:
> This patch series further filters better vcpu candidate to yield to
> in PLE handler. The main idea is to record the preempted vcpus using
> preempt notifiers and iterate only those preempted vcpus in the
> handler. Note that the vcpus which were in spinloop during pause loop
> exit are already filtered.
>
> Thanks Jiannan, Avi for bringing the idea and Gleb, PeterZ for
> precious suggestions during the discussion.
> Thanks Srikar for suggesting to avoid rcu lock while checking task state
> that has improved overcommit cases.
>
> There are basically two approches for the implementation.
>
> Method 1: Uses per vcpu preempt flag (this series).
>
> Method 2: We keep a bitmap of preempted vcpus. using this we can easily
> iterate over preempted vcpus.
>
> Note that method 2 needs an extra index variable to identify/map bitmap to
> vcpu and it also needs static vcpu allocation.
>
> I am also posting Method 2 approach for reference in case it interests.
>
> Result: decent improvement for kernbench and ebizzy.
>
> base = 3.8.0 + undercommit patches
> patched = base + preempt patches
>
> Tested on 32 core (no HT) mx3850 machine with 32 vcpu guest 8GB RAM
>
> --+-----------+-----------+-----------+------------+-----------+
> kernbench (exec time in sec lower is beter)
> --+-----------+-----------+-----------+------------+-----------+
> base stdev patched stdev %improve
> --+-----------+-----------+-----------+------------+-----------+
> 1x 47.0383 4.6977 44.2584 1.2899 5.90986
> 2x 96.0071 7.1873 91.2605 7.3567 4.94401
> 3x 164.0157 10.3613 156.6750 11.4267 4.47561
> 4x 212.5768 23.7326 204.4800 13.2908 3.80888
> --+-----------+-----------+-----------+------------+-----------+
> no ple kernbench 1x result for reference: 46.056133
>
> --+-----------+-----------+-----------+------------+-----------+
> ebizzy (record/sec higher is better)
> --+-----------+-----------+-----------+------------+-----------+
> base stdev patched stdev %improve
> --+-----------+-----------+-----------+------------+-----------+
> 1x 5609.2000 56.9343 6263.7000 64.7097 11.66833
> 2x 2071.9000 108.4829 2653.5000 181.8395 28.07085
> 3x 1557.4167 109.7141 1993.5000 166.3176 28.00043
> 4x 1254.7500 91.2997 1765.5000 237.5410 40.70532
> --+-----------+-----------+-----------+------------+-----------+
> no ple ebizzy 1x result for reference : 7394.9 rec/sec
>
> Please let me know if you have any suggestions and comments.
>
> Raghavendra K T (2):
> kvm: Record the preemption status of vcpus using preempt notifiers
> kvm: Iterate over only vcpus that are preempted
>
Applied, thanks!

--
Gleb.