2018-06-11 07:39:42

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH 1/2] KVM: Fix lock holder candidate yield

From: Wanpeng Li <[email protected]>

After detecting pause loop which is executed by a Lock Waiter in the
guest, the pCPU will be yielded to a Lock Holder candidate, the Lock
Holder candidate may have its own task affinity constrain, however,
current yield logic yield to the Lock Holder condidate unconditionally
w/o checking the affinity constrain and set the task to the next buddy
of cfs, this will break the scheduler. This patch fixes it by skipping
the candidate vCPU if the current pCPU doesn't meat the affinity constrain.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
virt/kvm/kvm_main.c | 29 +++++++++++++++++++++++++++--
1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index aa7da1d8e..ccf8907 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2239,17 +2239,40 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
EXPORT_SYMBOL_GPL(kvm_vcpu_kick);
#endif /* !CONFIG_S390 */

-int kvm_vcpu_yield_to(struct kvm_vcpu *target)
+struct task_struct *vcpu_to_task(struct kvm_vcpu *target)
{
struct pid *pid;
struct task_struct *task = NULL;
- int ret = 0;

rcu_read_lock();
pid = rcu_dereference(target->pid);
if (pid)
task = get_pid_task(pid, PIDTYPE_PID);
rcu_read_unlock();
+ return task;
+}
+
+bool kvm_vcpu_allow_yield(struct kvm_vcpu *target)
+{
+ struct task_struct *task = NULL;
+ bool ret = false;
+
+ task = vcpu_to_task(target);
+ if (!task)
+ return ret;
+ if (cpumask_test_cpu(raw_smp_processor_id(), &task->cpus_allowed))
+ ret = true;
+ put_task_struct(task);
+
+ return ret;
+}
+
+int kvm_vcpu_yield_to(struct kvm_vcpu *target)
+{
+ struct task_struct *task = NULL;
+ int ret = 0;
+
+ task = vcpu_to_task(target);
if (!task)
return ret;
ret = yield_to(task, 1);
@@ -2333,6 +2356,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
continue;
if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
continue;
+ if (!kvm_vcpu_allow_yield(vcpu))
+ continue;

yielded = kvm_vcpu_yield_to(vcpu);
if (yielded > 0) {
--
2.7.4



2018-06-11 07:41:17

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH 2/2] sched/core: Consider afffinity constrain when yield to a task

From: Wanpeng Li <[email protected]>

Consider the task afffinity constrain when yield to a task.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
kernel/sched/core.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 092f7c4..11001ff 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5102,6 +5102,9 @@ int __sched yield_to(struct task_struct *p, bool preempt)
if (task_running(p_rq, p) || p->state)
goto out_unlock;

+ if (!cpumask_test_cpu(smp_processor_id(), &p->cpus_allowed))
+ goto out_unlock;
+
yielded = curr->sched_class->yield_to_task(rq, p, preempt);
if (yielded) {
schedstat_inc(rq->yld_count);
--
2.7.4


2018-06-11 08:40:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched/core: Consider afffinity constrain when yield to a task

On Mon, Jun 11, 2018 at 03:38:50PM +0800, Wanpeng Li wrote:
> From: Wanpeng Li <[email protected]>
>
> Consider the task afffinity constrain when yield to a task.
>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Signed-off-by: Wanpeng Li <[email protected]>
> ---
> kernel/sched/core.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 092f7c4..11001ff 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5102,6 +5102,9 @@ int __sched yield_to(struct task_struct *p, bool preempt)
> if (task_running(p_rq, p) || p->state)
> goto out_unlock;
>
> + if (!cpumask_test_cpu(smp_processor_id(), &p->cpus_allowed))
> + goto out_unlock;
> +
> yielded = curr->sched_class->yield_to_task(rq, p, preempt);
> if (yielded) {
> schedstat_inc(rq->yld_count);

I'm confused... why?

So yield_to(@p) is documented as yielding @curr and getting @p to run
'sooner'. If they're on the same rq, yay, that means we'll likely switch
from @curr to @p, however if they're not on the same rq, it should still
work, except we'll reschedule 2 CPUs.

Look at the code, yield_to() will lock 2 rqs: rq and p_rq.

First if verifies p_rq == task_rq(p) (p could've been migrated while we
were waiting to acquire the lock) if this is not so, we unlock and try
again.

Then we check trivial things like actually having a ->yield_to and @curr
and @p being of the same class.

Then we check if @p is in fact already running or not runnable at all,
if either, obviously nothing to do.

So now, we have rq and p_rq locked, they need not be the same and we'll
call ->yield_to(), on success we'll force reschedule p_rq, unlock the
rqs and reschedule ourself.

So far, nothing requires @p to be runnable on the current CPU.

So let us look at yield_to_task_fair() the only yield_to implementation:

That again checks if @p is in fact runnable, if not, nothing to do.

Then it calls set_next_buddy(&p->se), which will mark @p more likely to
get picked on its rq (p_rq) _not_ our rq. Note how set_next_buddy() uses
cfs_rq_of().

Then it yields @curr on the current cpu.


Again, nothing cares if @curr and @p are on the same CPU and if @curr is
allowed to run on the current CPU -- there are no migrations.


So.. why?!




2018-06-11 09:04:15

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched/core: Consider afffinity constrain when yield to a task

On Mon, 11 Jun 2018 at 16:39, Peter Zijlstra <[email protected]> wrote:
[.../...]
>
> Then it calls set_next_buddy(&p->se), which will mark @p more likely to
> get picked on its rq (p_rq) _not_ our rq. Note how set_next_buddy() uses
> cfs_rq_of().

I miss it, thanks for the great explanation. I will drop the two patches.

Regards,
Wanpeng Li