2017-11-16 09:48:38

by Quan Xu

[permalink] [raw]
Subject: Re: [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path



On 2017-11-16 06:03, Thomas Gleixner wrote:
> On Wed, 15 Nov 2017, Peter Zijlstra wrote:
>
>> On Mon, Nov 13, 2017 at 06:06:02PM +0800, Quan Xu wrote:
>>> From: Yang Zhang <[email protected]>
>>>
>>> Implement a generic idle poll which resembles the functionality
>>> found in arch/. Provide weak arch_cpu_idle_poll function which
>>> can be overridden by the architecture code if needed.
>> No, we want less of those magic hooks, not more.
>>
>>> Interrupts arrive which may not cause a reschedule in idle loops.
>>> In KVM guest, this costs several VM-exit/VM-entry cycles, VM-entry
>>> for interrupts and VM-exit immediately. Also this becomes more
>>> expensive than bare metal. Add a generic idle poll before enter
>>> real idle path. When a reschedule event is pending, we can bypass
>>> the real idle path.
>> Why not do a HV specific idle driver?
> If I understand the problem correctly then he wants to avoid the heavy
> lifting in tick_nohz_idle_enter() in the first place, but there is already
> an interesting quirk there which makes it exit early. See commit
> 3c5d92a0cfb5 ("nohz: Introduce arch_needs_cpu"). The reason for this commit
> looks similar. But lets not proliferate that. I'd rather see that go away.

agreed.

Even we can get more benifit than commit 3c5d92a0cfb5 ("nohz: Introduce
arch_needs_cpu")
in kvm guest. I won't proliferate that..

> But the irq_timings stuff is heading into the same direction, with a more
> complex prediction logic which should tell you pretty good how long that
> idle period is going to be and in case of an interrupt heavy workload this
> would skip the extra work of stopping and restarting the tick and provide a
> very good input into a polling decision.


interesting. I have tested with IRQ_TIMINGS related code, which seems
not working so far.
Also I'd like to help as much as I can.
> This can be handled either in a HV specific idle driver or even in the
> generic core code. If the interrupt does not arrive then you can assume
> within the predicted time then you can assume that the flood stopped and
> invoke halt or whatever.
>
> That avoids all of that 'tunable and tweakable' x86 specific hackery and
> utilizes common functionality which is mostly there already.
here is some sample code. Poll for a while before enter halt in
cpuidle_enter_state()
If I get a reschedule event, then don't try to enter halt.  (I hope this
is the right direction as Peter mentioned in another email)

--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -210,6 +210,13 @@ int cpuidle_enter_state(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
                target_state = &drv->states[index];
        }

+#ifdef CONFIG_PARAVIRT
+       paravirt_idle_poll();
+
+       if (need_resched())
+               return -EBUSY;
+#endif
+
        /* Take note of the planned idle state. */
        sched_idle_set_state(target_state);




thanks,

Quan
Alibaba Cloud

From 1584173435729790209@xxx Wed Nov 15 22:38:38 +0000 2017
X-GM-THRID: 1584141070007959176
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread