Message-ID: <BLU437-SMTP9059F601C3CAF2FA34F2E4806E0@phx.gbl>
From: Wanpeng Li <wanpeng.li@hotmail.com>
Subject: Re: [PATCH v2 2/3] KVM: dynamic halt_poll_ns adjustment
To: David Matlack <dmatlack@google.com>
References: <1440484519-2709-1-git-send-email-wanpeng.li@hotmail.com>
 <BLU436-SMTP2428E3874423D6FB88D208C80610@phx.gbl>
 <CALzav=fpBrohrKyGbjDBhAtC7uwm11eqBsWjCqP9rsaX4QGXNw@mail.gmail.com>
 <BLU436-SMTP20728BC58B3359978FD0DE2806F0@phx.gbl>
 <CALzav=eGOfniScPsR-_h=HOc6DZgZZRR7A5LCncNSTVQ5sdKhg@mail.gmail.com>
CC: Paolo Bonzini <pbonzini@redhat.com>, kvm list <kvm@vger.kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Date: Fri, 28 Aug 2015 13:30:49 +0800
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0)
 Gecko/20100101 Thunderbird/38.2.0
MIME-Version: 1.0
In-Reply-To: <CALzav=eGOfniScPsR-_h=HOc6DZgZZRR7A5LCncNSTVQ5sdKhg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 8BIT
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 7147
Lines: 192

On 8/28/15 12:25 AM, David Matlack wrote:
> On Thu, Aug 27, 2015 at 2:59 AM, Wanpeng Li <wanpeng.li@hotmail.com> wrote:
>> Hi David,
>> On 8/26/15 1:19 AM, David Matlack wrote:
>>> Thanks for writing v2, Wanpeng.
>>>
>>> On Mon, Aug 24, 2015 at 11:35 PM, Wanpeng Li <wanpeng.li@hotmail.com>
>>> wrote:
>>>> There is a downside of halt_poll_ns since poll is still happen for idle
>>>> VCPU which can waste cpu usage. This patch adds the ability to adjust
>>>> halt_poll_ns dynamically.
>>> What testing have you done with these patches? Do you know if this removes
>>> the overhead of polling in idle VCPUs? Do we lose any of the performance
>>> from always polling?
>>>
>>>> There are two new kernel parameters for changing the halt_poll_ns:
>>>> halt_poll_ns_grow and halt_poll_ns_shrink. A third new parameter,
>>>> halt_poll_ns_max, controls the maximal halt_poll_ns; it is internally
>>>> rounded down to a closest multiple of halt_poll_ns_grow. The shrink/grow
>>>> matrix is suggested by David:
>>>>
>>>> if (poll successfully for interrupt): stay the same
>>>>     else if (length of kvm_vcpu_block is longer than halt_poll_ns_max):
>>>> shrink
>>>>     else if (length of kvm_vcpu_block is less than halt_poll_ns_max): grow
>>> The way you implemented this wasn't what I expected. I thought you would
>>> time
>>> the whole function (kvm_vcpu_block). But I like your approach better. It's
>>> simpler and [by inspection] does what we want.
>>
>> I see there is more idle vCPUs overhead w/ this method even more than always
>> halt-poll, so I bring back grow vcpu->halt_poll_ns when interrupt arrives
>> and shrinks when idle VCPU is detected. The perfomance looks good in v4.
> Why did this patch have a worse idle overhead than always poll?

I’m not sure. I make a mistake when I report the kernelbuild test, the 
perfomance is also worse than always poll w/ your method. I think your 
method didn't grow halt_poll_ns according to if interrupt arrives.

Regards,
Wanpeng Li

>
>> Regards,
>> Wanpeng Li
>>
>>
>>>>     halt_poll_ns_shrink/ |
>>>>     halt_poll_ns_grow    | grow halt_poll_ns    | shrink halt_poll_ns
>>>>     ---------------------+----------------------+-------------------
>>>>     < 1                  |  = halt_poll_ns      |  = 0
>>>>     < halt_poll_ns       | *= halt_poll_ns_grow | /= halt_poll_ns_shrink
>>>>     otherwise            | += halt_poll_ns_grow | -= halt_poll_ns_shrink
>>> I was curious why you went with this approach rather than just the
>>> middle row, or just the last row. Do you think we'll want the extra
>>> flexibility?
>>>
>>>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>>>> ---
>>>>    virt/kvm/kvm_main.c | 65
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>>    1 file changed, 64 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>>> index 93db833..2a4962b 100644
>>>> --- a/virt/kvm/kvm_main.c
>>>> +++ b/virt/kvm/kvm_main.c
>>>> @@ -66,9 +66,26 @@
>>>>    MODULE_AUTHOR("Qumranet");
>>>>    MODULE_LICENSE("GPL");
>>>>
>>>> -static unsigned int halt_poll_ns;
>>>> +#define KVM_HALT_POLL_NS  500000
>>>> +#define KVM_HALT_POLL_NS_GROW   2
>>>> +#define KVM_HALT_POLL_NS_SHRINK 0
>>>> +#define KVM_HALT_POLL_NS_MAX 2000000
>>> The macros are not necessary. Also, hard coding the numbers in the param
>>> definitions will make reading the comments above them easier.
>>>
>>>> +
>>>> +static unsigned int halt_poll_ns = KVM_HALT_POLL_NS;
>>>>    module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR);
>>>>
>>>> +/* Default doubles per-vcpu halt_poll_ns. */
>>>> +static unsigned int halt_poll_ns_grow = KVM_HALT_POLL_NS_GROW;
>>>> +module_param(halt_poll_ns_grow, int, S_IRUGO);
>>>> +
>>>> +/* Default resets per-vcpu halt_poll_ns . */
>>>> +static unsigned int halt_poll_ns_shrink = KVM_HALT_POLL_NS_SHRINK;
>>>> +module_param(halt_poll_ns_shrink, int, S_IRUGO);
>>>> +
>>>> +/* halt polling only reduces halt latency by 10-15 us, 2ms is enough */
>>> Ah, I misspoke before. I was thinking about round-trip latency. The
>>> latency
>>> of a single halt is reduced by about 5-7 us.
>>>
>>>> +static unsigned int halt_poll_ns_max = KVM_HALT_POLL_NS_MAX;
>>>> +module_param(halt_poll_ns_max, int, S_IRUGO);
>>> We can remove halt_poll_ns_max. vcpu->halt_poll_ns can always start at
>>> zero
>>> and grow from there. Then we just need one module param to keep
>>> vcpu->halt_poll_ns from growing too large.
>>>
>>> [ It would make more sense to remove halt_poll_ns and keep
>>> halt_poll_ns_max,
>>>     but since halt_poll_ns already exists in upstream kernels, we probably
>>> can't
>>>     remove it. ]
>>>
>>>> +
>>>>    /*
>>>>     * Ordering of locks:
>>>>     *
>>>> @@ -1907,6 +1924,48 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu
>>>> *vcpu, gfn_t gfn)
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty);
>>>>
>>>> +static unsigned int __grow_halt_poll_ns(unsigned int val)
>>>> +{
>>>> +       if (halt_poll_ns_grow < 1)
>>>> +               return halt_poll_ns;
>>>> +
>>>> +       val = min(val, halt_poll_ns_max);
>>>> +
>>>> +       if (val == 0)
>>>> +               return halt_poll_ns;
>>>> +
>>>> +       if (halt_poll_ns_grow < halt_poll_ns)
>>>> +               val *= halt_poll_ns_grow;
>>>> +       else
>>>> +               val += halt_poll_ns_grow;
>>>> +
>>>> +       return val;
>>>> +}
>>>> +
>>>> +static unsigned int __shrink_halt_poll_ns(int val, int modifier, int
>>>> minimum)
>>> minimum never gets used.
>>>
>>>> +{
>>>> +       if (modifier < 1)
>>>> +               return 0;
>>>> +
>>>> +       if (modifier < halt_poll_ns)
>>>> +               val /= modifier;
>>>> +       else
>>>> +               val -= modifier;
>>>> +
>>>> +       return val;
>>>> +}
>>>> +
>>>> +static void grow_halt_poll_ns(struct kvm_vcpu *vcpu)
>>> These wrappers aren't necessary.
>>>
>>>> +{
>>>> +       vcpu->halt_poll_ns = __grow_halt_poll_ns(vcpu->halt_poll_ns);
>>>> +}
>>>> +
>>>> +static void shrink_halt_poll_ns(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +       vcpu->halt_poll_ns = __shrink_halt_poll_ns(vcpu->halt_poll_ns,
>>>> +                       halt_poll_ns_shrink, halt_poll_ns);
>>>> +}
>>>> +
>>>>    static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
>>>>    {
>>>>           if (kvm_arch_vcpu_runnable(vcpu)) {
>>>> @@ -1954,6 +2013,10 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>>>>                           break;
>>>>
>>>>                   waited = true;
>>>> +               if (vcpu->halt_poll_ns > halt_poll_ns_max)
>>>> +                       shrink_halt_poll_ns(vcpu);
>>>> +               else
>>>> +                       grow_halt_poll_ns(vcpu);
>>> Shouldn't this go after the loop, and before "out:", in case we schedule
>>> more than once? You can gate it on "if (waited)" so it only runs if we
>>> actually scheduled.
>>>
>>>>                   schedule();
>>>>           }
>>>>
>>>> --
>>>> 1.9.1
>>>>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/