Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753025AbbBEUjY (ORCPT ); Thu, 5 Feb 2015 15:39:24 -0500 Received: from mail-ig0-f175.google.com ([209.85.213.175]:44261 "EHLO mail-ig0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751135AbbBEUjW (ORCPT ); Thu, 5 Feb 2015 15:39:22 -0500 MIME-Version: 1.0 In-Reply-To: <1423152325-5094-1-git-send-email-pbonzini@redhat.com> References: <1423152325-5094-1-git-send-email-pbonzini@redhat.com> From: David Matlack Date: Thu, 5 Feb 2015 12:39:01 -0800 Message-ID: Subject: Re: [PATCH RFC] kvm: x86: add halt_poll module parameter To: Paolo Bonzini Cc: "linux-kernel@vger.kernel.org" , kvm list , riel@redhat.com, rkrcmar@redhat.com, Marcelo Tosatti Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9302 Lines: 221 On Thu, Feb 5, 2015 at 8:05 AM, Paolo Bonzini wrote: > This patch introduces a new module parameter for the KVM module; when it > is present, KVM attempts a bit of polling on every HLT before scheduling > itself out via kvm_vcpu_block. Awesome. I have been working on the same feature in parallel so I have some suggestions :) > > This parameter helps a lot for latency-bound workloads---in particular > I tested it with O_DSYNC writes with a battery-backed disk in the host. > In this case, writes are fast (because the data doesn't have to go all > the way to the platters) but they cannot be merged by either the host or > the guest. KVM's performance here is usually around 30% of bare metal, > or 50% if you use cache=directsync or cache=writethrough (these > parameters avoid that the guest sends pointless flush requests, and > at the same time they are not slow because of the battery-backed cache). > The bad performance happens because on every halt the host CPU decides > to halt itself too. When the interrupt comes, the vCPU thread is then > migrated to a new physical CPU, and in general the latency is horrible > because the vCPU thread has to be scheduled back in. > > With this patch performance reaches 60-65% of bare metal and, more > important, 99% of what you get if you use idle=poll in the guest. This I used loopback TCP_RR and loopback memcache as benchmarks for halt polling. I saw very similar results as you (before: 40% bare metal, after: 60-65% bare metal and 95% of guest idle=poll). > means that the tunable gets rid of this particular bottleneck, and more > work can be done to improve performance in the kernel or QEMU. > > Of course there is some price to pay; every time an otherwise idle vCPUs > is interrupted by an interrupt, it will poll unnecessarily and thus > impose a little load on the host. The above results were obtained with > a mostly random value of the parameter (2000000), and the load was around > 1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU. > > The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll, > that can be used to tune the parameter. It counts how many HLT > instructions received an interrupt during the polling period; each > successful poll avoids that Linux schedules the VCPU thread out and back > in, and may also avoid a likely trip to C1 and back for the physical CPU. > > While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second. > Of these halts, almost all are failed polls. During the benchmark, > instead, basically all halts end within the polling period, except a more > or less constant stream of 50 per second coming from vCPUs that are not > running the benchmark. The wasted time is thus very low. Things may > be slightly different for Windows VMs, which have a ~10 ms timer tick. > > The effect is also visible on Marcelo's recently-introduced latency > test for the TSC deadline timer. Though of course a non-RT kernel has > awful latency bounds, the latency of the timer is around 8000-10000 clock > cycles compared to 20000-120000 without setting halt_poll. For the TSC > deadline timer, thus, the effect is both a smaller average latency and > a smaller variance. > > Signed-off-by: Paolo Bonzini > --- Reviewed-by: David Matlack > arch/x86/include/asm/kvm_host.h | 1 + > arch/x86/kvm/x86.c | 28 ++++++++++++++++++++++++---- > include/linux/kvm_host.h | 1 + > virt/kvm/kvm_main.c | 22 +++++++++++++++------- > 4 files changed, 41 insertions(+), 11 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 848947ac6ade..a236e39cc385 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -655,6 +655,7 @@ struct kvm_vcpu_stat { > u32 irq_window_exits; > u32 nmi_window_exits; > u32 halt_exits; > + u32 halt_successful_poll; > u32 halt_wakeup; > u32 request_irq_exits; > u32 irq_exits; > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 1373e04e1f19..b7b20828f01c 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -96,6 +96,9 @@ EXPORT_SYMBOL_GPL(kvm_x86_ops); > static bool ignore_msrs = 0; > module_param(ignore_msrs, bool, S_IRUGO | S_IWUSR); > > +unsigned int halt_poll = 0; > +module_param(halt_poll, uint, S_IRUGO | S_IWUSR); Suggest encoding the units in the name. "halt_poll_cycles" in this case. > + > unsigned int min_timer_period_us = 500; > module_param(min_timer_period_us, uint, S_IRUGO | S_IWUSR); > > @@ -145,6 +148,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { > { "irq_window", VCPU_STAT(irq_window_exits) }, > { "nmi_window", VCPU_STAT(nmi_window_exits) }, > { "halt_exits", VCPU_STAT(halt_exits) }, > + { "halt_successful_poll", VCPU_STAT(halt_successful_poll) }, > { "halt_wakeup", VCPU_STAT(halt_wakeup) }, > { "hypercalls", VCPU_STAT(hypercalls) }, > { "request_irq", VCPU_STAT(request_irq_exits) }, > @@ -5819,13 +5823,29 @@ void kvm_arch_exit(void) > int kvm_emulate_halt(struct kvm_vcpu *vcpu) > { > ++vcpu->stat.halt_exits; > - if (irqchip_in_kernel(vcpu->kvm)) { > - vcpu->arch.mp_state = KVM_MP_STATE_HALTED; > - return 1; > - } else { > + if (!irqchip_in_kernel(vcpu->kvm)) { > vcpu->run->exit_reason = KVM_EXIT_HLT; > return 0; > } > + > + vcpu->arch.mp_state = KVM_MP_STATE_HALTED; > + if (halt_poll) { Would it be useful to poll in kvm_vcpu_block() for the benefit of all arch's? > + u64 start, curr; > + rdtscll(start); Why cycles instead of time? > + do { > + /* > + * This sets KVM_REQ_UNHALT if an interrupt > + * arrives. > + */ > + if (kvm_vcpu_check_block(vcpu) < 0) { > + ++vcpu->stat.halt_successful_poll; > + break; > + } > + rdtscll(curr); > + } while(!need_resched() && curr - start < halt_poll); I found that using need_resched() was not sufficient at preventing VCPUs from delaying their own progress. To test this try running with and without polling on a 2 VCPU VM, confined to 1 PCPU, that is running loopback TCP_RR in the VM. The problem goes away if you stop polling as soon as there are runnable threads on your cpu. (e.g. use "single_task_running()" instead of "!need_resched()" http://lxr.free-electrons.com/source/kernel/sched/core.c#L2398 ). This also guarantees polling only delays the idle thread. > + } > + > + return 1; > } > EXPORT_SYMBOL_GPL(kvm_emulate_halt); > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 8a82838034f1..1519d48d956f 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -584,6 +584,7 @@ int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn); > unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn); > void mark_page_dirty(struct kvm *kvm, gfn_t gfn); > > +int kvm_vcpu_check_block(struct kvm_vcpu *vcpu); > void kvm_vcpu_block(struct kvm_vcpu *vcpu); > void kvm_vcpu_kick(struct kvm_vcpu *vcpu); > int kvm_vcpu_yield_to(struct kvm_vcpu *target); > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 0c281760a1c5..825fc3ec0509 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1813,6 +1813,20 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn) > } > EXPORT_SYMBOL_GPL(mark_page_dirty); > > +int kvm_vcpu_check_block(struct kvm_vcpu *vcpu) > +{ > + if (kvm_arch_vcpu_runnable(vcpu)) { > + kvm_make_request(KVM_REQ_UNHALT, vcpu); > + return -EINTR; > + } > + if (kvm_cpu_has_pending_timer(vcpu)) > + return -EINTR; > + if (signal_pending(current)) > + return -EINTR; > + > + return 0; > +} > + > /* > * The vCPU has executed a HLT instruction with in-kernel mode enabled. > */ > @@ -1823,13 +1837,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) > for (;;) { > prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); > > - if (kvm_arch_vcpu_runnable(vcpu)) { > - kvm_make_request(KVM_REQ_UNHALT, vcpu); > - break; > - } > - if (kvm_cpu_has_pending_timer(vcpu)) > - break; > - if (signal_pending(current)) > + if (kvm_vcpu_check_block(vcpu) < 0) > break; > > schedule(); > -- > 1.8.3.1 > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/