Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751521AbbBEVYa (ORCPT ); Thu, 5 Feb 2015 16:24:30 -0500 Received: from mail-we0-f182.google.com ([74.125.82.182]:57234 "EHLO mail-we0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750992AbbBEVY2 (ORCPT ); Thu, 5 Feb 2015 16:24:28 -0500 Message-ID: <54D3DF86.4020300@redhat.com> Date: Thu, 05 Feb 2015 22:24:22 +0100 From: Paolo Bonzini User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: David Matlack CC: "linux-kernel@vger.kernel.org" , kvm list , riel@redhat.com, rkrcmar@redhat.com, Marcelo Tosatti Subject: Re: [PATCH RFC] kvm: x86: add halt_poll module parameter References: <1423152325-5094-1-git-send-email-pbonzini@redhat.com> In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4824 Lines: 121 On 05/02/2015 21:39, David Matlack wrote: >> This parameter helps a lot for latency-bound workloads [...] >> KVM's performance here is usually around 30% of bare metal, >> or 50% if you use cache=directsync or cache=writethrough. >> With this patch performance reaches 60-65% of bare metal and, more >> important, 99% of what you get if you use idle=poll in the guest. > > I used loopback TCP_RR and loopback memcache as benchmarks for halt > polling. I saw very similar results as you (before: 40% bare metal, > after: 60-65% bare metal and 95% of guest idle=poll). Good that it also works for network! > Reviewed-by: David Matlack > >> arch/x86/include/asm/kvm_host.h | 1 + >> arch/x86/kvm/x86.c | 28 ++++++++++++++++++++++++---- >> include/linux/kvm_host.h | 1 + >> virt/kvm/kvm_main.c | 22 +++++++++++++++------- >> 4 files changed, 41 insertions(+), 11 deletions(-) >> >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h >> index 848947ac6ade..a236e39cc385 100644 >> --- a/arch/x86/include/asm/kvm_host.h >> +++ b/arch/x86/include/asm/kvm_host.h >> @@ -655,6 +655,7 @@ struct kvm_vcpu_stat { >> u32 irq_window_exits; >> u32 nmi_window_exits; >> u32 halt_exits; >> + u32 halt_successful_poll; >> u32 halt_wakeup; >> u32 request_irq_exits; >> u32 irq_exits; >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index 1373e04e1f19..b7b20828f01c 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -96,6 +96,9 @@ EXPORT_SYMBOL_GPL(kvm_x86_ops); >> static bool ignore_msrs = 0; >> module_param(ignore_msrs, bool, S_IRUGO | S_IWUSR); >> >> +unsigned int halt_poll = 0; >> +module_param(halt_poll, uint, S_IRUGO | S_IWUSR); > > Suggest encoding the units in the name. "halt_poll_cycles" in this case. I left it out because of the parallel with ple_window/ple_gap. But I will call it "halt_poll_ns" in the next version. >> + >> unsigned int min_timer_period_us = 500; >> module_param(min_timer_period_us, uint, S_IRUGO | S_IWUSR); >> >> @@ -145,6 +148,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { >> { "irq_window", VCPU_STAT(irq_window_exits) }, >> { "nmi_window", VCPU_STAT(nmi_window_exits) }, >> { "halt_exits", VCPU_STAT(halt_exits) }, >> + { "halt_successful_poll", VCPU_STAT(halt_successful_poll) }, >> { "halt_wakeup", VCPU_STAT(halt_wakeup) }, >> { "hypercalls", VCPU_STAT(hypercalls) }, >> { "request_irq", VCPU_STAT(request_irq_exits) }, >> @@ -5819,13 +5823,29 @@ void kvm_arch_exit(void) >> int kvm_emulate_halt(struct kvm_vcpu *vcpu) >> { >> ++vcpu->stat.halt_exits; >> - if (irqchip_in_kernel(vcpu->kvm)) { >> - vcpu->arch.mp_state = KVM_MP_STATE_HALTED; >> - return 1; >> - } else { >> + if (!irqchip_in_kernel(vcpu->kvm)) { >> vcpu->run->exit_reason = KVM_EXIT_HLT; >> return 0; >> } >> + >> + vcpu->arch.mp_state = KVM_MP_STATE_HALTED; >> + if (halt_poll) { > > Would it be useful to poll in kvm_vcpu_block() for the benefit of all > arch's? Sure. Especially if I use time instead of cycles. >> + u64 start, curr; >> + rdtscll(start); > > Why cycles instead of time? People's love for rdtsc grows every time you tell them it's wrong... >> + do { >> + /* >> + * This sets KVM_REQ_UNHALT if an interrupt >> + * arrives. >> + */ >> + if (kvm_vcpu_check_block(vcpu) < 0) { >> + ++vcpu->stat.halt_successful_poll; >> + break; >> + } >> + rdtscll(curr); >> + } while(!need_resched() && curr - start < halt_poll); > > I found that using need_resched() was not sufficient at preventing > VCPUs from delaying their own progress. To test this try running with > and without polling on a 2 VCPU VM, confined to 1 PCPU, that is running > loopback TCP_RR in the VM. The problem goes away if you stop polling as > soon as there are runnable threads on your cpu. (e.g. use > "single_task_running()" instead of "!need_resched()" > http://lxr.free-electrons.com/source/kernel/sched/core.c#L2398 ). This > also guarantees polling only delays the idle thread. Great, I'll include all of your suggestions in the next version of the patch. Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/