Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932588AbcDEKFW (ORCPT ); Tue, 5 Apr 2016 06:05:22 -0400 Received: from mail-wm0-f66.google.com ([74.125.82.66]:34457 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757532AbcDEKFS (ORCPT ); Tue, 5 Apr 2016 06:05:18 -0400 Subject: Re: [PATCH] kvm: x86: make lapic hrtimer pinned To: Luiz Capitulino , kvm@vger.kernel.org References: <20160404164607.09e306fa@redhat.com> Cc: linux-kernel@vger.kernel.org, rkrcmar@redhat.com, mtosatti@redhat.com, riel@redhat.com, bsd@redhat.com From: Paolo Bonzini Message-ID: <57038DD8.7070204@redhat.com> Date: Tue, 5 Apr 2016 12:05:12 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <20160404164607.09e306fa@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3445 Lines: 91 On 04/04/2016 22:46, Luiz Capitulino wrote: > When a vCPU runs on a nohz_full core, the hrtimer used by > the lapic emulation code can be migrated to another core. > When this happens, it's possible to observe milisecond > latency when delivering timer IRQs to KVM guests. > > The huge latency is mainly due to the fact that > apic_timer_fn() expects to run during a kvm exit. It > sets KVM_REQ_PENDING_TIMER and let it be handled on kvm > entry. However, if the timer fires on a different core, > we have to wait until the next kvm exit for the guest > to see KVM_REQ_PENDING_TIMER set. > > This problem became visible after commit 9642d18ee. This > commit changed the timer migration code to always attempt > to migrate timers away from nohz_full cores. While it's > discussable if this is correct/desirable (I don't think > it is), it's clear that the lapic emulation code has > a requirement on firing the hrtimer in the same core > where it was started. This is achieved by making the > hrtimer pinned. > > Lastly, note that KVM has code to migrate timers when a > vCPU is scheduled to run in different core. However, this > forced migration may fail. When this happens, we can have > the same problem. If we want 100% correctness, we'll have > to modify apic_timer_fn() to cause a kvm exit when it runs > on a different core than the vCPU. Not sure if this is > possible. > > Here's a reproducer for the issue being fixed: > > 1. Set all cores but core0 to be nohz_full cores > 2. Start a guest with a single vCPU > 3. Trace apic_timer_fn() and kvm_inject_apic_timer_irqs() > > You'll see that apic_timer_fn() will run in core0 while > kvm_inject_apic_timer_irqs() runs in a different core. If > you get both on core0, try running a program that takes 100% > of the CPU and pin it to core0 to force the vCPU out. > > Signed-off-by: Luiz Capitulino > --- > arch/x86/kvm/lapic.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 443d2a5..1a2da0e 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -1369,7 +1369,7 @@ static void start_apic_timer(struct kvm_lapic *apic) > > hrtimer_start(&apic->lapic_timer.timer, > ktime_add_ns(now, apic->lapic_timer.period), > - HRTIMER_MODE_ABS); > + HRTIMER_MODE_ABS_PINNED); > > apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016" > PRIx64 ", " > @@ -1402,7 +1402,7 @@ static void start_apic_timer(struct kvm_lapic *apic) > expire = ktime_add_ns(now, ns); > expire = ktime_sub_ns(expire, lapic_timer_advance_ns); > hrtimer_start(&apic->lapic_timer.timer, > - expire, HRTIMER_MODE_ABS); > + expire, HRTIMER_MODE_ABS_PINNED); > } else > apic_timer_expired(apic); > > @@ -1868,7 +1868,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu) > apic->vcpu = vcpu; > > hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC, > - HRTIMER_MODE_ABS); > + HRTIMER_MODE_ABS_PINNED); > apic->lapic_timer.timer.function = apic_timer_fn; > > /* > @@ -2003,7 +2003,7 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu) > > timer = &vcpu->arch.apic->lapic_timer.timer; > if (hrtimer_cancel(timer)) > - hrtimer_start_expires(timer, HRTIMER_MODE_ABS); > + hrtimer_start_expires(timer, HRTIMER_MODE_ABS_PINNED); > } > > /* > Queued for 4.6.0-rc3, thanks. Paolo