Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751456AbdH3Kbv (ORCPT ); Wed, 30 Aug 2017 06:31:51 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34142 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751300AbdH3Kbt (ORCPT ); Wed, 30 Aug 2017 06:31:49 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 2A770C049E31 Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=drjones@redhat.com Date: Wed, 30 Aug 2017 12:31:41 +0200 From: Andrew Jones To: Christoffer Dall Cc: Marc Zyngier , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, Christoffer Dall , Thomas Gleixner , Jason Cooper , Eric Auger , Shanker Donthineni , Mark Rutland , Shameerali Kolothum Thodi Subject: Re: [PATCH v3 51/59] KVM: arm/arm64: GICv4: Add doorbell interrupt handling Message-ID: <20170830103141.jarquijxzntv2feo@kamzik.brq.redhat.com> References: <20170731172637.29355-1-marc.zyngier@arm.com> <20170731172637.29355-52-marc.zyngier@arm.com> <8a35ee18-85a5-168b-b3d2-b9f51390b475@arm.com> <20170828181850.GL24649@cbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170828181850.GL24649@cbox> User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Wed, 30 Aug 2017 10:31:49 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6451 Lines: 169 On Mon, Aug 28, 2017 at 08:18:50PM +0200, Christoffer Dall wrote: > On Fri, Aug 04, 2017 at 08:44:04AM +0100, Marc Zyngier wrote: > > On 31/07/17 18:26, Marc Zyngier wrote: > > > When a vPE is not running, a VLPI being made pending results in a > > > doorbell interrupt being delivered. Let's handle this interrupt > > > and update the pending_last flag that indicates that VLPIs are > > > pending. The corresponding vcpu is also kicked into action. > > > > > > Signed-off-by: Marc Zyngier > > > --- > > > virt/kvm/arm/vgic/vgic-v4.c | 34 ++++++++++++++++++++++++++++++++++ > > > 1 file changed, 34 insertions(+) > > > > > > diff --git a/virt/kvm/arm/vgic/vgic-v4.c b/virt/kvm/arm/vgic/vgic-v4.c > > > index 534d3051a078..6af3cde6d7d4 100644 > > > --- a/virt/kvm/arm/vgic/vgic-v4.c > > > +++ b/virt/kvm/arm/vgic/vgic-v4.c > > > @@ -21,6 +21,19 @@ > > > > > > #include "vgic.h" > > > > > > +static irqreturn_t vgic_v4_doorbell_handler(int irq, void *info) > > > +{ > > > + struct kvm_vcpu *vcpu = info; > > > + > > > + if (!kvm_vgic_vcpu_pending_irq(vcpu)) { > > > + vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true; > > > + kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu); > > > + kvm_vcpu_kick(vcpu); > > > + } > > > > This code is so obviously broken that I completely overlooked it. > > > > If we have take a doorbell interrupt, then it means nothing was > > otherwise pending (because we'd have been kicked out of the blocking > > state, and will have masked the doorbell). So checking for pending > > interrupts is pointless. > > > > Furthermore, calling kvm_vgic_vcpu_pending_irq() takes the ap_list > > lock. If we take a doorbell interrupt while injecting a virtual > > interrupt (from userspace, for example) on the same CPU, we end-up > > in deadlock land. This would be solved by Christoffer's latest > > crop of timer patches, but there is no point getting there the first > > place. > > > > The patchlet below solves it: > > > > diff --git a/virt/kvm/arm/vgic/vgic-v4.c b/virt/kvm/arm/vgic/vgic-v4.c > > index 15feb1151797..48e4d6ebeaa8 100644 > > --- a/virt/kvm/arm/vgic/vgic-v4.c > > +++ b/virt/kvm/arm/vgic/vgic-v4.c > > @@ -94,11 +94,9 @@ static irqreturn_t vgic_v4_doorbell_handler(int irq, void *info) > > { > > struct kvm_vcpu *vcpu = info; > > > > - if (!kvm_vgic_vcpu_pending_irq(vcpu)) { > > - vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true; > > - kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu); > > - kvm_vcpu_kick(vcpu); > > - } > > + vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true; > > + kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu); > > + kvm_vcpu_kick(vcpu); > > I don't think you need the request and kick, because if you're getting > this doorbell, doesn't that also mean that the VCPU is not running in > the guest and you simply need to make sure the VCPU thread gets > scheduled again, so you could call kvm_vcpu_wake_up() instead. While we do that in kvm_timer_inject_irq_work(), which is scheduled from the same function that vgic_v4_doorbell_handler() would be enabled from (kvm_vcpu_block->kvm_arch_vcpu_blocking), just a wake up isn't sufficient in this case. > > Unless the request is there to ensure proper memory barriers around > setting pending_last? Right, unlike pending timers, we need the barriers in this handler. Pending timers are safe because their pending test compares state set by the VCPU thread itself to state acquired by the VCPU thread reading a host sysreg itself. IOW, handlers making "set vcpu timer pending requests" don't need to do anything but wake up the VCPU. Handlers that set some sort of irq pending state, like vgic_v4_doorbell_handler(), do need to worry about the visibility of that state though. > > In that case, is the read barrier taken care of by prepare_to_swait in > kvm_vcpu_block()? Thanks for the bug report :-) There's a barrier, but it's not properly paired. Currently we have VCPU handler ---- ------- for (;;) { WRITE_ONCE(task->state, INTERRUPTIBLE); pending=true; smp_mb(); smp_wmb(); // kvm_make_request() if (pending) { WRITE_ONCE(vcpu->state, NORMAL); ... stop waiting ... } schedule(); } Proper barrier use with swait/swake should instead look like this VCPU handler ---- ------- for (;;) { WRITE_ONCE(task->state, INTERRUPTIBLE); smp_mb(); if (READ_ONCE(task->state) == NORMAL) { pending=true; smp_rmb(); smp_wmb(); if (pending) WRITE_ONCE(vcpu->state, NORMAL); ... stop waiting ... else continue; } schedule(); } But checking task state adds complexity and would only cover a small window anyway (the window between prepare_to_swait() and kvm_vcpu_check_block()). We need to cover a larger window, for this particular case it's from kvm_arch_vcpu_blocking() to kvm_vcpu_check_block(). Better use of VCPU requests is the answer: VCPU handler ---- ------- kvm_arch_vcpu_blocking(); for (;;) { prepare_to_swait(); if (test_bit(IRQ_PENDING)) { pending=true; smp_rmb(); smp_wmb(); if (pending) set_bit(IRQ_PENDING); ... stop waiting ... else continue; } schedule(); } The handler side is covered by kvm_make_request() and the vcpu kick, so this patch looks good. However we need a patch to kvm_arch_vcpu_runnable() to fix the VCPU side. int kvm_arch_vcpu_runnable(struct kvm_vcpu *v) { - return ((!!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v)) - && !v->arch.power_off && !v->arch.pause); + if (v->arch.power_off || v->arch.pause) + return false; + + if (kvm_test_request(KVM_REQ_IRQ_PENDING, vcpu)) { + smp_rmb(); + return true; + } } We can keep the kvm_vgic_vcpu_pending_irq() check, after the IRQ_PENDING check, if there are cases where IRQ_PENDING wouldn't be set, but kvm_vgic_vcpu_pending_irq() would return true. Either way we also get a bit of an optimization with this fix. I'll think/test some more, and then send a patch. Thanks, drew