Received: by 10.223.185.116 with SMTP id b49csp6486679wrg; Thu, 8 Mar 2018 08:12:38 -0800 (PST) X-Google-Smtp-Source: AG47ELvgnmQjQiH32lAqgHRCb7NAes0HM2T+RbWM2kpMKKmP8vNsBYDW6hvL5xAJ5lnMkJthPjMh X-Received: by 2002:a17:902:7509:: with SMTP id i9-v6mr25112270pll.220.1520525558610; Thu, 08 Mar 2018 08:12:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520525558; cv=none; d=google.com; s=arc-20160816; b=ahq7798rt+fRVlpTZUYPyCMXnknB1CXYgrXNDfQ4KRUwwJRPGZ9cwkzXinEduw0nTy QEAHrq+sPGn2VZN8qeQHDSIxpAaG49MaAQ3nO2nTUDgiMNL35k4lOkVnI2Gyxthsjo9y Ih8XLJ5IGtkax2QiUh5zf+hN5JkFoAM9SemIx4ECGDoM6o2efRMgdq3tAaEosWQ96t28 am2RjZM5ZdHhw73YGVpYgyZt+0sh+g3rpd8KdCtp2sUq02rMwbKBR2MyOnymus/NCpso IphthnMdoQ2lJUOeHzTknG12K3klYdh8Fm8p9ZhWRzVVWk+b1TABmWm+omxLigM5E0Sq aRuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject:arc-authentication-results; bh=xGmRgnUlCNe2ZaqxOyKOlj/m9sZSLqYkMoCswCDv5uM=; b=bSLIY86laWSQGtSvsXJq3uCL3gs/66Jzo7UfPg6lQcyP7YeSWnoPJ4VwFJWGqV9ZeG x04TNr9rn06dJWrh2ubFNAvmJY+nO9nQOqEOdkAzvzRg6GZe54fm0je+xg/v1+ERZwAB 7A8ANKXdAJ0eG4KibLK0639Hd1mhAp6R2VgDEB1Vlud/yC0h2sZJirws+XEdmrZoDY0l UTCXmkKQwZ3vrikrRjwaqo1aNy4MvLyeSxGdapW6bd11lPckrnEr48end375jNPxSWfZ LC1hZZ5n8vaYj34kik5Vc7CRDRDw+5GoVQZ+28Sv38zFxrbU5rOZSXnwDW8A2VwmgOh7 lwUg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p25si13109677pgn.151.2018.03.08.08.12.23; Thu, 08 Mar 2018 08:12:38 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935903AbeCHQKA (ORCPT + 99 others); Thu, 8 Mar 2018 11:10:00 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:58342 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S935873AbeCHQJ7 (ORCPT ); Thu, 8 Mar 2018 11:09:59 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 85784F1205; Thu, 8 Mar 2018 16:09:58 +0000 (UTC) Received: from localhost.localdomain (ovpn-116-135.ams2.redhat.com [10.36.116.135]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 64BC911701C0; Thu, 8 Mar 2018 16:09:56 +0000 (UTC) Subject: Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling To: Marc Zyngier , Shunyong Yang References: <1520492490-7943-1-git-send-email-shunyong.yang@hxt-semitech.com> <9ad47673-068e-f732-d2ca-9c76a8fbdfbc@arm.com> <0a15633d-8944-cb9b-3e6b-b08ee5ec42b9@arm.com> Cc: Christoffer Dall , david.daney@cavium.com, ard.biesheuvel@linaro.org, will.deacon@arm.com, linux-kernel@vger.kernel.org, Joey Zheng , kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org From: Auger Eric Message-ID: <155512d1-b169-e59e-beed-6cf66de112d3@redhat.com> Date: Thu, 8 Mar 2018 17:09:55 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <0a15633d-8944-cb9b-3e6b-b08ee5ec42b9@arm.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Thu, 08 Mar 2018 16:09:58 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Thu, 08 Mar 2018 16:09:58 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'eric.auger@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Marc, On 08/03/18 12:54, Marc Zyngier wrote: > On 08/03/18 09:49, Marc Zyngier wrote: >> [updated Christoffer's email address] >> >> Hi Shunyong, >> >> On 08/03/18 07:01, Shunyong Yang wrote: >>> When resampling irqfds is enabled, level interrupt should be >>> de-asserted when resampling happens. On page 4-47 of GIC v3 >>> specification IHI0069D, it said, >>> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU >>> interface, the IRI changes the status of the interrupt to active >>> and pending if: >>> • It is an edge-triggered interrupt, and another edge has been >>> detected since the interrupt was acknowledged. >>> • It is a level-sensitive interrupt, and the level has not been >>> deasserted since the interrupt was acknowledged." >>> >>> GIC v2 specification IHI0048B.b has similar description on page >>> 3-42 for state machine transition. >>> >>> When some VFIO device, like mtty(8250 VFIO mdev emulation driver >>> in samples/vfio-mdev) triggers a level interrupt, the status >>> transition in LR is pending-->active-->active and pending. >>> Then it will wait resampling to de-assert the interrupt. >>> >>> Current design of lr_signals_eoi_mi() will return false if state >>> in LR is not invalid(Inactive). It causes resampling will not happen >>> in mtty case. >> >> Let me rephrase this, and tell me if I understood it correctly: >> >> - A level interrupt is injected, activated by the guest (LR state=active) >> - guest exits, re-enters, (LR state=pending+active) >> - guest EOIs the interrupt (LR state=pending) >> - maintenance interrupt >> - we don't signal the resampling because we're not in an invalid state >> >> Is that correct? >> >> That's an interesting case, because it seems to invalidate some of the >> optimization that went in over a year ago. >> >> 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields >> b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state >> af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation >> >> We could compare the value of the LR before the guest entry with >> the value at exit time, but we still could miss it if we have a >> transition such as P+A -> P -> A and assume a long enough propagation >> delay for the maintenance interrupt (which is very likely). >> >> In essence, we have lost the benefit of EISR, which was to give us a >> way to deal with asynchronous signalling. >> >>> >>> This will cause interrupt fired continuously to guest even 8250 IIR >>> has no interrupt. When 8250's interrupt is configured in shared mode, >>> it will pass interrupt to other drivers to handle. However, there >>> is no other driver involved. Then, a "nobody cared" kernel complaint >>> occurs. >>> >>> / # cat /dev/ttyS0 >>> [ 4.826836] random: crng init done >>> [ 6.373620] irq 41: nobody cared (try booting with the "irqpoll" >>> option) >>> [ 6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4 >>> [ 6.378927] Hardware name: linux,dummy-virt (DT) >>> [ 6.380876] Call trace: >>> [ 6.381937] dump_backtrace+0x0/0x180 >>> [ 6.383495] show_stack+0x14/0x1c >>> [ 6.384902] dump_stack+0x90/0xb4 >>> [ 6.386312] __report_bad_irq+0x38/0xe0 >>> [ 6.387944] note_interrupt+0x1f4/0x2b8 >>> [ 6.389568] handle_irq_event_percpu+0x54/0x7c >>> [ 6.391433] handle_irq_event+0x44/0x74 >>> [ 6.393056] handle_fasteoi_irq+0x9c/0x154 >>> [ 6.394784] generic_handle_irq+0x24/0x38 >>> [ 6.396483] __handle_domain_irq+0x60/0xb4 >>> [ 6.398207] gic_handle_irq+0x98/0x1b0 >>> [ 6.399796] el1_irq+0xb0/0x128 >>> [ 6.401138] _raw_spin_unlock_irqrestore+0x18/0x40 >>> [ 6.403149] __setup_irq+0x41c/0x678 >>> [ 6.404669] request_threaded_irq+0xe0/0x190 >>> [ 6.406474] univ8250_setup_irq+0x208/0x234 >>> [ 6.408250] serial8250_do_startup+0x1b4/0x754 >>> [ 6.410123] serial8250_startup+0x20/0x28 >>> [ 6.411826] uart_startup.part.21+0x78/0x144 >>> [ 6.413633] uart_port_activate+0x50/0x68 >>> [ 6.415328] tty_port_open+0x84/0xd4 >>> [ 6.416851] uart_open+0x34/0x44 >>> [ 6.418229] tty_open+0xec/0x3c8 >>> [ 6.419610] chrdev_open+0xb0/0x198 >>> [ 6.421093] do_dentry_open+0x200/0x310 >>> [ 6.422714] vfs_open+0x54/0x84 >>> [ 6.424054] path_openat+0x2dc/0xf04 >>> [ 6.425569] do_filp_open+0x68/0xd8 >>> [ 6.427044] do_sys_open+0x16c/0x224 >>> [ 6.428563] SyS_openat+0x10/0x18 >>> [ 6.429972] el0_svc_naked+0x30/0x34 >>> [ 6.431494] handlers: >>> [ 6.432479] [<000000000e9fb4bb>] serial8250_interrupt >>> [ 6.434597] Disabling IRQ #41 >>> >>> This patch changes the lr state condition in lr_signals_eoi_mi() from >>> invalid(Inactive) to active and pending to avoid this. >>> >>> I am not sure about the original design of the condition of >>> invalid(active). So, This RFC is sent out for comments. >>> >>> Cc: Joey Zheng >>> Signed-off-by: Shunyong Yang >>> --- >>> virt/kvm/arm/vgic/vgic-v2.c | 4 ++-- >>> virt/kvm/arm/vgic/vgic-v3.c | 4 ++-- >>> 2 files changed, 4 insertions(+), 4 deletions(-) >>> >>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c >>> index e9d840a75e7b..740ee9a5f551 100644 >>> --- a/virt/kvm/arm/vgic/vgic-v2.c >>> +++ b/virt/kvm/arm/vgic/vgic-v2.c >>> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) >>> >>> static bool lr_signals_eoi_mi(u32 lr_val) >>> { >>> - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && >>> - !(lr_val & GICH_LR_HW); >>> + return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) && >> >> That feels very wrong. You're now signalling the resampling in both >> invalid and pending+active, and the latter state doesn't mean you've >> EOIed anything. You're now over-signalling, and signalling the >> wrong event. >> >>> + (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); >>> } >>> >>> /* >>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c >>> index 6b329414e57a..43111bba7af9 100644 >>> --- a/virt/kvm/arm/vgic/vgic-v3.c >>> +++ b/virt/kvm/arm/vgic/vgic-v3.c >>> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) >>> >>> static bool lr_signals_eoi_mi(u64 lr_val) >>> { >>> - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && >>> - !(lr_val & ICH_LR_HW); >>> + return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) && >>> + (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); >>> } >>> >>> void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) >>> >> >> Assuming I understand the issue correctly, I cannot really see how >> to solve this without reintroducing EISR, which sucks majorly. >> >> I'll try to cook something shortly and we can all have a good >> fight about how crap this is. > > Here's what I came up with. I don't really like it, but that's > the least invasive this I could come up with. Please let me > know if that helps with your test case. Note that I have only > boot-tested this on a sample of 1 machine, so I don't expect this > to be perfect. > > Also, any guideline on how to reproduce this would be much appreciated. > I never used this mdev/mtty thing, so please bear with me. > > Thanks, > > M. > > From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17 00:00:00 2001 > From: Marc Zyngier > Date: Thu, 8 Mar 2018 11:14:06 +0000 > Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to guess EOI MI > status > > We so far rely on the LR state to decide whether the guest has > EOI'd a level interrupt or not. While this looks like a good > idea on the surface, it leads to a couple of annoying corner > cases: > > Example 1: (P = Pending, A = Active, MI = Maintenance Interrupt) > P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P -> MI > > The state is now pending, we've really EOI'd the interrupt, and > yet lr_signals_eoi_mi() returns false, since the state is not 0. > The result is that we won't signal anything on the corresponding > irqfd, which people complain about. Meh. > > Example 2: > P+A -> guest EOI -> P -> delayed MI -> guest IAR -> A -> MI fires In that case aren't we acking the same IRQ occurence twice? > > Same issue: state isn't 0, and nothing happens. > > The core of the problem is that we can't decide on whether an > interrupt has been EOId by just looking at the LR if we ever > want to support the P+A state, as things do change behind our back. > > An alternative to dropping P+A is to bring back our friend EISR, > which indicates which LRs have generated a MI. Instead of dragging > the state around like we used to do, use it to clear the EOI bit > from the in-memory copy, and use that as a predicate to find out > if it fired or not. > > Signed-off-by: Marc Zyngier > --- > virt/kvm/arm/hyp/vgic-v2-sr.c | 8 ++++++++ > virt/kvm/arm/hyp/vgic-v3-sr.c | 6 ++++++ > virt/kvm/arm/vgic/vgic-v2.c | 3 +-- > virt/kvm/arm/vgic/vgic-v3.c | 3 +-- > 4 files changed, 16 insertions(+), 4 deletions(-) > > diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/virt/kvm/arm/hyp/vgic-v2-sr.c > index 4fe6e797e8b3..475cb2d7fd33 100644 > --- a/virt/kvm/arm/hyp/vgic-v2-sr.c > +++ b/virt/kvm/arm/hyp/vgic-v2-sr.c > @@ -43,6 +43,11 @@ static void __hyp_text save_lrs(struct kvm_vcpu *vcpu, void __iomem *base) > struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2; > int i; > u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs; > + u64 eisr; > + > + eisr = readl_relaxed(base + GICH_EISR0); > + if (unlikely(used_lrs > 32)) > + eisr |= (u64)readl_relaxed(base + GICH_EISR1) << 32; > > for (i = 0; i < used_lrs; i++) { > if (cpu_if->vgic_elrsr & (1UL << i)) > @@ -50,6 +55,9 @@ static void __hyp_text save_lrs(struct kvm_vcpu *vcpu, void __iomem *base) > else > cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4)); > > + if ((cpu_if->vgic_lr[i] & GICH_LR_EOI) && !(eisr & (1UL << i))) > + cpu_if->vgic_lr[i] &= ~GICH_LR_EOI; > + > writel_relaxed(0, base + GICH_LR0 + (i * 4)); > } > } > diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c > index b89ce5432214..2ce63d6740b0 100644 > --- a/virt/kvm/arm/hyp/vgic-v3-sr.c > +++ b/virt/kvm/arm/hyp/vgic-v3-sr.c > @@ -223,8 +223,10 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu) > if (used_lrs) { > int i; > u32 nr_pre_bits; > + u32 eisr; > > cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2); > + eisr = read_gicreg(ICH_EISR_EL2); > > write_gicreg(0, ICH_HCR_EL2); > val = read_gicreg(ICH_VTR_EL2); > @@ -236,6 +238,10 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu) > else > cpu_if->vgic_lr[i] = __gic_v3_get_lr(i); > > + if ((cpu_if->vgic_lr[i] & ICH_LR_EOI) && > + !(eisr & (1 << i))) > + cpu_if->vgic_lr[i] &= ~ICH_LR_EOI; > + > __gic_v3_set_lr(0, i); > } > > diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c > index e9d840a75e7b..0be616e4ee29 100644 > --- a/virt/kvm/arm/vgic/vgic-v2.c > +++ b/virt/kvm/arm/vgic/vgic-v2.c > @@ -46,8 +46,7 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu) > > static bool lr_signals_eoi_mi(u32 lr_val) > { > - return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) && > - !(lr_val & GICH_LR_HW); > + return (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW); do we still need to test lr_val & GICH_LR_HW? Aren't LR_EOI and LR_HW antagonist (maybe it is a reminder of architected timer stuff?) > } > > /* > diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c > index 6b329414e57a..c68352b8ed28 100644 > --- a/virt/kvm/arm/vgic/vgic-v3.c > +++ b/virt/kvm/arm/vgic/vgic-v3.c > @@ -35,8 +35,7 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) > > static bool lr_signals_eoi_mi(u64 lr_val) > { > - return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) && > - !(lr_val & ICH_LR_HW); > + return (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW); > } > > void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) > Otherwise Looks good to me Thanks Eric