Received: by 2002:a05:6358:a55:b0:ec:fcf4:3ecf with SMTP id 21csp6385407rwb; Wed, 18 Jan 2023 04:40:14 -0800 (PST) X-Google-Smtp-Source: AMrXdXv8vnLPmFGVEkb5WlQtFuZl7DlMBcLLa8Y/dmZoW7A4D6gcDsGtIGK7W4Qt4WvekquMK38X X-Received: by 2002:a17:90a:a895:b0:229:d400:11c1 with SMTP id h21-20020a17090aa89500b00229d40011c1mr544075pjq.10.1674045614580; Wed, 18 Jan 2023 04:40:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674045614; cv=none; d=google.com; s=arc-20160816; b=rk7ZM742wp3lgyHMrBnuGnzG2KCbfINdluMKA5piIP0MbdCkvyNUP+FAFd5FXQLZcM TvcG63wtSgHi5IrzSs2U0iiBT07sMXTaf8jU/Q0PLLOabeV+PTtchTv6ktQuOXgytDVt 08jgpArj8cW8Zn6Fm9k+p2GLSA3jO8em8eAPgLDDoLLYJFvALj3Y69lk5nyrIg997eSJ 0T5imRUAzJPf9bTkP5+nMZKNqQgwpY2kfwvE/OvH60ycM7iOqvqgrybRmBqS8gpTVD6h CmpSvIr79HNDUT96HsTFxAkrYNlugRrhyCrWlXNQoIWpY53UnLo6kTXj96p/Htqb2t/Y 5b2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:references:in-reply-to :subject:cc:to:from:message-id:date:dkim-signature; bh=WO1ZsABnxbsxldqBOXS/q+bLt63FaWDhaHzfmJwD3s0=; b=vvRLoURR0kxRv9JD5VospnuIPcyinkMFD09WCp2xVm36lgTcC9dAe9dMVE5f+IooR1 QN8A6rtyApUstymcQIANZ4m2zjIVphMq+pWDpA5t/Fd925EcDR4ydrijxaSAXnMssf3N ih8AHGD5RzanHRyT6YN2IJOy8pjmILjiKXVRZN/qJ67R367h8ZJ9EPG8ojHgZsJhL1YG T4WEJUlCyURh5M9OIr9sxG2rE4oneU8raG6XakHGlxKxPoXVINKmwRZufcoeB7kd1FWr CIuDd6BUhQlot12bvkhDQEj0LzQdQf4jFVxJuAYyu4iTeOUs8ZpQHinkH/2l6w9riNUE GFxQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=o1+302aT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x62-20020a638641000000b004cfc40121e4si3759637pgd.855.2023.01.18.04.40.09; Wed, 18 Jan 2023 04:40:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=o1+302aT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230371AbjARMeO (ORCPT + 45 others); Wed, 18 Jan 2023 07:34:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36240 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230290AbjARMdq (ORCPT ); Wed, 18 Jan 2023 07:33:46 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B6C116D6A1 for ; Wed, 18 Jan 2023 03:54:07 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 5DE16B81C64 for ; Wed, 18 Jan 2023 11:54:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D6EBAC433D2; Wed, 18 Jan 2023 11:54:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1674042845; bh=Vi7V9pYw3dgQME5ZLQ6OTz8bl5bu0RPfdqsoC2EYPSg=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=o1+302aTiqMN+NUoKj43qeP01rY9Cg/FTj9CupZ6eerEsNE0lQ5QgpiAwQh0Hl8SU 09eKVkLTtHIAUc+vDWAzBhV1cttrDizqRs4gdEsxzPvemqFZINwdOVfMHIXowjRI7c CdGy5U/aB9Rf8fbZq6/KGfQBVYBBtPv6sMtxAF94sultA2uawd9fVBVXgIhCU6HDgF nP+uwfHmDIKi1LwnvKQbkNMrqBawOqba6gZRhp7yTVuEt43sYPlnlShYB+3Yo5bXcI 9fWR6SpI+s2qPmul15ygnVYJCjlapQH/QBkJ1pFJXTj44V6QUmU8Y1rSkO+ZWZsGev WmGjqfz6M2VYQ== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1pI710-002kME-DI; Wed, 18 Jan 2023 11:54:02 +0000 Date: Wed, 18 Jan 2023 11:54:02 +0000 Message-ID: <863588njmt.wl-maz@kernel.org> From: Marc Zyngier To: Shanker Donthineni Cc: James Morse , Catalin Marinas , Will Deacon , , , , , Vikram Sethi , Zenghui Yu , Oliver Upton , Suzuki K Poulose Subject: Re: [PATCH] KVM: arm64: vgic: Fix soft lockup during VM teardown In-Reply-To: <20230118022348.4137094-1-sdonthineni@nvidia.com> References: <20230118022348.4137094-1-sdonthineni@nvidia.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) X-TUID: 1Q6WKep95Q7I MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: sdonthineni@nvidia.com, james.morse@arm.com, catalin.marinas@arm.com, will@kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org, vsethi@nvidia.com, yuzenghui@huawei.com, oliver.upton@linux.dev, suzuki.poulose@arm.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Shanker, Please Cc all the KVM/arm64 reviewers when sending KVM/arm64 patches. On Wed, 18 Jan 2023 02:23:48 +0000, Shanker Donthineni wrote: > > Getting intermittent CPU soft lockups during the virtual machines > teardown on a system with GICv4 features enabled. The function > __synchronize_hardirq() has been waiting for IRQD_IRQ_INPROGRESS > to be cleared forever as per the current implementation. > > CPU stuck here for a long time leads to soft lockup: > while (irqd_irq_inprogress(&desc->irq_data)) > cpu_relax(); Is it a soft-lockup from which the system recovers? or a livelock which leaves the system dead? What kernel version is that? Please provide all the relevant context that could help analysing the issue. > > Call trace from the lockup CPU: > [ 87.238866] watchdog: BUG: soft lockup - CPU#37 stuck for 23s! > [ 87.250025] CPU: 37 PID: 1031 Comm: qemu-system-aarch64 > [ 87.358397] Call trace: > [ 87.360891] __synchronize_hardirq+0x48/0x140 > [ 87.365343] free_irq+0x138/0x424 > [ 87.368727] vgic_v4_teardown+0xa4/0xe0 > [ 87.372649] __kvm_vgic_destroy+0x18c/0x194 > [ 87.376922] kvm_vgic_destroy+0x28/0x3c > [ 87.380839] kvm_arch_destroy_vm+0x24/0x44 > [ 87.385024] kvm_destroy_vm+0x158/0x2c4 > [ 87.388943] kvm_vm_release+0x6c/0x98 > [ 87.392681] __fput+0x70/0x220 > [ 87.395800] ____fput+0x10/0x20 > [ 87.399005] task_work_run+0xb4/0x23c > [ 87.402746] do_exit+0x2bc/0x8a4 > [ 87.406042] do_group_exit+0x34/0xb0 > [ 87.409693] get_signal+0x878/0x8a0 > [ 87.413254] do_notify_resume+0x138/0x1530 > [ 87.417440] el0_svc+0xdc/0xf0 > [ 87.420559] el0t_64_sync_handler+0xf0/0x11c > [ 87.424919] el0t_64_sync+0x18c/0x190 > > The state of the IRQD_IRQ_INPROGRESS information is lost inside > irq_domain_activate_irq() which happens before calling free_irq(). > Instrumented the code and confirmed, the IRQD state is changed from > 0x10401400 to 0x10441600 instead of 0x10401600 causing problem. > > Call trace from irqd_set_activated(): > [ 78.983544] irqd_set_activated: lost IRQD_IRQ_INPROGRESS > old=0x10401400, new=0x10441600 > [ 78.992093] CPU: 19 PID: 1511 Comm: qemu-system-aarch64 > [ 79.008461] Call trace: > [ 79.010956] dump_backtrace.part.0+0xc8/0xe0 > [ 79.015328] show_stack+0x18/0x54 > [ 79.018713] dump_stack_lvl+0x64/0x7c > [ 79.022459] dump_stack+0x18/0x30 > [ 79.025842] irq_domain_activate_irq+0x88/0x94 > [ 79.030385] vgic_v3_save_pending_tables+0x260/0x29c > [ 79.035463] vgic_set_common_attr+0xac/0x23c > [ 79.039826] vgic_v3_set_attr+0x48/0x60 > [ 79.043742] kvm_device_ioctl+0x120/0x19c > [ 79.047840] __arm64_sys_ioctl+0x42c/0xe00 > [ 79.052027] invoke_syscall.constprop.0+0x50/0xe0 > [ 79.056835] do_el0_svc+0x58/0x180 > [ 79.060308] el0_svc+0x38/0xf0 > [ 79.063425] el0t_64_sync_handler+0xf0/0x11c > [ 79.067785] el0t_64_sync+0x18c/0x190 Are these two traces an indication of concurrent events? Or are they far apart? > > irqreturn_t handle_irq_event(struct irq_desc *desc) > { > irqd_set(&desc->irq_data, IRQD_IRQ_INPROGRESS); > raw_spin_unlock(&desc->lock); > > ret = handle_irq_event_percpu(desc); > > raw_spin_lock(&desc->lock); > irqd_clear(&desc->irq_data, IRQD_IRQ_INPROGRESS); > } How is that relevant to this trace? Do you see this function running concurrently with the teardown? If it matters here, it must be a VPE doorbell, right? But you claim that this is on a GICv4 platform, while this would only affect GICv4.1... Or are you using GICv4.1? > > In this particular failed case and based on traces, the two functions > irqd_set_activated() and handle_irq_event() are concurrently modifying > IRQD state without both holding desc->lock. The irqd_set_activated() > execution path is reading memory 'state_use_accessors' in between set > & clear of IRQD_IRQ_INPROGRESS state change and writing the modified > data after executing 'irqd_clear(desc->irq_data, IRQD_IRQ_INPROGRESS)'. > > To fix the lockup issue, hold desc->lock when calling functions > irq_domain_activate_irq() and irq_domain_deactivate_irq). For that particular issue, the main problem is that we are abusing the interrupt startup/teardown code. The locking is only a consequence of this. > > Signed-off-by: Shanker Donthineni > --- > arch/arm64/kvm/vgic/vgic-v3.c | 6 ++++++ > arch/arm64/kvm/vgic/vgic-v4.c | 4 ++++ > 2 files changed, 10 insertions(+) > > diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c > index 2074521d4a8c..e6aa909fcbe2 100644 > --- a/arch/arm64/kvm/vgic/vgic-v3.c > +++ b/arch/arm64/kvm/vgic/vgic-v3.c > @@ -353,22 +353,28 @@ int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq) > static void unmap_all_vpes(struct vgic_dist *dist) > { > struct irq_desc *desc; > + unsigned long flags; > int i; > > for (i = 0; i < dist->its_vm.nr_vpes; i++) { > desc = irq_to_desc(dist->its_vm.vpes[i]->irq); > + raw_spin_lock_irqsave(&desc->lock, flags); > irq_domain_deactivate_irq(irq_desc_get_irq_data(desc)); > + raw_spin_unlock_irqrestore(&desc->lock, flags); I guess this is the guilty one, based on your analysis, and assuming this is a v4.1 issue, not v4. > } > } > > static void map_all_vpes(struct vgic_dist *dist) > { > struct irq_desc *desc; > + unsigned long flags; > int i; > > for (i = 0; i < dist->its_vm.nr_vpes; i++) { > desc = irq_to_desc(dist->its_vm.vpes[i]->irq); > + raw_spin_lock_irqsave(&desc->lock, flags); > irq_domain_activate_irq(irq_desc_get_irq_data(desc), false); > + raw_spin_unlock_irqrestore(&desc->lock, flags); > } > } > > diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c > index ad06ba6c9b00..a01b8313e82c 100644 > --- a/arch/arm64/kvm/vgic/vgic-v4.c > +++ b/arch/arm64/kvm/vgic/vgic-v4.c > @@ -139,8 +139,10 @@ static void vgic_v4_enable_vsgis(struct kvm_vcpu *vcpu) > /* Transfer the full irq state to the vPE */ > vgic_v4_sync_sgi_config(vpe, irq); > desc = irq_to_desc(irq->host_irq); > + raw_spin_lock(&desc->lock); > ret = irq_domain_activate_irq(irq_desc_get_irq_data(desc), > false); > + raw_spin_unlock(&desc->lock); This one looks wrong. The interrupt never fires on the host (that's the whole point of this stuff). There isn't even a handler attached to it. How can that result in a problem? My take on the whole thing is that we should stop playing games with the underlying interrupt infrastructure. How about the following hack, which is only compile-tested. Let me know if that helps. M. diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c index 2074521d4a8c..2624963cb95b 100644 --- a/arch/arm64/kvm/vgic/vgic-v3.c +++ b/arch/arm64/kvm/vgic/vgic-v3.c @@ -350,26 +350,23 @@ int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq) * The deactivation of the doorbell interrupt will trigger the * unmapping of the associated vPE. */ -static void unmap_all_vpes(struct vgic_dist *dist) +static void unmap_all_vpes(struct kvm *kvm) { - struct irq_desc *desc; + struct vgic_dist *dist = &kvm->arch.vgic; int i; - for (i = 0; i < dist->its_vm.nr_vpes; i++) { - desc = irq_to_desc(dist->its_vm.vpes[i]->irq); - irq_domain_deactivate_irq(irq_desc_get_irq_data(desc)); - } + for (i = 0; i < dist->its_vm.nr_vpes; i++) + free_irq(dist->its_vm.vpes[i]->irq, kvm_get_vcpu(kvm, i)); } -static void map_all_vpes(struct vgic_dist *dist) +static void map_all_vpes(struct kvm *kvm) { - struct irq_desc *desc; + struct vgic_dist *dist = &kvm->arch.vgic; int i; - for (i = 0; i < dist->its_vm.nr_vpes; i++) { - desc = irq_to_desc(dist->its_vm.vpes[i]->irq); - irq_domain_activate_irq(irq_desc_get_irq_data(desc), false); - } + for (i = 0; i < dist->its_vm.nr_vpes; i++) + WARN_ON(vgic_v4_request_vpe_irq(kvm_get_vcpu(kvm, i), + dist->its_vm.vpes[i]->irq)); } /** @@ -394,7 +391,7 @@ int vgic_v3_save_pending_tables(struct kvm *kvm) * and enabling of the doorbells have already been done. */ if (kvm_vgic_global_state.has_gicv4_1) { - unmap_all_vpes(dist); + unmap_all_vpes(kvm); vlpi_avail = true; } @@ -444,7 +441,7 @@ int vgic_v3_save_pending_tables(struct kvm *kvm) out: if (vlpi_avail) - map_all_vpes(dist); + map_all_vpes(kvm); return ret; } diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c index ad06ba6c9b00..a413718be92b 100644 --- a/arch/arm64/kvm/vgic/vgic-v4.c +++ b/arch/arm64/kvm/vgic/vgic-v4.c @@ -222,6 +222,11 @@ void vgic_v4_get_vlpi_state(struct vgic_irq *irq, bool *val) *val = !!(*ptr & mask); } +int vgic_v4_request_vpe_irq(struct kvm_vcpu *vcpu, int irq) +{ + return request_irq(irq, vgic_v4_doorbell_handler, 0, "vcpu", vcpu); +} + /** * vgic_v4_init - Initialize the GICv4 data structures * @kvm: Pointer to the VM being initialized @@ -283,8 +288,7 @@ int vgic_v4_init(struct kvm *kvm) irq_flags &= ~IRQ_NOAUTOEN; irq_set_status_flags(irq, irq_flags); - ret = request_irq(irq, vgic_v4_doorbell_handler, - 0, "vcpu", vcpu); + ret = vgic_v4_request_vpe_irq(vcpu, irq); if (ret) { kvm_err("failed to allocate vcpu IRQ%d\n", irq); /* diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h index 0c8da72953f0..23e280fa0a16 100644 --- a/arch/arm64/kvm/vgic/vgic.h +++ b/arch/arm64/kvm/vgic/vgic.h @@ -331,5 +331,6 @@ int vgic_v4_init(struct kvm *kvm); void vgic_v4_teardown(struct kvm *kvm); void vgic_v4_configure_vsgis(struct kvm *kvm); void vgic_v4_get_vlpi_state(struct vgic_irq *irq, bool *val); +int vgic_v4_request_vpe_irq(struct kvm_vcpu *vcpu, int irq); #endif -- Without deviation from the norm, progress is not possible.