Received: by 2002:a05:6358:a55:b0:ec:fcf4:3ecf with SMTP id 21csp2545342rwb; Fri, 20 Jan 2023 04:29:49 -0800 (PST) X-Google-Smtp-Source: AMrXdXt82rWi86RZ7JwaGNp/W04mt467RgSL+nwyKhVWT2kqgyGuk8L3xz8v1766s8++2VUO+JZQ X-Received: by 2002:aa7:cc17:0:b0:49e:2109:6f57 with SMTP id q23-20020aa7cc17000000b0049e21096f57mr13843195edt.19.1674217789617; Fri, 20 Jan 2023 04:29:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674217789; cv=none; d=google.com; s=arc-20160816; b=AVNRjJmDWXpnXipih+zzwTOzC9WaGurumY7cieCItixtjHDK7ozce+L+KG0AHCxGeb NvnNDRTI/C9YV4VOA6EpbNeCZ7ccFfpwmMdcX4zUBSlQ15WMwah/LUcyDYFVPezZ+bAZ MKGAAeX7gwFEegfPuKpzCcBMsMFa9TgIBzBkd6eWd3Rm85wSaaM1ctXmwGp+YjNvo5oc X7LzwgpGM8pOIlHZtEZJq6QJZZujBczP69Udk0eY8PNEBAbzFyEkn4OOdiMTMBs8aQ8M LKEpXYf0kVG15Q74glhYeP+e06DebxIf5Amgeu4J6Lbv8IK/iQ+uyWPEVEPe1vgLqBMb QzBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:references:in-reply-to :subject:cc:to:from:message-id:date:dkim-signature; bh=JUNNtIDt+qTrvEQR7Eg01YZ9+bvLeZSGGuMFUn0sSXM=; b=wCH0af+q3mhOhpu1UCgyMBZv8pfA3Mn50Du2JafJ0YNnnVNXmr2Vm52qoXO2qtCEqB eEXisz5N60LIDDE/8kfrgx4d8xvevrsXsaYjY3KeP85055OtTPulZAno9ttdM6urYAAR jRMahpVyU7O76TsFnC3vxyzDu2bxs6bM2c8b/s5YeMPgr11xYPmih0w8NutFxjFCW7pL Qs2f6gSf/zTds1oBenUQ408Ju/5kg1jN11+APP1YsbsG9crzLYiLiKvaHjzgvwwdp+MG cZmKKaiGWREx93vpv7EtUbTOkjD4/FFxfESieBcGCBf9nRPAcAbhEz+IiPavX981up1I 2PGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=IUwPtSJ1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y15-20020a056402358f00b0049df78b34efsi22061524edc.57.2023.01.20.04.29.37; Fri, 20 Jan 2023 04:29:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=IUwPtSJ1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230022AbjATMAK (ORCPT + 49 others); Fri, 20 Jan 2023 07:00:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39346 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229729AbjATMAI (ORCPT ); Fri, 20 Jan 2023 07:00:08 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6ED99CDE0 for ; Fri, 20 Jan 2023 04:00:07 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 1AE0FB824B5 for ; Fri, 20 Jan 2023 12:00:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B772CC433EF; Fri, 20 Jan 2023 12:00:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1674216004; bh=PVDi2htqWT+QKa3YIMuyhLAeAZhaDhc+0T20L5gW75s=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=IUwPtSJ159SiBJpQI0I6ceWr5ZZisMdp6Uk8n5a5me+RdFPQPICFX1NKLVzQ5F5hm oek8Sp7V7LGAH9WkcjZyg6XpO0OZjxJz7s23snOas0VjSWru3j1OLo2joZVtKqC/eL 18eYUJ23rktN2LjeiF71JHA0ncozshrMy82Nr94rdxuqRNK6MqOsyQkC+gUuPo5RbR HQoMUJiOABHQtIaqXisG/Gv/T0axk3zoW6lMtMWIzCTO+pzNdHWAaRfeXmxuaH40fr ZwJse2MY/7prGpZHHrtol9NIOVj98HNWxoZV2veJX2jOrAJP0KSWHt284oACXP5+sX TLbajAuVSfJEA== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1pIq3t-003OpT-Vi; Fri, 20 Jan 2023 12:00:02 +0000 Date: Fri, 20 Jan 2023 12:00:01 +0000 Message-ID: <86r0vpmn5q.wl-maz@kernel.org> From: Marc Zyngier To: Shanker Donthineni Cc: James Morse , Catalin Marinas , Will Deacon , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, Vikram Sethi , Zenghui Yu , Oliver Upton , Suzuki K Poulose , Ard Biesheuvel Subject: Re: [PATCH] KVM: arm64: vgic: Fix soft lockup during VM teardown In-Reply-To: References: <20230118022348.4137094-1-sdonthineni@nvidia.com> <863588njmt.wl-maz@kernel.org> <28061ceb-a7ce-0aca-a97d-8227dcfe6800@nvidia.com> <87bkmvdmna.wl-maz@kernel.org> <2e0c971a-0199-ff0d-c13c-d007d9f03122@nvidia.com> <86wn5imxm9.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: sdonthineni@nvidia.com, james.morse@arm.com, catalin.marinas@arm.com, will@kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, vsethi@nvidia.com, yuzenghui@huawei.com, oliver.upton@linux.dev, suzuki.poulose@arm.com, ardb@kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 20 Jan 2023 05:02:15 +0000, Shanker Donthineni wrote: > > [1 ] > Hi Marc, > > On 1/19/23 08:01, Marc Zyngier wrote: > >> On 1/19/23 01:11, Marc Zyngier wrote: > >>> So you can see the VM being torn down while the vgic save sequence is > >>> still in progress? > >>> > >>> If you can actually see that, then this is a much bigger bug than the > >>> simple race you are describing, and we're missing a reference on the > >>> kvm structure. This would be a*MAJOR* bug. > >>> > >> How do we know vGIC save sequence is in progress while VM is being > >> teardown? I'm launching/terminating ~32 VMs in a loop to reproduce > >> the issue. > > Errr...*you* know when you are issuing the save ioctl, right? You > > also know when you are terminating the VM (closing its fd or killing > > the VMM). > > > > Added debug statements to trace the code patch, and tagged each log message > with 'struct kvm *'. Attached the complete kernel log messages including > debug messages. > > All 32 VMs launched, time 258s to 291s > [ 258.519837] kvm_create_vm(1236) called kvm=ffff8000303e0000 --> 1st VM > ... > [ 291.801179] kvm_create_vm(1236) called kvm=ffff800057a60000 --> 32nd VM > > Test script inside VM issues poweroff command after sleeping 200sec. > > Working case kvm=ffff8000303e0000: > > $ cat gicv4-debug.txt | grep ffff8000303e0000 > [ 258.519837] kvm_create_vm(1236) called kvm=ffff8000303e0000 > [ 258.667101] vgic_v4_init(267) called kvm=ffff8000303e0000 doorbell=140(64) > [ 517.942167] vgic_set_common_attr(263) called kvm=ffff8000303e0000 > [ 517.948415] vgic_v3_save_pending_tables(397) called kvm=ffff8000303e0000 > [ 517.955602] vgic_v3_save_pending_tables(448) called kvm=ffff8000303e0000 > [ 518.099696] kvm_vm_release(1374) called kvm=ffff8000303e0000 > [ 518.126833] vgic_v4_teardown(323) started kvm=ffff8000303e0000 doorbell=140(64) > [ 518.134677] vgic_v4_teardown(333) finished kvm=ffff8000303e0000 doorbell=140(64) > > Not working case kvm=ffff80001e0a0000: > > $ cat gicv4-debug.txt | grep ffff80001e0a0000 > [ 277.684981] kvm_create_vm(1236) called kvm=ffff80001e0a0000 > [ 278.158511] vgic_v4_init(267) called kvm=ffff80001e0a0000 doorbell=20812(64) > [ 545.079117] vgic_set_common_attr(263) called kvm=ffff80001e0a0000 > [ 545.085358] vgic_v3_save_pending_tables(397) called kvm=ffff80001e0a0000 > [ 545.092580] vgic_v3_save_pending_tables(448) called kvm=ffff80001e0a0000 > [ 545.099562] irq: irqd_set_activated: CPU49 IRQ20821 lost IRQD_IRQ_INPROGRESS old=0x10401400 new=0x10401600, expected=0x10441600 kvm=ffff80001e0a0000^M > [ 545.113177] irq: irqd_set_activated: IRQD_IRQ_INPROGRESS set time [545.099561]^M > [ 545.121454] irq: irqd_set_activated: IRQD_IRQ_INPROGRESS clr time [545.099562]^M > [ 545.129755] irq: irqd_set_activated: CPU49 IRQ20826 lost IRQD_IRQ_INPROGRESS old=0x10441400 new=0x10441600, expected=0x10401600 kvm=ffff80001e0a0000^M > [ 545.143365] irq: irqd_set_activated: IRQD_IRQ_INPROGRESS set time [545.129754]^M > [ 545.151654] irq: irqd_set_activated: IRQD_IRQ_INPROGRESS clr time [545.129755]^M > [ 545.163250] kvm_vm_release(1374) called kvm=ffff80001e0a0000 > [ 545.169204] vgic_v4_teardown(323) started kvm=ffff80001e0a0000 doorbell=20812(64) > > IRQD_IRQ_INPROGRESS is corrupted before calling kvm_vm_release(), You keep missing my point. Yes, we have established the interrupt race and have a way to fix it, let's move on... What I am asking agin is: is there any overlap between any vgic ioctl and the teardown of the VM? Do you ever see kvm_vm_release() being called before kvm_device_release()? Because that's the overlap I've been talking all along. M. -- Without deviation from the norm, progress is not possible.