Received: by 2002:a05:7412:d024:b0:f9:90c9:de9f with SMTP id bd36csp57191rdb; Wed, 20 Dec 2023 06:16:55 -0800 (PST) X-Google-Smtp-Source: AGHT+IGikjWJr5RC/lO5L/seZ7c9YFcVVs6xRFR3dprn0hzgTfK56AyTjCo0tlYyjUInXljYU3Ka X-Received: by 2002:a05:6a00:8e:b0:6d4:e812:9d40 with SMTP id c14-20020a056a00008e00b006d4e8129d40mr7431184pfj.6.1703081814894; Wed, 20 Dec 2023 06:16:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703081814; cv=none; d=google.com; s=arc-20160816; b=R4LDyjjVbiw3NwxanOqRDmWltLuR4EE6xVjxKCXgiSXf9sD+q19+EWzHw/zhsC9unE Aux0JTQQVmclzGGv249uxwjs1q1y7qbDnDmQAOcYlbaLwZGLXchDelOfUeRoJgMbBtkT R+EcR5a3lHxA0UHtr5lvFJqHkoHI1f+UIUroT3Drh7LzjRCSHdA0bdJO9Pc6+dGXi6Gx FixeO8ZhZOOdxkeJqPJGw+PP8jXfBp+qtIQvir6WCYx2NFuLe/jyc3Kl79CckjGk1XOE 51QCu2C3U2UXFf1XERGLIT3BHiEeOUzDk61XKopS2JAlDswjUTFa2G9HUoBjr9cek2ym LHTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :user-agent:references:in-reply-to:subject:cc:to:from:message-id :date:dkim-signature; bh=jc+z5x2dZS+YZPeCty7XM9hz8mclqcZO6JtG4k5hj8Y=; fh=jM1IIVaorIi3bQQFHFmeG1ab0qQQClwVwTgnyhdOVfM=; b=KZ5ezVesBy9WB0w0jBIkSY5wgVl6gaPG62WEqNUYbiGUWWYx86RTOZ315a9E6FNxsh /Ct4fi01rcHj9ooh9cLCH1ODd4x0owPVjU/mUH1SAGJ1X7bppzmVeBD4XdV1cuN8Avy3 71tcMcjCsrNxgOvhM2pgb7vxS8xwWrvH4+Wg/rj1Gz231OR7eh9xPDgYtMoqwq7Cc+Yp EYB86YdJ+wNAojlQo3G7ht54aPol3KyzP1M18Y9sD4xC/bJF3pefsP3U1sthrWgVdmRf IWQQ82eVZ2WWF6dxMH37Rb/NvpqgakJ/pm03WSMRM8XCDegId9WsbgdlCqZ/xRbDKy04 a2Iw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=B5e6AjFX; spf=pass (google.com: domain of linux-kernel+bounces-7045-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-7045-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id fb30-20020a056a002d9e00b006d808ac1cc9si5070347pfb.127.2023.12.20.06.16.54 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Dec 2023 06:16:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-7045-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=B5e6AjFX; spf=pass (google.com: domain of linux-kernel+bounces-7045-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-7045-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 83A0028336D for ; Wed, 20 Dec 2023 14:16:54 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E807338FA1; Wed, 20 Dec 2023 14:16:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="B5e6AjFX" X-Original-To: linux-kernel@vger.kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F64238F82; Wed, 20 Dec 2023 14:16:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8FBD3C433C8; Wed, 20 Dec 2023 14:16:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1703081803; bh=ZI70D8QvLaWdXr+fAuK9V6XLldoFTfbDeU/tcqCx5Bs=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=B5e6AjFXAbuWH87OJt2SncWeCUQNmySRUqIgLbrY1DD0aF0i3otiqnjjhkE24YOOH KirDF+QUDSsbn1/+/GFIHNzn7bpJchKI/Kjv80eqSoDfN73rVGUASKiJSXJssiJMaA U/Wjw7zQyNon/FCoPeBP+FTRhhQjRpbn3mgCKt7ZziecilMqhPBKxEhbBDc6PNMPkR VSbgVwz+AYuDi7upHfRIkKIY+pg/uFvVx+yBwynsT6sTmlY5m5VmKXmRdWrgVOtLZ5 E3X8ParhMu06znMEq2MJhbZY4rsFAME5XHPfeFeguIQTrtXty5naajYcGAbA37405P xhtVYMji6aRwQ== Received: from [104.132.45.104] (helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1rFxNJ-005hcv-4T; Wed, 20 Dec 2023 14:16:41 +0000 Date: Wed, 20 Dec 2023 14:16:40 +0000 Message-ID: <87v88tt0vr.wl-maz@kernel.org> From: Marc Zyngier To: Cc: , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: Re: [PATCH v4 2/3] kvm: arm64: set io memory s2 pte as normalnc for vfio pci devices In-Reply-To: <20231218090719.22250-3-ankita@nvidia.com> References: <20231218090719.22250-1-ankita@nvidia.com> <20231218090719.22250-3-ankita@nvidia.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 104.132.45.104 X-SA-Exim-Rcpt-To: ankita@nvidia.com, jgg@nvidia.com, oliver.upton@linux.dev, suzuki.poulose@arm.com, yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org, alex.williamson@redhat.com, kevin.tian@intel.com, yi.l.liu@intel.com, ardb@kernel.org, akpm@linux-foundation.org, gshan@redhat.com, mochs@nvidia.com, lpieralisi@kernel.org, aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com, targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com, apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com, linux-mm@kvack.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false On Mon, 18 Dec 2023 09:07:18 +0000, wrote: > > From: Ankit Agrawal > > To provide VM with the ability to get device IO memory with NormalNC > property, map device MMIO in KVM for ARM64 at stage2 as NormalNC. > Having NormalNC S2 default puts guests in control (based on [1], > "Combining stage 1 and stage 2 memory type attributes") of device > MMIO regions memory mappings. The rules are summarized below: > ([(S1) - stage1], [(S2) - stage 2]) > > S1 | S2 | Result > NORMAL-WB | NORMAL-NC | NORMAL-NC > NORMAL-WT | NORMAL-NC | NORMAL-NC > NORMAL-NC | NORMAL-NC | NORMAL-NC > DEVICE | NORMAL-NC | DEVICE > > Generalizing this to non PCI devices may be problematic. E.g. GICv2 > vCPU interface, which is effectively a shared peripheral, can allow > a guest to affect another guest's interrupt distribution. The issue > may be solved by limiting the relaxation to mappings that have a user > VMA. Still There is insufficient information and uncertainity in the > behavior of non PCI driver. Hence caution is maintained and the change > is restricted to the VFIO-PCI devices. PCIe on the other hand is safe > because the PCI bridge does not generate errors, and thus do not cause > uncontained failures. > > A new flag VM_VFIO_ALLOW_WC to indicate KVM that the device is WC capable. > KVM use this flag to activate the code. > > This could be extended to other devices in the future once that > is deemed safe. > > [1] section D8.5.5 of DDI0487J_a_a-profile_architecture_reference_manual.pdf > > Signed-off-by: Ankit Agrawal > Suggested-by: Catalin Marinas > Acked-by: Jason Gunthorpe > Tested-by: Ankit Agrawal > --- > arch/arm64/kvm/mmu.c | 18 ++++++++++++++---- > include/linux/mm.h | 13 +++++++++++++ > 2 files changed, 27 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index d14504821b79..e1e6847a793b 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1381,7 +1381,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > int ret = 0; > bool write_fault, writable, force_pte = false; > bool exec_fault, mte_allowed; > - bool device = false; > + bool device = false, vfio_allow_wc = false; > unsigned long mmu_seq; > struct kvm *kvm = vcpu->kvm; > struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache; > @@ -1472,6 +1472,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > gfn = fault_ipa >> PAGE_SHIFT; > mte_allowed = kvm_vma_mte_allowed(vma); > > + vfio_allow_wc = (vma->vm_flags & VM_VFIO_ALLOW_WC); > + > /* Don't use the VMA after the unlock -- it may have vanished */ > vma = NULL; > > @@ -1557,10 +1559,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > if (exec_fault) > prot |= KVM_PGTABLE_PROT_X; > > - if (device) > - prot |= KVM_PGTABLE_PROT_DEVICE; > - else if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC)) > + if (device) { > + /* > + * To provide VM with the ability to get device IO memory > + * with NormalNC property, map device MMIO as NormalNC in S2. > + */ > + if (vfio_allow_wc) > + prot |= KVM_PGTABLE_PROT_NORMAL_NC; > + else > + prot |= KVM_PGTABLE_PROT_DEVICE; > + } else if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC)) { > prot |= KVM_PGTABLE_PROT_X; > + } > > /* > * Under the premise of getting a FSC_PERM fault, we just need to relax > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 2bea89dc0bdf..d2f0f969875c 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -391,6 +391,19 @@ extern unsigned int kobjsize(const void *objp); > # define VM_UFFD_MINOR VM_NONE > #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ > > +/* This flag is used to connect VFIO to arch specific KVM code. It > + * indicates that the memory under this VMA is safe for use with any > + * non-cachable memory type inside KVM. Some VFIO devices, on some > + * platforms, are thought to be unsafe and can cause machine crashes if > + * KVM does not lock down the memory type. > + */ Comment format. > +#ifdef CONFIG_64BIT > +#define VM_VFIO_ALLOW_WC_BIT 39 > +#define VM_VFIO_ALLOW_WC BIT(VM_VFIO_ALLOW_WC_BIT) > +#else > +#define VM_VFIO_ALLOW_WC VM_NONE > +#endif > + > /* Bits set in the VMA until the stack is in its final location */ > #define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ | VM_STACK_EARLY) The mm.h change should be standalone, separate from the KVM stuff. M. -- Without deviation from the norm, progress is not possible.