Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp235652rdb; Tue, 5 Dec 2023 04:13:02 -0800 (PST) X-Google-Smtp-Source: AGHT+IG4HlJjGvja4Tf65iCEZOc0pA38u4kUEXygPhxRsOmVFtNcIu4TpzXBUHL5raA72qjjokgD X-Received: by 2002:a05:6a20:42a1:b0:18f:1d8b:fadc with SMTP id o33-20020a056a2042a100b0018f1d8bfadcmr6522861pzj.19.1701778382007; Tue, 05 Dec 2023 04:13:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701778381; cv=none; d=google.com; s=arc-20160816; b=TMlTlUznkiuXmyvztNLvFYswCMcqHgHQcs8CJ+sqVgba33OHNuDaxUq54hyEu6Cqpy 99LYiRc27o0HSdygsX//7raI3nLKQHRwp7p1JIp4FtUxdHBshSuSmWdOqV7OWRItuoP6 cu4t7gmhHkksWPWE6Qgb/NsrLCecz165K+qnROKK6pNPlPuTjvEp62X8GIrwy+s0VFkl EOQtfh6e1dUZKFt5uSgukwrbrCuB5MYV2xOu2FgtHfIjntK39G7ZNULI5kfT1Gd6CGQj uD4pwN5H4mxgKC7Ezcxe3LKhdFK0C+CVc2CDBLGhwOX7boNq1ughq4FdtIKiku6U8XfI e3xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=0sKYLlbZ1ZDJTM0e7q6LQTCyqpsFqDWXrRh9BIKT4sY=; fh=GTu5J0PIhmWO642z+jrfT0bfjeexC9Y0aqPzPT8kwc0=; b=fZVkwPQXn8tls2ghlmdUL/x7awwV2hUvPGCeByYEFChb3rMRIYeiTEnrRakJxx5+un /cVmp4ZYgMbGlKV4tA1MBjaR5j0kfATjtaDr1fOe7BLBdgz1Hj5L5tZL+A50MxNo0iKr 8sq6oaWoLve4SvGBgFVRpLGXCTuaHKmvQUxUo9SIiEWCCXpaQ3BNt4bcxo+XRpLBfVqV YW7QSiyOXN3WSa+jMRkx4P+ZRhogUTW5xJ69iHyGnIZn0INxyDbkApohtMHShyXVMd2R 3FoRgW8iwv33y1hIvobBIJ+Aa0O7vgNmInUqjUPUrhGfrHuzax5XUVQPLAzuHE8a1dam 400A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id dn12-20020a056a00498c00b006ce0c17d7d6si5595955pfb.90.2023.12.05.04.13.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 04:13:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 00F8780C5907; Tue, 5 Dec 2023 04:13:00 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235132AbjLEMMt (ORCPT + 99 others); Tue, 5 Dec 2023 07:12:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346969AbjLELkr (ORCPT ); Tue, 5 Dec 2023 06:40:47 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 211779A for ; Tue, 5 Dec 2023 03:40:54 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 96643C433C8; Tue, 5 Dec 2023 11:40:49 +0000 (UTC) Date: Tue, 5 Dec 2023 11:40:47 +0000 From: Catalin Marinas To: Marc Zyngier Cc: ankita@nvidia.com, Shameerali Kolothum Thodi , jgg@nvidia.com, oliver.upton@linux.dev, suzuki.poulose@arm.com, yuzenghui@huawei.com, will@kernel.org, ardb@kernel.org, akpm@linux-foundation.org, gshan@redhat.com, aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com, targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com, apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com, mochs@nvidia.com, kvmarm@lists.linux.dev, kvm@vger.kernel.org, lpieralisi@kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: Re: [PATCH v2 1/1] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory Message-ID: References: <20231205033015.10044-1-ankita@nvidia.com> <86fs0hatt3.wl-maz@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <86fs0hatt3.wl-maz@kernel.org> X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 05 Dec 2023 04:13:00 -0800 (PST) + Lorenzo, he really needs to be on this thread. On Tue, Dec 05, 2023 at 09:21:28AM +0000, Marc Zyngier wrote: > On Tue, 05 Dec 2023 03:30:15 +0000, > wrote: > > From: Ankit Agrawal > > > > Currently, KVM for ARM64 maps at stage 2 memory that is considered device > > (i.e. it is not RAM) with DEVICE_nGnRE memory attributes; this setting > > overrides (as per the ARM architecture [1]) any device MMIO mapping > > present at stage 1, resulting in a set-up whereby a guest operating > > system cannot determine device MMIO mapping memory attributes on its > > own but it is always overridden by the KVM stage 2 default. [...] > Despite the considerable increase in the commit message length, a > number of questions are left unanswered: > > - Shameer reported a regression on non-FWB systems, breaking device > assignment: > > https://lore.kernel.org/all/af13ed63dc9a4f26a6c958ebfa77d78a@huawei.com/ This referred to the first patch in the old series which was trying to make the Stage 2 cacheable based on the vma attributes. That patch has been dropped for now. It would be good if Shameer confirms this was the problem. > - Will had unanswered questions in another part of the thread: > > https://lore.kernel.org/all/20231013092954.GB13524@willie-the-truck/ > > Can someone please help concluding it? Is this about reclaiming the device? I think we concluded that we can't generalise this beyond PCIe, though not sure there was any formal statement to that thread. The other point Will had was around stating in the commit message why we only relax this to Normal NC. I haven't checked the commit message yet, it needs careful reading ;). > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index d14504821b79..1cb302457d3f 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -1071,7 +1071,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, > > struct kvm_mmu_memory_cache cache = { .gfp_zero = __GFP_ZERO }; > > struct kvm_s2_mmu *mmu = &kvm->arch.mmu; > > struct kvm_pgtable *pgt = mmu->pgt; > > - enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_DEVICE | > > + enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_NORMAL_NC | > > KVM_PGTABLE_PROT_R | > > (writable ? KVM_PGTABLE_PROT_W : 0); > > Doesn't this affect the GICv2 VCPU interface, which is effectively a > shared peripheral, now allowing a guest to affect another guest's > interrupt distribution? If that is the case, this needs to be fixed. > > In general, I don't think this should be a blanket statement, but be > limited to devices that we presume can deal with this (i.e. PCIe, and > not much else). Based on other on-list and off-line discussions, I came to the same conclusion that we can't relax this beyond PCIe. How we do this, it's up for debate. Some ideas: - something in the vfio-pci driver like a new VM_* flag that KVM can consume (hard sell for an arch-specific thing) - changing the vfio-pci driver to allow WC and NC mappings (x86 terminology) with additional knowledge about what it can safely map as NC. KVM would mimic the vma attributes (it doesn't eliminate the alias though, the guest can always go for Device while the VMM for Normal) - some per-SoC pfn ranges that are permitted as Normal NC. Can the arch code figure out where these PCIe BARs are or is it only the vfio-pci driver? I guess we can sort something out around the PCIe but I'm not familiar with this subsystem. The alternative is some "safe" ranges in firmware tables, assuming the firmware configures the PCIe BARs location -- Catalin