Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp435932pxj; Wed, 2 Jun 2021 02:57:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxNv6mtd4EIvGLsn18LdTZ3SI6Mh76hVy8D3E3xYImng3zjanEw/atqSKEGAwAaEvSxCPK+ X-Received: by 2002:a17:906:6549:: with SMTP id u9mr32533261ejn.506.1622627843980; Wed, 02 Jun 2021 02:57:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622627843; cv=none; d=google.com; s=arc-20160816; b=RBnAQwj/kj2fS/q7voWq4bxelCKt5vZn93uCT9CNqTg2kOG0TCZnQJ0HI0tJlGJUPB f2WT45wUeYNL1iwQrMi2t0li5ZPV5SgFl/wwgUMuu3ggkpmKjyYZoiaGM7sENImKznKe XjDiGEGDz8tjwL4Vggw6+jhu4BTZO4IkUQz4Wicu5fBpnbikgb4soKUBtytjrb+hVcNd V9+pGjhjeD/c6dVpotPrFQFtdL8GFUtQm01539oMWYizUjPVVmRlDB1Ddg+356cW9RSi +o19S4edZ9CPMDW4pvpz8DgwzikO0c7zDJtIIB3W55zEdw3Eu3CcgOi0c2HNRmu0O780 AhjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:subject:cc:to:from:message-id :date; bh=QFm9NgFJeprj4Fwda8ualq2WSZeahfp6cAuyUwQ5UAY=; b=y4IL2FZ8/MrHgoOjfFmlsqFgOWtw1/G/DgN4G6WtwyiK89j8Lz2SH4gXKqNUSOg1Pe 2Nq0NZlVoPPRY9hb74wJ2jrVgPUzqKd3nXMjhYivZFpXnBbV2uwbiEE6u8eU0Oo2TmAI ys45YVvy2PimW/96n4HPvIN9VQHdyckxF/SS7vOeWjSR2bA6ANShoT7wIIBa0VSRhLzP pWQJ7kZondVniP29g4Jlb+Xv4L9Ig3lrgAVdgK4J5P4A0Rr8hz+6/wsrXR7EFftHVuuY 9L/Wh2dabYlddH2Kxwk4vxDErqBUtc2eXCnUJldvz/cihw+ysMQ5+jv5KT2cI0W8Bc3j 3xmw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e20si6461769ejd.727.2021.06.02.02.57.00; Wed, 02 Jun 2021 02:57:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230198AbhFBJM4 convert rfc822-to-8bit (ORCPT + 99 others); Wed, 2 Jun 2021 05:12:56 -0400 Received: from mail.kernel.org ([198.145.29.99]:46526 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229745AbhFBJMz (ORCPT ); Wed, 2 Jun 2021 05:12:55 -0400 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id AD0FB608FE; Wed, 2 Jun 2021 09:11:12 +0000 (UTC) Received: from 78.163-31-62.static.virginmediabusiness.co.uk ([62.31.163.78] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1loMu6-004zLH-Ha; Wed, 02 Jun 2021 10:11:10 +0100 Date: Wed, 02 Jun 2021 10:11:09 +0100 Message-ID: <87a6o81viq.wl-maz@kernel.org> From: Marc Zyngier To: Alex Williamson Cc: Vikram Sethi , Mark Kettenis , Shanker Donthineni , "will@kernel.org" , "catalin.marinas@arm.com" , "christoffer.dall@arm.com" , "linux-arm-kernel@lists.infradead.org" , "kvmarm@lists.cs.columbia.edu" , "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , Jason Sequeira Subject: Re: [RFC 1/2] vfio/pci: keep the prefetchable attribute of a BAR region in VMA In-Reply-To: <20210504120348.2eec075b@redhat.com> References: <20210429162906.32742-1-sdonthineni@nvidia.com> <20210429162906.32742-2-sdonthineni@nvidia.com> <20210429122840.4f98f78e@redhat.com> <470360a7-0242-9ae5-816f-13608f957bf6@nvidia.com> <20210429134659.321a5c3c@redhat.com> <87czucngdc.wl-maz@kernel.org> <1edb2c4e-23f0-5730-245b-fc6d289951e1@nvidia.com> <878s4zokll.wl-maz@kernel.org> <87eeeqvm1d.wl-maz@kernel.org> <87bl9sunnw.wl-maz@kernel.org> <20210503084432.75e0126d@x1.home.shazbot.org> <20210504120348.2eec075b@redhat.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-SA-Exim-Connect-IP: 62.31.163.78 X-SA-Exim-Rcpt-To: alex.williamson@redhat.com, vsethi@nvidia.com, mark.kettenis@xs4all.nl, sdonthineni@nvidia.com, will@kernel.org, catalin.marinas@arm.com, christoffer.dall@arm.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, jsequeira@nvidia.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 04 May 2021 19:03:48 +0100, Alex Williamson wrote: > > On Mon, 3 May 2021 22:03:59 +0000 > Vikram Sethi wrote: > > > Hi Alex, > > > From: Alex Williamson > > > On Mon, 3 May 2021 13:59:43 +0000 > > > Vikram Sethi wrote: > > > > > From: Mark Kettenis > > > > > > From: Marc Zyngier > > > > > > > > snip > > > > > > If, by enumerating the properties of Prefetchable, you can show > > > > > > that they are a strict superset of Normal_NC, I'm on board. I > > > > > > haven't seen such an enumeration so far. > > > > > > > > > > snip > > > > > > Right, so we have made a small step in the direction of mapping > > > > > > "prefetchable" onto "Normal_NC", thanks for that. What about all > > > > > > the other properties (unaligned accesses, ordering, gathering)? > > > > > > > > > Regarding gathering/write combining, that is also allowed to > > > > prefetchable per PCI spec > > > > > > As others have stated, gather/write combining itself is not well defined. > > > > > > > From 1.3.2.2 of 5/0 base spec: > > > > A PCI Express Endpoint requesting memory resources through a BAR must > > > > set the BAR's Prefetchable bit unless the range contains locations > > > > with read side-effects or locations in which the Function does not tolerate > > > write merging. > > > > > > "write merging" This is a very specific thing, per PCI 3.0, 3.2.6: > > > > > > Byte Merging – occurs when a sequence of individual memory writes > > > (bytes or words) are merged into a single DWORD. > > > > > > The semantics suggest quadword support in addition to dword, but > > > don't require it. Writes to bytes within a dword can be merged, > > > but duplicate writes cannot. > > > > > > It seems like an extremely liberal application to suggest that > > > this one write semantic encompasses full write combining > > > semantics, which itself is not clearly defined. > > > > > Talking to our PCIe SIG representative, PCIe switches are not > > allowed do any of the byte Merging/combining etc as defined in the > > PCI spec, and per a rather poorly worded Implementation note in > > the spec says that no known PCIe Host Briddges/Root ports do it > > either. So for PCIe we don't think believe there is any byte > > merging that happens in the PCIe fabric so it's really a matter of > > what happens in the CPU core and interconnect before it gets to > > the PCIe hierarchy. > > Yes, but merged writes, no matter where they happen, are still the only > type of write combining that a prefetchable BAR on an endpoint is > required to support. > > > Stepping back from this patchset, do you agree that it is > > desirable to support Write combining as understood by ioremap_wc > > to work in all ISA guests including ARMv8? > > Yes, a userspace vfio driver should be able to take advantage of the > hardware capabilities. I think where we disagree is whether it's > universally safe to assume write combining based on the PCI > prefetchable capability of a BAR. If that's something that can be > assumed universally for ARMv8 based on the architecture specification > compatibility with the PCI definition of a prefetchable BAR, then I > would expect a helper somewhere in arch code that returns the right > page protection flags, so that arch maintainers don't need to scour > device drivers for architecture hacks. Otherwise, it needs to be > exposed through the vfio uAPI to allow the userspace device driver > itself to select these semantics. > > > You note that x86 virtualization doesn't have this issue, but > > KVM-ARM does because KVM maps all device BARs as Device Memory > > type nGnRE which doesn't allow ioremap_wc from within the guest to > > get the actual semantics desired. > > > > Marc and others have suggested that userspace should provide the > > hints. But the question is how would qemu vfio do this either? We > > would be stuck in the same arguments as here, as to what is the > > correct way to determine the desired attributes for a given BAR > > such that eventually when a driver in the guest asks for > > ioremap_wc it actually has a chance of working in the guest, in > > all ISAs. Do you have any suggestions on how to make progress > > here? > > We do need some way for userspace drivers to also make use of WC > semantics, there were some discussions in the past, I think others have > referenced them as well, but nothing has been proposed for a vfio API. > > If we had that API, QEMU deciding to universally enable WC for all > vfio prefetchable BARs seems only marginally better than this approach. > Ultimately the mapping should be based on the guest driver semantics, > and if you don't have any visibility to that on KVM/arm like we have on > KVM/x86, then it seems like there's nothing to trigger a vfio API here > anyway. There isn't much KVM/arm64 can do here unless it is being told what to do. We don't have visibility on the guest's page tables in a reliable way, and trusting them is not something I want to entertain anyway. > If that's the case, I'd probably go back to letting the arch/arm64 folks > declare that WC is compatible with the definition of PCI prefetchable > and export some sort of pgprot_pci_prefetchable() helper where the > default would be to #define it as pgproc_noncached() #ifndef by the > arch. > > > A device specific list of which BARs are OK to allow ioremap_wc > > for seems terrible and I'm not sure if a commandline qemu option > > is any better. Is the user of device assignment/sysadmin supposed > > to know which BAR of which device is OK to allow ioremap_wc for? > > No, a device specific userspace driver should know such device > semantics, but QEMU is not such a driver. Burdening the hypervisor > user/admin is not a good solution either. I'd lean on KVM/arm64 folks > to know how the guest driver semantics can be exposed to the > hypervisor. Thanks, I don't see a good way for that, unless we make it a per-guest buy-in where all PCI prefetchable mappings get the same treatment. I'm prepared to bet that this will break when two devices will have different requirements. It would also require userspace to buy into this scheme though, which is crap. Exposing the guest's preference on a per-device basis seems difficult (KVM knows nothing about the PCI devices) and would require some PV interface that will quickly become unmaintainable. M. -- Without deviation from the norm, progress is not possible.