Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp3843329pxy; Tue, 4 May 2021 11:08:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz/bngs84/pPDox5MY/P7c22MyLx2uPGqareqDh0a9POWnhzgMTM0I/6ydSPqxx64Zbx2k2 X-Received: by 2002:aa7:d390:: with SMTP id x16mr23384611edq.172.1620151691500; Tue, 04 May 2021 11:08:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620151691; cv=none; d=google.com; s=arc-20160816; b=l8lqMIKFd5+OinemmQ/AvGTwoxt3e6T8nIzLMyZFCXIydsRg10jfKMWtsrYcWdVCg+ rIL0F4SCJ/X9FIQH6Q8gUumn/2dXX6M3xLLMfKGbzfX1hdSiLKOo9ButNMP/CigrgTtN nXRGASpaSyjLV8HvS3aU8YwWgIGDIEybjysICt/iH82YtsavpOaaswpbagg7je/aF/TY ucykAXnXaJe0ynND07lBg19FQ8UbmeFou1YlvnGUWReDvNgx9SPAzTcRetvlMU4Yut0W imAzobKSDyXse+DWfTwAoXwmmvtiXFIHbTinxLx691VRMiHcxSeuV4wpamPSjuBMqT/H XaSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=crIjxSKc9RicXz5uU0iWJNjyfJ75605vwRv4rA88bAc=; b=slhReX79L4P2zIMgomlHHna2N/iGzYu63Dahoo/1PjpurrVgBuEuBW7OP7Sbg2owO8 PfbZyf2CF6BHok2JfTqWDYhfl9vE6BdjROd95Q4rmMWkG0o2bhfiSv0+AqGSbzk2yLlO cr/wJ/Xtei7lvjO+pT2lX7uv3t6J470FgSuZSu5mxrlestTik8UxG8s2tuJLBgEis5Hg dQBzFTPNAS5LSJQp8Nippp6HyDm6vw53cWe+jljWbdOBI1v0g20a3WMFCsTTn2rQuz3G V0m5bBCYKXToqvmHqiPAREPjhelwdrlsV2w7bqQACZUhA5HSQvMT/EcgP2lVXsG8HfUp h6dQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gpHXQycL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i11si15450107edb.109.2021.05.04.11.07.43; Tue, 04 May 2021 11:08:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gpHXQycL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232049AbhEDSEu (ORCPT + 99 others); Tue, 4 May 2021 14:04:50 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:56068 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230285AbhEDSEu (ORCPT ); Tue, 4 May 2021 14:04:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1620151434; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=crIjxSKc9RicXz5uU0iWJNjyfJ75605vwRv4rA88bAc=; b=gpHXQycL4j1chUFogkUq3N3XYM51I0LG5ZdwWX48Dko7I3NabRp6/taOl6StPF0HX8x6C5 XSH7sSZm7qvFLMfppTaOkjF2+mlpSe6pyKJrVzhb5u8HomiNge0XTT1ZE5B7Dir4tc6MKo O8Gr3kxW2y6wLmNaY0/qiEoM7lQJuWY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-538-UeGCIxtyO5OSuVhwk3SZ8A-1; Tue, 04 May 2021 14:03:51 -0400 X-MC-Unique: UeGCIxtyO5OSuVhwk3SZ8A-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id BE4AA107ACED; Tue, 4 May 2021 18:03:49 +0000 (UTC) Received: from redhat.com (ovpn-113-225.phx2.redhat.com [10.3.113.225]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0F9921002388; Tue, 4 May 2021 18:03:49 +0000 (UTC) Date: Tue, 4 May 2021 12:03:48 -0600 From: Alex Williamson To: Vikram Sethi Cc: Mark Kettenis , Marc Zyngier , Shanker Donthineni , "will@kernel.org" , "catalin.marinas@arm.com" , "christoffer.dall@arm.com" , "linux-arm-kernel@lists.infradead.org" , "kvmarm@lists.cs.columbia.edu" , "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , Jason Sequeira Subject: Re: [RFC 1/2] vfio/pci: keep the prefetchable attribute of a BAR region in VMA Message-ID: <20210504120348.2eec075b@redhat.com> In-Reply-To: References: <20210429162906.32742-1-sdonthineni@nvidia.com> <20210429162906.32742-2-sdonthineni@nvidia.com> <20210429122840.4f98f78e@redhat.com> <470360a7-0242-9ae5-816f-13608f957bf6@nvidia.com> <20210429134659.321a5c3c@redhat.com> <87czucngdc.wl-maz@kernel.org> <1edb2c4e-23f0-5730-245b-fc6d289951e1@nvidia.com> <878s4zokll.wl-maz@kernel.org> <87eeeqvm1d.wl-maz@kernel.org> <87bl9sunnw.wl-maz@kernel.org> <20210503084432.75e0126d@x1.home.shazbot.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 3 May 2021 22:03:59 +0000 Vikram Sethi wrote: > Hi Alex, > > From: Alex Williamson > > On Mon, 3 May 2021 13:59:43 +0000 > > Vikram Sethi wrote: =20 > > > > From: Mark Kettenis =20 > > > > > From: Marc Zyngier =20 > > > > > > snip =20 > > > > > If, by enumerating the properties of Prefetchable, you can show > > > > > that they are a strict superset of Normal_NC, I'm on board. I > > > > > haven't seen such an enumeration so far. > > > > > =20 > > > snip =20 > > > > > Right, so we have made a small step in the direction of mapping > > > > > "prefetchable" onto "Normal_NC", thanks for that. What about all > > > > > the other properties (unaligned accesses, ordering, gathering)? = =20 > > > > =20 > > > Regarding gathering/write combining, that is also allowed to > > > prefetchable per PCI spec =20 > >=20 > > As others have stated, gather/write combining itself is not well define= d. > > =20 > > > From 1.3.2.2 of 5/0 base spec: > > > A PCI Express Endpoint requesting memory resources through a BAR must > > > set the BAR's Prefetchable bit unless the range contains locations > > > with read side-effects or locations in which the Function does not to= lerate =20 > > write merging. > >=20 > > "write merging" This is a very specific thing, per PCI 3.0, 3.2.6: > >=20 > > Byte Merging =E2=80=93 occurs when a sequence of individual memory wr= ites > > (bytes or words) are merged into a single DWORD. > >=20 > > The semantics suggest quadword support in addition to dword, but don't > > require it. Writes to bytes within a dword can be merged, but duplicate > > writes cannot. > >=20 > > It seems like an extremely liberal application to suggest that this one= write > > semantic encompasses full write combining semantics, which itself is not > > clearly defined. > > =20 > Talking to our PCIe SIG representative, PCIe switches are not allowed do = any of the byte > Merging/combining etc as defined in the PCI spec, and per a rather poorly > worded Implementation note in the spec says that no known PCIe Host Bridd= ges/Root=20 > ports do it either.=20 > So for PCIe we don't think believe there is any byte merging that happens= in the PCIe > fabric so it's really a matter of what happens in the CPU core and interc= onnect > before it gets to the PCIe hierarchy. Yes, but merged writes, no matter where they happen, are still the only type of write combining that a prefetchable BAR on an endpoint is required to support. > Stepping back from this patchset, do you agree that it is desirable to su= pport > Write combining as understood by ioremap_wc to work in all ISA guests inc= luding > ARMv8? Yes, a userspace vfio driver should be able to take advantage of the hardware capabilities. I think where we disagree is whether it's universally safe to assume write combining based on the PCI prefetchable capability of a BAR. If that's something that can be assumed universally for ARMv8 based on the architecture specification compatibility with the PCI definition of a prefetchable BAR, then I would expect a helper somewhere in arch code that returns the right page protection flags, so that arch maintainers don't need to scour device drivers for architecture hacks. Otherwise, it needs to be exposed through the vfio uAPI to allow the userspace device driver itself to select these semantics. > You note that x86 virtualization doesn't have this issue, but KVM-ARM does > because KVM maps all device BARs as Device Memory type nGnRE which=20 > doesn't allow ioremap_wc from within the guest to get the actual semantic= s desired. >=20 > Marc and others have suggested that userspace should provide the hints. B= ut the > question is how would qemu vfio do this either? We would be stuck in the = same > arguments as here, as to what is the correct way to determine the desired= attributes > for a given BAR such that eventually when a driver in the guest asks for > ioremap_wc it actually has a chance of working in the guest, in all ISAs.= =20 > Do you have any suggestions on how to make progress here? We do need some way for userspace drivers to also make use of WC semantics, there were some discussions in the past, I think others have referenced them as well, but nothing has been proposed for a vfio API. If we had that API, QEMU deciding to universally enable WC for all vfio prefetchable BARs seems only marginally better than this approach. Ultimately the mapping should be based on the guest driver semantics, and if you don't have any visibility to that on KVM/arm like we have on KVM/x86, then it seems like there's nothing to trigger a vfio API here anyway. If that's the case, I'd probably go back to letting the arch/arm64 folks declare that WC is compatible with the definition of PCI prefetchable and export some sort of pgprot_pci_prefetchable() helper where the default would be to #define it as pgproc_noncached() #ifndef by the arch. > A device specific list of which BARs are OK to allow ioremap_wc for seems= terrible > and I'm not sure if a commandline qemu option is any better. Is the user = of device=20 > assignment/sysadmin supposed to know which BAR of which device is OK to a= llow=20 > ioremap_wc for? No, a device specific userspace driver should know such device semantics, but QEMU is not such a driver. Burdening the hypervisor user/admin is not a good solution either. I'd lean on KVM/arm64 folks to know how the guest driver semantics can be exposed to the hypervisor. Thanks, Alex