Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp3278472imm; Fri, 20 Jul 2018 13:28:35 -0700 (PDT) X-Google-Smtp-Source: AAOMgpeeWv6RDRhENiazw5emb1vMNVrTFh4NLfaAEOhO+RTAek3fsA2Ucm3UOVTAiKIke66jZ+C7 X-Received: by 2002:a62:8559:: with SMTP id u86-v6mr3624464pfd.32.1532118515549; Fri, 20 Jul 2018 13:28:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532118515; cv=none; d=google.com; s=arc-20160816; b=1KMSsjPuiLviDiKAwQx5XSLMwRF7pLqU8kix/F8vNv7x4cTJTbKQzAbK1JcX2bH7X0 UhIWRP7kmM2UND5MA75/5L+zX/CkUGOLWR4reQG76oMs4Q8ER4WC1ALe4rc+sgZ5OCpu cXwwhbrALrF3MJO+Jx64DNWuHq4DgnUPjon23HoDckYKm9Ir1h5XOs2L7Vl29MkRVwQM zUd3kgQBZAmLt0OVUxwOG3qeYavaurrkzvgYI9jp9UaJLhQIZXdzArhDveV/2jglqSwM jQZ1HouctFvsFhY+Sv9JI1uYnqCh/0KN6xWyk+N5U6OudMkUP9I1+VHCRbAWBPhs/wwO Tm8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=I26tVf3z5ed+yebZo5niUs2BcjGViSnPzJcjEibjY8E=; b=lsI491NfsdKcX+dqyWfQHl1hNv7Ik/GyFo9hGA8AhYCPagYc+kpak2vv1qBkNoUtQL 8mywHsB5Kq5prDOKN064p68MMrk0ZXSEvRuUmQvOjAzMWUAakewyS5GqBA1Ge9SPBYJ+ I7ehvDJHurukUyfcSCXNXRoOBNtPBepLk9RpW4iDpTjsSFLRq6wmOsm7VGsfvyRPe5WN suds/Gc54lr5j/p0m3eeLvx5qN0ZdV5G5qQ2py8nB2FklyYEkfYhzZxp72mlW6qjGd+1 l3cQAU4yBMXcBTp+kqnX3lhzWDw8WZt0DHXC71AaQoAnWgLPOFWwRaA7WutgPiPvxzZo hRvg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a190-v6si2519761pgc.241.2018.07.20.13.28.20; Fri, 20 Jul 2018 13:28:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728353AbeGTVR0 (ORCPT + 99 others); Fri, 20 Jul 2018 17:17:26 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42494 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728110AbeGTVRZ (ORCPT ); Fri, 20 Jul 2018 17:17:25 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 62703308A972; Fri, 20 Jul 2018 20:27:31 +0000 (UTC) Received: from t450s.home (ovpn-116-105.phx2.redhat.com [10.3.116.105]) by smtp.corp.redhat.com (Postfix) with ESMTP id DBDEA5C20D; Fri, 20 Jul 2018 20:27:29 +0000 (UTC) Date: Fri, 20 Jul 2018 14:27:28 -0600 From: Alex Williamson To: Srinath Mannam Cc: Ray Jui , Vikram Prakash , Scott Branden , kvm@vger.kernel.org, BCM Kernel Feedback , Linux Kernel Mailing List , aik@ozlabs.ru, David Gibson , benh@kernel.crashing.org Subject: Re: [RFC PATCH] vfio/pci: map prefetchble bars as writecombine Message-ID: <20180720142728.572dd2d7@t450s.home> In-Reply-To: References: <1531457777-11825-1-git-send-email-srinath.mannam@broadcom.com> <20180717092207.775e381c@t450s.home> <20180718152531.60f7df6b@t450s.home> <20180719091252.749f2433@t450s.home> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.41]); Fri, 20 Jul 2018 20:27:33 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 19 Jul 2018 21:49:48 +0530 Srinath Mannam wrote: > HI Alex, > > On Thu, Jul 19, 2018 at 8:42 PM, Alex Williamson > wrote: > > On Thu, 19 Jul 2018 20:17:11 +0530 > > Srinath Mannam wrote: > > > >> HI Alex, > >> > >> On Thu, Jul 19, 2018 at 2:55 AM, Alex Williamson > >> wrote: > >> > On Thu, 19 Jul 2018 00:05:18 +0530 > >> > Srinath Mannam wrote: > >> > > >> >> Hi Alex, > >> >> > >> >> On Tue, Jul 17, 2018 at 8:52 PM, Alex Williamson > >> >> wrote: > >> >> > On Fri, 13 Jul 2018 10:26:17 +0530 > >> >> > Srinath Mannam wrote: > >> >> > > >> >> >> By default all BARs map with VMA access permissions > >> >> >> as pgprot_noncached. > >> >> >> > >> >> >> In ARM64 pgprot_noncached is MT_DEVICE_nGnRnE which > >> >> >> is strongly ordered and allows aligned access. > >> >> >> This type of mapping works for NON-PREFETCHABLE bars > >> >> >> containing EP controller registers. > >> >> >> But it restricts PREFETCHABLE bars from doing > >> >> >> unaligned access. > >> >> >> > >> >> >> In CMB NVMe drives PREFETCHABLE bars are required to > >> >> >> map as MT_NORMAL_NC to do unaligned access. > >> >> >> > >> >> >> Signed-off-by: Srinath Mannam > >> >> >> Reviewed-by: Ray Jui > >> >> >> Reviewed-by: Vikram Prakash > >> >> >> --- > >> >> > > >> >> > This has been discussed before: > >> >> > > >> >> > https://www.spinics.net/lists/kvm/msg156548.html > >> >> Thank you for inputs.. I have gone through the long list of mail chain > >> >> discussion. > >> >> > > >> >> > CC'ing the usual suspects from the previous thread. I'm not convinced > >> >> > that the patch here has considered anything other than the ARM64 > >> >> > implications and it's not clear that it considers compatibility with > >> >> > existing users or devices at all. Can we guarantee for all devices and > >> >> > use cases that WC is semantically equivalent and preferable to UC? If > >> >> > not then we need to device an extension to the interface that allows > >> >> > the user to specify WC. Thanks, > >> >> > > >> >> To implement with user specified WC flags, many changes need to be done. > >> >> Suppose In DPDK, prefetcable BARs map using WC flag, then also same > >> >> question comes > >> >> that WC may be different for different CPUs. > >> >> As per functionality, both WC and PREFETCHABLE are same, like merging writes and > >> >> typically WC is uncached. > >> >> So, based on prefetchable BARs behavior and usage we need to map bar memory. > >> >> Is it right to map prefetchable BARs as strongly ordered, aligned > >> >> access and uncached? > >> > > >> > Is it possible to answer that question generically? Whether to map a > >> > BAR as UC or WC is generally a question for the driver. Does the > >> > device handle unaligned accesses? Does the device need strong memory > >> > ordering? If this is a driver level question then the driver that > >> > needs to make that decision is the userspace driver. VFIO is just a > >> > pass-through here and since we don't offer the user a choice of > >> > mappings, we take the safer and more conservative mapping, ie. UC. > >> > > >> Yes, you are right, driver should make the decision based on its requirement. > >> In my case, user space driver is part of SPDK, so SPDK should request DPDK > >> and DPDK should request VFIO to map BAR for its choice of mapping. > >> So to implement this we need code changes in VFIO, DPDK and SPDK. > >> > >> > You're suggesting that there are many changes to be done if we modify > >> > the vfio interface to expose WC under the user's control rather than > >> > simply transparently impose WC for all mappings, but is that really the > >> > case? Most devices on most platforms seem to work fine now. Perhaps WC > >> > is a performance optimization, but this is the first instance I've seen > >> > of it as a functional issue. Does that suggest that the imposed > >> > alignment on your platform is perhaps unique and the relaxed alignment > >> > should be implemented at the architecture specific memory flags for UC > >> > mappings? For instance, does x86 require this change for the same > >> > device? The chance for regressions of other devices on other platforms > >> > seems rather high as proposed. Thanks, > >> This issue is not specific to platform or device. this is the requirement of > >> CMB enabled NVMe cards. > >> NVMe kernel driver already has support to map CMB bar as WC using > >> ioremap_wc function. > >> File: drivers/nvme/host/pci.c > >> Function: nvme_map_cmb > >> code: dev->cmb = ioremap_wc(pci_resource_start(pdev, bar) + offset, size); > >> It means ioremap_wc is working with all platforms and WC map of > >> perfetchable BARs does not harm. > >> Same is required in SPDK NVMe driver also, without WC map it may work > >> in x86 platform, but it does not work in ARM platforms. > > > > Doesn't this contradict your assertion that it's not specific to > > platform or device? The device requires support for unaligned > > accesses. The platform chooses to restrict unaligned accesses for > > non-WC mappings while other platforms do not. The native driver can > > still clearly have performance considerations for choosing to use WC > > mappings, but it's still not clear to me that the functionality issue > > isn't self inflicted by the platform definition of UC vs WC. Thanks, > > > Device allows both aligned and unaligned access.. so software have > flexibility can do unaligned access. > As per ARM64 platform, with UC map, memory access should be un-cached, > aligned access and strongly order mapping. > with WC map, memory access can be aligned/unaligned and un-cached. I > thought this is the property of platform not issue. > To allow software to do unaligned access of device memory, we need to > use WC map of ARM64 platform case. > In ARM platforms UC mapping is used to map controller registers which > are 4 byte aligned exposed by non-prefetchable bars. > Also prefetchable BARs allows write merging so I thought using WC map > fulfills both write merging (add performance) and unaligned > access. > Can I modify the code as below to enable only for ARM platforms. > vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); > +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64) > + if (pci_resource_flags(pdev, index) & IORESOURCE_PREFETCH) > + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); > +#endif > vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff; While the risk of regression is smaller by restricting this to ARM, I don't think it's the right solution. What happens when a device requires strict ordering? ARM now behaves differently than any other architecture, that's not acceptable. Thanks, Alex