Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1148081imm; Wed, 1 Aug 2018 10:59:57 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdrN/wm+IeUBb3d8634FHSWFy/d66EeWV+6ZD3k/8duz5gk/9CB/g6IkNDyKTWY6tj6TNca X-Received: by 2002:a65:5b08:: with SMTP id y8-v6mr25034754pgq.297.1533146397089; Wed, 01 Aug 2018 10:59:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533146397; cv=none; d=google.com; s=arc-20160816; b=Tv+mwbpAA6BeDSD+FFtgd16HC2zQvuHrG8oqJJQAeIDO3O45BFn+sYJn+YEd20bEKh Cd3yv+Q6/l5/Qxurejb0vs5/anut57MfJQHcAXGI4yky/3F52yEcH9Cj8Hn/oWHBQxzW Ax1E9TV85SsdjUEgqxRzdrjDDyCejP908yIjnfHTmgd0QXiUZo9uAtxl9MN4DS/98dtv d1/npPLK3W1e1qy7XVoW8yuJTqdGa/Vy55hsULI9ka5ZL4sg9TgQHhwyfdDA888+B0Eu 5uNhfvUoH7TAtEO/dVcmp7xDrm8JeArEMz7qfkXa5jn3H4mjkMOkZxHSJriJ+NJgQcfH 1YwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=+r7+b+0b48kr+UsI63pOhi971NcDB/+j0t/8tpk+5s4=; b=oJgtX0XGAdRrOmU9Kj28byumg9duTw/A7vRGQz8ZiNuN4IIFUB+M1qkC75GLk6233a 8AV18FB3sfjYh6mLoh2YUTSWIpzqjG/aaa8qYNJNUenxQZlVwT/rdpE0VbvuUfofTpWK mrEJanzV1k7hk1EcBNDIxtTRp3Q80B9WCjG+VEE0PmcfblUJVcRwNGJ0kGzuBCnZaKPy zQypk3kTALE3EguLJHVgtH9ZT1GvAEXlv0tcMwYGF2zgSdc8Kj/Boh89BzGWwhHZdfG2 gbXHgN2Fakqwqjgu6n1gYcenYXHYKwVuu0RZs6cc/Pp4khmYLky5kLDKiHYF/6THEZCN ZceA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@broadcom.com header.s=google header.b=gCQwinN2; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=broadcom.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b18-v6si15412660pgd.185.2018.08.01.10.59.42; Wed, 01 Aug 2018 10:59:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@broadcom.com header.s=google header.b=gCQwinN2; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=broadcom.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732704AbeHATpr (ORCPT + 99 others); Wed, 1 Aug 2018 15:45:47 -0400 Received: from mail-oi0-f66.google.com ([209.85.218.66]:37679 "EHLO mail-oi0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726192AbeHATpq (ORCPT ); Wed, 1 Aug 2018 15:45:46 -0400 Received: by mail-oi0-f66.google.com with SMTP id j205-v6so6365950oib.4 for ; Wed, 01 Aug 2018 10:58:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=+r7+b+0b48kr+UsI63pOhi971NcDB/+j0t/8tpk+5s4=; b=gCQwinN2VnieDxIse7tYtZ5uiVwX9Z2Az4vhU065/CPyH8lK82diDtsMB/DuXKM1/T 4yc8EsPwGaT4WkIlwx5RBTnx2QJLhoLENP5kxyhRwFsGLB7ztkYz9o3Af5In18EuFN/M rZE/alDVOvoC8U5/WPLqjYRXWN1IFjhLPTpBU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=+r7+b+0b48kr+UsI63pOhi971NcDB/+j0t/8tpk+5s4=; b=FKCTFfydWI1p2h272/0Icfnfu+2Kpf83OyF7x/+9WWe6EBXzJH8N2Y1z0NQrc8cOXN /47U9DaVU01MGvQqmN1CcDV601Ss5Smt+T1kb/v+CFSAA/BDg9L8SdIlNJmAfZu2pL5Q Z3lnRoPCfvaeUsPKLVkGQFM4QOa03MpQwYvkxFS8Y4WMBmgnsAoCsuMLtm1CHp2QlW6z r8sBy5F78FHrWYvj00WRUEmI3ONyF6ctylh1tzgjgMPgLJgIe4SJZo+elhbxv3kzNujw Yamp1G8PgDhoxrbl2D28FzH4srkdXoVMbzhuWG7Ge/9Yc5RHvBuyQiiBFvoFIFPVp9IT 6iQw== X-Gm-Message-State: AOUpUlFSgfPmmxnKgaG0nBRhdApbOXN/CBJ05iNXVGYiQBRM52wqR7Jz STYHBUlw2iECLxXLqTUEwf8cgw5JN1DMKFYErRFq9xPVHn0= X-Received: by 2002:aca:ecd0:: with SMTP id k199-v6mr4856569oih.227.1533146333896; Wed, 01 Aug 2018 10:58:53 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a9d:6043:0:0:0:0:0 with HTTP; Wed, 1 Aug 2018 10:58:53 -0700 (PDT) In-Reply-To: References: <1531457777-11825-1-git-send-email-srinath.mannam@broadcom.com> <20180717092207.775e381c@t450s.home> <20180718152531.60f7df6b@t450s.home> <20180719091252.749f2433@t450s.home> <20180720142728.572dd2d7@t450s.home> From: Srinath Mannam Date: Wed, 1 Aug 2018 23:28:53 +0530 Message-ID: Subject: Re: [RFC PATCH] vfio/pci: map prefetchble bars as writecombine To: Alex Williamson Cc: Ray Jui , Vikram Prakash , Scott Branden , kvm@vger.kernel.org, BCM Kernel Feedback , Linux Kernel Mailing List , aik@ozlabs.ru, David Gibson , benh@kernel.crashing.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Alex, In user space UIO driver (DPDK) implementation, sysfs interface "/sys/devices/pci/.../resource0_wc" is used to map prefetchable PCI resources as WC. Platforms which support write-combining maps of PCI resources have arch_can_pci_mmap_wc() flag enabled. So that it allows to map resources as WC. In this approach mmap calls "pci_mmap_resource_range" kernel function with write_combine parameter set. "drivers/pci/pci-sysfs.c" kernel file has this implementation. If this approach fits to vfio driver, then code change in vfio driver are if (arch_can_pci_mmap_wc() && (pci_resource_flags(pdev, index) & IORESOURCE_PREFETCH)) vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); else vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); Please provide your feedback. Thank you. On Mon, Jul 23, 2018 at 2:03 PM, Srinath Mannam wrote: > On Sat, Jul 21, 2018 at 1:57 AM, Alex Williamson > wrote: >> On Thu, 19 Jul 2018 21:49:48 +0530 >> Srinath Mannam wrote: >> >>> HI Alex, >>> >>> On Thu, Jul 19, 2018 at 8:42 PM, Alex Williamson >>> wrote: >>> > On Thu, 19 Jul 2018 20:17:11 +0530 >>> > Srinath Mannam wrote: >>> > >>> >> HI Alex, >>> >> >>> >> On Thu, Jul 19, 2018 at 2:55 AM, Alex Williamson >>> >> wrote: >>> >> > On Thu, 19 Jul 2018 00:05:18 +0530 >>> >> > Srinath Mannam wrote: >>> >> > >>> >> >> Hi Alex, >>> >> >> >>> >> >> On Tue, Jul 17, 2018 at 8:52 PM, Alex Williamson >>> >> >> wrote: >>> >> >> > On Fri, 13 Jul 2018 10:26:17 +0530 >>> >> >> > Srinath Mannam wrote: >>> >> >> > >>> >> >> >> By default all BARs map with VMA access permissions >>> >> >> >> as pgprot_noncached. >>> >> >> >> >>> >> >> >> In ARM64 pgprot_noncached is MT_DEVICE_nGnRnE which >>> >> >> >> is strongly ordered and allows aligned access. >>> >> >> >> This type of mapping works for NON-PREFETCHABLE bars >>> >> >> >> containing EP controller registers. >>> >> >> >> But it restricts PREFETCHABLE bars from doing >>> >> >> >> unaligned access. >>> >> >> >> >>> >> >> >> In CMB NVMe drives PREFETCHABLE bars are required to >>> >> >> >> map as MT_NORMAL_NC to do unaligned access. >>> >> >> >> >>> >> >> >> Signed-off-by: Srinath Mannam >>> >> >> >> Reviewed-by: Ray Jui >>> >> >> >> Reviewed-by: Vikram Prakash >>> >> >> >> --- >>> >> >> > >>> >> >> > This has been discussed before: >>> >> >> > >>> >> >> > https://www.spinics.net/lists/kvm/msg156548.html >>> >> >> Thank you for inputs.. I have gone through the long list of mail chain >>> >> >> discussion. >>> >> >> > >>> >> >> > CC'ing the usual suspects from the previous thread. I'm not convinced >>> >> >> > that the patch here has considered anything other than the ARM64 >>> >> >> > implications and it's not clear that it considers compatibility with >>> >> >> > existing users or devices at all. Can we guarantee for all devices and >>> >> >> > use cases that WC is semantically equivalent and preferable to UC? If >>> >> >> > not then we need to device an extension to the interface that allows >>> >> >> > the user to specify WC. Thanks, >>> >> >> > >>> >> >> To implement with user specified WC flags, many changes need to be done. >>> >> >> Suppose In DPDK, prefetcable BARs map using WC flag, then also same >>> >> >> question comes >>> >> >> that WC may be different for different CPUs. >>> >> >> As per functionality, both WC and PREFETCHABLE are same, like merging writes and >>> >> >> typically WC is uncached. >>> >> >> So, based on prefetchable BARs behavior and usage we need to map bar memory. >>> >> >> Is it right to map prefetchable BARs as strongly ordered, aligned >>> >> >> access and uncached? >>> >> > >>> >> > Is it possible to answer that question generically? Whether to map a >>> >> > BAR as UC or WC is generally a question for the driver. Does the >>> >> > device handle unaligned accesses? Does the device need strong memory >>> >> > ordering? If this is a driver level question then the driver that >>> >> > needs to make that decision is the userspace driver. VFIO is just a >>> >> > pass-through here and since we don't offer the user a choice of >>> >> > mappings, we take the safer and more conservative mapping, ie. UC. >>> >> > >>> >> Yes, you are right, driver should make the decision based on its requirement. >>> >> In my case, user space driver is part of SPDK, so SPDK should request DPDK >>> >> and DPDK should request VFIO to map BAR for its choice of mapping. >>> >> So to implement this we need code changes in VFIO, DPDK and SPDK. >>> >> >>> >> > You're suggesting that there are many changes to be done if we modify >>> >> > the vfio interface to expose WC under the user's control rather than >>> >> > simply transparently impose WC for all mappings, but is that really the >>> >> > case? Most devices on most platforms seem to work fine now. Perhaps WC >>> >> > is a performance optimization, but this is the first instance I've seen >>> >> > of it as a functional issue. Does that suggest that the imposed >>> >> > alignment on your platform is perhaps unique and the relaxed alignment >>> >> > should be implemented at the architecture specific memory flags for UC >>> >> > mappings? For instance, does x86 require this change for the same >>> >> > device? The chance for regressions of other devices on other platforms >>> >> > seems rather high as proposed. Thanks, >>> >> This issue is not specific to platform or device. this is the requirement of >>> >> CMB enabled NVMe cards. >>> >> NVMe kernel driver already has support to map CMB bar as WC using >>> >> ioremap_wc function. >>> >> File: drivers/nvme/host/pci.c >>> >> Function: nvme_map_cmb >>> >> code: dev->cmb = ioremap_wc(pci_resource_start(pdev, bar) + offset, size); >>> >> It means ioremap_wc is working with all platforms and WC map of >>> >> perfetchable BARs does not harm. >>> >> Same is required in SPDK NVMe driver also, without WC map it may work >>> >> in x86 platform, but it does not work in ARM platforms. >>> > >>> > Doesn't this contradict your assertion that it's not specific to >>> > platform or device? The device requires support for unaligned >>> > accesses. The platform chooses to restrict unaligned accesses for >>> > non-WC mappings while other platforms do not. The native driver can >>> > still clearly have performance considerations for choosing to use WC >>> > mappings, but it's still not clear to me that the functionality issue >>> > isn't self inflicted by the platform definition of UC vs WC. Thanks, >>> > >>> Device allows both aligned and unaligned access.. so software have >>> flexibility can do unaligned access. >>> As per ARM64 platform, with UC map, memory access should be un-cached, >>> aligned access and strongly order mapping. >>> with WC map, memory access can be aligned/unaligned and un-cached. I >>> thought this is the property of platform not issue. >>> To allow software to do unaligned access of device memory, we need to >>> use WC map of ARM64 platform case. >>> In ARM platforms UC mapping is used to map controller registers which >>> are 4 byte aligned exposed by non-prefetchable bars. >>> Also prefetchable BARs allows write merging so I thought using WC map >>> fulfills both write merging (add performance) and unaligned >>> access. >>> Can I modify the code as below to enable only for ARM platforms. >>> vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); >>> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64) >>> + if (pci_resource_flags(pdev, index) & IORESOURCE_PREFETCH) >>> + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); >>> +#endif >>> vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff; >> >> While the risk of regression is smaller by restricting this to ARM, I >> don't think it's the right solution. What happens when a device >> requires strict ordering? ARM now behaves differently than any other >> architecture, that's not acceptable. Thanks, > If strict ordering is required for prefetchable bars, driver software > has to add barrier instructions. > With this we can assume for prefetchable bars WC mapping should be fine? > > Regards, > Srinath. >> >> Alex