Received: by 10.192.165.148 with SMTP id m20csp4794936imm; Tue, 8 May 2018 14:41:26 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpxIio3TfYbeLBvPdrehqyQJWpafk6PsSvPZ2WX1TJ03pCak48zzcASYi+NU0KZ7BoO5WKd X-Received: by 2002:a63:b443:: with SMTP id n3-v6mr27357293pgu.342.1525815686109; Tue, 08 May 2018 14:41:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525815686; cv=none; d=google.com; s=arc-20160816; b=X6E6lI0GQ0c16+tvk5dfPHXCR5fUm41V4x4AD9tOlJP0FpQEgs6VB9g/C0sm9bYstq 0fNj+izBB7QrGgTHEjFn5ONKqNqtpHIeNCNAbP0ldbK1x36lyNyxcya6C15Vls/mgfNe CIH3jZy0adET38x6ICAB1+xVJ7W6OkjLxNdsAOzHJIHlnP8ffBEmVlHzAMxE6XkDCbN6 ii7rY7Akp99kVCGerwzxlMcfyN/BZSJEYnVXDyYncgH9o90lqvDsKO4cm0BlMKKclMuI j7FopifOHHjxhlUChuppJi4SV8by2ucMOxp2CKFWn0vWStRQMaC+9MHNkI68DHhd7zqY B70w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=kqngzwOxxJCOrJ17Pbb6ewItpsW4+CVsdwDnieB6QP4=; b=uRqJbv3ptcTHTBkxUfoEoyKEYbr+KwNNtA41DbH0PX8J+ZoQ86kA+oucrOf+SZNi9r xxU4+ZFRxWN/8f83x17nuMzTQG/7Sp9E4aWvIAAoYtzDHlmKTXes72p+RW3tq25BBPGK ZgA5IbLZDHyjpPcdkRU9H9agBKJzduLWAQ7MHIEbc1/QjYAoV4Go2bpLG6FRs6K2nTna 9beJs0usKUZNiLr8tz7ZIhP4VIcEtD9EYZ/iQDd0JcQa8iypAEx8pfuoQ0Bc0637Sla9 NVsojIQEQ9Q7HaVQu1L16ytCcytbLrxycvu0dwZVcS6MWbD0B8iIR0WB0k2VBSmt6fOo wXiQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q62-v6si19771182pgq.297.2018.05.08.14.41.11; Tue, 08 May 2018 14:41:26 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756050AbeEHVkp (ORCPT + 99 others); Tue, 8 May 2018 17:40:45 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44082 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755665AbeEHVkm (ORCPT ); Tue, 8 May 2018 17:40:42 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7B0F6C058EDD; Tue, 8 May 2018 21:40:42 +0000 (UTC) Received: from w520.home (ovpn-116-103.phx2.redhat.com [10.3.116.103]) by smtp.corp.redhat.com (Postfix) with ESMTP id B882D5D9C6; Tue, 8 May 2018 21:40:40 +0000 (UTC) Date: Tue, 8 May 2018 15:40:39 -0600 From: Alex Williamson To: Don Dutile Cc: Bjorn Helgaas , Logan Gunthorpe , linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org, Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?UTF-8?B?SsOpcsO0bWU=?= Glisse , Benjamin Herrenschmidt , Christian =?UTF-8?B?S8O2bmln?= Subject: Re: [PATCH v4 00/14] Copy Offload in NVMe Fabrics with P2P PCI Memory Message-ID: <20180508154039.5c85a8f8@w520.home> In-Reply-To: References: <20180423233046.21476-1-logang@deltatee.com> <20180507232346.GI161390@bhelgaas-glaptop.roam.corp.google.com> <20180508105759.7dbaa8fe@w520.home> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 08 May 2018 21:40:42 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 8 May 2018 17:25:24 -0400 Don Dutile wrote: > On 05/08/2018 12:57 PM, Alex Williamson wrote: > > On Mon, 7 May 2018 18:23:46 -0500 > > Bjorn Helgaas wrote: > > > >> On Mon, Apr 23, 2018 at 05:30:32PM -0600, Logan Gunthorpe wrote: > >>> Hi Everyone, > >>> > >>> Here's v4 of our series to introduce P2P based copy offload to NVMe > >>> fabrics. This version has been rebased onto v4.17-rc2. A git repo > >>> is here: > >>> > >>> https://github.com/sbates130272/linux-p2pmem pci-p2p-v4 > >>> ... > >> > >>> Logan Gunthorpe (14): > >>> PCI/P2PDMA: Support peer-to-peer memory > >>> PCI/P2PDMA: Add sysfs group to display p2pmem stats > >>> PCI/P2PDMA: Add PCI p2pmem dma mappings to adjust the bus offset > >>> PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches > >>> docs-rst: Add a new directory for PCI documentation > >>> PCI/P2PDMA: Add P2P DMA driver writer's documentation > >>> block: Introduce PCI P2P flags for request and request queue > >>> IB/core: Ensure we map P2P memory correctly in > >>> rdma_rw_ctx_[init|destroy]() > >>> nvme-pci: Use PCI p2pmem subsystem to manage the CMB > >>> nvme-pci: Add support for P2P memory in requests > >>> nvme-pci: Add a quirk for a pseudo CMB > >>> nvmet: Introduce helper functions to allocate and free request SGLs > >>> nvmet-rdma: Use new SGL alloc/free helper for requests > >>> nvmet: Optionally use PCI P2P memory > >>> > >>> Documentation/ABI/testing/sysfs-bus-pci | 25 + > >>> Documentation/PCI/index.rst | 14 + > >>> Documentation/driver-api/index.rst | 2 +- > >>> Documentation/driver-api/pci/index.rst | 20 + > >>> Documentation/driver-api/pci/p2pdma.rst | 166 ++++++ > >>> Documentation/driver-api/{ => pci}/pci.rst | 0 > >>> Documentation/index.rst | 3 +- > >>> block/blk-core.c | 3 + > >>> drivers/infiniband/core/rw.c | 13 +- > >>> drivers/nvme/host/core.c | 4 + > >>> drivers/nvme/host/nvme.h | 8 + > >>> drivers/nvme/host/pci.c | 118 +++-- > >>> drivers/nvme/target/configfs.c | 67 +++ > >>> drivers/nvme/target/core.c | 143 ++++- > >>> drivers/nvme/target/io-cmd.c | 3 + > >>> drivers/nvme/target/nvmet.h | 15 + > >>> drivers/nvme/target/rdma.c | 22 +- > >>> drivers/pci/Kconfig | 26 + > >>> drivers/pci/Makefile | 1 + > >>> drivers/pci/p2pdma.c | 814 +++++++++++++++++++++++++++++ > >>> drivers/pci/pci.c | 6 + > >>> include/linux/blk_types.h | 18 +- > >>> include/linux/blkdev.h | 3 + > >>> include/linux/memremap.h | 19 + > >>> include/linux/pci-p2pdma.h | 118 +++++ > >>> include/linux/pci.h | 4 + > >>> 26 files changed, 1579 insertions(+), 56 deletions(-) > >>> create mode 100644 Documentation/PCI/index.rst > >>> create mode 100644 Documentation/driver-api/pci/index.rst > >>> create mode 100644 Documentation/driver-api/pci/p2pdma.rst > >>> rename Documentation/driver-api/{ => pci}/pci.rst (100%) > >>> create mode 100644 drivers/pci/p2pdma.c > >>> create mode 100644 include/linux/pci-p2pdma.h > >> > >> How do you envison merging this? There's a big chunk in drivers/pci, but > >> really no opportunity for conflicts there, and there's significant stuff in > >> block and nvme that I don't really want to merge. > >> > >> If Alex is OK with the ACS situation, I can ack the PCI parts and you could > >> merge it elsewhere? > > > > AIUI from previously questioning this, the change is hidden behind a > > build-time config option and only custom kernels or distros optimized > > for this sort of support would enable that build option. I'm more than > > a little dubious though that we're not going to have a wave of distros > > enabling this only to get user complaints that they can no longer make > > effective use of their devices for assignment due to the resulting span > > of the IOMMU groups, nor is there any sort of compromise, configure > > the kernel for p2p or device assignment, not both. Is this really such > > a unique feature that distro users aren't going to be asking for both > > features? Thanks, > > > > Alex > At least 1/2 the cases presented to me by existing customers want it in a tunable kernel, > and tunable btwn two points, if the hw allows it to be 'contained' in that manner, which > a (layer of) switch(ing) provides. > To me, that means a kernel cmdline parameter to _enable_, and another sysfs (configfs? ... i'm not a configfs afficionato to say which is best), > method to make two points p2p dma capable. That's not what's done here AIUI. There are also some complications to making IOMMU groups dynamic, for instance could a downstream endpoint already be in use by a userspace tool as ACS is being twiddled in sysfs? Probably the easiest solution would be that all devices affected by the ACS change are soft unplugged before and re-added after the ACS change. Note that "affected" is not necessarily only the downstream devices if the downstream port at which we're playing with ACS is part of a multifunction device. Thanks, Alex