Received: by 10.192.165.148 with SMTP id m20csp4783501imm; Tue, 8 May 2018 14:26:08 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpfdHQEwDBCUs+tBHY/3A0kszMyhWqEtpEaPtRyBfKukBw5VLyKQm76IOI6QY+gxplu65ht X-Received: by 2002:a65:4acc:: with SMTP id c12-v6mr26619922pgu.329.1525814768816; Tue, 08 May 2018 14:26:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525814768; cv=none; d=google.com; s=arc-20160816; b=zrR/wwXgRqgSoroTeS0532b+DiUw/gSWjwRNZp4j0Z6KXo4WVtI5A1F/cLxkWreRU9 dVKj5IpcmaBr0kShZZvY1mRMUry6sbQKKnSGobnDZTD/6wJMXhzXs5kKSonvPoItZGKU l/N+dk7YsnNkGMfjPlFiZxH6ahWfUL3gle35K9NDoym27NVyQYJ2u9MpGPRYuLNioFs7 SLklCtok53k8qBOnhxa1rAAcTwc/Xyy3/GkECQ3vQ7AOiSWau6b2ivORkZfeRu2N4fLa RZvPm2bhSRpFxz4xzMUePNwxnPE+14gARwds5OFq57lstEteVmH1OFQ5fIAl6aGqFLKe 4DsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=RE3K9l94jz9z+OGrdYQ6f/wU8cs9/6iSFoFkyUvv2Sc=; b=oJwi4oH9PSAI/kEQKkYRb7NndyL2bu6EOvCS7c17xoHpqxQuAyxFKEP4VMOlJxpb83 L0cj8tpKG0bgR4jK1E0N48IsYMKUuF1vqRSHn+1CJ9FoIA+2IXIRmxX0OLsZCZorkFsJ IDtrUII8KkxlVsilFSml4zCyLqeG6IkLHeLDyHBRc3RWHDb05abW4IGEHbRIzcekLXj0 HuSH1vHJ7f9uCq6bi1/7dzLV/CZ1Gf7jpIjBEdNNNOz3V32r7slE5ucOboT/pqW8U/bj oXYNNdfRw4RzvM87UXIvPo5KB1SXMZHUAoMswogbsf63op9l/0HqozIIZpoKFBBWTlLo th/Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x3-v6si22681526plo.303.2018.05.08.14.25.53; Tue, 08 May 2018 14:26:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755927AbeEHVZl (ORCPT + 99 others); Tue, 8 May 2018 17:25:41 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:56834 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755504AbeEHVZj (ORCPT ); Tue, 8 May 2018 17:25:39 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5D44A402290A; Tue, 8 May 2018 21:25:28 +0000 (UTC) Received: from [10.18.17.89] (dhcp-17-89.bos.redhat.com [10.18.17.89]) by smtp.corp.redhat.com (Postfix) with ESMTP id CA031202342D; Tue, 8 May 2018 21:25:24 +0000 (UTC) Subject: Re: [PATCH v4 00/14] Copy Offload in NVMe Fabrics with P2P PCI Memory To: Alex Williamson , Bjorn Helgaas Cc: Logan Gunthorpe , linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org, Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Benjamin Herrenschmidt , =?UTF-8?Q?Christian_K=c3=b6nig?= References: <20180423233046.21476-1-logang@deltatee.com> <20180507232346.GI161390@bhelgaas-glaptop.roam.corp.google.com> <20180508105759.7dbaa8fe@w520.home> From: Don Dutile Message-ID: Date: Tue, 8 May 2018 17:25:24 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180508105759.7dbaa8fe@w520.home> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Tue, 08 May 2018 21:25:38 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Tue, 08 May 2018 21:25:38 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'ddutile@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/08/2018 12:57 PM, Alex Williamson wrote: > On Mon, 7 May 2018 18:23:46 -0500 > Bjorn Helgaas wrote: > >> On Mon, Apr 23, 2018 at 05:30:32PM -0600, Logan Gunthorpe wrote: >>> Hi Everyone, >>> >>> Here's v4 of our series to introduce P2P based copy offload to NVMe >>> fabrics. This version has been rebased onto v4.17-rc2. A git repo >>> is here: >>> >>> https://github.com/sbates130272/linux-p2pmem pci-p2p-v4 >>> ... >> >>> Logan Gunthorpe (14): >>> PCI/P2PDMA: Support peer-to-peer memory >>> PCI/P2PDMA: Add sysfs group to display p2pmem stats >>> PCI/P2PDMA: Add PCI p2pmem dma mappings to adjust the bus offset >>> PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches >>> docs-rst: Add a new directory for PCI documentation >>> PCI/P2PDMA: Add P2P DMA driver writer's documentation >>> block: Introduce PCI P2P flags for request and request queue >>> IB/core: Ensure we map P2P memory correctly in >>> rdma_rw_ctx_[init|destroy]() >>> nvme-pci: Use PCI p2pmem subsystem to manage the CMB >>> nvme-pci: Add support for P2P memory in requests >>> nvme-pci: Add a quirk for a pseudo CMB >>> nvmet: Introduce helper functions to allocate and free request SGLs >>> nvmet-rdma: Use new SGL alloc/free helper for requests >>> nvmet: Optionally use PCI P2P memory >>> >>> Documentation/ABI/testing/sysfs-bus-pci | 25 + >>> Documentation/PCI/index.rst | 14 + >>> Documentation/driver-api/index.rst | 2 +- >>> Documentation/driver-api/pci/index.rst | 20 + >>> Documentation/driver-api/pci/p2pdma.rst | 166 ++++++ >>> Documentation/driver-api/{ => pci}/pci.rst | 0 >>> Documentation/index.rst | 3 +- >>> block/blk-core.c | 3 + >>> drivers/infiniband/core/rw.c | 13 +- >>> drivers/nvme/host/core.c | 4 + >>> drivers/nvme/host/nvme.h | 8 + >>> drivers/nvme/host/pci.c | 118 +++-- >>> drivers/nvme/target/configfs.c | 67 +++ >>> drivers/nvme/target/core.c | 143 ++++- >>> drivers/nvme/target/io-cmd.c | 3 + >>> drivers/nvme/target/nvmet.h | 15 + >>> drivers/nvme/target/rdma.c | 22 +- >>> drivers/pci/Kconfig | 26 + >>> drivers/pci/Makefile | 1 + >>> drivers/pci/p2pdma.c | 814 +++++++++++++++++++++++++++++ >>> drivers/pci/pci.c | 6 + >>> include/linux/blk_types.h | 18 +- >>> include/linux/blkdev.h | 3 + >>> include/linux/memremap.h | 19 + >>> include/linux/pci-p2pdma.h | 118 +++++ >>> include/linux/pci.h | 4 + >>> 26 files changed, 1579 insertions(+), 56 deletions(-) >>> create mode 100644 Documentation/PCI/index.rst >>> create mode 100644 Documentation/driver-api/pci/index.rst >>> create mode 100644 Documentation/driver-api/pci/p2pdma.rst >>> rename Documentation/driver-api/{ => pci}/pci.rst (100%) >>> create mode 100644 drivers/pci/p2pdma.c >>> create mode 100644 include/linux/pci-p2pdma.h >> >> How do you envison merging this? There's a big chunk in drivers/pci, but >> really no opportunity for conflicts there, and there's significant stuff in >> block and nvme that I don't really want to merge. >> >> If Alex is OK with the ACS situation, I can ack the PCI parts and you could >> merge it elsewhere? > > AIUI from previously questioning this, the change is hidden behind a > build-time config option and only custom kernels or distros optimized > for this sort of support would enable that build option. I'm more than > a little dubious though that we're not going to have a wave of distros > enabling this only to get user complaints that they can no longer make > effective use of their devices for assignment due to the resulting span > of the IOMMU groups, nor is there any sort of compromise, configure > the kernel for p2p or device assignment, not both. Is this really such > a unique feature that distro users aren't going to be asking for both > features? Thanks, > > Alex At least 1/2 the cases presented to me by existing customers want it in a tunable kernel, and tunable btwn two points, if the hw allows it to be 'contained' in that manner, which a (layer of) switch(ing) provides. To me, that means a kernel cmdline parameter to _enable_, and another sysfs (configfs? ... i'm not a configfs afficionato to say which is best), method to make two points p2p dma capable. Worse case, the whole system is one large IOMMU group (current mindset of this static or run-time config option), or best case (over time, more hw), a secure set of the primary system with p2p-enabled sections, that are deemed 'safe' or 'self-inflicting-unsecure', the latter the case of today's VM with an assigned device -- can scribble all over the VM, but no other VM and not the host/HV. > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >