Received: by 10.223.185.116 with SMTP id b49csp2067752wrg; Sun, 4 Mar 2018 17:34:32 -0800 (PST) X-Google-Smtp-Source: AG47ELuurmBvR/sKuKBaa7bYH8AXW9TekutuTLyvPNseC9EGQKb/aB6dQ6zsCYtXdQs5ep5niKcH X-Received: by 10.98.117.139 with SMTP id q133mr13464343pfc.64.1520213672442; Sun, 04 Mar 2018 17:34:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520213672; cv=none; d=google.com; s=arc-20160816; b=rtiIwKCgjS/Vj0LVxfvojTpb+TnLP7oIEsTE52jq2HoqsK6EfQVXWuYOD3Oo8dvBWT HXM/M3JFxmwLym3egq8rkCyfAnE+jemZajgi+e00VHzoEGta6gZ+/c/I90Q+tW/82wPC Mw5ql+ZeU9xJGRmdrCQuMo9Gj8b4P99qFCNwyWSvQKSZXmfVaKqaz7FlG69PEo1myfvf OSPb7gvWVvxyU356KElQTY0QTEACN8++J5aej1aaN1HgvErA8RdS6Pj1K/bZiLEsln8Y U0jRDLonHIUDtq+OzPSehRxY0Dv4cLGaleVcGxLuNTLVGDiydFuY2xV1DmRy9Hekxcns PD7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=svaGukPd9GffHfBcDteqZmQHtVNQ39dNRCHshwdDgl0=; b=gplXywBD/ePrUUmeIKx65R7Kgdz6xYJltR65hSeobBOhCKzKDjdkbbOp4nm+fZwkKV +bcK5IZuBtMlrmPxKko0KB6eRWyynpoN8H1ULEzaJ9VKR+K9hIElnCQ4SThBSoCUItPj 0cTLeVz2c+M8yuIWMdBfxJGPiCcIw3sxCY8mOv74kdKDw1jXJWQO79O4kdl2P7SwCHgG fDNHIPBQZ8iF7GdbISz/MKSSHZ1OfbDkAZHGvvXzAVwg7EkpJKzHHuQiMlm2xJCbNqVq 39jFhrs6LBQFYryA4JnBkM/IgqJMECVdY7avWGP0Qi3l+t6zcWW39tKBjSQPKdVbivET hEuQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=jLzmOWFa; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d2-v6si8416832plr.67.2018.03.04.17.34.15; Sun, 04 Mar 2018 17:34:32 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=jLzmOWFa; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932493AbeCEBde (ORCPT + 99 others); Sun, 4 Mar 2018 20:33:34 -0500 Received: from mail-qt0-f196.google.com ([209.85.216.196]:38017 "EHLO mail-qt0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932083AbeCEBda (ORCPT ); Sun, 4 Mar 2018 20:33:30 -0500 Received: by mail-qt0-f196.google.com with SMTP id n12so18579999qtl.5; Sun, 04 Mar 2018 17:33:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=svaGukPd9GffHfBcDteqZmQHtVNQ39dNRCHshwdDgl0=; b=jLzmOWFa69xspsmqJ+Y8FcandjCKQg3iSPlo3n0QERrUMlEf5Wtf2Fs9kyOQ1n635u uYUlrkDTuzOooE2LY8WOzrBFWwCwAlrRqPwsNaGEM6X91c1hGalOeW3hA5a+/Lgw8HU9 dwuvju3TlMWPfLaT743VU9y0dduDT5iU7zq6hy9LCrwAwjodKhZXunziYSuexMFbFiFG gZC8OrO7GCo1C3TYwleFVqrzgEIsDGulAmC2rho/dT9m67zi494i2kkcuiJ1rvxRCDFP y4l6fJWwqgGvm/4JXX8XS4WMwWnJNruMyrpNHcoeK6BhG9tP2ZaOxIuTdRMwitaZiHmm qg2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=svaGukPd9GffHfBcDteqZmQHtVNQ39dNRCHshwdDgl0=; b=A2bFrJJYyLN5LGHvmOog25/NjgDb4s790H+wo/I5pYAvMWEpcA9Ya2PXYq4K1ORDoe 6ZO1xj7GWnobeQeUs9So6wkruMUdnhVWWd6Xa7v+LViYhIOHNvkBzMyJO1DDvQ5S6h9h bxtbbtUO+WBDqpOoWKCn9xCmrIQeIIbxXIZOf3NcKPIhx91eeCJsH8yJUGbz5k5wIQ8+ vgq1xKJXDFnFXeAObsevFs4eFI0v6t5ZHTMugUMAoXtn/n2qsHPvCjsiKg4RUHUfNtbj IrmtiMOHhaWwqUVWCCiuHITL/oZ2LGMxkPumUBjZTdI7n6STmLD0Cug4C4L94uWbbPeH BgZg== X-Gm-Message-State: AElRT7HYWL814kT8nI4lwYJ8CnBXj/JE49KtI5/gkPHTfKOAEylE5YxS 3hHhIbGzIwgLmo0XoW7zWOCz/DKfj0Reue/r7ic= X-Received: by 10.200.25.60 with SMTP id t57mr20902177qtj.81.1520213609551; Sun, 04 Mar 2018 17:33:29 -0800 (PST) MIME-Version: 1.0 Received: by 10.140.104.143 with HTTP; Sun, 4 Mar 2018 17:33:29 -0800 (PST) In-Reply-To: <20180228234006.21093-8-logang@deltatee.com> References: <20180228234006.21093-1-logang@deltatee.com> <20180228234006.21093-8-logang@deltatee.com> From: Oliver Date: Mon, 5 Mar 2018 12:33:29 +1100 Message-ID: Subject: Re: [PATCH v2 07/10] nvme-pci: Use PCI p2pmem subsystem to manage the CMB To: Logan Gunthorpe Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, "linux-nvdimm@lists.01.org" , linux-block@vger.kernel.org, Jens Axboe , Benjamin Herrenschmidt , Alex Williamson , Keith Busch , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Jason Gunthorpe , Bjorn Helgaas , Max Gurtovoy , Christoph Hellwig Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 1, 2018 at 10:40 AM, Logan Gunthorpe wrote: > Register the CMB buffer as p2pmem and use the appropriate allocation > functions to create and destroy the IO SQ. > > If the CMB supports WDS and RDS, publish it for use as p2p memory > by other devices. > > Signed-off-by: Logan Gunthorpe > --- > drivers/nvme/host/pci.c | 75 +++++++++++++++++++++++++++---------------------- > 1 file changed, 41 insertions(+), 34 deletions(-) > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index 73036d2fbbd5..56ca79be8476 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -29,6 +29,7 @@ > #include > #include > #include > +#include > > #include "nvme.h" > > @@ -91,9 +92,8 @@ struct nvme_dev { > struct work_struct remove_work; > struct mutex shutdown_lock; > bool subsystem; > - void __iomem *cmb; > - pci_bus_addr_t cmb_bus_addr; > u64 cmb_size; > + bool cmb_use_sqes; > u32 cmbsz; > u32 cmbloc; > struct nvme_ctrl ctrl; > @@ -148,7 +148,7 @@ struct nvme_queue { > struct nvme_dev *dev; > spinlock_t q_lock; > struct nvme_command *sq_cmds; > - struct nvme_command __iomem *sq_cmds_io; > + bool sq_cmds_is_io; > volatile struct nvme_completion *cqes; > struct blk_mq_tags **tags; > dma_addr_t sq_dma_addr; > @@ -429,10 +429,7 @@ static void __nvme_submit_cmd(struct nvme_queue *nvmeq, > { > u16 tail = nvmeq->sq_tail; > - if (nvmeq->sq_cmds_io) > - memcpy_toio(&nvmeq->sq_cmds_io[tail], cmd, sizeof(*cmd)); > - else > - memcpy(&nvmeq->sq_cmds[tail], cmd, sizeof(*cmd)); > + memcpy(&nvmeq->sq_cmds[tail], cmd, sizeof(*cmd)); Hmm, how safe is replacing memcpy_toio() with regular memcpy()? On PPC the _toio() variant enforces alignment, does the copy with 4 byte stores, and has a full barrier after the copy. In comparison our regular memcpy() does none of those things and may use unaligned and vector load/stores. For normal (cacheable) memory that is perfectly fine, but they can cause alignment faults when targeted at MMIO (cache-inhibited) memory. I think in this particular case it might be ok since we know SEQs are aligned to 64 byte boundaries and the copy is too small to use our vectorised memcpy(). I'll assume we don't need explicit ordering between writes of SEQs since the existing code doesn't seem to care unless the doorbell is being rung, so you're probably fine there too. That said, I still think this is a little bit sketchy and at the very least you should add a comment explaining what's going on when the CMB is being used. If someone more familiar with the NVMe driver could chime in I would appreciate it. > if (++tail == nvmeq->q_depth) > tail = 0; > @@ -1286,9 +1283,18 @@ static void nvme_free_queue(struct nvme_queue *nvmeq) > { > dma_free_coherent(nvmeq->q_dmadev, CQ_SIZE(nvmeq->q_depth), > (void *)nvmeq->cqes, nvmeq->cq_dma_addr); > - if (nvmeq->sq_cmds) > - dma_free_coherent(nvmeq->q_dmadev, SQ_SIZE(nvmeq->q_depth), > - nvmeq->sq_cmds, nvmeq->sq_dma_addr); > + > + if (nvmeq->sq_cmds) { > + if (nvmeq->sq_cmds_is_io) > + pci_free_p2pmem(to_pci_dev(nvmeq->q_dmadev), > + nvmeq->sq_cmds, > + SQ_SIZE(nvmeq->q_depth)); > + else > + dma_free_coherent(nvmeq->q_dmadev, > + SQ_SIZE(nvmeq->q_depth), > + nvmeq->sq_cmds, > + nvmeq->sq_dma_addr); > + } > } > > static void nvme_free_queues(struct nvme_dev *dev, int lowest) > @@ -1368,12 +1374,21 @@ static int nvme_cmb_qdepth(struct nvme_dev *dev, int nr_io_queues, > static int nvme_alloc_sq_cmds(struct nvme_dev *dev, struct nvme_queue *nvmeq, > int qid, int depth) > { > - /* CMB SQEs will be mapped before creation */ > - if (qid && dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) > - return 0; > + struct pci_dev *pdev = to_pci_dev(dev->dev); > + > + if (qid && dev->cmb_use_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) { > + nvmeq->sq_cmds = pci_alloc_p2pmem(pdev, SQ_SIZE(depth)); > + nvmeq->sq_dma_addr = pci_p2pmem_virt_to_bus(pdev, > + nvmeq->sq_cmds); > + nvmeq->sq_cmds_is_io = true; > + } > + > + if (!nvmeq->sq_cmds) { > + nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth), > + &nvmeq->sq_dma_addr, GFP_KERNEL); > + nvmeq->sq_cmds_is_io = false; > + } > > - nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth), > - &nvmeq->sq_dma_addr, GFP_KERNEL); > if (!nvmeq->sq_cmds) > return -ENOMEM; > return 0; > @@ -1449,13 +1464,6 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid) > struct nvme_dev *dev = nvmeq->dev; > int result; > > - if (dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) { > - unsigned offset = (qid - 1) * roundup(SQ_SIZE(nvmeq->q_depth), > - dev->ctrl.page_size); > - nvmeq->sq_dma_addr = dev->cmb_bus_addr + offset; > - nvmeq->sq_cmds_io = dev->cmb + offset; > - } > - > nvmeq->cq_vector = qid - 1; > result = adapter_alloc_cq(dev, qid, nvmeq); > if (result < 0) > @@ -1685,9 +1693,6 @@ static void nvme_map_cmb(struct nvme_dev *dev) > return; > dev->cmbloc = readl(dev->bar + NVME_REG_CMBLOC); > > - if (!use_cmb_sqes) > - return; > - > size = nvme_cmb_size_unit(dev) * nvme_cmb_size(dev); > offset = nvme_cmb_size_unit(dev) * NVME_CMB_OFST(dev->cmbloc); > bar = NVME_CMB_BIR(dev->cmbloc); > @@ -1704,11 +1709,15 @@ static void nvme_map_cmb(struct nvme_dev *dev) > if (size > bar_size - offset) > size = bar_size - offset; > > - dev->cmb = ioremap_wc(pci_resource_start(pdev, bar) + offset, size); > - if (!dev->cmb) > + if (pci_p2pdma_add_resource(pdev, bar, size, offset)) > return; > - dev->cmb_bus_addr = pci_bus_address(pdev, bar) + offset; > + > dev->cmb_size = size; > + dev->cmb_use_sqes = use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS); > + > + if ((dev->cmbsz & (NVME_CMBSZ_WDS | NVME_CMBSZ_RDS)) == > + (NVME_CMBSZ_WDS | NVME_CMBSZ_RDS)) > + pci_p2pmem_publish(pdev, true); > > if (sysfs_add_file_to_group(&dev->ctrl.device->kobj, > &dev_attr_cmb.attr, NULL)) > @@ -1718,12 +1727,10 @@ static void nvme_map_cmb(struct nvme_dev *dev) > > static inline void nvme_release_cmb(struct nvme_dev *dev) > { > - if (dev->cmb) { > - iounmap(dev->cmb); > - dev->cmb = NULL; > + if (dev->cmb_size) { > sysfs_remove_file_from_group(&dev->ctrl.device->kobj, > &dev_attr_cmb.attr, NULL); > - dev->cmbsz = 0; > + dev->cmb_size = 0; > } > } > > @@ -1918,13 +1925,13 @@ static int nvme_setup_io_queues(struct nvme_dev *dev) > if (nr_io_queues == 0) > return 0; > > - if (dev->cmb && (dev->cmbsz & NVME_CMBSZ_SQS)) { > + if (dev->cmb_use_sqes) { > result = nvme_cmb_qdepth(dev, nr_io_queues, > sizeof(struct nvme_command)); > if (result > 0) > dev->q_depth = result; > else > - nvme_release_cmb(dev); > + dev->cmb_use_sqes = false; > } > > do { > -- > 2.11.0 > > _______________________________________________ > Linux-nvdimm mailing list > Linux-nvdimm@lists.01.org > https://lists.01.org/mailman/listinfo/linux-nvdimm