Received: by 10.223.185.116 with SMTP id b49csp7706401wrg; Thu, 1 Mar 2018 09:42:12 -0800 (PST) X-Google-Smtp-Source: AG47ELssxB0+oA2KUO+oY5/VXJPg3GVkBz/5T6KwDBufaejdMuqlc+Q6grtldThiLI+1lCcCheM+ X-Received: by 10.99.175.87 with SMTP id s23mr2179607pgo.328.1519926132538; Thu, 01 Mar 2018 09:42:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519926132; cv=none; d=google.com; s=arc-20160816; b=aqvHJfSnE8GbAhsIy2VZ9XXnW6V6PdZcwvZ7HTF0oIphWkyew294t9zKO+JL2bynDl /nKfK3di1wYdVtGGZV5JuGeHV0q8l11AuBUMtWQdpkfEFskVrMTBksA/1sTLW0Mn6auu NpwiOzUILGwxK2etkPDbjW0BLBlTgpsm7jtyitSzL+QIK5ynB+/oYRkPrz7DEaiXp9XX vXwbJZDfTzWIpPJhyatSrhxGuI2vTxVg2rEYkeJ/guoyidA1brsh56bQUDN77FP70soT UrQOENU09pNUaFZuSFafPi5cECkxxtUj8DkDgDSK5Ig22kvOS26TXrp9qRYmfCXuhnsV xc1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:subject:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:arc-authentication-results; bh=Uq1/D6C6n/S6hMSL8PgaKVuCYwrAaHgwSbPxWgH6EsE=; b=DG0NwWR4dcO5S2uU/Qv6N6mtC/X1NMmsImbcMT1e+/1e3eLGkg7skL8sFRXzTYt9ix hdWptizrFDcljF0xfO1cnt3l86gBRyUlFiuF4XfeR3zv2EinE7bokOPkGyxGb+xn1dUn e5RgwD8o5oD6l3pFQj3i/1cierBmzxoaEcoX09ppU9PYs9xv6lyNq/ZHDPwKpfuR2QWD DC6orwUHfM1LCe2dwX6sIAUf1Aedh+5JaG/cn2UO7+VcUZW+6iw+jpA64TPQwxtV7Eoj wvgbJJPS+VdO9wIa1RlGNvkFAkFwLI5xH9UDLx+EKY1YEi3dG7XP0XtTdAxZ+cDV7l1y i5TA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u188si2698440pgc.785.2018.03.01.09.41.57; Thu, 01 Mar 2018 09:42:12 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1033524AbeCARkb (ORCPT + 99 others); Thu, 1 Mar 2018 12:40:31 -0500 Received: from ale.deltatee.com ([207.54.116.67]:36770 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1033270AbeCARk2 (ORCPT ); Thu, 1 Mar 2018 12:40:28 -0500 Received: from guinness.priv.deltatee.com ([172.16.1.162]) by ale.deltatee.com with esmtp (Exim 4.89) (envelope-from ) id 1erSBi-0007Js-GZ; Thu, 01 Mar 2018 10:40:15 -0700 To: Sagi Grimberg , linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org Cc: Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Benjamin Herrenschmidt , Alex Williamson , Steve Wise References: <20180228234006.21093-1-logang@deltatee.com> <20180228234006.21093-11-logang@deltatee.com> <749e3752-4349-0bdf-5243-3d510c2b26db@grimberg.me> From: Logan Gunthorpe Message-ID: <40d69074-31a8-d06a-ade9-90de7712c553@deltatee.com> Date: Thu, 1 Mar 2018 10:40:11 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <749e3752-4349-0bdf-5243-3d510c2b26db@grimberg.me> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-SA-Exim-Connect-IP: 172.16.1.162 X-SA-Exim-Rcpt-To: swise@opengridcomputing.com, alex.williamson@redhat.com, benh@kernel.crashing.org, jglisse@redhat.com, dan.j.williams@intel.com, maxg@mellanox.com, jgg@mellanox.com, bhelgaas@google.com, keith.busch@intel.com, axboe@kernel.dk, hch@lst.de, sbates@raithlin.com, linux-block@vger.kernel.org, linux-nvdimm@lists.01.org, linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, sagi@grimberg.me X-SA-Exim-Mail-From: logang@deltatee.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on ale.deltatee.com X-Spam-Level: X-Spam-Status: No, score=-8.9 required=5.0 tests=ALL_TRUSTED,BAYES_00, GREYLIST_ISWHITE,T_RP_MATCHES_RCVD autolearn=ham autolearn_force=no version=3.4.1 Subject: Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/03/18 04:03 AM, Sagi Grimberg wrote: > Can you describe what would be the plan to have it when these devices > do come along? I'd say that p2p_dev needs to become a nvmet_ns reference > and not from nvmet_ctrl. Then, when cmb capable devices come along, the > ns can prefer to use its own cmb instead of locating a p2p_dev device? The patchset already supports CMB drives. That's essentially what patch 7 is for. We change the nvme-pci driver to use the p2pmem code to register and manage the CMB memory. After that, it is simply available to the nvmet code. We have already been using this with a couple prototype CMB drives. >> +static int nvmet_p2pdma_add_client(struct nvmet_ctrl *ctrl, >> +                   struct nvmet_ns *ns) >> +{ >> +    int ret; >> + >> +    if (!blk_queue_pci_p2pdma(ns->bdev->bd_queue)) { >> +        pr_err("peer-to-peer DMA is not supported by %s\n", >> +               ns->device_path); >> +        return -EINVAL; > > I'd say that just skip it instead of failing it, in theory one can > connect nvme devices via p2p mem and expose other devices in the > same subsystem. The message would be pr_debug to reduce chattiness. No, it's actually a bit more complicated than you make it. There's a couple cases: 1) The user adds a namespace but there hasn't been a connection and no p2pmem has been selected. In this case the add_client function is never called and the user can add whatever namespace they like. 2) When the first connect happens, nvmet_setup_p2pmem() will call add_client() for each namespace and rdma device. If any of them fail then it does not use a P2P device and falls back, as you'd like, to regular memory. 3) When the user adds a namespace after a port is in use and a compatible P2P device has been found. In this case, if the user tries to add a namespace that is not compatible with the P2P device in use then it fails adding the new namespace. The only alternative here is to tear everything down, ensure there are no P2P enabled buffers open and start using regular memory again... That is very difficult. I also disagree that these messages should be pr_debug. If a user is trying to use P2P memory and they enable it but the system doesn't choose to use their memory they must know why that is so they can make the necessary adjustments. If the system doesn't at least print a dmesg they are in the dark. >> +    } >> + >> +    ret = pci_p2pdma_add_client(&ctrl->p2p_clients, nvmet_ns_dev(ns)); >> +    if (ret) >> +        pr_err("failed to add peer-to-peer DMA client %s: %d\n", >> +               ns->device_path, ret); >> + >> +    return ret; >> +} >> + >>   int nvmet_ns_enable(struct nvmet_ns *ns) >>   { >>       struct nvmet_subsys *subsys = ns->subsys; >> @@ -299,6 +319,14 @@ int nvmet_ns_enable(struct nvmet_ns *ns) >>       if (ret) >>           goto out_blkdev_put; >> +    list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) { >> +        if (ctrl->p2p_dev) { >> +            ret = nvmet_p2pdma_add_client(ctrl, ns); >> +            if (ret) >> +                goto out_remove_clients; > > Is this really a fatal failure given that we fall-back to main > memory? Why not continue with main memory (and warn at best)? See above. It's fatal because we are already using p2p memory and we can't easily tear that all down when a user adds a new namespace. >> +    ctrl->p2p_dev = pci_p2pmem_find(&ctrl->p2p_clients); > > This is the first p2p_dev found right? What happens if I have more than > a single p2p device? In theory I'd have more p2p memory I can use. Have > you considered making pci_p2pmem_find return the least used suitable > device? Yes, it currently returns the first found. I imagine a bunch of improvements could be made to it in future work. However, I'd probably start with finding the nearest p2p device and then falling back to the least used if that's necessary. At this time though it's not clear how complicated these systems will get and what's actually needed here. >> @@ -68,6 +69,7 @@ struct nvmet_rdma_rsp { >>       u8            n_rdma; >>       u32            flags; >>       u32            invalidate_rkey; >> +    struct pci_dev        *p2p_dev; > > Given that p2p_client is in nvmet_req, I think it make sense > that the p2p_dev itself would also live there. In theory, nothing > is preventing FC from using it as well. Fair point. I can look at moving it in v3. Thanks, Logan