Received: by 10.223.185.116 with SMTP id b49csp2984970wrg; Mon, 5 Mar 2018 11:58:44 -0800 (PST) X-Google-Smtp-Source: AG47ELvK6tui2wq1fSr/yE1Mp1wtmPLWh7OtU+cZ4HxPA/qWMIhLqtKNn2KkrBMlCQc4gYn4uPeC X-Received: by 10.101.97.139 with SMTP id c11mr13011865pgv.435.1520279924855; Mon, 05 Mar 2018 11:58:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520279924; cv=none; d=google.com; s=arc-20160816; b=XJMKm5PZbA1/r7orLuiQxLbxkFM7hfwoYpmon8GyNbewtsct30kuSNG0+n6evoceTc 5WQld87W71bpVqQGLBYgLpL3IpGsWlFCXhF0DvXN48OUY5i0AwN7LO0XL5iUROulELc0 3TBF4C3qmdTItYsmSaLsV5PVP4mZ1lgXKmNbhpWCm+wbvxs26625ExWF3xtTxsJu2rWd We7BSrAhgG498+tCsFE5eqH4NO4K8qZm/MgbbCqDhC/acnpYqpvRYwYoFgm7Mx8eSzXH J39aIwxkx+WffKKWfkHv4dNIGpQnliXCFFtjRIZE/+O9jLSOo8P0HMDg+Lu2l+lV/v8Z zq/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=6kwNa3vvH8N9grRii7PPyXAPxtJbdZR8dGqdfatrcok=; b=vLFSRLA239tagb+X4b3DmeuOBh+SuNBYFW7izEau8RObgdO+QEs4olaargpCOG1aEp 6bommKfpHzlIpZksIIJ2uaWWh1bWrUr0NIUWyuLTvzjQSORHzI5rZqbx7NhWpGmQm7Va hFhHe/LHF6m7rk1sX/tu/uIBYfpV16yrtUz1fMYRswgu15r4bsCc/IQqZV6LX1NSHxv+ xYmODFkrgRQhNSgxQtJacxRUVdqn8+6rCzEvfwfpz8f05+KZa8+kK3RWjFgQ87qBtAFZ 23pTcXi0E+dJeeT4UzGiuyPP0HYNY2Dv+uknM/kJQV/Zb8+/tUslIzDzCfJTrCkC/LZQ p+4A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 201si8733234pge.119.2018.03.05.11.58.30; Mon, 05 Mar 2018 11:58:44 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932112AbeCET5h (ORCPT + 99 others); Mon, 5 Mar 2018 14:57:37 -0500 Received: from mail-wm0-f68.google.com ([74.125.82.68]:50538 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751779AbeCET5f (ORCPT ); Mon, 5 Mar 2018 14:57:35 -0500 Received: by mail-wm0-f68.google.com with SMTP id w128so18430416wmw.0; Mon, 05 Mar 2018 11:57:34 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=6kwNa3vvH8N9grRii7PPyXAPxtJbdZR8dGqdfatrcok=; b=LFRV5TkXSvhQnH4RQphKW+4AwdMz1gfZPd3Fu2vN4cT2z2Z4DeXtTeLUB57ZsOrJ0u SSuL+zaY7Q7drnPSt0fdxaiEGrECBoztuUgHvkzYjnPGwY/aRcpG2KWW7rPp8GrceGDS 0yFrwM+50ZRD5SAK6lZ98EdGF759AyLkRL3a0DQSHVr57bzTqojbFCZ8e74szjSaaZEd uZgVeqgEo96p/mg+wbJW6a3lG/fmx1jiN08zxd2X120yfzcVsYaiFhBh3+julLuNlt/S CbZplYqVytBQcElS/+cZk5xfWaW39fAebmOrD5/Y3PVqwMZZjEAHtpuPrymZDyMwR3Fn ik9w== X-Gm-Message-State: AElRT7EWDxyGwrwVvUlsfVVqm7ErtUKadFO/se5JVPB4PxtNyD2VOEG2 e+s2FQ35X2oLs2wJjUaSPLk= X-Received: by 10.28.249.21 with SMTP id x21mr9748376wmh.114.1520279854067; Mon, 05 Mar 2018 11:57:34 -0800 (PST) Received: from [192.168.96.32] ([174.127.205.131]) by smtp.gmail.com with ESMTPSA id h50sm26208732wrf.65.2018.03.05.11.57.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 05 Mar 2018 11:57:33 -0800 (PST) Subject: Re: [PATCH v2 07/10] nvme-pci: Use PCI p2pmem subsystem to manage the CMB To: Keith Busch , Oliver Cc: Jens Axboe , "linux-nvdimm@lists.01.org" , linux-rdma@vger.kernel.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, Alex Williamson , Jason Gunthorpe , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Benjamin Herrenschmidt , Bjorn Helgaas , Max Gurtovoy , Christoph Hellwig References: <20180228234006.21093-1-logang@deltatee.com> <20180228234006.21093-8-logang@deltatee.com> <20180305160004.GA30975@localhost.localdomain> From: Sagi Grimberg Message-ID: <36c78987-006a-a97f-1d18-b0a08cbea9d4@grimberg.me> Date: Mon, 5 Mar 2018 21:57:27 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180305160004.GA30975@localhost.localdomain> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org >>> - if (nvmeq->sq_cmds_io) >>> - memcpy_toio(&nvmeq->sq_cmds_io[tail], cmd, sizeof(*cmd)); >>> - else >>> - memcpy(&nvmeq->sq_cmds[tail], cmd, sizeof(*cmd)); >>> + memcpy(&nvmeq->sq_cmds[tail], cmd, sizeof(*cmd)); >> >> Hmm, how safe is replacing memcpy_toio() with regular memcpy()? On PPC >> the _toio() variant enforces alignment, does the copy with 4 byte >> stores, and has a full barrier after the copy. In comparison our >> regular memcpy() does none of those things and may use unaligned and >> vector load/stores. For normal (cacheable) memory that is perfectly >> fine, but they can cause alignment faults when targeted at MMIO >> (cache-inhibited) memory. >> >> I think in this particular case it might be ok since we know SEQs are >> aligned to 64 byte boundaries and the copy is too small to use our >> vectorised memcpy(). I'll assume we don't need explicit ordering >> between writes of SEQs since the existing code doesn't seem to care >> unless the doorbell is being rung, so you're probably fine there too. >> That said, I still think this is a little bit sketchy and at the very >> least you should add a comment explaining what's going on when the CMB >> is being used. If someone more familiar with the NVMe driver could >> chime in I would appreciate it. > > I may not be understanding the concern, but I'll give it a shot. > > You're right, the start of any SQE is always 64-byte aligned, so that > should satisfy alignment requirements. > > The order when writing multiple/successive SQEs in a submission queue > does matter, and this is currently serialized through the q_lock. > > The order in which the bytes of a single SQE is written doesn't really > matter as long as the entire SQE is written into the CMB prior to writing > that SQ's doorbell register. > > The doorbell register is written immediately after copying a command > entry into the submission queue (ignore "shadow buffer" features), > so the doorbells written to commands submitted is 1:1. > > If a CMB SQE and DB order is not enforced with the memcpy, then we do > need a barrier after the SQE's memcpy and before the doorbell's writel. Keith, while we're on this, regardless of cmb, is SQE memcopy and DB update ordering always guaranteed? If you look at mlx4 (rdma device driver) that works exactly the same as nvme you will find: -- qp->sq.head += nreq; /* * Make sure that descriptors are written before * doorbell record. */ wmb(); writel(qp->doorbell_qpn, to_mdev(ibqp->device)->uar_map + MLX4_SEND_DOORBELL); /* * Make sure doorbells don't leak out of SQ spinlock * and reach the HCA out of order. */ mmiowb(); -- That seems to explicitly make sure to place a barrier before updating the doorbell. So as I see it, either ordering is guaranteed and the above code is redundant, or nvme needs to do the same. Thoughts?