Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp4392980imm; Fri, 18 May 2018 04:29:59 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrbNTwxchCo8aqbx+kksxvGHru6NrQMZFbz7W8LsmpS4Ip79KunDlPuo2Q6W8zc8p2TNbPv X-Received: by 2002:a17:902:7e06:: with SMTP id b6-v6mr1290425plm.151.1526642998948; Fri, 18 May 2018 04:29:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526642998; cv=none; d=google.com; s=arc-20160816; b=Q3m+G5CaSzgoT0KfNgzYfbVF+3HAvrD/BXFG1+Gl+hO2zn0AAv76yx357EU++6U8Ta yB0pv3SC+xVE0OHCTJIOE8nA5krG77N7zHSqiqY+qGutQawWt0wq6mdYrQRJwOtieuMx bjCWR2Wk6aa3ay6BYqUyuaviBpH/uNVFCvZqGeM27hrO1p8MWBwadycX1GapiEyWs+dY qJ6XtkTWZ96kPyz/kDaR/18n21cVCKVdUbBHZREW/F8774fdxka9gXZAIR0GhWN4BD9E qbr1hrMNLvK4vZzbnEO0ns04gH49NHHSv1bYTOAvgr6x0u4KTpCZtWKsUl6MAl5T0PDz jUGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=IlVkkMaSTkKr8dHEy/I8CipP9oR2tRCXZZhwdoZce+s=; b=O9FKQD1sNAdrRTdltpRa7GDveKmBorLLa2Vz/kBXfp6CCbCBvNX9vPnszihUZ12VMq xivH69/yZfD5lCN4vi0e7LBwZwlYo4sHp+ql3pmi0Syh5L/s0T7JIe3L/poq/oAFGbh+ SzGLhJK4b+ee2Z1kvqrTgnFmoOqJ1s4g4zWCV9QwlEf8tElUQajAT01NpyMMjtu3C8CJ Ri15xHTHNe+D3WLShlNoLtsdUzrMO0MzKU7wCY6P3uP42fD9wqnq9yoDVaxulpjfBUJ5 xUYb8z9laIGO5G0Nzq558Tj3gtJ8/AQsrI3YLXfx2PuMsjeHx1Pth+IsBbq+icYuXVJ7 MMrA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d3-v6si2036042pgv.562.2018.05.18.04.29.44; Fri, 18 May 2018 04:29:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752336AbeERL32 (ORCPT + 99 others); Fri, 18 May 2018 07:29:28 -0400 Received: from mga04.intel.com ([192.55.52.120]:25567 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752103AbeERL3Z (ORCPT ); Fri, 18 May 2018 07:29:25 -0400 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 May 2018 04:29:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,414,1520924400"; d="scan'208";a="40974070" Received: from debian.sh.intel.com (HELO debian) ([10.67.104.203]) by fmsmga008.fm.intel.com with ESMTP; 18 May 2018 04:29:23 -0700 Date: Fri, 18 May 2018 19:29:50 +0800 From: Tiwei Bie To: Jason Wang Cc: mst@redhat.com, virtualization@lists.linux-foundation.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, wexu@redhat.com, jfreimann@redhat.com Subject: Re: [RFC v4 3/5] virtio_ring: add packed ring support Message-ID: <20180518112950.GA28224@debian> References: <20180516083737.26504-1-tiwei.bie@intel.com> <20180516083737.26504-4-tiwei.bie@intel.com> <2000f635-bc34-71ff-ff51-a711c2e9726d@redhat.com> <20180516123909.GB986@debian> <20180516134550.GB4171@debian> <20180516143332.GA1957@debian> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.9.5 (2018-04-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote: > On 2018年05月16日 22:33, Tiwei Bie wrote: > > On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote: > > > On 2018年05月16日 21:45, Tiwei Bie wrote: > > > > On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote: > > > > > On 2018年05月16日 20:39, Tiwei Bie wrote: > > > > > > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote: > > > > > > > On 2018年05月16日 16:37, Tiwei Bie wrote: > > [...] > > > > > > > > +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head, > > > > > > > > + unsigned int id, void **ctx) > > > > > > > > +{ > > > > > > > > + struct vring_packed_desc *desc; > > > > > > > > + unsigned int i, j; > > > > > > > > + > > > > > > > > + /* Clear data ptr. */ > > > > > > > > + vq->desc_state[id].data = NULL; > > > > > > > > + > > > > > > > > + i = head; > > > > > > > > + > > > > > > > > + for (j = 0; j < vq->desc_state[id].num; j++) { > > > > > > > > + desc = &vq->vring_packed.desc[i]; > > > > > > > > + vring_unmap_one_packed(vq, desc); > > > > > > > As mentioned in previous discussion, this probably won't work for the case > > > > > > > of out of order completion since it depends on the information in the > > > > > > > descriptor ring. We probably need to extend ctx to record such information. > > > > > > Above code doesn't depend on the information in the descriptor > > > > > > ring. The vq->desc_state[] is the extended ctx. > > > > > > > > > > > > Best regards, > > > > > > Tiwei Bie > > > > > Yes, but desc is a pointer to descriptor ring I think so > > > > > vring_unmap_one_packed() still depends on the content of descriptor ring? > > > > > > > > > I got your point now. I think it makes sense to reserve > > > > the bits of the addr field. Driver shouldn't try to get > > > > addrs from the descriptors when cleanup the descriptors > > > > no matter whether we support out-of-order or not. > > > Maybe I was wrong, but I remember spec mentioned something like this. > > You're right. Spec mentioned this. I was just repeating > > the spec to emphasize that it does make sense. :) > > > > > > But combining it with the out-of-order support, it will > > > > mean that the driver still needs to maintain a desc/ctx > > > > list that is very similar to the desc ring in the split > > > > ring. I'm not quite sure whether it's something we want. > > > > If it is true, I'll do it. So do you think we also want > > > > to maintain such a desc/ctx list for packed ring? > > > To make it work for OOO backends I think we need something like this > > > (hardware NIC drivers are usually have something like this). > > Which hardware NIC drivers have this? > > It's quite common I think, e.g driver track e.g dma addr and page frag > somewhere. e.g the ring->rx_info in mlx4 driver. It seems that I had a misunderstanding on your previous comments. I know it's quite common for drivers to track e.g. DMA addrs somewhere (and I think one reason behind this is that they want to reuse the bits of addr field). But tracking addrs somewhere doesn't means supporting OOO. I thought you were saying it's quite common for hardware NIC drivers to support OOO (i.e. NICs will return the descriptors OOO): I'm not familiar with mlx4, maybe I'm wrong. I just had a quick glance. And I found below comments in mlx4_en_process_rx_cq(): ``` /* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx * descriptor offset can be deduced from the CQE index instead of * reading 'cqe->index' */ index = cq->mcq.cons_index & ring->size_mask; cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor; ``` It seems that although they have a completion queue, they are still using the ring in order. I guess maybe storage device may want OOO. Best regards, Tiwei Bie > > Thanks > > > > > > Not for the patch, but it looks like having a OUT_OF_ORDER feature bit is > > > much more simpler to be started with. > > +1 > > > > Best regards, > > Tiwei Bie >