Received: by 10.192.165.148 with SMTP id m20csp3208565imm; Mon, 23 Apr 2018 02:31:24 -0700 (PDT) X-Google-Smtp-Source: AIpwx49KqBPILneWD3Ox+4JXYMiX/uJ1mOW8DR9c2xQcAc8LtYml8Ko1xJ7RHIio2+RHaCpLkF9L X-Received: by 10.98.227.13 with SMTP id g13mr14181314pfh.68.1524475884277; Mon, 23 Apr 2018 02:31:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524475884; cv=none; d=google.com; s=arc-20160816; b=Q2DWzJx1e8aNU8GE2YYD2gSXVZqmttZv/m5jwONWOQeVJL6tTHJclG7tuNR/av3eVY P9H0cv6lmrA+l5DWEa2ljTdsu0bps6y9lWxDDFfANJcMWvdzsYoGJJE4PfUnSqnKliuY SKw4cSKqxkt73wHs760bxfDiBlCInqZexxuOF5wIUN32ooYi7TwNgvMnTgtnQPMtVR8p aKbvQpkgBz12i527OlNTGQq6+Gtg4fWiF9V3iwrjk4Vyr3/DFtQg+SNAVdTLWdkEz4nM OMTWLT8jTwO6lZdfv4AbrXnJxFPkUdb9WhcbqzgoyF72M7DQ0DT01JqJW6WLKlvGm/6h nwxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=ZzY8s71aLBdgLwtChdWSEnvUi42h91K7/3pUMwme8sU=; b=J0iyxs0ZLB9W3MfR7mr5LK17J/T0yrIyvl9vTISOmlvbenxJpxDnzowPbRqMyR9oFj Zt446YHeQxZq/JNEjbEqgDGSvIHDvxMszQjge1Yum6u9mVnTxM+sgmW3HakslE1mDTR3 ftTSoRQZ5CCDor0nGh3PotcrVzeM+k4jDmfgGtqKVoYE2wZLTvju6Nr//B7aP20nj8qT jxMQg6UH97Tk1RSWvhxhSBAyUVEkX6dOYc0TLnJ8pk18aoYDFDN18N6R1CxdxCetTTDr NG9LQ0y5SwjKkTynz5yrJAvBDUxA3+ZFOUGgQpGL/0cy9tUIrplmI/Aa5Cagt/FTW2Ie 856g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k5-v6si12219552pln.598.2018.04.23.02.31.10; Mon, 23 Apr 2018 02:31:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754563AbeDWJ22 (ORCPT + 99 others); Mon, 23 Apr 2018 05:28:28 -0400 Received: from mga07.intel.com ([134.134.136.100]:33652 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754078AbeDWJ2Y (ORCPT ); Mon, 23 Apr 2018 05:28:24 -0400 X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 Apr 2018 02:28:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,317,1520924400"; d="scan'208";a="35424473" Received: from debian.sh.intel.com (HELO debian) ([10.67.104.164]) by orsmga007.jf.intel.com with ESMTP; 23 Apr 2018 02:28:22 -0700 Date: Mon, 23 Apr 2018 17:29:09 +0800 From: Tiwei Bie To: Jason Wang Cc: mst@redhat.com, wexu@redhat.com, virtualization@lists.linux-foundation.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, jfreimann@redhat.com Subject: Re: [RFC v2] virtio: support packed ring Message-ID: <20180423092908.77rii3gi7dcaf7o6@debian> References: <20180401141216.8969-1-tiwei.bie@intel.com> <515e635b-bc80-9b8d-72f9-b390ae5103ec@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <515e635b-bc80-9b8d-72f9-b390ae5103ec@redhat.com> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 23, 2018 at 01:42:14PM +0800, Jason Wang wrote: > On 2018年04月01日 22:12, Tiwei Bie wrote: > > Hello everyone, > > > > This RFC implements packed ring support for virtio driver. > > > > The code was tested with DPDK vhost (testpmd/vhost-PMD) implemented > > by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html > > Minor changes are needed for the vhost code, e.g. to kick the guest. > > > > TODO: > > - Refinements and bug fixes; > > - Split into small patches; > > - Test indirect descriptor support; > > - Test/fix event suppression support; > > - Test devices other than net; > > > > RFC v1 -> RFC v2: > > - Add indirect descriptor support - compile test only; > > - Add event suppression supprt - compile test only; > > - Move vring_packed_init() out of uapi (Jason, MST); > > - Merge two loops into one in virtqueue_add_packed() (Jason); > > - Split vring_unmap_one() for packed ring and split ring (Jason); > > - Avoid using '%' operator (Jason); > > - Rename free_head -> next_avail_idx (Jason); > > - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason); > > - Some other refinements and bug fixes; > > > > Thanks! > > > > Signed-off-by: Tiwei Bie > > --- > > drivers/virtio/virtio_ring.c | 1094 +++++++++++++++++++++++++++++------- > > include/linux/virtio_ring.h | 8 +- > > include/uapi/linux/virtio_config.h | 12 +- > > include/uapi/linux/virtio_ring.h | 61 ++ > > 4 files changed, 980 insertions(+), 195 deletions(-) > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > > index 71458f493cf8..0515dca34d77 100644 > > --- a/drivers/virtio/virtio_ring.c > > +++ b/drivers/virtio/virtio_ring.c > > @@ -58,14 +58,15 @@ > > [...] > > > + > > + if (vq->indirect) { > > + u32 len; > > + > > + desc = vq->desc_state[head].indir_desc; > > + /* Free the indirect table, if any, now that it's unmapped. */ > > + if (!desc) > > + goto out; > > + > > + len = virtio32_to_cpu(vq->vq.vdev, > > + vq->vring_packed.desc[head].len); > > + > > + BUG_ON(!(vq->vring_packed.desc[head].flags & > > + cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT))); > > It looks to me spec does not force to keep VRING_DESC_F_INDIRECT here. So we > can safely remove this BUG_ON() here. > > > + BUG_ON(len == 0 || len % sizeof(struct vring_packed_desc)); > > Len could be ignored for used descriptor according to the spec, so we need > remove this BUG_ON() too. Yeah, you're right! The BUG_ON() isn't right. I'll remove it. And I think something related to this in the spec isn't very clear currently. In the spec, there are below words: https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L272 """ In descriptors with VIRTQ_DESC_F_INDIRECT set VIRTQ_DESC_F_WRITE is reserved and is ignored by the device. """ So when device writes back an used descriptor in this case, device may not set the VIRTQ_DESC_F_WRITE flag as the flag is reserved and should be ignored. https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L170 """ Element Length is reserved for used descriptors without the VIRTQ_DESC_F_WRITE flag, and is ignored by drivers. """ And this is the way how driver ignores the `len` in an used descriptor. https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L241 """ To increase ring capacity the driver can store a (read-only by the device) table of indirect descriptors anywhere in memory, and insert a descriptor in the main virtqueue (with \field{Flags} bit VIRTQ_DESC_F_INDIRECT on) that refers to a buffer element containing this indirect descriptor table; """ So the indirect descriptors in the table are read-only by the device. And the only descriptor which is writeable by the device is the descriptor in the main virtqueue (with Flags bit VIRTQ_DESC_F_INDIRECT on). So if we ignore the `len` in this descriptor, we won't be able to get the length of the data written by the device. So I think the `len` in this descriptor will carry the length of the data written by the device (if the buffers are writable to the device) even if the VIRTQ_DESC_F_WRITE isn't set by the device. How do you think? > > The reason is we don't touch descriptor ring in the case of split, so > BUG_ON()s may help there. > > > + > > + for (j = 0; j < len / sizeof(struct vring_packed_desc); j++) > > + vring_unmap_one_packed(vq, &desc[j]); > > + > > + kfree(desc); > > + vq->desc_state[head].indir_desc = NULL; > > + } else if (ctx) { > > + *ctx = vq->desc_state[head].indir_desc; > > + } > > + > > +out: > > + return vq->desc_state[head].num; > > +} > > + > > +static inline bool more_used_split(const struct vring_virtqueue *vq) > > { > > return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx); > > } > > +static inline bool more_used_packed(const struct vring_virtqueue *vq) > > +{ > > + u16 last_used, flags; > > + bool avail, used; > > + > > + if (vq->vq.num_free == vq->vring_packed.num) > > + return false; > > + > > + last_used = vq->last_used_idx; > > + flags = virtio16_to_cpu(vq->vq.vdev, > > + vq->vring_packed.desc[last_used].flags); > > + avail = flags & VRING_DESC_F_AVAIL(1); > > + used = flags & VRING_DESC_F_USED(1); > > + > > + return avail == used; > > +} > > This looks interesting, spec said: > > " > Thus VIRTQ_DESC_F_AVAIL and VIRTQ_DESC_F_USED bits are different for an > available descriptor and > equal for a used descriptor. > Note that this observation is mostly useful for sanity-checking as these are > necessary but not sufficient > conditions - for example, all descriptors are zero-initialized. To detect > used and available descriptors it is > possible for drivers and devices to keep track of the last observed value of > VIRTQ_DESC_F_USED/VIRTQ_- > DESC_F_AVAIL. Other techniques to detect > VIRTQ_DESC_F_AVAIL/VIRTQ_DESC_F_USED bit changes > might also be possible. > " > > So it looks to me it was not sufficient, looking at the example codes in > spec, do we need to track last seen used_wrap_counter here? I don't think we have to track used_wrap_counter in driver. There was a discussion on this: https://lists.oasis-open.org/archives/virtio-dev/201802/msg00177.html And after that, below sentence was added (it's also in the above words you quoted): """ Other techniques to detect VIRTQ_DESC_F_AVAIL/VIRTQ_DESC_F_USED bit changes might also be possible. """ Best regards, Tiwei Bie > > Thanks