Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp4380634imm; Wed, 30 May 2018 04:43:38 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKbI4ae/bUNKNiradu3OtiAJn9dZnEvyX49WCrJCaAS9wGzPtqwvL9Mh8fVYzQeaLPKB90C X-Received: by 2002:a62:f58b:: with SMTP id b11-v6mr2455760pfm.113.1527680618899; Wed, 30 May 2018 04:43:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527680618; cv=none; d=google.com; s=arc-20160816; b=f+I3vNPOWV28XSNVT4Ob4rlvFar5xxjCguzWtwPgrcjWC4W212C2yhZ7rPEKnSuej4 fjgwvoMB61PbprTMVH/TQmWbwCw3mBjWCyYNs+giuaw1MrMR6tNBxdmQ6SXV1e9X79fK Tvkv4ciJkWlEUtsQSXfx8QtAYnSbL9InBJIxfRWxWzrFCSiyuOQlIripsiyN414uzujn KDogTCsUqEEjzZb5H0OyXYCPviLlykoU0J95X6WkjPSrW5++j1GFCluB+39+qdxd+Zf9 8XyjxrOiamIe4LfLUHiQX7e+dof4fgrqMnLFGhFaMEFBvWaa0lsnbiR7zs8KZk7+acvQ I8ZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=004TP56AFzuKMhY+nt0DlaHpurxIEpFLjp75WAkF2co=; b=rNDkp7WxQKLxDBAGzSiCo7X+32pPwK6ZQJjo8u/cInlkSw8hmFAzNBXFGUT2ipT8PO MePatALVldBgUqtg69tq2GjQV3Xx+bZyEc83XRiSAt78oYFZQ/mYAGMtEgYy3HmrJtfs Soy1B+8ExzuhTKfmbyB51hg7QD3lqZ6sR42Hrs8XTNQsgtCPIrEwfVgUSK6cydTuWw2a 5w4HUZOVlD1MD/IXNhiNz8F5ynsm5U9U7VeJU6ZwzQTR47fO1aMk/nWa+IEtDWJm7h6O QcOuvHKAEZbepEPrVMoKOdmhA0HQtbbPzFA2wYCMbvvQv6D+bHdQPOHCoIBSUf64gpha XiMA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c65-v6si1779683pfc.224.2018.05.30.04.43.24; Wed, 30 May 2018 04:43:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753849AbeE3Lmr (ORCPT + 99 others); Wed, 30 May 2018 07:42:47 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:59488 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753728AbeE3LmL (ORCPT ); Wed, 30 May 2018 07:42:11 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4956C859A2; Wed, 30 May 2018 11:42:10 +0000 (UTC) Received: from wei-ubt (dhcp-14-122.nay.redhat.com [10.66.14.122]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 0F8736B5B5; Wed, 30 May 2018 11:42:02 +0000 (UTC) Date: Wed, 30 May 2018 19:42:00 +0800 From: Wei Xu To: Jason Wang Cc: mst@redhat.com, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, jfreimann@redhat.com, tiwei.bie@intel.com Subject: Re: [RFC V5 PATCH 8/8] vhost: event suppression for packed ring Message-ID: <20180530114200.GA23792@wei-ubt> References: <1527559830-8133-1-git-send-email-jasowang@redhat.com> <1527559830-8133-9-git-send-email-jasowang@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1527559830-8133-9-git-send-email-jasowang@redhat.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Wed, 30 May 2018 11:42:10 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Wed, 30 May 2018 11:42:10 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'wexu@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 29, 2018 at 10:10:30AM +0800, Jason Wang wrote: > This patch introduces basic support for event suppression aka driver > and device area. > > Signed-off-by: Jason Wang > --- > drivers/vhost/vhost.c | 191 ++++++++++++++++++++++++++++++++++++--- > drivers/vhost/vhost.h | 10 +- > include/uapi/linux/virtio_ring.h | 19 ++++ > 3 files changed, 204 insertions(+), 16 deletions(-) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index a36e5ad2..112f680 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -1112,10 +1112,15 @@ static int vq_access_ok_packed(struct vhost_virtqueue *vq, unsigned int num, > struct vring_used __user *used) > { > struct vring_desc_packed *packed = (struct vring_desc_packed *)desc; > + struct vring_packed_desc_event *driver_event = > + (struct vring_packed_desc_event *)avail; > + struct vring_packed_desc_event *device_event = > + (struct vring_packed_desc_event *)used; > > - /* FIXME: check device area and driver area */ > return access_ok(VERIFY_READ, packed, num * sizeof(*packed)) && > - access_ok(VERIFY_WRITE, packed, num * sizeof(*packed)); > + access_ok(VERIFY_WRITE, packed, num * sizeof(*packed)) && > + access_ok(VERIFY_READ, driver_event, sizeof(*driver_event)) && > + access_ok(VERIFY_WRITE, device_event, sizeof(*device_event)); > } > > static int vq_access_ok_split(struct vhost_virtqueue *vq, unsigned int num, > @@ -1190,14 +1195,27 @@ static bool iotlb_access_ok(struct vhost_virtqueue *vq, > return true; > } > > -int vq_iotlb_prefetch(struct vhost_virtqueue *vq) > +int vq_iotlb_prefetch_packed(struct vhost_virtqueue *vq) > +{ > + int num = vq->num; > + > + return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc, > + num * sizeof(*vq->desc), VHOST_ADDR_DESC) && > + iotlb_access_ok(vq, VHOST_ACCESS_WO, (u64)(uintptr_t)vq->desc, > + num * sizeof(*vq->desc), VHOST_ADDR_DESC) && > + iotlb_access_ok(vq, VHOST_ACCESS_RO, > + (u64)(uintptr_t)vq->driver_event, > + sizeof(*vq->driver_event), VHOST_ADDR_AVAIL) && > + iotlb_access_ok(vq, VHOST_ACCESS_WO, > + (u64)(uintptr_t)vq->device_event, > + sizeof(*vq->device_event), VHOST_ADDR_USED); > +} > + > +int vq_iotlb_prefetch_split(struct vhost_virtqueue *vq) > { > size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0; > unsigned int num = vq->num; > > - if (!vq->iotlb) > - return 1; > - > return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc, > num * sizeof(*vq->desc), VHOST_ADDR_DESC) && > iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->avail, > @@ -1209,6 +1227,17 @@ int vq_iotlb_prefetch(struct vhost_virtqueue *vq) > num * sizeof(*vq->used->ring) + s, > VHOST_ADDR_USED); > } > + > +int vq_iotlb_prefetch(struct vhost_virtqueue *vq) > +{ > + if (!vq->iotlb) > + return 1; > + > + if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) > + return vq_iotlb_prefetch_packed(vq); > + else > + return vq_iotlb_prefetch_split(vq); > +} > EXPORT_SYMBOL_GPL(vq_iotlb_prefetch); > > /* Can we log writes? */ > @@ -1730,6 +1759,50 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq) > return 0; > } > > +static int vhost_update_device_flags(struct vhost_virtqueue *vq, > + __virtio16 device_flags) > +{ > + void __user *flags; > + > + if (vhost_put_user(vq, device_flags, &vq->device_event->flags, > + VHOST_ADDR_USED) < 0) > + return -EFAULT; > + if (unlikely(vq->log_used)) { > + /* Make sure the flag is seen before log. */ > + smp_wmb(); > + /* Log used flag write. */ > + flags = &vq->device_event->flags; > + log_write(vq->log_base, vq->log_addr + > + (flags - (void __user *)vq->device_event), > + sizeof(vq->device_event->flags)); > + if (vq->log_ctx) > + eventfd_signal(vq->log_ctx, 1); > + } > + return 0; > +} > + > +static int vhost_update_device_off_wrap(struct vhost_virtqueue *vq, > + __virtio16 device_off_wrap) > +{ > + void __user *off_wrap; > + > + if (vhost_put_user(vq, device_off_wrap, &vq->device_event->off_wrap, > + VHOST_ADDR_USED) < 0) > + return -EFAULT; > + if (unlikely(vq->log_used)) { > + /* Make sure the flag is seen before log. */ > + smp_wmb(); > + /* Log used flag write. */ > + off_wrap = &vq->device_event->off_wrap; > + log_write(vq->log_base, vq->log_addr + > + (off_wrap - (void __user *)vq->device_event), > + sizeof(vq->device_event->off_wrap)); > + if (vq->log_ctx) > + eventfd_signal(vq->log_ctx, 1); > + } > + return 0; > +} > + > static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event) > { > if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->avail_idx), > @@ -2683,16 +2756,13 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, > } > EXPORT_SYMBOL_GPL(vhost_add_used_n); > > -static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq) > +static bool vhost_notify_split(struct vhost_dev *dev, > + struct vhost_virtqueue *vq) > { > __u16 old, new; > __virtio16 event; > bool v; > > - /* FIXME: check driver area */ > - if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) > - return true; > - > /* Flush out used index updates. This is paired > * with the barrier that the Guest executes when enabling > * interrupts. */ > @@ -2725,6 +2795,64 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq) > return vring_need_event(vhost16_to_cpu(vq, event), new, old); > } > > +static bool vhost_notify_packed(struct vhost_dev *dev, > + struct vhost_virtqueue *vq) > +{ > + __virtio16 event_off_wrap, event_flags; > + __u16 old, new, off_wrap; > + bool v; > + > + /* Flush out used descriptors updates. This is paired > + * with the barrier that the Guest executes when enabling > + * interrupts. > + */ > + smp_mb(); > + > + if (vhost_get_avail(vq, event_flags, > + &vq->driver_event->flags) < 0) { > + vq_err(vq, "Failed to get driver desc_event_flags"); > + return true; > + } > + > + if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) > + return event_flags != > + cpu_to_vhost16(vq, RING_EVENT_FLAGS_DISABLE); > + > + old = vq->signalled_used; > + v = vq->signalled_used_valid; > + new = vq->signalled_used = vq->last_used_idx; > + vq->signalled_used_valid = true; > + > + if (event_flags != cpu_to_vhost16(vq, RING_EVENT_FLAGS_DESC)) > + return event_flags != > + cpu_to_vhost16(vq, RING_EVENT_FLAGS_DISABLE); > + > + /* Read desc event flags before event_off and event_wrap */ > + smp_rmb(); > + > + if (vhost_get_avail(vq, event_off_wrap, > + &vq->driver_event->off_wrap) < 0) { > + vq_err(vq, "Failed to get driver desc_event_off/wrap"); > + return true; > + } > + > + off_wrap = vhost16_to_cpu(vq, event_off_wrap); > + > + if (unlikely(!v)) > + return true; > + > + return vhost_vring_packed_need_event(vq, vq->used_wrap_counter, > + off_wrap, new, old); > +} > + > +static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq) > +{ > + if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) > + return vhost_notify_packed(dev, vq); > + else > + return vhost_notify_split(dev, vq); > +} > + > /* This actually signals the guest, using eventfd. */ > void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq) > { > @@ -2802,10 +2930,34 @@ static bool vhost_enable_notify_packed(struct vhost_dev *dev, > struct vhost_virtqueue *vq) > { > struct vring_desc_packed *d = vq->desc_packed + vq->avail_idx; > - __virtio16 flags; > + __virtio16 flags = RING_EVENT_FLAGS_ENABLE; > int ret; > > - /* FIXME: disable notification through device area */ > + if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY)) > + return false; > + vq->used_flags &= ~VRING_USED_F_NO_NOTIFY; 'used_flags' was originally designed for 1.0, why should we pay attetion to it here? Wei > + > + if (vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) { > + __virtio16 off_wrap = cpu_to_vhost16(vq, vq->avail_idx | > + vq->avail_wrap_counter << 15); > + > + ret = vhost_update_device_off_wrap(vq, off_wrap); > + if (ret) { > + vq_err(vq, "Failed to write to off warp at %p: %d\n", > + &vq->device_event->off_wrap, ret); > + return false; > + } > + /* Make sure off_wrap is wrote before flags */ > + smp_wmb(); > + flags = RING_EVENT_FLAGS_DESC; > + } > + > + ret = vhost_update_device_flags(vq, flags); > + if (ret) { > + vq_err(vq, "Failed to enable notification at %p: %d\n", > + &vq->device_event->flags, ret); > + return false; > + } > > /* They could have slipped one in as we were doing that: make > * sure it's written, then check again. */ > @@ -2871,7 +3023,18 @@ EXPORT_SYMBOL_GPL(vhost_enable_notify); > static void vhost_disable_notify_packed(struct vhost_dev *dev, > struct vhost_virtqueue *vq) > { > - /* FIXME: disable notification through device area */ > + __virtio16 flags; > + int r; > + > + if (vq->used_flags & VRING_USED_F_NO_NOTIFY) > + return; > + vq->used_flags |= VRING_USED_F_NO_NOTIFY; > + > + flags = cpu_to_vhost16(vq, RING_EVENT_FLAGS_DISABLE); > + r = vhost_update_device_flags(vq, flags); > + if (r) > + vq_err(vq, "Failed to enable notification at %p: %d\n", > + &vq->device_event->flags, r); > } > > static void vhost_disable_notify_split(struct vhost_dev *dev, > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h > index 7543a46..b920582 100644 > --- a/drivers/vhost/vhost.h > +++ b/drivers/vhost/vhost.h > @@ -96,8 +96,14 @@ struct vhost_virtqueue { > struct vring_desc __user *desc; > struct vring_desc_packed __user *desc_packed; > }; > - struct vring_avail __user *avail; > - struct vring_used __user *used; > + union { > + struct vring_avail __user *avail; > + struct vring_packed_desc_event __user *driver_event; > + }; > + union { > + struct vring_used __user *used; > + struct vring_packed_desc_event __user *device_event; > + }; > const struct vhost_umem_node *meta_iotlb[VHOST_NUM_ADDRS]; > struct file *kick; > struct eventfd_ctx *call_ctx; > diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h > index e297580..71c7a46 100644 > --- a/include/uapi/linux/virtio_ring.h > +++ b/include/uapi/linux/virtio_ring.h > @@ -75,6 +75,25 @@ struct vring_desc_packed { > __virtio16 flags; > }; > > +/* Enable events */ > +#define RING_EVENT_FLAGS_ENABLE 0x0 > +/* Disable events */ > +#define RING_EVENT_FLAGS_DISABLE 0x1 > +/* > + * Enable events for a specific descriptor > + * (as specified by Descriptor Ring Change Event Offset/Wrap Counter). > + * Only valid if VIRTIO_F_RING_EVENT_IDX has been negotiated. > + */ > +#define RING_EVENT_FLAGS_DESC 0x2 > +/* The value 0x3 is reserved */ > + > +struct vring_packed_desc_event { > + /* Descriptor Ring Change Event Offset and Wrap Counter */ > + __virtio16 off_wrap; > + /* Descriptor Ring Change Event Flags */ > + __virtio16 flags; > +}; > + > /* Virtio ring descriptors: 16 bytes. These can chain together via "next". */ > struct vring_desc { > /* Address (guest-physical). */ > -- > 2.7.4 >