The issue was reported by Yihuang Yu on NVidia's grace-hopper (ARM64)
platform. The wrong head (available ring entry) is seen by the guest
when running 'netperf' on the guest and running 'netserver' on another
NVidia's grace-grace machine.
/home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
-accel kvm -machine virt,gic-version=host -cpu host \
-smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
-m 4096M,slots=16,maxmem=64G \
-object memory-backend-ram,id=mem0,size=4096M \
: \
-netdev tap,id=tap0,vhost=true \
-device virtio-net-pci,bus=pcie.8,netdev=tap0,mac=52:54:00:f1:26:b0
:
guest# ifconfig eth0 | grep 'inet addr'
inet addr:10.26.1.220
guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
virtio_net virtio0: output.0:id 100 is not a head!
There is missed smp_rmb() in vhost_{vq_avail_empty, enable_notify}()
Without smp_rmb(), vq->avail_idx is advanced but the available ring
entries aren't arriving to vhost side yet. So a stale available ring
entry can be fetched in vhost_get_vq_desc().
Fix it by adding smp_rmb() in those two functions. Note that I need
two patches so that they can be easily picked up by the stable kernel.
With the changes, I'm unable to hit the issue again. Besides, the
function vhost_get_avail_idx() is improved to tackle the memory barrier
so that the callers needn't to worry about it.
v2: https://lore.kernel.org/virtualization/[email protected]
v1: https://lore.kernel.org/virtualization/[email protected]
Changelog
=========
v3:
Improved change log (Jason)
Improved comments and added PATCH[v3 3/3] to execute
smp_rmb() in vhost_get_avail_idx() (Michael)
Gavin Shan (3):
vhost: Add smp_rmb() in vhost_vq_avail_empty()
vhost: Add smp_rmb() in vhost_enable_notify()
vhost: Improve vhost_get_avail_idx() with smp_rmb()
drivers/vhost/vhost.c | 51 ++++++++++++++++++++-----------------------
1 file changed, 24 insertions(+), 27 deletions(-)
--
2.44.0
A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
Will. Otherwise, it's not ensured the available ring entries pushed
by guest can be observed by vhost in time, leading to stale available
ring entries fetched by vhost in vhost_get_vq_desc(), as reported by
Yihuang Yu on NVidia's grace-hopper (ARM64) platform.
/home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
-accel kvm -machine virt,gic-version=host -cpu host \
-smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
-m 4096M,slots=16,maxmem=64G \
-object memory-backend-ram,id=mem0,size=4096M \
: \
-netdev tap,id=vnet0,vhost=true \
-device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
:
guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
virtio_net virtio0: output.0:id 100 is not a head!
Add the missed smp_rmb() in vhost_vq_avail_empty(). When tx_can_batch()
returns true, it means there's still pending tx buffers. Since it might
read indices, so it still can bypass the smp_rmb() in vhost_get_vq_desc().
Note that it should be safe until vq->avail_idx is changed by commit
275bf960ac697 ("vhost: better detection of available buffers").
Fixes: 275bf960ac69 ("vhost: better detection of available buffers")
Cc: <[email protected]> # v4.11+
Reported-by: Yihuang Yu <[email protected]>
Suggested-by: Will Deacon <[email protected]>
Signed-off-by: Gavin Shan <[email protected]>
Acked-by: Jason Wang <[email protected]>
---
drivers/vhost/vhost.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 045f666b4f12..29df65b2ebf2 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2799,9 +2799,19 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
r = vhost_get_avail_idx(vq, &avail_idx);
if (unlikely(r))
return false;
+
vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
+ if (vq->avail_idx != vq->last_avail_idx) {
+ /* Since we have updated avail_idx, the following
+ * call to vhost_get_vq_desc() will read available
+ * ring entries. Make sure that read happens after
+ * the avail_idx read.
+ */
+ smp_rmb();
+ return false;
+ }
- return vq->avail_idx == vq->last_avail_idx;
+ return true;
}
EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
--
2.44.0
A smp_rmb() has been missed in vhost_enable_notify(), inspired by
Will. Otherwise, it's not ensured the available ring entries pushed
by guest can be observed by vhost in time, leading to stale available
ring entries fetched by vhost in vhost_get_vq_desc(), as reported by
Yihuang Yu on NVidia's grace-hopper (ARM64) platform.
/home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
-accel kvm -machine virt,gic-version=host -cpu host \
-smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
-m 4096M,slots=16,maxmem=64G \
-object memory-backend-ram,id=mem0,size=4096M \
: \
-netdev tap,id=vnet0,vhost=true \
-device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
:
guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
virtio_net virtio0: output.0:id 100 is not a head!
Add the missed smp_rmb() in vhost_enable_notify(). When it returns true,
it means there's still pending tx buffers. Since it might read indices,
so it still can bypass the smp_rmb() in vhost_get_vq_desc(). Note that
it should be safe until vq->avail_idx is changed by commit d3bb267bbdcb
("vhost: cache avail index in vhost_enable_notify()").
Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()")
Cc: <[email protected]> # v5.18+
Reported-by: Yihuang Yu <[email protected]>
Suggested-by: Will Deacon <[email protected]>
Signed-off-by: Gavin Shan <[email protected]>
Acked-by: Jason Wang <[email protected]>
---
drivers/vhost/vhost.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 29df65b2ebf2..32686c79c41d 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2848,9 +2848,19 @@ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
&vq->avail->idx, r);
return false;
}
+
vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
+ if (vq->avail_idx != vq->last_avail_idx) {
+ /* Since we have updated avail_idx, the following
+ * call to vhost_get_vq_desc() will read available
+ * ring entries. Make sure that read happens after
+ * the avail_idx read.
+ */
+ smp_rmb();
+ return true;
+ }
- return vq->avail_idx != vq->last_avail_idx;
+ return false;
}
EXPORT_SYMBOL_GPL(vhost_enable_notify);
--
2.44.0
All the callers of vhost_get_avail_idx() are concerned to the memory
barrier, imposed by smp_rmb() to ensure the order of the available
ring entry read and avail_idx read.
Improve vhost_get_avail_idx() so that smp_rmb() is executed when
the avail_idx is advanced. With it, the callers needn't to worry
about the memory barrier.
Suggested-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Gavin Shan <[email protected]>
---
drivers/vhost/vhost.c | 75 +++++++++++++++----------------------------
1 file changed, 26 insertions(+), 49 deletions(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 32686c79c41d..e6882f4f6ce2 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1290,10 +1290,28 @@ static void vhost_dev_unlock_vqs(struct vhost_dev *d)
mutex_unlock(&d->vqs[i]->mutex);
}
-static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq,
- __virtio16 *idx)
+static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)
{
- return vhost_get_avail(vq, *idx, &vq->avail->idx);
+ __virtio16 avail_idx;
+ int r;
+
+ r = vhost_get_avail(vq, avail_idx, &vq->avail->idx);
+ if (unlikely(r)) {
+ vq_err(vq, "Failed to access avail idx at %p\n",
+ &vq->avail->idx);
+ return r;
+ }
+
+ vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
+ if (vq->avail_idx != vq->last_avail_idx) {
+ /* Ensure the available ring entry read happens
+ * before the avail_idx read when the avail_idx
+ * is advanced.
+ */
+ smp_rmb();
+ }
+
+ return 0;
}
static inline int vhost_get_avail_head(struct vhost_virtqueue *vq,
@@ -2499,7 +2517,6 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
struct vring_desc desc;
unsigned int i, head, found = 0;
u16 last_avail_idx;
- __virtio16 avail_idx;
__virtio16 ring_head;
int ret, access;
@@ -2507,12 +2524,8 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
last_avail_idx = vq->last_avail_idx;
if (vq->avail_idx == vq->last_avail_idx) {
- if (unlikely(vhost_get_avail_idx(vq, &avail_idx))) {
- vq_err(vq, "Failed to access avail idx at %p\n",
- &vq->avail->idx);
+ if (unlikely(vhost_get_avail_idx(vq)))
return -EFAULT;
- }
- vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
if (unlikely((u16)(vq->avail_idx - last_avail_idx) > vq->num)) {
vq_err(vq, "Guest moved used index from %u to %u",
@@ -2525,11 +2538,6 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
*/
if (vq->avail_idx == last_avail_idx)
return vq->num;
-
- /* Only get avail ring entries after they have been
- * exposed by guest.
- */
- smp_rmb();
}
/* Grab the next descriptor number they're advertising, and increment
@@ -2790,35 +2798,19 @@ EXPORT_SYMBOL_GPL(vhost_add_used_and_signal_n);
/* return true if we're sure that avaiable ring is empty */
bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
{
- __virtio16 avail_idx;
- int r;
-
if (vq->avail_idx != vq->last_avail_idx)
return false;
- r = vhost_get_avail_idx(vq, &avail_idx);
- if (unlikely(r))
- return false;
-
- vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
- if (vq->avail_idx != vq->last_avail_idx) {
- /* Since we have updated avail_idx, the following
- * call to vhost_get_vq_desc() will read available
- * ring entries. Make sure that read happens after
- * the avail_idx read.
- */
- smp_rmb();
+ if (unlikely(vhost_get_avail_idx(vq)))
return false;
- }
- return true;
+ return vq->avail_idx == vq->last_avail_idx;
}
EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
/* OK, now we need to know about added descriptors. */
bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
{
- __virtio16 avail_idx;
int r;
if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY))
@@ -2842,25 +2834,10 @@ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
/* They could have slipped one in as we were doing that: make
* sure it's written, then check again. */
smp_mb();
- r = vhost_get_avail_idx(vq, &avail_idx);
- if (r) {
- vq_err(vq, "Failed to check avail idx at %p: %d\n",
- &vq->avail->idx, r);
+ if (unlikely(vhost_get_avail_idx(vq)))
return false;
- }
-
- vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
- if (vq->avail_idx != vq->last_avail_idx) {
- /* Since we have updated avail_idx, the following
- * call to vhost_get_vq_desc() will read available
- * ring entries. Make sure that read happens after
- * the avail_idx read.
- */
- smp_rmb();
- return true;
- }
- return false;
+ return vq->avail_idx != vq->last_avail_idx;
}
EXPORT_SYMBOL_GPL(vhost_enable_notify);
--
2.44.0
On Thu, Mar 28, 2024 at 8:22 AM Gavin Shan <[email protected]> wrote:
>
> All the callers of vhost_get_avail_idx() are concerned to the memory
> barrier, imposed by smp_rmb() to ensure the order of the available
> ring entry read and avail_idx read.
>
> Improve vhost_get_avail_idx() so that smp_rmb() is executed when
> the avail_idx is advanced. With it, the callers needn't to worry
> about the memory barrier.
>
> Suggested-by: Michael S. Tsirkin <[email protected]>
> Signed-off-by: Gavin Shan <[email protected]>
Acked-by: Jason Wang <[email protected]>
Thanks
On Thu, Mar 28, 2024 at 10:21:49AM +1000, Gavin Shan wrote:
> All the callers of vhost_get_avail_idx() are concerned to the memory
> barrier, imposed by smp_rmb() to ensure the order of the available
> ring entry read and avail_idx read.
>
> Improve vhost_get_avail_idx() so that smp_rmb() is executed when
> the avail_idx is advanced. With it, the callers needn't to worry
> about the memory barrier.
>
> Suggested-by: Michael S. Tsirkin <[email protected]>
> Signed-off-by: Gavin Shan <[email protected]>
Previous patches are ok. This one I feel needs more work -
first more code such as sanity checking should go into
this function, second there's actually a difference
between comparing to last_avail_idx and just comparing
to the previous value of avail_idx.
I will pick patches 1-2 and post a cleanup on top so you can
take a look, ok?
> ---
> drivers/vhost/vhost.c | 75 +++++++++++++++----------------------------
> 1 file changed, 26 insertions(+), 49 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 32686c79c41d..e6882f4f6ce2 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1290,10 +1290,28 @@ static void vhost_dev_unlock_vqs(struct vhost_dev *d)
> mutex_unlock(&d->vqs[i]->mutex);
> }
>
> -static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq,
> - __virtio16 *idx)
> +static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)
> {
> - return vhost_get_avail(vq, *idx, &vq->avail->idx);
> + __virtio16 avail_idx;
> + int r;
> +
> + r = vhost_get_avail(vq, avail_idx, &vq->avail->idx);
> + if (unlikely(r)) {
> + vq_err(vq, "Failed to access avail idx at %p\n",
> + &vq->avail->idx);
> + return r;
> + }
> +
> + vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
> + if (vq->avail_idx != vq->last_avail_idx) {
> + /* Ensure the available ring entry read happens
> + * before the avail_idx read when the avail_idx
> + * is advanced.
> + */
> + smp_rmb();
> + }
> +
> + return 0;
> }
>
> static inline int vhost_get_avail_head(struct vhost_virtqueue *vq,
> @@ -2499,7 +2517,6 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
> struct vring_desc desc;
> unsigned int i, head, found = 0;
> u16 last_avail_idx;
> - __virtio16 avail_idx;
> __virtio16 ring_head;
> int ret, access;
>
> @@ -2507,12 +2524,8 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
> last_avail_idx = vq->last_avail_idx;
>
> if (vq->avail_idx == vq->last_avail_idx) {
> - if (unlikely(vhost_get_avail_idx(vq, &avail_idx))) {
> - vq_err(vq, "Failed to access avail idx at %p\n",
> - &vq->avail->idx);
> + if (unlikely(vhost_get_avail_idx(vq)))
> return -EFAULT;
> - }
> - vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
>
> if (unlikely((u16)(vq->avail_idx - last_avail_idx) > vq->num)) {
> vq_err(vq, "Guest moved used index from %u to %u",
> @@ -2525,11 +2538,6 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
> */
> if (vq->avail_idx == last_avail_idx)
> return vq->num;
> -
> - /* Only get avail ring entries after they have been
> - * exposed by guest.
> - */
> - smp_rmb();
> }
>
> /* Grab the next descriptor number they're advertising, and increment
> @@ -2790,35 +2798,19 @@ EXPORT_SYMBOL_GPL(vhost_add_used_and_signal_n);
> /* return true if we're sure that avaiable ring is empty */
> bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> {
> - __virtio16 avail_idx;
> - int r;
> -
> if (vq->avail_idx != vq->last_avail_idx)
> return false;
>
> - r = vhost_get_avail_idx(vq, &avail_idx);
> - if (unlikely(r))
> - return false;
> -
> - vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
> - if (vq->avail_idx != vq->last_avail_idx) {
> - /* Since we have updated avail_idx, the following
> - * call to vhost_get_vq_desc() will read available
> - * ring entries. Make sure that read happens after
> - * the avail_idx read.
> - */
> - smp_rmb();
> + if (unlikely(vhost_get_avail_idx(vq)))
> return false;
> - }
>
> - return true;
> + return vq->avail_idx == vq->last_avail_idx;
> }
> EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
>
> /* OK, now we need to know about added descriptors. */
> bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> {
> - __virtio16 avail_idx;
> int r;
>
> if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY))
> @@ -2842,25 +2834,10 @@ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> /* They could have slipped one in as we were doing that: make
> * sure it's written, then check again. */
> smp_mb();
> - r = vhost_get_avail_idx(vq, &avail_idx);
> - if (r) {
> - vq_err(vq, "Failed to check avail idx at %p: %d\n",
> - &vq->avail->idx, r);
> + if (unlikely(vhost_get_avail_idx(vq)))
> return false;
> - }
> -
> - vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
> - if (vq->avail_idx != vq->last_avail_idx) {
> - /* Since we have updated avail_idx, the following
> - * call to vhost_get_vq_desc() will read available
> - * ring entries. Make sure that read happens after
> - * the avail_idx read.
> - */
> - smp_rmb();
> - return true;
> - }
>
> - return false;
> + return vq->avail_idx != vq->last_avail_idx;
> }
> EXPORT_SYMBOL_GPL(vhost_enable_notify);
>
> --
> 2.44.0
On Thu, Mar 28, 2024 at 10:21:47AM +1000, Gavin Shan wrote:
>A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
>Will. Otherwise, it's not ensured the available ring entries pushed
>by guest can be observed by vhost in time, leading to stale available
>ring entries fetched by vhost in vhost_get_vq_desc(), as reported by
>Yihuang Yu on NVidia's grace-hopper (ARM64) platform.
>
> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
> -accel kvm -machine virt,gic-version=host -cpu host \
> -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
> -m 4096M,slots=16,maxmem=64G \
> -object memory-backend-ram,id=mem0,size=4096M \
> : \
> -netdev tap,id=vnet0,vhost=true \
> -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
> :
> guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
> virtio_net virtio0: output.0:id 100 is not a head!
>
>Add the missed smp_rmb() in vhost_vq_avail_empty(). When tx_can_batch()
>returns true, it means there's still pending tx buffers. Since it might
>read indices, so it still can bypass the smp_rmb() in vhost_get_vq_desc().
>Note that it should be safe until vq->avail_idx is changed by commit
>275bf960ac697 ("vhost: better detection of available buffers").
>
>Fixes: 275bf960ac69 ("vhost: better detection of available buffers")
>Cc: <[email protected]> # v4.11+
>Reported-by: Yihuang Yu <[email protected]>
>Suggested-by: Will Deacon <[email protected]>
>Signed-off-by: Gavin Shan <[email protected]>
>Acked-by: Jason Wang <[email protected]>
>---
> drivers/vhost/vhost.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
Reviewed-by: Stefano Garzarella <[email protected]>
>
>diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>index 045f666b4f12..29df65b2ebf2 100644
>--- a/drivers/vhost/vhost.c
>+++ b/drivers/vhost/vhost.c
>@@ -2799,9 +2799,19 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> r = vhost_get_avail_idx(vq, &avail_idx);
> if (unlikely(r))
> return false;
>+
> vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
>+ if (vq->avail_idx != vq->last_avail_idx) {
>+ /* Since we have updated avail_idx, the following
>+ * call to vhost_get_vq_desc() will read available
>+ * ring entries. Make sure that read happens after
>+ * the avail_idx read.
>+ */
>+ smp_rmb();
>+ return false;
>+ }
>
>- return vq->avail_idx == vq->last_avail_idx;
>+ return true;
> }
> EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
>
>--
>2.44.0
>
On Thu, Mar 28, 2024 at 10:21:48AM +1000, Gavin Shan wrote:
>A smp_rmb() has been missed in vhost_enable_notify(), inspired by
>Will. Otherwise, it's not ensured the available ring entries pushed
>by guest can be observed by vhost in time, leading to stale available
>ring entries fetched by vhost in vhost_get_vq_desc(), as reported by
>Yihuang Yu on NVidia's grace-hopper (ARM64) platform.
>
> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
> -accel kvm -machine virt,gic-version=host -cpu host \
> -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
> -m 4096M,slots=16,maxmem=64G \
> -object memory-backend-ram,id=mem0,size=4096M \
> : \
> -netdev tap,id=vnet0,vhost=true \
> -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
> :
> guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
> virtio_net virtio0: output.0:id 100 is not a head!
>
>Add the missed smp_rmb() in vhost_enable_notify(). When it returns true,
>it means there's still pending tx buffers. Since it might read indices,
>so it still can bypass the smp_rmb() in vhost_get_vq_desc(). Note that
>it should be safe until vq->avail_idx is changed by commit d3bb267bbdcb
>("vhost: cache avail index in vhost_enable_notify()").
>
>Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()")
>Cc: <[email protected]> # v5.18+
>Reported-by: Yihuang Yu <[email protected]>
>Suggested-by: Will Deacon <[email protected]>
>Signed-off-by: Gavin Shan <[email protected]>
>Acked-by: Jason Wang <[email protected]>
>---
> drivers/vhost/vhost.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
Thanks for fixing this!
Reviewed-by: Stefano Garzarella <[email protected]>
>
>diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>index 29df65b2ebf2..32686c79c41d 100644
>--- a/drivers/vhost/vhost.c
>+++ b/drivers/vhost/vhost.c
>@@ -2848,9 +2848,19 @@ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> &vq->avail->idx, r);
> return false;
> }
>+
> vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
>+ if (vq->avail_idx != vq->last_avail_idx) {
>+ /* Since we have updated avail_idx, the following
>+ * call to vhost_get_vq_desc() will read available
>+ * ring entries. Make sure that read happens after
>+ * the avail_idx read.
>+ */
>+ smp_rmb();
>+ return true;
>+ }
>
>- return vq->avail_idx != vq->last_avail_idx;
>+ return false;
> }
> EXPORT_SYMBOL_GPL(vhost_enable_notify);
>
>--
>2.44.0
>
On 3/28/24 19:31, Michael S. Tsirkin wrote:
> On Thu, Mar 28, 2024 at 10:21:49AM +1000, Gavin Shan wrote:
>> All the callers of vhost_get_avail_idx() are concerned to the memory
>> barrier, imposed by smp_rmb() to ensure the order of the available
>> ring entry read and avail_idx read.
>>
>> Improve vhost_get_avail_idx() so that smp_rmb() is executed when
>> the avail_idx is advanced. With it, the callers needn't to worry
>> about the memory barrier.
>>
>> Suggested-by: Michael S. Tsirkin <[email protected]>
>> Signed-off-by: Gavin Shan <[email protected]>
>
> Previous patches are ok. This one I feel needs more work -
> first more code such as sanity checking should go into
> this function, second there's actually a difference
> between comparing to last_avail_idx and just comparing
> to the previous value of avail_idx.
> I will pick patches 1-2 and post a cleanup on top so you can
> take a look, ok?
>
Thanks, Michael. It's fine to me.
Thanks,
Gavin
Hi Michael,
On 3/30/24 19:02, Gavin Shan wrote:
> On 3/28/24 19:31, Michael S. Tsirkin wrote:
>> On Thu, Mar 28, 2024 at 10:21:49AM +1000, Gavin Shan wrote:
>>> All the callers of vhost_get_avail_idx() are concerned to the memory
>>> barrier, imposed by smp_rmb() to ensure the order of the available
>>> ring entry read and avail_idx read.
>>>
>>> Improve vhost_get_avail_idx() so that smp_rmb() is executed when
>>> the avail_idx is advanced. With it, the callers needn't to worry
>>> about the memory barrier.
>>>
>>> Suggested-by: Michael S. Tsirkin <[email protected]>
>>> Signed-off-by: Gavin Shan <[email protected]>
>>
>> Previous patches are ok. This one I feel needs more work -
>> first more code such as sanity checking should go into
>> this function, second there's actually a difference
>> between comparing to last_avail_idx and just comparing
>> to the previous value of avail_idx.
>> I will pick patches 1-2 and post a cleanup on top so you can
>> take a look, ok?
>>
>
> Thanks, Michael. It's fine to me.
>
A kindly ping.
If it's ok to you, could you please merge PATCH[1-2]? Our downstream
9.4 need the fixes, especially for NVidia's grace-hopper and grace-grace
platforms.
For PATCH[3], I also can help with the improvement if you don't have time
for it. Please let me know.
Thanks,
Gavin
On Mon, Apr 08, 2024 at 02:15:24PM +1000, Gavin Shan wrote:
> Hi Michael,
>
> On 3/30/24 19:02, Gavin Shan wrote:
> > On 3/28/24 19:31, Michael S. Tsirkin wrote:
> > > On Thu, Mar 28, 2024 at 10:21:49AM +1000, Gavin Shan wrote:
> > > > All the callers of vhost_get_avail_idx() are concerned to the memory
> > > > barrier, imposed by smp_rmb() to ensure the order of the available
> > > > ring entry read and avail_idx read.
> > > >
> > > > Improve vhost_get_avail_idx() so that smp_rmb() is executed when
> > > > the avail_idx is advanced. With it, the callers needn't to worry
> > > > about the memory barrier.
> > > >
> > > > Suggested-by: Michael S. Tsirkin <[email protected]>
> > > > Signed-off-by: Gavin Shan <[email protected]>
> > >
> > > Previous patches are ok. This one I feel needs more work -
> > > first more code such as sanity checking should go into
> > > this function, second there's actually a difference
> > > between comparing to last_avail_idx and just comparing
> > > to the previous value of avail_idx.
> > > I will pick patches 1-2 and post a cleanup on top so you can
> > > take a look, ok?
> > >
> >
> > Thanks, Michael. It's fine to me.
> >
>
> A kindly ping.
>
> If it's ok to you, could you please merge PATCH[1-2]? Our downstream
> 9.4 need the fixes, especially for NVidia's grace-hopper and grace-grace
> platforms.
Yes - in the next rc hopefully.
> For PATCH[3], I also can help with the improvement if you don't have time
> for it. Please let me know.
>
> Thanks,
> Gavin
That would be great.
--
MST
On Mon, Apr 08, 2024 at 02:15:24PM +1000, Gavin Shan wrote:
> Hi Michael,
>
> On 3/30/24 19:02, Gavin Shan wrote:
> > On 3/28/24 19:31, Michael S. Tsirkin wrote:
> > > On Thu, Mar 28, 2024 at 10:21:49AM +1000, Gavin Shan wrote:
> > > > All the callers of vhost_get_avail_idx() are concerned to the memory
> > > > barrier, imposed by smp_rmb() to ensure the order of the available
> > > > ring entry read and avail_idx read.
> > > >
> > > > Improve vhost_get_avail_idx() so that smp_rmb() is executed when
> > > > the avail_idx is advanced. With it, the callers needn't to worry
> > > > about the memory barrier.
> > > >
> > > > Suggested-by: Michael S. Tsirkin <[email protected]>
> > > > Signed-off-by: Gavin Shan <[email protected]>
> > >
> > > Previous patches are ok. This one I feel needs more work -
> > > first more code such as sanity checking should go into
> > > this function, second there's actually a difference
> > > between comparing to last_avail_idx and just comparing
> > > to the previous value of avail_idx.
> > > I will pick patches 1-2 and post a cleanup on top so you can
> > > take a look, ok?
> > >
> >
> > Thanks, Michael. It's fine to me.
> >
>
> A kindly ping.
>
> If it's ok to you, could you please merge PATCH[1-2]? Our downstream
> 9.4 need the fixes, especially for NVidia's grace-hopper and grace-grace
> platforms.
>
> For PATCH[3], I also can help with the improvement if you don't have time
> for it. Please let me know.
>
> Thanks,
> Gavin
The thing to do is basically diff with the patch I wrote :)
We can also do a bit more cleanups on top of *that*, like unifying
error handling.
--
MST
On Mon, Apr 08, 2024 at 02:15:24PM +1000, Gavin Shan wrote:
> Hi Michael,
>
> On 3/30/24 19:02, Gavin Shan wrote:
> > On 3/28/24 19:31, Michael S. Tsirkin wrote:
> > > On Thu, Mar 28, 2024 at 10:21:49AM +1000, Gavin Shan wrote:
> > > > All the callers of vhost_get_avail_idx() are concerned to the memory
> > > > barrier, imposed by smp_rmb() to ensure the order of the available
> > > > ring entry read and avail_idx read.
> > > >
> > > > Improve vhost_get_avail_idx() so that smp_rmb() is executed when
> > > > the avail_idx is advanced. With it, the callers needn't to worry
> > > > about the memory barrier.
> > > >
> > > > Suggested-by: Michael S. Tsirkin <[email protected]>
> > > > Signed-off-by: Gavin Shan <[email protected]>
> > >
> > > Previous patches are ok. This one I feel needs more work -
> > > first more code such as sanity checking should go into
> > > this function, second there's actually a difference
> > > between comparing to last_avail_idx and just comparing
> > > to the previous value of avail_idx.
> > > I will pick patches 1-2 and post a cleanup on top so you can
> > > take a look, ok?
> > >
> >
> > Thanks, Michael. It's fine to me.
> >
>
> A kindly ping.
>
> If it's ok to you, could you please merge PATCH[1-2]? Our downstream
> 9.4 need the fixes, especially for NVidia's grace-hopper and grace-grace
> platforms.
>
> For PATCH[3], I also can help with the improvement if you don't have time
> for it. Please let me know.
>
> Thanks,
> Gavin
1-2 are upstream go ahead and post the cleanup.
--
MST
On 4/23/24 06:46, Michael S. Tsirkin wrote:
> On Mon, Apr 08, 2024 at 02:15:24PM +1000, Gavin Shan wrote:
>> On 3/30/24 19:02, Gavin Shan wrote:
>>> On 3/28/24 19:31, Michael S. Tsirkin wrote:
>>>> On Thu, Mar 28, 2024 at 10:21:49AM +1000, Gavin Shan wrote:
>>>>> All the callers of vhost_get_avail_idx() are concerned to the memory
>>>>> barrier, imposed by smp_rmb() to ensure the order of the available
>>>>> ring entry read and avail_idx read.
>>>>>
>>>>> Improve vhost_get_avail_idx() so that smp_rmb() is executed when
>>>>> the avail_idx is advanced. With it, the callers needn't to worry
>>>>> about the memory barrier.
>>>>>
>>>>> Suggested-by: Michael S. Tsirkin <[email protected]>
>>>>> Signed-off-by: Gavin Shan <[email protected]>
>>>>
>>>> Previous patches are ok. This one I feel needs more work -
>>>> first more code such as sanity checking should go into
>>>> this function, second there's actually a difference
>>>> between comparing to last_avail_idx and just comparing
>>>> to the previous value of avail_idx.
>>>> I will pick patches 1-2 and post a cleanup on top so you can
>>>> take a look, ok?
>>>>
>>>
>>> Thanks, Michael. It's fine to me.
>>>
>>
>> A kindly ping.
>>
>> If it's ok to you, could you please merge PATCH[1-2]? Our downstream
>> 9.4 need the fixes, especially for NVidia's grace-hopper and grace-grace
>> platforms.
>>
>> For PATCH[3], I also can help with the improvement if you don't have time
>> for it. Please let me know.
>>
>
> 1-2 are upstream go ahead and post the cleanup.
>
Michael, a cleanup series has been sent for review.
https://lore.kernel.org/virtualization/[email protected]/T/#t
Thanks,
Gavin