2015-12-17 10:32:58

by Michael S. Tsirkin

[permalink] [raw]
Subject: [PATCH] virtio_ring: use smp_store_mb

We need a full barrier after writing out event index, using smp_store_mb
there seems better than open-coding.
As usual, we need a wrapper to account for strong barriers/non smp.

It's tempting to use this in vhost as well, for that, we'll
need a variant of smp_store_mb that works on __user pointers.

Signed-off-by: Michael S. Tsirkin <[email protected]>
---

Seems to give a speedup on my box but I'm less sure about this one. E.g. as
xchng faster than mfence on all/most intel CPUs? Anyone has an opinion?

include/linux/virtio_ring.h | 14 ++++++++++++++
drivers/virtio/virtio_ring.c | 15 +++++++++------
2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index 0135c16..8912189 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -47,6 +47,20 @@ static inline void virtio_wmb(bool weak_barriers)
wmb();
}

+static inline void virtio_store_mb(bool weak_barriers,
+ __virtio16 *p, __virtio16 v)
+{
+#ifdef CONFIG_SMP
+ if (weak_barriers)
+ smp_store_mb(*p, v);
+ else
+#endif
+ {
+ WRITE_ONCE(*p, v);
+ mb();
+ }
+}
+
static inline __virtio16 virtio_load_acquire(bool weak_barriers, __virtio16 *p)
{
if (!weak_barriers) {
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index f822cab..b0aea67 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -517,10 +517,10 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
/* If we expect an interrupt for the next entry, tell host
* by writing event index and flush out the write before
* the read in the next get_buf call. */
- if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
- vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, vq->last_used_idx);
- virtio_mb(vq->weak_barriers);
- }
+ if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
+ virtio_store_mb(vq->weak_barriers,
+ &vring_used_event(&vq->vring),
+ cpu_to_virtio16(_vq->vdev, vq->last_used_idx));

#ifdef DEBUG
vq->last_add_time_valid = false;
@@ -653,8 +653,11 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
}
/* TODO: tune this threshold */
bufs = (u16)(vq->avail_idx_shadow - vq->last_used_idx) * 3 / 4;
- vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs);
- virtio_mb(vq->weak_barriers);
+
+ virtio_store_mb(vq->weak_barriers,
+ &vring_used_event(&vq->vring),
+ cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs));
+
if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->vring.used->idx) - vq->last_used_idx) > bufs)) {
END_USE(vq);
return false;
--
MST


2015-12-17 10:52:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] virtio_ring: use smp_store_mb

On Thu, Dec 17, 2015 at 12:32:53PM +0200, Michael S. Tsirkin wrote:
> +static inline void virtio_store_mb(bool weak_barriers,
> + __virtio16 *p, __virtio16 v)
> +{
> +#ifdef CONFIG_SMP
> + if (weak_barriers)
> + smp_store_mb(*p, v);
> + else
> +#endif
> + {
> + WRITE_ONCE(*p, v);
> + mb();
> + }
> +}

This is a different barrier depending on SMP, that seems wrong.

smp_mb(), as (should be) used by smp_store_mb() does not provide a
barrier against IO. mb() otoh does.

Since this is virtIO I would expect you always want mb().

2015-12-17 11:22:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] virtio_ring: use smp_store_mb

On Thu, Dec 17, 2015 at 12:32:53PM +0200, Michael S. Tsirkin wrote:
> Seems to give a speedup on my box but I'm less sure about this one. E.g. as
> xchng faster than mfence on all/most intel CPUs? Anyone has an opinion?

Would help if you Cc people who would actually know this :-)

Yes, we've recently established that xchg is indeed faster than mfence
on at least recent machines, see:

lkml.kernel.org/r/CA+55aFynbkeuUGs9s-q+fLY6MeRBA6MjEyWWbbe7A5AaqsAknw@mail.gmail.com

> +static inline void virtio_store_mb(bool weak_barriers,
> + __virtio16 *p, __virtio16 v)
> +{
> +#ifdef CONFIG_SMP
> + if (weak_barriers)
> + smp_store_mb(*p, v);
> + else
> +#endif
> + {
> + WRITE_ONCE(*p, v);
> + mb();
> + }
> +}

Note that virtio_mb() is weirdly inconsistent with virtio_[rw]mb() in
that they use dma_* ops for weak_barriers, while virtio_mb() uses
smp_mb().

As previously stated, smp_mb() does not cover the same memory domains as
dma_mb() would.

2015-12-17 13:16:24

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] virtio_ring: use smp_store_mb

On Thu, Dec 17, 2015 at 11:52:38AM +0100, Peter Zijlstra wrote:
> On Thu, Dec 17, 2015 at 12:32:53PM +0200, Michael S. Tsirkin wrote:
> > +static inline void virtio_store_mb(bool weak_barriers,
> > + __virtio16 *p, __virtio16 v)
> > +{
> > +#ifdef CONFIG_SMP
> > + if (weak_barriers)
> > + smp_store_mb(*p, v);
> > + else
> > +#endif
> > + {
> > + WRITE_ONCE(*p, v);
> > + mb();
> > + }
> > +}
>
> This is a different barrier depending on SMP, that seems wrong.

Of course it's wrong in the sense that it's
suboptimal on UP. What we would really like is to
have, on UP, exactly the same barrier as on SMP.
This is because a UP guest can run on an SMP host.

But Linux doesn't provide this ability: if CONFIG_SMP is
not defined is optimizes most barriers out to a
compiler barrier.

Consider for example x86: what we want is xchg (NOT
mfence - see below for why) but if built without CONFIG_SMP
smp_store_mb does not include this.



> smp_mb(), as (should be) used by smp_store_mb() does not provide a
> barrier against IO. mb() otoh does.
>
> Since this is virtIO I would expect you always want mb().

No because it's VIRTio not real io :) It just switches to the hyprevisor
mode - kind of like a function call really.
The weak_barriers flag is cleared for when it's used
with real devices with real IO.


All this is explained in some detail at the top of
include/linux/virtio.h



--
MST

2015-12-17 13:26:35

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] virtio_ring: use smp_store_mb

On Thu, Dec 17, 2015 at 12:22:22PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 17, 2015 at 12:32:53PM +0200, Michael S. Tsirkin wrote:
> > Seems to give a speedup on my box but I'm less sure about this one. E.g. as
> > xchng faster than mfence on all/most intel CPUs? Anyone has an opinion?
>
> Would help if you Cc people who would actually know this :-)

Good point. Glad you still saw this. Thanks!

> Yes, we've recently established that xchg is indeed faster than mfence
> on at least recent machines, see:
>
> lkml.kernel.org/r/CA+55aFynbkeuUGs9s-q+fLY6MeRBA6MjEyWWbbe7A5AaqsAknw@mail.gmail.com
>
> > +static inline void virtio_store_mb(bool weak_barriers,
> > + __virtio16 *p, __virtio16 v)
> > +{
> > +#ifdef CONFIG_SMP
> > + if (weak_barriers)
> > + smp_store_mb(*p, v);
> > + else
> > +#endif
> > + {
> > + WRITE_ONCE(*p, v);
> > + mb();
> > + }
> > +}
>
> Note that virtio_mb() is weirdly inconsistent with virtio_[rw]mb() in
> that they use dma_* ops for weak_barriers, while virtio_mb() uses
> smp_mb().

It's a hack really. I think I'll clean it up a bit to
make it more consistent.

To simplify things, you may consider things before
the optimization brought in by
commit 9e1a27ea42691429e31f158cce6fc61bc79bb2e9
Author: Alexander Duyck <[email protected]>
Date: Mon Apr 13 21:03:49 2015 +0930

virtio_ring: Update weak barriers to use dma_wmb/rmb

> As previously stated, smp_mb() does not cover the same memory domains as
> dma_mb() would.

I know. We used to consistently do the right thing on SMP,
but on UP Linux does not have good portable APIs for us
to use. So we hack around with what's available which is
typically stronger than what's really needed.

I guess no one cares about UP that much.

The Alexander came and tried to optimize UP using
dma_wmb/dma_rmb. I guess he did not find dma_mb so
left it as is.

Maybe we should make virtio depend on SMP, and be done with it,
but the amount of code to maintain !SMP is small enough
to not be worth the potential pain to users (if any).



--
MST

2015-12-17 13:57:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] virtio_ring: use smp_store_mb

On Thu, Dec 17, 2015 at 03:16:20PM +0200, Michael S. Tsirkin wrote:
> On Thu, Dec 17, 2015 at 11:52:38AM +0100, Peter Zijlstra wrote:
> > On Thu, Dec 17, 2015 at 12:32:53PM +0200, Michael S. Tsirkin wrote:
> > > +static inline void virtio_store_mb(bool weak_barriers,
> > > + __virtio16 *p, __virtio16 v)
> > > +{
> > > +#ifdef CONFIG_SMP
> > > + if (weak_barriers)
> > > + smp_store_mb(*p, v);
> > > + else
> > > +#endif
> > > + {
> > > + WRITE_ONCE(*p, v);
> > > + mb();
> > > + }
> > > +}
> >
> > This is a different barrier depending on SMP, that seems wrong.
>
> Of course it's wrong in the sense that it's
> suboptimal on UP. What we would really like is to
> have, on UP, exactly the same barrier as on SMP.
> This is because a UP guest can run on an SMP host.
>
> But Linux doesn't provide this ability: if CONFIG_SMP is
> not defined is optimizes most barriers out to a
> compiler barrier.
>
> Consider for example x86: what we want is xchg (NOT
> mfence - see below for why) but if built without CONFIG_SMP
> smp_store_mb does not include this.

You could of course go fix that instead of mutilating things into
sort-of functional state.

>
>
> > smp_mb(), as (should be) used by smp_store_mb() does not provide a
> > barrier against IO. mb() otoh does.
> >
> > Since this is virtIO I would expect you always want mb().
>
> No because it's VIRTio not real io :) It just switches to the hyprevisor
> mode - kind of like a function call really.
> The weak_barriers flag is cleared for when it's used
> with real devices with real IO.
>
>
> All this is explained in some detail at the top of
> include/linux/virtio.h

I did read that, it didn't make any sense wrt the code below it.

For instance it seems to imply weak_barriers is for smp like stuff while
!weak_barriers is for actual devices.

But then you go use dma_*mb() ops, which are specifially for devices
only for weak_barrier.

2015-12-17 14:02:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] virtio_ring: use smp_store_mb

On Thu, Dec 17, 2015 at 03:26:29PM +0200, Michael S. Tsirkin wrote:
> > Note that virtio_mb() is weirdly inconsistent with virtio_[rw]mb() in
> > that they use dma_* ops for weak_barriers, while virtio_mb() uses
> > smp_mb().
>
> It's a hack really. I think I'll clean it up a bit to
> make it more consistent.
>
> To simplify things, you may consider things before
> the optimization brought in by
> commit 9e1a27ea42691429e31f158cce6fc61bc79bb2e9
> Author: Alexander Duyck <[email protected]>
> Date: Mon Apr 13 21:03:49 2015 +0930
>
> virtio_ring: Update weak barriers to use dma_wmb/rmb

That commit doesn't make any sense. dma_*mb() explicitly does _NOT_
cover the smp_*mb() part.

Again, look at the ARM definitions, the smp_*mb() primitives use the
inner coherence stuff, while the dma_*mb() primitives use the outer
coherent stuff.

the *mb() primitives cover both.

2015-12-17 14:33:49

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] virtio_ring: use smp_store_mb

On Thu, Dec 17, 2015 at 02:57:26PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 17, 2015 at 03:16:20PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Dec 17, 2015 at 11:52:38AM +0100, Peter Zijlstra wrote:
> > > On Thu, Dec 17, 2015 at 12:32:53PM +0200, Michael S. Tsirkin wrote:
> > > > +static inline void virtio_store_mb(bool weak_barriers,
> > > > + __virtio16 *p, __virtio16 v)
> > > > +{
> > > > +#ifdef CONFIG_SMP
> > > > + if (weak_barriers)
> > > > + smp_store_mb(*p, v);
> > > > + else
> > > > +#endif
> > > > + {
> > > > + WRITE_ONCE(*p, v);
> > > > + mb();
> > > > + }
> > > > +}
> > >
> > > This is a different barrier depending on SMP, that seems wrong.
> >
> > Of course it's wrong in the sense that it's
> > suboptimal on UP. What we would really like is to
> > have, on UP, exactly the same barrier as on SMP.
> > This is because a UP guest can run on an SMP host.
> >
> > But Linux doesn't provide this ability: if CONFIG_SMP is
> > not defined is optimizes most barriers out to a
> > compiler barrier.
> >
> > Consider for example x86: what we want is xchg (NOT
> > mfence - see below for why) but if built without CONFIG_SMP
> > smp_store_mb does not include this.
>
> You could of course go fix that instead of mutilating things into
> sort-of functional state.

Yes, we'd just need to touch all architectures, all for
the sake of UP which almost no one uses.
Basically, we need APIs that explicitly are
for talking to another kernel on a different CPU on
the same SMP system, and implemented identically
between CONFIG_SMP and !CONFIG_SMP on all architectures.

Do you think this is something of general usefulness,
outside virtio?

> >
> >
> > > smp_mb(), as (should be) used by smp_store_mb() does not provide a
> > > barrier against IO. mb() otoh does.
> > >
> > > Since this is virtIO I would expect you always want mb().
> >
> > No because it's VIRTio not real io :) It just switches to the hyprevisor
> > mode - kind of like a function call really.
> > The weak_barriers flag is cleared for when it's used
> > with real devices with real IO.
> >
> >
> > All this is explained in some detail at the top of
> > include/linux/virtio.h
>
> I did read that, it didn't make any sense wrt the code below it.
>
> For instance it seems to imply weak_barriers is for smp like stuff while
> !weak_barriers is for actual devices.
>
> But then you go use dma_*mb() ops, which are specifially for devices
> only for weak_barrier.

Yes the dma_*mb thing is a kind of an interface misuse.
It's an optimization for UP introduced here:

commit 9e1a27ea42691429e31f158cce6fc61bc79bb2e9
Author: Alexander Duyck <[email protected]>
Date: Mon Apr 13 21:03:49 2015 +0930

virtio_ring: Update weak barriers to use dma_wmb/rmb
This change makes it so that instead of using smp_wmb/rmb which varies
depending on the kernel configuration we can can use dma_wmb/rmb which for
most architectures should be equal to or slightly more strict than
smp_wmb/rmb.

The advantage to this is that these barriers are available to uniprocessor
builds as well so the performance should improve under such a
configuration.

Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>

but given the confusion this causes, I'm inclined to revert
this, and later switch to for cleaner API if that
appears. Cc Alexander (at the new address).


--
MST

2015-12-17 14:35:03

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] virtio_ring: use smp_store_mb

On Thu, Dec 17, 2015 at 03:02:12PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 17, 2015 at 03:26:29PM +0200, Michael S. Tsirkin wrote:
> > > Note that virtio_mb() is weirdly inconsistent with virtio_[rw]mb() in
> > > that they use dma_* ops for weak_barriers, while virtio_mb() uses
> > > smp_mb().
> >
> > It's a hack really. I think I'll clean it up a bit to
> > make it more consistent.
> >
> > To simplify things, you may consider things before
> > the optimization brought in by
> > commit 9e1a27ea42691429e31f158cce6fc61bc79bb2e9
> > Author: Alexander Duyck <[email protected]>
> > Date: Mon Apr 13 21:03:49 2015 +0930
> >
> > virtio_ring: Update weak barriers to use dma_wmb/rmb
>
> That commit doesn't make any sense. dma_*mb() explicitly does _NOT_
> cover the smp_*mb() part.
>
> Again, look at the ARM definitions, the smp_*mb() primitives use the
> inner coherence stuff, while the dma_*mb() primitives use the outer
> coherent stuff.

Does outer coherent imply inner coherent?

> the *mb() primitives cover both.

2015-12-17 14:39:15

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] virtio_ring: use smp_store_mb

On Thu, Dec 17, 2015 at 04:33:44PM +0200, Michael S. Tsirkin wrote:
> On Thu, Dec 17, 2015 at 02:57:26PM +0100, Peter Zijlstra wrote:
> >
> > You could of course go fix that instead of mutilating things into
> > sort-of functional state.
>
> Yes, we'd just need to touch all architectures, all for
> the sake of UP which almost no one uses.
> Basically, we need APIs that explicitly are
> for talking to another kernel on a different CPU on
> the same SMP system, and implemented identically
> between CONFIG_SMP and !CONFIG_SMP on all architectures.
>
> Do you think this is something of general usefulness,
> outside virtio?

I'm not aware of any other case, but if there are more parts of virt
that need this then I see no problem adding it.

That is, virt in general is the only use-case that I can think of,
because this really is an artifact of interfacing with an SMP host while
running an UP kernel.

But I'm really not familiar with virt, so I do not know if there's more
sites outside of virtio that could use this.

Touching all archs is a tad tedious, but its fairly straight forward.

2015-12-17 14:43:57

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] virtio_ring: use smp_store_mb

On Thu, Dec 17, 2015 at 03:39:10PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 17, 2015 at 04:33:44PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Dec 17, 2015 at 02:57:26PM +0100, Peter Zijlstra wrote:
> > >
> > > You could of course go fix that instead of mutilating things into
> > > sort-of functional state.
> >
> > Yes, we'd just need to touch all architectures, all for
> > the sake of UP which almost no one uses.
> > Basically, we need APIs that explicitly are
> > for talking to another kernel on a different CPU on
> > the same SMP system, and implemented identically
> > between CONFIG_SMP and !CONFIG_SMP on all architectures.
> >
> > Do you think this is something of general usefulness,
> > outside virtio?
>
> I'm not aware of any other case, but if there are more parts of virt
> that need this then I see no problem adding it.
>
> That is, virt in general is the only use-case that I can think of,
> because this really is an artifact of interfacing with an SMP host while
> running an UP kernel.
>
> But I'm really not familiar with virt, so I do not know if there's more
> sites outside of virtio that could use this.
>
> Touching all archs is a tad tedious, but its fairly straight forward.

Thanks, will take a stub at this.

2015-12-17 15:09:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] virtio_ring: use smp_store_mb

On Thu, Dec 17, 2015 at 04:34:57PM +0200, Michael S. Tsirkin wrote:
> On Thu, Dec 17, 2015 at 03:02:12PM +0100, Peter Zijlstra wrote:

> > > commit 9e1a27ea42691429e31f158cce6fc61bc79bb2e9
> > > Author: Alexander Duyck <[email protected]>
> > > Date: Mon Apr 13 21:03:49 2015 +0930
> > >
> > > virtio_ring: Update weak barriers to use dma_wmb/rmb
> >
> > That commit doesn't make any sense. dma_*mb() explicitly does _NOT_
> > cover the smp_*mb() part.
> >
> > Again, look at the ARM definitions, the smp_*mb() primitives use the
> > inner coherence stuff, while the dma_*mb() primitives use the outer
> > coherent stuff.
>
> Does outer coherent imply inner coherent?
>
> > the *mb() primitives cover both.

I do not think so, but lets add Will, he dreams this stuff.

2015-12-17 15:52:31

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] virtio_ring: use smp_store_mb

On Thu, Dec 17, 2015 at 04:09:17PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 17, 2015 at 04:34:57PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Dec 17, 2015 at 03:02:12PM +0100, Peter Zijlstra wrote:
>
> > > > commit 9e1a27ea42691429e31f158cce6fc61bc79bb2e9
> > > > Author: Alexander Duyck <[email protected]>
> > > > Date: Mon Apr 13 21:03:49 2015 +0930
> > > >
> > > > virtio_ring: Update weak barriers to use dma_wmb/rmb
> > >
> > > That commit doesn't make any sense. dma_*mb() explicitly does _NOT_
> > > cover the smp_*mb() part.
> > >
> > > Again, look at the ARM definitions, the smp_*mb() primitives use the
> > > inner coherence stuff, while the dma_*mb() primitives use the outer
> > > coherent stuff.
> >
> > Does outer coherent imply inner coherent?
> >
> > > the *mb() primitives cover both.
>
> I do not think so, but lets add Will, he dreams this stuff.

Right, and I don't sleep well these days.

Anyway, the outer-shareable domain (osh) is a superset of the
inner-shareable domain (ish). The inner-shareable domain contains the
CPUs and any peripherals that you and I would call "cache coherent". The
outer-shareable domain extends this to cover a strange set of "less cache
coherent" devices, which we would just call "not cache coherent" for the
purposes of Linux. Normal, non-cacheable memory (i.e. the memory type we
use for non-coherent DMA buffers) is outer-shareable.

Since the barrier macros don't know if the device is coherent or not, we
use the stronger semantics of outer-shareable.

I've not been following the thread, but I reckon we could add dma_mb()
(as dmb(osh) on arm), then use that to build dma_load_acquire and
dma_store_release accessors. On arm64, we could probably use the
acquire/release instructions directly, since they inherit the shareability
domain of the address (which has the nice property of being inner-shareable
for coherent devices).

The massive pain with adding new accessors is defining the semantics.
memory-barriers.txt is already lacking for the CPU side, and we're
struggling to express the kind of transitivity guarantees we provide
today, let alone with new primitives :(

Will

2015-12-17 19:21:57

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] virtio_ring: use smp_store_mb

On Thu, Dec 17, 2015 at 03:52:24PM +0000, Will Deacon wrote:
> On Thu, Dec 17, 2015 at 04:09:17PM +0100, Peter Zijlstra wrote:
> > On Thu, Dec 17, 2015 at 04:34:57PM +0200, Michael S. Tsirkin wrote:
> > > On Thu, Dec 17, 2015 at 03:02:12PM +0100, Peter Zijlstra wrote:
> >
> > > > > commit 9e1a27ea42691429e31f158cce6fc61bc79bb2e9
> > > > > Author: Alexander Duyck <[email protected]>
> > > > > Date: Mon Apr 13 21:03:49 2015 +0930
> > > > >
> > > > > virtio_ring: Update weak barriers to use dma_wmb/rmb
> > > >
> > > > That commit doesn't make any sense. dma_*mb() explicitly does _NOT_
> > > > cover the smp_*mb() part.
> > > >
> > > > Again, look at the ARM definitions, the smp_*mb() primitives use the
> > > > inner coherence stuff, while the dma_*mb() primitives use the outer
> > > > coherent stuff.
> > >
> > > Does outer coherent imply inner coherent?
> > >
> > > > the *mb() primitives cover both.
> >
> > I do not think so, but lets add Will, he dreams this stuff.
>
> Right, and I don't sleep well these days.
>
> Anyway, the outer-shareable domain (osh) is a superset of the
> inner-shareable domain (ish). The inner-shareable domain contains the
> CPUs and any peripherals that you and I would call "cache coherent". The
> outer-shareable domain extends this to cover a strange set of "less cache
> coherent" devices, which we would just call "not cache coherent" for the
> purposes of Linux. Normal, non-cacheable memory (i.e. the memory type we
> use for non-coherent DMA buffers) is outer-shareable.
>
> Since the barrier macros don't know if the device is coherent or not, we
> use the stronger semantics of outer-shareable.
>
> I've not been following the thread, but I reckon we could add dma_mb()
> (as dmb(osh) on arm), then use that to build dma_load_acquire and
> dma_store_release accessors. On arm64, we could probably use the
> acquire/release instructions directly, since they inherit the shareability
> domain of the address (which has the nice property of being inner-shareable
> for coherent devices).
>
> The massive pain with adding new accessors is defining the semantics.
> memory-barriers.txt is already lacking for the CPU side, and we're
> struggling to express the kind of transitivity guarantees we provide
> today, let alone with new primitives :(
>
> Will

Well virtio (might) have wanted dma_mb in the past.

But if are adding barrier stuff anyway, we really want
pv_ counterparts to smp_ that do the same on CONFIG_SMP
but don't change when CONFIG_SMP is disabled.

--
MST

2015-12-20 09:26:08

by Michael S. Tsirkin

[permalink] [raw]
Subject: new barrier type for paravirt (was Re: [PATCH] virtio_ring: use smp_store_mb)

On Thu, Dec 17, 2015 at 03:39:10PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 17, 2015 at 04:33:44PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Dec 17, 2015 at 02:57:26PM +0100, Peter Zijlstra wrote:
> > >
> > > You could of course go fix that instead of mutilating things into
> > > sort-of functional state.
> >
> > Yes, we'd just need to touch all architectures, all for
> > the sake of UP which almost no one uses.
> > Basically, we need APIs that explicitly are
> > for talking to another kernel on a different CPU on
> > the same SMP system, and implemented identically
> > between CONFIG_SMP and !CONFIG_SMP on all architectures.
> >
> > Do you think this is something of general usefulness,
> > outside virtio?
>
> I'm not aware of any other case, but if there are more parts of virt
> that need this then I see no problem adding it.

When I wrote this, I assumed there are no other users, and I'm still not
sure there are other users at the moment. Do you see a problem then?

> That is, virt in general is the only use-case that I can think of,
> because this really is an artifact of interfacing with an SMP host while
> running an UP kernel.

Or another guest kernel on an SMP host.

> But I'm really not familiar with virt, so I do not know if there's more
> sites outside of virtio that could use this.
> Touching all archs is a tad tedious, but its fairly straight forward.

So I looked and I was only able to find other another possible user in Xen.

Cc Xen folks.

I noticed that drivers/xen/xenbus/xenbus_comms.c uses
full memory barriers to communicate with the other side.
For example:

/* Must write data /after/ reading the consumer index. * */
mb();

memcpy(dst, data, avail);
data += avail;
len -= avail;

/* Other side must not see new producer until data is * there. */
wmb();
intf->req_prod += avail;

/* Implies mb(): other side will see the updated producer. */
notify_remote_via_evtchn(xen_store_evtchn);

To me, it looks like for guests compiled with CONFIG_SMP, smp_wmb and smp_mb
would be sufficient, so mb() and wmb() here are only needed if
a non-SMP guest runs on an SMP host.

Is my analysis correct?

So what I'm suggesting is something like the below patch,
except instead of using virtio directly, a new set of barriers
that behaves identically for SMP and non-SMP guests will be introduced.

And of course the weak barriers flag is not needed for Xen -
that's a virtio only thing.

For example:

smp_pv_wmb()
smp_pv_rmb()
smp_pv_mb()

I'd like to get confirmation from Xen folks before posting
this patchset.

Comments/suggestions?

Signed-off-by: Michael S. Tsirkin <[email protected]>

--

compile-tested only.

diff --git a/drivers/xen/xenbus/xenbus_comms.c b/drivers/xen/xenbus/xenbus_comms.c
index fdb0f33..a28f049 100644
--- a/drivers/xen/xenbus/xenbus_comms.c
+++ b/drivers/xen/xenbus/xenbus_comms.c
@@ -36,6 +36,7 @@
#include <linux/interrupt.h>
#include <linux/sched.h>
#include <linux/err.h>
+#include <linux/virtio_ring.h>
#include <xen/xenbus.h>
#include <asm/xen/hypervisor.h>
#include <xen/events.h>
@@ -123,14 +124,14 @@ int xb_write(const void *data, unsigned len)
avail = len;

/* Must write data /after/ reading the consumer index. */
- mb();
+ virtio_mb(true);

memcpy(dst, data, avail);
data += avail;
len -= avail;

/* Other side must not see new producer until data is there. */
- wmb();
+ virtio_wmb(true);
intf->req_prod += avail;

/* Implies mb(): other side will see the updated producer. */
@@ -180,14 +181,14 @@ int xb_read(void *data, unsigned len)
avail = len;

/* Must read data /after/ reading the producer index. */
- rmb();
+ virtio_rmb(true);

memcpy(data, src, avail);
data += avail;
len -= avail;

/* Other side must not see free space until we've copied out */
- mb();
+ virtio_mb(true);
intf->rsp_cons += avail;

pr_debug("Finished read of %i bytes (%i to go)\n", avail, len);

--
MST

2015-12-20 17:07:25

by Andrew Cooper

[permalink] [raw]
Subject: Re: [Xen-devel] new barrier type for paravirt (was Re: [PATCH] virtio_ring: use smp_store_mb)

On 20/12/15 09:25, Michael S. Tsirkin wrote:
> On Thu, Dec 17, 2015 at 03:39:10PM +0100, Peter Zijlstra wrote:
>> On Thu, Dec 17, 2015 at 04:33:44PM +0200, Michael S. Tsirkin wrote:
>>> On Thu, Dec 17, 2015 at 02:57:26PM +0100, Peter Zijlstra wrote:
>>>> You could of course go fix that instead of mutilating things into
>>>> sort-of functional state.
>>> Yes, we'd just need to touch all architectures, all for
>>> the sake of UP which almost no one uses.
>>> Basically, we need APIs that explicitly are
>>> for talking to another kernel on a different CPU on
>>> the same SMP system, and implemented identically
>>> between CONFIG_SMP and !CONFIG_SMP on all architectures.
>>>
>>> Do you think this is something of general usefulness,
>>> outside virtio?
>> I'm not aware of any other case, but if there are more parts of virt
>> that need this then I see no problem adding it.
> When I wrote this, I assumed there are no other users, and I'm still not
> sure there are other users at the moment. Do you see a problem then?
>
>> That is, virt in general is the only use-case that I can think of,
>> because this really is an artifact of interfacing with an SMP host while
>> running an UP kernel.
> Or another guest kernel on an SMP host.
>
>> But I'm really not familiar with virt, so I do not know if there's more
>> sites outside of virtio that could use this.
>> Touching all archs is a tad tedious, but its fairly straight forward.
> So I looked and I was only able to find other another possible user in Xen.
>
> Cc Xen folks.
>
> I noticed that drivers/xen/xenbus/xenbus_comms.c uses
> full memory barriers to communicate with the other side.
> For example:
>
> /* Must write data /after/ reading the consumer index. * */
> mb();
>
> memcpy(dst, data, avail);
> data += avail;
> len -= avail;
>
> /* Other side must not see new producer until data is * there. */
> wmb();
> intf->req_prod += avail;
>
> /* Implies mb(): other side will see the updated producer. */
> notify_remote_via_evtchn(xen_store_evtchn);
>
> To me, it looks like for guests compiled with CONFIG_SMP, smp_wmb and smp_mb
> would be sufficient, so mb() and wmb() here are only needed if
> a non-SMP guest runs on an SMP host.
>
> Is my analysis correct?

Correct. The reason full barriers are used is so non-SMP guests still
function correctly. It is certainly inefficient.

>
> So what I'm suggesting is something like the below patch,
> except instead of using virtio directly, a new set of barriers
> that behaves identically for SMP and non-SMP guests will be introduced.
>
> And of course the weak barriers flag is not needed for Xen -
> that's a virtio only thing.
>
> For example:
>
> smp_pv_wmb()
> smp_pv_rmb()
> smp_pv_mb()
>
> I'd like to get confirmation from Xen folks before posting
> this patchset.
>
> Comments/suggestions?

Very much +1 for fixing this.

Those names would be fine, but they do add yet another set of options in
an already-complicated area.

An alternative might be to have the regular smp_{w,r,}mb() not revert
back to nops if CONFIG_PARAVIRT, or perhaps if pvops have detected a
non-native environment. (I don't know how feasible this suggestion is,
however.)

~Andrew

2015-12-20 19:59:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Xen-devel] new barrier type for paravirt (was Re: [PATCH] virtio_ring: use smp_store_mb)

On Sun, Dec 20, 2015 at 05:07:19PM +0000, Andrew Cooper wrote:
>
> Very much +1 for fixing this.
>
> Those names would be fine, but they do add yet another set of options in
> an already-complicated area.
>
> An alternative might be to have the regular smp_{w,r,}mb() not revert
> back to nops if CONFIG_PARAVIRT, or perhaps if pvops have detected a
> non-native environment. (I don't know how feasible this suggestion is,
> however.)

So a regular SMP kernel emits the LOCK prefix and will patch it out with
a DS prefix (iirc) when it finds but a single CPU. So for those you
could easily do this.

However an UP kernel will not emit the LOCK and do no patching.

So if you're willing to make CONFIG_PARAVIRT depend on CONFIG_SMP or
similar, this is doable.

I don't see people going to allow emitting the LOCK prefix (and growing
the kernel text size) for UP kernels.

2015-12-21 07:11:02

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [Xen-devel] new barrier type for paravirt (was Re: [PATCH] virtio_ring: use smp_store_mb)

On Sun, Dec 20, 2015 at 08:59:44PM +0100, Peter Zijlstra wrote:
> On Sun, Dec 20, 2015 at 05:07:19PM +0000, Andrew Cooper wrote:
> >
> > Very much +1 for fixing this.
> >
> > Those names would be fine, but they do add yet another set of options in
> > an already-complicated area.
> >
> > An alternative might be to have the regular smp_{w,r,}mb() not revert
> > back to nops if CONFIG_PARAVIRT, or perhaps if pvops have detected a
> > non-native environment. (I don't know how feasible this suggestion is,
> > however.)
>
> So a regular SMP kernel emits the LOCK prefix and will patch it out with
> a DS prefix (iirc) when it finds but a single CPU. So for those you
> could easily do this.
>
> However an UP kernel will not emit the LOCK and do no patching.
>
> So if you're willing to make CONFIG_PARAVIRT depend on CONFIG_SMP or
> similar, this is doable.

One of the uses for virtio is to allow testing an existing kernel on
kvm just by loading a module, and this will break this usecase.

> I don't see people going to allow emitting the LOCK prefix (and growing
> the kernel text size) for UP kernels.

Thinking about this more, maybe __smp_*mb is a better set of names.

The nice thing about it is that we can then have generic code
that does basically

#ifdef CONFIG_SMP
#define smp_mb() __smp_mb()
#else
#define smp_mb() barrier()
#endif

and reuse this on all architectures.

So instead of a maintainance burden, we are actually
removing code duplication.

--
MST

2015-12-21 07:22:26

by Michael S. Tsirkin

[permalink] [raw]
Subject: [PATCH RFC] smp_store_mb should use smp_mb

On some architectures smp_store_mb() calls mb() which is stronger
than implied by both the name and the documentation.

smp_store_mb is only used by core kernel code at the moment, so
we know no one mis-uses it for an MMIO barrier.
Make it call smp_mb consistently before some arch-specific
code uses it as such by mistake.

Signed-off-by: Michael S. Tsirkin <[email protected]>

---

Note: I'm guessing an ack from arch maintainers will be needed, but
I'm working on a bigger cleanup moving a bunch of duplicated code
into asm-generic/barrier.h which depends on this, so not Cc'ing
widely yet.

Please Ack if appropriate but do not merge yet.

diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
index df896a1..425552b 100644
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -77,7 +77,7 @@ do { \
___p1; \
})

-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0)
+#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)

/*
* The group barrier in front of the rsm & ssm are necessary to ensure
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index 0eca6ef..4f0169e 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -34,7 +34,7 @@
#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
#define wmb() __asm__ __volatile__ ("sync" : : : "memory")

-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0)
+#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)

#ifdef __SUBARCH_HAS_LWSYNC
# define SMPWMB LWSYNC
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index d68e11e..6c1d8b5 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -36,7 +36,7 @@
#define smp_mb__before_atomic() smp_mb()
#define smp_mb__after_atomic() smp_mb()

-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0)
+#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)

#define smp_store_release(p, v) \
do { \
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index b42afad..0f45f93 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -93,7 +93,7 @@
#endif /* CONFIG_SMP */

#ifndef smp_store_mb
-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0)
+#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
#endif

#ifndef smp_mb__before_atomic

2015-12-21 10:47:56

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] new barrier type for paravirt (was Re: [PATCH] virtio_ring: use smp_store_mb)

On 20/12/15 09:25, Michael S. Tsirkin wrote:
>
> I noticed that drivers/xen/xenbus/xenbus_comms.c uses
> full memory barriers to communicate with the other side.
> For example:
>
> /* Must write data /after/ reading the consumer index. * */
> mb();
>
> memcpy(dst, data, avail);
> data += avail;
> len -= avail;
>
> /* Other side must not see new producer until data is * there. */
> wmb();
> intf->req_prod += avail;
>
> /* Implies mb(): other side will see the updated producer. */
> notify_remote_via_evtchn(xen_store_evtchn);
>
> To me, it looks like for guests compiled with CONFIG_SMP, smp_wmb and smp_mb
> would be sufficient, so mb() and wmb() here are only needed if
> a non-SMP guest runs on an SMP host.
>
> Is my analysis correct?

For x86, yes.

For arm/arm64 I think so, but would prefer one of the Xen arm
maintainers to confirm. In particular, whether inner-shareable barriers
are sufficient for memory shared with the hypervisor.

> So what I'm suggesting is something like the below patch,
> except instead of using virtio directly, a new set of barriers
> that behaves identically for SMP and non-SMP guests will be introduced.
>
> And of course the weak barriers flag is not needed for Xen -
> that's a virtio only thing.
>
> For example:
>
> smp_pv_wmb()
> smp_pv_rmb()
> smp_pv_mb()

The smp_ prefix doesn't make a lot of sense to me here since these
barriers are going to be the same whether the kernel is SMP or not.

David

2015-12-21 11:52:52

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [Xen-devel] new barrier type for paravirt (was Re: [PATCH] virtio_ring: use smp_store_mb)

On Mon, Dec 21, 2015 at 10:47:49AM +0000, David Vrabel wrote:
> On 20/12/15 09:25, Michael S. Tsirkin wrote:
> >
> > I noticed that drivers/xen/xenbus/xenbus_comms.c uses
> > full memory barriers to communicate with the other side.
> > For example:
> >
> > /* Must write data /after/ reading the consumer index. * */
> > mb();
> >
> > memcpy(dst, data, avail);
> > data += avail;
> > len -= avail;
> >
> > /* Other side must not see new producer until data is * there. */
> > wmb();
> > intf->req_prod += avail;
> >
> > /* Implies mb(): other side will see the updated producer. */
> > notify_remote_via_evtchn(xen_store_evtchn);
> >
> > To me, it looks like for guests compiled with CONFIG_SMP, smp_wmb and smp_mb
> > would be sufficient, so mb() and wmb() here are only needed if
> > a non-SMP guest runs on an SMP host.
> >
> > Is my analysis correct?
>
> For x86, yes.
>
> For arm/arm64 I think so, but would prefer one of the Xen arm
> maintainers to confirm. In particular, whether inner-shareable barriers
> are sufficient for memory shared with the hypervisor.
>
> > So what I'm suggesting is something like the below patch,
> > except instead of using virtio directly, a new set of barriers
> > that behaves identically for SMP and non-SMP guests will be introduced.
> >
> > And of course the weak barriers flag is not needed for Xen -
> > that's a virtio only thing.
> >
> > For example:
> >
> > smp_pv_wmb()
> > smp_pv_rmb()
> > smp_pv_mb()
>
> The smp_ prefix doesn't make a lot of sense to me here since these
> barriers are going to be the same whether the kernel is SMP or not.
>
> David

Guest kernel - yes. But it's only needed because you
are running on an SMP host.

--
MST

2015-12-21 14:51:17

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [Qemu-devel] [Xen-devel] new barrier type for paravirt (was Re: [PATCH] virtio_ring: use smp_store_mb)

On Mon, 21 Dec 2015, David Vrabel wrote:
> On 20/12/15 09:25, Michael S. Tsirkin wrote:
> >
> > I noticed that drivers/xen/xenbus/xenbus_comms.c uses
> > full memory barriers to communicate with the other side.
> > For example:
> >
> > /* Must write data /after/ reading the consumer index. * */
> > mb();
> >
> > memcpy(dst, data, avail);
> > data += avail;
> > len -= avail;
> >
> > /* Other side must not see new producer until data is * there. */
> > wmb();
> > intf->req_prod += avail;
> >
> > /* Implies mb(): other side will see the updated producer. */
> > notify_remote_via_evtchn(xen_store_evtchn);
> >
> > To me, it looks like for guests compiled with CONFIG_SMP, smp_wmb and smp_mb
> > would be sufficient, so mb() and wmb() here are only needed if
> > a non-SMP guest runs on an SMP host.
> >
> > Is my analysis correct?
>
> For x86, yes.
>
> For arm/arm64 I think so, but would prefer one of the Xen arm
> maintainers to confirm. In particular, whether inner-shareable barriers
> are sufficient for memory shared with the hypervisor.

inner-shareable barriers are certainly OK. In this case there would be
also a switch from dsb to dmb barriers, which I also think should be OK.

What about all the mb() and wmb() in RING_PUSH_REQUESTS and
RING_PUSH_RESPONSES in include/xen/interface/io/ring.h ?


> > So what I'm suggesting is something like the below patch,
> > except instead of using virtio directly, a new set of barriers
> > that behaves identically for SMP and non-SMP guests will be introduced.
> >
> > And of course the weak barriers flag is not needed for Xen -
> > that's a virtio only thing.
> >
> > For example:
> >
> > smp_pv_wmb()
> > smp_pv_rmb()
> > smp_pv_mb()
>
> The smp_ prefix doesn't make a lot of sense to me here since these
> barriers are going to be the same whether the kernel is SMP or not.