2017-12-05 19:29:43

by Michael S. Tsirkin

[permalink] [raw]
Subject: [PATCH] ptr_ring: add barriers

Users of ptr_ring expect that it's safe to give the
data structure a pointer and have it be available
to consumers, but that actually requires an smb_wmb
or a stronger barrier.

In absence of such barriers and on architectures that reorder writes,
consumer might read an un=initialized value from an skb pointer stored
in the skb array. This was observed causing crashes.

To fix, add memory barriers. The barrier we use is a wmb, the
assumption being that producers do not need to read the value so we do
not need to order these reads.

Reported-by: George Cherian <[email protected]>
Suggested-by: Jason Wang <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
---

George, could you pls report whether this patch fixes
the issue for you?

This seems to be needed in stable as well.




include/linux/ptr_ring.h | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 37b4bb2..6866df4 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -101,12 +101,18 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r)

/* Note: callers invoking this in a loop must use a compiler barrier,
* for example cpu_relax(). Callers must hold producer_lock.
+ * Callers are responsible for making sure pointer that is being queued
+ * points to a valid data.
*/
static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr)
{
if (unlikely(!r->size) || r->queue[r->producer])
return -ENOSPC;

+ /* Make sure the pointer we are storing points to a valid data. */
+ /* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */
+ smp_wmb();
+
r->queue[r->producer++] = ptr;
if (unlikely(r->producer >= r->size))
r->producer = 0;
@@ -275,6 +281,9 @@ static inline void *__ptr_ring_consume(struct ptr_ring *r)
if (ptr)
__ptr_ring_discard_one(r);

+ /* Make sure anyone accessing data through the pointer is up to date. */
+ /* Pairs with smp_wmb in __ptr_ring_produce. */
+ smp_read_barrier_depends();
return ptr;
}

--
MST


2017-12-06 02:31:51

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] ptr_ring: add barriers



On 2017年12月06日 03:29, Michael S. Tsirkin wrote:
> Users of ptr_ring expect that it's safe to give the
> data structure a pointer and have it be available
> to consumers, but that actually requires an smb_wmb
> or a stronger barrier.
>
> In absence of such barriers and on architectures that reorder writes,
> consumer might read an un=initialized value from an skb pointer stored
> in the skb array. This was observed causing crashes.
>
> To fix, add memory barriers. The barrier we use is a wmb, the
> assumption being that producers do not need to read the value so we do
> not need to order these reads.
>
> Reported-by: George Cherian <[email protected]>
> Suggested-by: Jason Wang <[email protected]>
> Signed-off-by: Michael S. Tsirkin <[email protected]>
> ---
>
> George, could you pls report whether this patch fixes
> the issue for you?
>
> This seems to be needed in stable as well.
>
>
>
>
> include/linux/ptr_ring.h | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> index 37b4bb2..6866df4 100644
> --- a/include/linux/ptr_ring.h
> +++ b/include/linux/ptr_ring.h
> @@ -101,12 +101,18 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r)
>
> /* Note: callers invoking this in a loop must use a compiler barrier,
> * for example cpu_relax(). Callers must hold producer_lock.
> + * Callers are responsible for making sure pointer that is being queued
> + * points to a valid data.
> */
> static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr)
> {
> if (unlikely(!r->size) || r->queue[r->producer])
> return -ENOSPC;
>
> + /* Make sure the pointer we are storing points to a valid data. */
> + /* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */
> + smp_wmb();
> +
> r->queue[r->producer++] = ptr;
> if (unlikely(r->producer >= r->size))
> r->producer = 0;
> @@ -275,6 +281,9 @@ static inline void *__ptr_ring_consume(struct ptr_ring *r)
> if (ptr)
> __ptr_ring_discard_one(r);
>
> + /* Make sure anyone accessing data through the pointer is up to date. */
> + /* Pairs with smp_wmb in __ptr_ring_produce. */
> + smp_read_barrier_depends();
> return ptr;
> }
>

I was thinking whether or not it's better to move those to the callers.
Then we can save lots of barriers in e.g batch consuming.

Thanks

2017-12-06 02:54:01

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] ptr_ring: add barriers

On Wed, Dec 06, 2017 at 10:31:39AM +0800, Jason Wang wrote:
>
>
> On 2017年12月06日 03:29, Michael S. Tsirkin wrote:
> > Users of ptr_ring expect that it's safe to give the
> > data structure a pointer and have it be available
> > to consumers, but that actually requires an smb_wmb
> > or a stronger barrier.
> >
> > In absence of such barriers and on architectures that reorder writes,
> > consumer might read an un=initialized value from an skb pointer stored
> > in the skb array. This was observed causing crashes.
> >
> > To fix, add memory barriers. The barrier we use is a wmb, the
> > assumption being that producers do not need to read the value so we do
> > not need to order these reads.
> >
> > Reported-by: George Cherian <[email protected]>
> > Suggested-by: Jason Wang <[email protected]>
> > Signed-off-by: Michael S. Tsirkin <[email protected]>
> > ---
> >
> > George, could you pls report whether this patch fixes
> > the issue for you?
> >
> > This seems to be needed in stable as well.
> >
> >
> >
> >
> > include/linux/ptr_ring.h | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> > index 37b4bb2..6866df4 100644
> > --- a/include/linux/ptr_ring.h
> > +++ b/include/linux/ptr_ring.h
> > @@ -101,12 +101,18 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r)
> > /* Note: callers invoking this in a loop must use a compiler barrier,
> > * for example cpu_relax(). Callers must hold producer_lock.
> > + * Callers are responsible for making sure pointer that is being queued
> > + * points to a valid data.
> > */
> > static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr)
> > {
> > if (unlikely(!r->size) || r->queue[r->producer])
> > return -ENOSPC;
> > + /* Make sure the pointer we are storing points to a valid data. */
> > + /* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */
> > + smp_wmb();
> > +
> > r->queue[r->producer++] = ptr;
> > if (unlikely(r->producer >= r->size))
> > r->producer = 0;
> > @@ -275,6 +281,9 @@ static inline void *__ptr_ring_consume(struct ptr_ring *r)
> > if (ptr)
> > __ptr_ring_discard_one(r);
> > + /* Make sure anyone accessing data through the pointer is up to date. */
> > + /* Pairs with smp_wmb in __ptr_ring_produce. */
> > + smp_read_barrier_depends();
> > return ptr;
> > }
>
> I was thinking whether or not it's better to move those to the callers. Then
> we can save lots of barriers in e.g batch consuming.
>
> Thanks

Batch consumers only do smp_read_barrier_depends which is free on
non-alpha. I suggest we do the simple thing for stable and reserve
optimizations for later.

--
MST

2017-12-06 03:22:06

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] ptr_ring: add barriers



On 2017年12月06日 10:53, Michael S. Tsirkin wrote:
> On Wed, Dec 06, 2017 at 10:31:39AM +0800, Jason Wang wrote:
>>
>> On 2017年12月06日 03:29, Michael S. Tsirkin wrote:
>>> Users of ptr_ring expect that it's safe to give the
>>> data structure a pointer and have it be available
>>> to consumers, but that actually requires an smb_wmb
>>> or a stronger barrier.
>>>
>>> In absence of such barriers and on architectures that reorder writes,
>>> consumer might read an un=initialized value from an skb pointer stored
>>> in the skb array. This was observed causing crashes.
>>>
>>> To fix, add memory barriers. The barrier we use is a wmb, the
>>> assumption being that producers do not need to read the value so we do
>>> not need to order these reads.
>>>
>>> Reported-by: George Cherian <[email protected]>
>>> Suggested-by: Jason Wang <[email protected]>
>>> Signed-off-by: Michael S. Tsirkin <[email protected]>
>>> ---
>>>
>>> George, could you pls report whether this patch fixes
>>> the issue for you?
>>>
>>> This seems to be needed in stable as well.
>>>
>>>
>>>
>>>
>>> include/linux/ptr_ring.h | 9 +++++++++
>>> 1 file changed, 9 insertions(+)
>>>
>>> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
>>> index 37b4bb2..6866df4 100644
>>> --- a/include/linux/ptr_ring.h
>>> +++ b/include/linux/ptr_ring.h
>>> @@ -101,12 +101,18 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r)
>>> /* Note: callers invoking this in a loop must use a compiler barrier,
>>> * for example cpu_relax(). Callers must hold producer_lock.
>>> + * Callers are responsible for making sure pointer that is being queued
>>> + * points to a valid data.
>>> */
>>> static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr)
>>> {
>>> if (unlikely(!r->size) || r->queue[r->producer])
>>> return -ENOSPC;
>>> + /* Make sure the pointer we are storing points to a valid data. */
>>> + /* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */
>>> + smp_wmb();
>>> +
>>> r->queue[r->producer++] = ptr;
>>> if (unlikely(r->producer >= r->size))
>>> r->producer = 0;
>>> @@ -275,6 +281,9 @@ static inline void *__ptr_ring_consume(struct ptr_ring *r)
>>> if (ptr)
>>> __ptr_ring_discard_one(r);
>>> + /* Make sure anyone accessing data through the pointer is up to date. */
>>> + /* Pairs with smp_wmb in __ptr_ring_produce. */
>>> + smp_read_barrier_depends();
>>> return ptr;
>>> }
>> I was thinking whether or not it's better to move those to the callers. Then
>> we can save lots of barriers in e.g batch consuming.
>>
>> Thanks
> Batch consumers only do smp_read_barrier_depends which is free on
> non-alpha. I suggest we do the simple thing for stable and reserve
> optimizations for later.
>

Right.

Acked-by: Jason Wang <[email protected]>

2017-12-06 09:22:07

by George Cherian

[permalink] [raw]
Subject: Re: [PATCH] ptr_ring: add barriers

Hi Michael,


On 12/06/2017 12:59 AM, Michael S. Tsirkin wrote:
> Users of ptr_ring expect that it's safe to give the
> data structure a pointer and have it be available
> to consumers, but that actually requires an smb_wmb
> or a stronger barrier.
This is not the exact situation we are seeing.
Let me try to explain the situation

Affected on ARM64 platform.
1) tun_net_xmit calls skb_array_produce, which pushes the skb to the
ptr_ring, this push is protected by a producer_lock.

2)Prior to this call the tun_net_xmit calls skb_orphan which calls the
skb->destructor and sets skb->destructor and skb->sk as NULL.

2.a) These 2 writes are getting reordered

3) At the same time in the receive side (tun_ring_recv), which gets
executed in another core calls skb_array_consume which pulls the skb
from ptr ring, this pull is protected by a consumer lock.

4) eventually calling the skb->destructor (sock_wfree) with stale values.

Also note that this issue is reproducible in a long run and doesn't
happen immediately after the launch of multiple VM's (infact the
particular test cases launches 56 VM's which does iperf back and forth)

>
> In absence of such barriers and on architectures that reorder writes,
> consumer might read an un=initialized value from an skb pointer stored
> in the skb array. This was observed causing crashes.
>
> To fix, add memory barriers. The barrier we use is a wmb, the
> assumption being that producers do not need to read the value so we do
> not need to order these reads.
It is not the case that producer is reading the value, but the consumer
reading stale value. So we need to have a strict rmb in place .

>
> Reported-by: George Cherian <[email protected]>
> Suggested-by: Jason Wang <[email protected]>
> Signed-off-by: Michael S. Tsirkin <[email protected]>
> ---
>
> George, could you pls report whether this patch fixes
> the issue for you?
>
> This seems to be needed in stable as well.
>
>
>
>
> include/linux/ptr_ring.h | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> index 37b4bb2..6866df4 100644
> --- a/include/linux/ptr_ring.h
> +++ b/include/linux/ptr_ring.h
> @@ -101,12 +101,18 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r)
>
> /* Note: callers invoking this in a loop must use a compiler barrier,
> * for example cpu_relax(). Callers must hold producer_lock.
> + * Callers are responsible for making sure pointer that is being queued
> + * points to a valid data.
> */
> static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr)
> {
> if (unlikely(!r->size) || r->queue[r->producer])
> return -ENOSPC;
>
> + /* Make sure the pointer we are storing points to a valid data. */
> + /* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */
> + smp_wmb();
> +
> r->queue[r->producer++] = ptr;
> if (unlikely(r->producer >= r->size))
> r->producer = 0;
> @@ -275,6 +281,9 @@ static inline void *__ptr_ring_consume(struct ptr_ring *r)
> if (ptr)
> __ptr_ring_discard_one(r);
>
> + /* Make sure anyone accessing data through the pointer is up to date. */
> + /* Pairs with smp_wmb in __ptr_ring_produce. */
> + smp_read_barrier_depends();
> return ptr;
> }
>
>

2017-12-06 12:46:59

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] ptr_ring: add barriers

On Wed, Dec 06, 2017 at 02:51:41PM +0530, George Cherian wrote:
> Hi Michael,
>
>
> On 12/06/2017 12:59 AM, Michael S. Tsirkin wrote:
> > Users of ptr_ring expect that it's safe to give the
> > data structure a pointer and have it be available
> > to consumers, but that actually requires an smb_wmb
> > or a stronger barrier.
> This is not the exact situation we are seeing.

Could you test the patch pls?

> Let me try to explain the situation
>
> Affected on ARM64 platform.
> 1) tun_net_xmit calls skb_array_produce, which pushes the skb to the
> ptr_ring, this push is protected by a producer_lock.
>
> 2)Prior to this call the tun_net_xmit calls skb_orphan which calls the
> skb->destructor and sets skb->destructor and skb->sk as NULL.
>
> 2.a) These 2 writes are getting reordered
>
> 3) At the same time in the receive side (tun_ring_recv), which gets executed
> in another core calls skb_array_consume which pulls the skb from ptr ring,
> this pull is protected by a consumer lock.
>
> 4) eventually calling the skb->destructor (sock_wfree) with stale values.
>
> Also note that this issue is reproducible in a long run and doesn't happen
> immediately after the launch of multiple VM's (infact the particular test
> cases launches 56 VM's which does iperf back and forth)
>
> >
> > In absence of such barriers and on architectures that reorder writes,
> > consumer might read an un=initialized value from an skb pointer stored
> > in the skb array. This was observed causing crashes.
> >
> > To fix, add memory barriers. The barrier we use is a wmb, the
> > assumption being that producers do not need to read the value so we do
> > not need to order these reads.
> It is not the case that producer is reading the value, but the consumer
> reading stale value. So we need to have a strict rmb in place .
>
> >
> > Reported-by: George Cherian <[email protected]>
> > Suggested-by: Jason Wang <[email protected]>
> > Signed-off-by: Michael S. Tsirkin <[email protected]>
> > ---
> >
> > George, could you pls report whether this patch fixes
> > the issue for you?
> >
> > This seems to be needed in stable as well.
> >
> >
> >
> >
> > include/linux/ptr_ring.h | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> > index 37b4bb2..6866df4 100644
> > --- a/include/linux/ptr_ring.h
> > +++ b/include/linux/ptr_ring.h
> > @@ -101,12 +101,18 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r)
> > /* Note: callers invoking this in a loop must use a compiler barrier,
> > * for example cpu_relax(). Callers must hold producer_lock.
> > + * Callers are responsible for making sure pointer that is being queued
> > + * points to a valid data.
> > */
> > static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr)
> > {
> > if (unlikely(!r->size) || r->queue[r->producer])
> > return -ENOSPC;
> > + /* Make sure the pointer we are storing points to a valid data. */
> > + /* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */
> > + smp_wmb();
> > +
> > r->queue[r->producer++] = ptr;
> > if (unlikely(r->producer >= r->size))
> > r->producer = 0;
> > @@ -275,6 +281,9 @@ static inline void *__ptr_ring_consume(struct ptr_ring *r)
> > if (ptr)
> > __ptr_ring_discard_one(r);
> > + /* Make sure anyone accessing data through the pointer is up to date. */
> > + /* Pairs with smp_wmb in __ptr_ring_produce. */
> > + smp_read_barrier_depends();
> > return ptr;
> > }
> >

2017-12-08 18:08:58

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] ptr_ring: add barriers

From: "Michael S. Tsirkin" <[email protected]>
Date: Tue, 5 Dec 2017 21:29:37 +0200

> Users of ptr_ring expect that it's safe to give the
> data structure a pointer and have it be available
> to consumers, but that actually requires an smb_wmb
> or a stronger barrier.
>
> In absence of such barriers and on architectures that reorder writes,
> consumer might read an un=initialized value from an skb pointer stored
> in the skb array. This was observed causing crashes.
>
> To fix, add memory barriers. The barrier we use is a wmb, the
> assumption being that producers do not need to read the value so we do
> not need to order these reads.
>
> Reported-by: George Cherian <[email protected]>
> Suggested-by: Jason Wang <[email protected]>
> Signed-off-by: Michael S. Tsirkin <[email protected]>
> ---
>
> George, could you pls report whether this patch fixes
> the issue for you?
>
> This seems to be needed in stable as well.

I really need some testing feedback for this before I apply it
and queue it up for -stable.

George?

2017-12-11 15:53:12

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] ptr_ring: add barriers

From: "Michael S. Tsirkin" <[email protected]>
Date: Tue, 5 Dec 2017 21:29:37 +0200

> Users of ptr_ring expect that it's safe to give the
> data structure a pointer and have it be available
> to consumers, but that actually requires an smb_wmb
> or a stronger barrier.
>
> In absence of such barriers and on architectures that reorder writes,
> consumer might read an un=initialized value from an skb pointer stored
> in the skb array. This was observed causing crashes.
>
> To fix, add memory barriers. The barrier we use is a wmb, the
> assumption being that producers do not need to read the value so we do
> not need to order these reads.
>
> Reported-by: George Cherian <[email protected]>
> Suggested-by: Jason Wang <[email protected]>
> Signed-off-by: Michael S. Tsirkin <[email protected]>

I'm asked for asking for testing feedback and did not get it in a
reasonable amount of time.

So I'm applying this as-is, and queueing it up for -stable.

Thank you.

2017-12-12 06:29:20

by George Cherian

[permalink] [raw]
Subject: Re: [PATCH] ptr_ring: add barriers

Hi David,

On 12/11/2017 09:23 PM, David Miller wrote:
> From: "Michael S. Tsirkin" <[email protected]>
> Date: Tue, 5 Dec 2017 21:29:37 +0200
>
>> Users of ptr_ring expect that it's safe to give the
>> data structure a pointer and have it be available
>> to consumers, but that actually requires an smb_wmb
>> or a stronger barrier.
>>
>> In absence of such barriers and on architectures that reorder writes,
>> consumer might read an un=initialized value from an skb pointer stored
>> in the skb array. This was observed causing crashes.
>>
>> To fix, add memory barriers. The barrier we use is a wmb, the
>> assumption being that producers do not need to read the value so we do
>> not need to order these reads.
>>
>> Reported-by: George Cherian <[email protected]>
>> Suggested-by: Jason Wang <[email protected]>
>> Signed-off-by: Michael S. Tsirkin <[email protected]>
>
> I'm asked for asking for testing feedback and did not get it in a
> reasonable amount of time.
>
The tests have completed more than 48 hours without any failures.
I won't interrupt the same and run for longer time.
In case of any issue I will report the same.
> So I'm applying this as-is, and queueing it up for -stable.
>
> Thank you.

Regards,
-George
>