2018-01-12 03:42:34

by jianchao.wang

[permalink] [raw]
Subject: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

Customer reported memory corruption issue on previous mlx4_en driver
version where the order-3 pages and multiple page reference counting
were still used.

Finally, find out one of the root causes is that the HW may see stale
rx_descs due to prod db updating reaches HW before rx_desc. Especially
when cross order-3 pages boundary and update a new one, HW may write
on the pages which may has been freed and allocated again by others.

To fix it, add a wmb between rx_desc and prod db updating to ensure
the order. Even thougth order-0 and page recycling has been introduced,
the disorder between rx_desc and prod db still could lead to corruption
on different inbound packages.

Signed-off-by: Jianchao Wang <[email protected]>
---
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 85e28ef..eefa82c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -555,7 +555,7 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
break;
ring->prod++;
} while (likely(--missing));
-
+ wmb(); /* ensure rx_desc updating reaches HW before prod db updating */
mlx4_en_update_rx_prod_db(ring);
}

--
2.7.4


2018-01-12 16:32:54

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

On Fri, Jan 12, 2018 at 11:42:22AM +0800, Jianchao Wang wrote:
> Customer reported memory corruption issue on previous mlx4_en driver
> version where the order-3 pages and multiple page reference counting
> were still used.
>
> Finally, find out one of the root causes is that the HW may see stale
> rx_descs due to prod db updating reaches HW before rx_desc. Especially
> when cross order-3 pages boundary and update a new one, HW may write
> on the pages which may has been freed and allocated again by others.
>
> To fix it, add a wmb between rx_desc and prod db updating to ensure
> the order. Even thougth order-0 and page recycling has been introduced,
> the disorder between rx_desc and prod db still could lead to corruption
> on different inbound packages.
>
> Signed-off-by: Jianchao Wang <[email protected]>
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 85e28ef..eefa82c 100644
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -555,7 +555,7 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
> break;
> ring->prod++;
> } while (likely(--missing));
> -
> + wmb(); /* ensure rx_desc updating reaches HW before prod db updating */
> mlx4_en_update_rx_prod_db(ring);
> }
>

Does this need to be dma_wmb(), and should it be in
mlx4_en_update_rx_prod_db ?

Jason

2018-01-12 16:46:12

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

On Fri, 2018-01-12 at 09:32 -0700, Jason Gunthorpe wrote:
> On Fri, Jan 12, 2018 at 11:42:22AM +0800, Jianchao Wang wrote:
> > Customer reported memory corruption issue on previous mlx4_en driver
> > version where the order-3 pages and multiple page reference counting
> > were still used.
> >
> > Finally, find out one of the root causes is that the HW may see stale
> > rx_descs due to prod db updating reaches HW before rx_desc. Especially
> > when cross order-3 pages boundary and update a new one, HW may write
> > on the pages which may has been freed and allocated again by others.
> >
> > To fix it, add a wmb between rx_desc and prod db updating to ensure
> > the order. Even thougth order-0 and page recycling has been introduced,
> > the disorder between rx_desc and prod db still could lead to corruption
> > on different inbound packages.
> >
> > Signed-off-by: Jianchao Wang <[email protected]>
> > drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > index 85e28ef..eefa82c 100644
> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > @@ -555,7 +555,7 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
> > break;
> > ring->prod++;
> > } while (likely(--missing));
> > -
> > + wmb(); /* ensure rx_desc updating reaches HW before prod db updating */
> > mlx4_en_update_rx_prod_db(ring);
> > }
> >
>
> Does this need to be dma_wmb(), and should it be in
> mlx4_en_update_rx_prod_db ?
>

+1 on dma_wmb()

On what architecture bug was observed ?

In any case, the barrier should be moved in mlx4_en_update_rx_prod_db()
I think.

2018-01-12 19:53:47

by Saeed Mahameed

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating



On 01/12/2018 08:46 AM, Eric Dumazet wrote:
> On Fri, 2018-01-12 at 09:32 -0700, Jason Gunthorpe wrote:
>> On Fri, Jan 12, 2018 at 11:42:22AM +0800, Jianchao Wang wrote:
>>> Customer reported memory corruption issue on previous mlx4_en driver
>>> version where the order-3 pages and multiple page reference counting
>>> were still used.
>>>
>>> Finally, find out one of the root causes is that the HW may see stale
>>> rx_descs due to prod db updating reaches HW before rx_desc. Especially
>>> when cross order-3 pages boundary and update a new one, HW may write
>>> on the pages which may has been freed and allocated again by others.
>>>
>>> To fix it, add a wmb between rx_desc and prod db updating to ensure
>>> the order. Even thougth order-0 and page recycling has been introduced,
>>> the disorder between rx_desc and prod db still could lead to corruption
>>> on different inbound packages.
>>>
>>> Signed-off-by: Jianchao Wang <[email protected]>
>>> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>>> index 85e28ef..eefa82c 100644
>>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>>> @@ -555,7 +555,7 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
>>> break;
>>> ring->prod++;
>>> } while (likely(--missing));
>>> -
>>> + wmb(); /* ensure rx_desc updating reaches HW before prod db updating */
>>> mlx4_en_update_rx_prod_db(ring);
>>> }
>>>
>>
>> Does this need to be dma_wmb(), and should it be in
>> mlx4_en_update_rx_prod_db ?
>>
>
> +1 on dma_wmb()
>
> On what architecture bug was observed ?
>
> In any case, the barrier should be moved in mlx4_en_update_rx_prod_db()
> I think.
>

+1 on dma_wmb(), thanks Eric for reviewing this.

The barrier is also needed elsewhere in the code as well, but I wouldn't
put it in mlx4_en_update_rx_prod_db(), just to allow batch filling of
all rx rings and then hit the barrier only once. As a rule of thumb, mem
barriers are the ring API caller responsibility.

e.g. in mlx4_en_activate_rx_rings():
between mlx4_en_fill_rx_buffers(priv); and the loop that updates rx prod
for all rings ring, the dma_wmb is needed, see below.

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index b4d144e67514..65541721a240 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -370,6 +370,8 @@ int mlx4_en_activate_rx_rings(struct mlx4_en_priv *priv)
if (err)
goto err_buffers;

+ dma_wmb();
+
for (ring_ind = 0; ring_ind < priv->rx_ring_num; ring_ind++) {
ring = priv->rx_ring[ring_ind];



2018-01-12 20:16:36

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

On Fri, 2018-01-12 at 11:53 -0800, Saeed Mahameed wrote:
>
> On 01/12/2018 08:46 AM, Eric Dumazet wrote:
> > On Fri, 2018-01-12 at 09:32 -0700, Jason Gunthorpe wrote:
> > > On Fri, Jan 12, 2018 at 11:42:22AM +0800, Jianchao Wang wrote:
> > > > Customer reported memory corruption issue on previous mlx4_en driver
> > > > version where the order-3 pages and multiple page reference counting
> > > > were still used.
> > > >
> > > > Finally, find out one of the root causes is that the HW may see stale
> > > > rx_descs due to prod db updating reaches HW before rx_desc. Especially
> > > > when cross order-3 pages boundary and update a new one, HW may write
> > > > on the pages which may has been freed and allocated again by others.
> > > >
> > > > To fix it, add a wmb between rx_desc and prod db updating to ensure
> > > > the order. Even thougth order-0 and page recycling has been introduced,
> > > > the disorder between rx_desc and prod db still could lead to corruption
> > > > on different inbound packages.
> > > >
> > > > Signed-off-by: Jianchao Wang <[email protected]>
> > > > drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > > > index 85e28ef..eefa82c 100644
> > > > +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > > > @@ -555,7 +555,7 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
> > > > break;
> > > > ring->prod++;
> > > > } while (likely(--missing));
> > > > -
> > > > + wmb(); /* ensure rx_desc updating reaches HW before prod db updating */
> > > > mlx4_en_update_rx_prod_db(ring);
> > > > }
> > > >
> > >
> > > Does this need to be dma_wmb(), and should it be in
> > > mlx4_en_update_rx_prod_db ?
> > >
> >
> > +1 on dma_wmb()
> >
> > On what architecture bug was observed ?
> >
> > In any case, the barrier should be moved in mlx4_en_update_rx_prod_db()
> > I think.
> >
>
> +1 on dma_wmb(), thanks Eric for reviewing this.
>
> The barrier is also needed elsewhere in the code as well, but I wouldn't
> put it in mlx4_en_update_rx_prod_db(), just to allow batch filling of
> all rx rings and then hit the barrier only once. As a rule of thumb, mem
> barriers are the ring API caller responsibility.
>
> e.g. in mlx4_en_activate_rx_rings():
> between mlx4_en_fill_rx_buffers(priv); and the loop that updates rx prod
> for all rings ring, the dma_wmb is needed, see below.
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index b4d144e67514..65541721a240 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -370,6 +370,8 @@ int mlx4_en_activate_rx_rings(struct mlx4_en_priv *priv)
> if (err)
> goto err_buffers;
>
> + dma_wmb();
> +
> for (ring_ind = 0; ring_ind < priv->rx_ring_num; ring_ind++) {
> ring = priv->rx_ring[ring_ind];


Why bother, considering dma_wmb() is a nop on x86,
simply a compiler barrier.

Putting it in mlx4_en_update_rx_prod_db() and have no obscure bugs...

Also we might change the existing wmb() in mlx4_en_process_rx_cq() by
dma_wmb(), that would help performance a bit.


2018-01-12 21:02:23

by Saeed Mahameed

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating



On 01/12/2018 12:16 PM, Eric Dumazet wrote:
> On Fri, 2018-01-12 at 11:53 -0800, Saeed Mahameed wrote:
>>
>> On 01/12/2018 08:46 AM, Eric Dumazet wrote:
>>> On Fri, 2018-01-12 at 09:32 -0700, Jason Gunthorpe wrote:
>>>> On Fri, Jan 12, 2018 at 11:42:22AM +0800, Jianchao Wang wrote:
>>>>> Customer reported memory corruption issue on previous mlx4_en driver
>>>>> version where the order-3 pages and multiple page reference counting
>>>>> were still used.
>>>>>
>>>>> Finally, find out one of the root causes is that the HW may see stale
>>>>> rx_descs due to prod db updating reaches HW before rx_desc. Especially
>>>>> when cross order-3 pages boundary and update a new one, HW may write
>>>>> on the pages which may has been freed and allocated again by others.
>>>>>
>>>>> To fix it, add a wmb between rx_desc and prod db updating to ensure
>>>>> the order. Even thougth order-0 and page recycling has been introduced,
>>>>> the disorder between rx_desc and prod db still could lead to corruption
>>>>> on different inbound packages.
>>>>>
>>>>> Signed-off-by: Jianchao Wang <[email protected]>
>>>>> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>>>>> index 85e28ef..eefa82c 100644
>>>>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>>>>> @@ -555,7 +555,7 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
>>>>> break;
>>>>> ring->prod++;
>>>>> } while (likely(--missing));
>>>>> -
>>>>> + wmb(); /* ensure rx_desc updating reaches HW before prod db updating */
>>>>> mlx4_en_update_rx_prod_db(ring);
>>>>> }
>>>>>
>>>>
>>>> Does this need to be dma_wmb(), and should it be in
>>>> mlx4_en_update_rx_prod_db ?
>>>>
>>>
>>> +1 on dma_wmb()
>>>
>>> On what architecture bug was observed ?
>>>
>>> In any case, the barrier should be moved in mlx4_en_update_rx_prod_db()
>>> I think.
>>>
>>
>> +1 on dma_wmb(), thanks Eric for reviewing this.
>>
>> The barrier is also needed elsewhere in the code as well, but I wouldn't
>> put it in mlx4_en_update_rx_prod_db(), just to allow batch filling of
>> all rx rings and then hit the barrier only once. As a rule of thumb, mem
>> barriers are the ring API caller responsibility.
>>
>> e.g. in mlx4_en_activate_rx_rings():
>> between mlx4_en_fill_rx_buffers(priv); and the loop that updates rx prod
>> for all rings ring, the dma_wmb is needed, see below.
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> index b4d144e67514..65541721a240 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> @@ -370,6 +370,8 @@ int mlx4_en_activate_rx_rings(struct mlx4_en_priv *priv)
>> if (err)
>> goto err_buffers;
>>
>> + dma_wmb();
>> +
>> for (ring_ind = 0; ring_ind < priv->rx_ring_num; ring_ind++) {
>> ring = priv->rx_ring[ring_ind];
>
>
> Why bother, considering dma_wmb() is a nop on x86,
> simply a compiler barrier.
>
> Putting it in mlx4_en_update_rx_prod_db() and have no obscure bugs...
>

Simply putting a memory barrier on the top or the bottom of a functions,
means nothing unless you are looking at the whole picture, of all the
callers of that function to understand why is it there.

which is better to grasp ?:

update_doorbell() {
dma_wmb();
ring->db = prod;
}

or

fill buffers();
dma_wmb();
update_doorbell();

I simply like the 2nd one since with one look you can understand what this dma_wmb is protecting.

Anyway this is truly a nit, Tariq can decide what is better for him :).

> Also we might change the existing wmb() in mlx4_en_process_rx_cq() by
> dma_wmb(), that would help performance a bit.
>
>

+1, Tariq will you handle ?

2018-01-12 21:22:04

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

On Fri, 2018-01-12 at 13:01 -0800, Saeed Mahameed wrote:

> which is better to grasp ?:
>
> update_doorbell() {
> dma_wmb();
> ring->db = prod;
> }

This one is IMO the most secure one (least surprise)

Considering the time it took to discover this bug, I would really play
safe.

But obviously I do not maintain mlx4.

2018-01-13 19:16:18

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

On Fri, Jan 12, 2018 at 01:01:56PM -0800, Saeed Mahameed wrote:


> Simply putting a memory barrier on the top or the bottom of a functions,
> means nothing unless you are looking at the whole picture, of all the
> callers of that function to understand why is it there.

When I review code I want to see the memory barrier placed *directly*
before the write which allows the DMA.

So yes, this is my preference:

> update_doorbell() {
> dma_wmb();
> ring->db = prod;
> }

Conceptually what is happening here is very similar to what
smp_store_release() does for SMP cases. In most cases wmb should
always be strongly connected with a following write.

smp_store_release() is called 'release' because the write it
incorporates allows the other CPU to 'see' what is being
protected. Similarly here, the write to the db allows the device to
'see' the new ring data.

And this is bad idea:

> fill buffers();
> dma_wmb();
> update_doorbell();

> I simply like the 2nd one since with one look you can understand
> what this dma_wmb is protecting.

What do you think the wmb is protecting in the above? It isn't the fill.

Jason

2018-01-14 02:40:24

by jianchao.wang

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

Dear all

Thanks for the kindly response and reviewing. That's really appreciated.

On 01/13/2018 12:46 AM, Eric Dumazet wrote:
>> Does this need to be dma_wmb(), and should it be in
>> mlx4_en_update_rx_prod_db ?
>>
> +1 on dma_wmb()
>
> On what architecture bug was observed ?
This issue was observed on x86-64.
And I will send a new patch, in which replace wmb() with dma_wmb(), to customer
to confirm.

Thanks
Jianchao

2018-01-14 09:47:27

by Tariq Toukan

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

Thanks Jianchao for your patch.

And Thank you guys for your reviews, much appreciated.
I was off-work on Friday and Saturday.

On 14/01/2018 4:40 AM, jianchao.wang wrote:
> Dear all
>
> Thanks for the kindly response and reviewing. That's really appreciated.
>
> On 01/13/2018 12:46 AM, Eric Dumazet wrote:
>>> Does this need to be dma_wmb(), and should it be in
>>> mlx4_en_update_rx_prod_db ?
>>>
>> +1 on dma_wmb()
>>
>> On what architecture bug was observed ?
> This issue was observed on x86-64.
> And I will send a new patch, in which replace wmb() with dma_wmb(), to customer
> to confirm.

+1 on dma_wmb, let us know once customer confirms.
Please place it within mlx4_en_update_rx_prod_db as suggested.
All other calls to mlx4_en_update_rx_prod_db are in control/slow path so
I prefer being on the safe side, and care less about bulking the barrier.

Thanks,
Tariq

2018-01-15 05:50:51

by jianchao.wang

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

Hi Tariq

Thanks for your kindly response.

On 01/14/2018 05:47 PM, Tariq Toukan wrote:
> Thanks Jianchao for your patch.
>
> And Thank you guys for your reviews, much appreciated.
> I was off-work on Friday and Saturday.
>
> On 14/01/2018 4:40 AM, jianchao.wang wrote:
>> Dear all
>>
>> Thanks for the kindly response and reviewing. That's really appreciated.
>>
>> On 01/13/2018 12:46 AM, Eric Dumazet wrote:
>>>> Does this need to be dma_wmb(), and should it be in
>>>> mlx4_en_update_rx_prod_db ?
>>>>
>>> +1 on dma_wmb()
>>>
>>> On what architecture bug was observed ?
>> This issue was observed on x86-64.
>> And I will send a new patch, in which replace wmb() with dma_wmb(), to customer
>> to confirm.
>
> +1 on dma_wmb, let us know once customer confirms.
> Please place it within mlx4_en_update_rx_prod_db as suggested.
Yes, I have recommended it to customer.
Once I get the result, I will share it here.
> All other calls to mlx4_en_update_rx_prod_db are in control/slow path so I prefer being on the safe side, and care less about bulking the barrier.
>
> Thanks,
> Tariq
>

2018-01-19 15:20:56

by jianchao.wang

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

Hi Tariq

Very sad that the crash was reproduced again after applied the patch.

--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -252,6 +252,7 @@ static inline bool mlx4_en_is_ring_empty(struct mlx4_en_rx_ring *ring)

static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring *ring)
{
+ dma_wmb();
*ring->wqres.db.db = cpu_to_be32(ring->prod & 0xffff);
}

I analyzed the kdump, it should be a memory corruption.

Thanks
Jianchao
On 01/15/2018 01:50 PM, jianchao.wang wrote:
> Hi Tariq
>
> Thanks for your kindly response.
>
> On 01/14/2018 05:47 PM, Tariq Toukan wrote:
>> Thanks Jianchao for your patch.
>>
>> And Thank you guys for your reviews, much appreciated.
>> I was off-work on Friday and Saturday.
>>
>> On 14/01/2018 4:40 AM, jianchao.wang wrote:
>>> Dear all
>>>
>>> Thanks for the kindly response and reviewing. That's really appreciated.
>>>
>>> On 01/13/2018 12:46 AM, Eric Dumazet wrote:
>>>>> Does this need to be dma_wmb(), and should it be in
>>>>> mlx4_en_update_rx_prod_db ?
>>>>>
>>>> +1 on dma_wmb()
>>>>
>>>> On what architecture bug was observed ?
>>> This issue was observed on x86-64.
>>> And I will send a new patch, in which replace wmb() with dma_wmb(), to customer
>>> to confirm.
>>
>> +1 on dma_wmb, let us know once customer confirms.
>> Please place it within mlx4_en_update_rx_prod_db as suggested.
> Yes, I have recommended it to customer.
> Once I get the result, I will share it here.
>> All other calls to mlx4_en_update_rx_prod_db are in control/slow path so I prefer being on the safe side, and care less about bulking the barrier.
>>
>> Thanks,
>> Tariq
>>
>

2018-01-19 15:51:17

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
> Hi Tariq
>
> Very sad that the crash was reproduced again after applied the patch.
>
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -252,6 +252,7 @@ static inline bool mlx4_en_is_ring_empty(struct mlx4_en_rx_ring *ring)
>
> static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring *ring)
> {
> + dma_wmb();

So... is wmb() here fixing the issue ?

> *ring->wqres.db.db = cpu_to_be32(ring->prod & 0xffff);
> }
>
> I analyzed the kdump, it should be a memory corruption.
>
> Thanks
> Jianchao
> On 01/15/2018 01:50 PM, jianchao.wang wrote:
> > Hi Tariq
> >
> > Thanks for your kindly response.
> >
> > On 01/14/2018 05:47 PM, Tariq Toukan wrote:
> > > Thanks Jianchao for your patch.
> > >
> > > And Thank you guys for your reviews, much appreciated.
> > > I was off-work on Friday and Saturday.
> > >
> > > On 14/01/2018 4:40 AM, jianchao.wang wrote:
> > > > Dear all
> > > >
> > > > Thanks for the kindly response and reviewing. That's really appreciated.
> > > >
> > > > On 01/13/2018 12:46 AM, Eric Dumazet wrote:
> > > > > > Does this need to be dma_wmb(), and should it be in
> > > > > > mlx4_en_update_rx_prod_db ?
> > > > > >
> > > > >
> > > > > +1 on dma_wmb()
> > > > >
> > > > > On what architecture bug was observed ?
> > > >
> > > > This issue was observed on x86-64.
> > > > And I will send a new patch, in which replace wmb() with dma_wmb(), to customer
> > > > to confirm.
> > >
> > > +1 on dma_wmb, let us know once customer confirms.
> > > Please place it within mlx4_en_update_rx_prod_db as suggested.
> >
> > Yes, I have recommended it to customer.
> > Once I get the result, I will share it here.
> > > All other calls to mlx4_en_update_rx_prod_db are in control/slow path so I prefer being on the safe side, and care less about bulking the barrier.
> > >
> > > Thanks,
> > > Tariq
> > >

2018-01-21 09:33:13

by Tariq Toukan

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating



On 19/01/2018 5:49 PM, Eric Dumazet wrote:
> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
>> Hi Tariq
>>
>> Very sad that the crash was reproduced again after applied the patch.
>>
>> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> @@ -252,6 +252,7 @@ static inline bool mlx4_en_is_ring_empty(struct mlx4_en_rx_ring *ring)
>>
>> static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring *ring)
>> {
>> + dma_wmb();
>
> So... is wmb() here fixing the issue ?
>
>> *ring->wqres.db.db = cpu_to_be32(ring->prod & 0xffff);
>> }
>>
>> I analyzed the kdump, it should be a memory corruption.
>>
>> Thanks
>> Jianchao

Hmm, this is actually consistent with the example below [1].

AIU from the example, it seems that the dma_wmb/dma_rmb barriers are
good for synchronizing cpu/device accesses to the "Streaming DMA mapped"
buffers (the descriptors, went through the dma_map_page() API), but not
for the doorbell (a coherent memory, typically allocated via
dma_alloc_coherent) that requires using the stronger wmb() barrier.


[1] Documentation/memory-barriers.txt

(*) dma_wmb();
(*) dma_rmb();

These are for use with consistent memory to guarantee the ordering
of writes or reads of shared memory accessible to both the CPU and a
DMA capable device.

For example, consider a device driver that shares memory with a device
and uses a descriptor status value to indicate if the descriptor
belongs
to the device or the CPU, and a doorbell to notify it when new
descriptors are available:

if (desc->status != DEVICE_OWN) {
/* do not read data until we own descriptor */
dma_rmb();

/* read/modify data */
read_data = desc->data;
desc->data = write_data;

/* flush modifications before status update */
dma_wmb();

/* assign ownership */
desc->status = DEVICE_OWN;

/* force memory to sync before notifying device via MMIO */
wmb();

/* notify device of new descriptors */
writel(DESC_NOTIFY, doorbell);
}

The dma_rmb() allows us guarantee the device has released ownership
before we read the data from the descriptor, and the dma_wmb() allows
us to guarantee the data is written to the descriptor before the
device
can see it now has ownership. The wmb() is needed to guarantee
that the
cache coherent memory writes have completed before attempting a
write to
the cache incoherent MMIO region.

See Documentation/DMA-API.txt for more information on consistent
memory.


>> On 01/15/2018 01:50 PM, jianchao.wang wrote:
>>> Hi Tariq
>>>
>>> Thanks for your kindly response.
>>>
>>> On 01/14/2018 05:47 PM, Tariq Toukan wrote:
>>>> Thanks Jianchao for your patch.
>>>>
>>>> And Thank you guys for your reviews, much appreciated.
>>>> I was off-work on Friday and Saturday.
>>>>
>>>> On 14/01/2018 4:40 AM, jianchao.wang wrote:
>>>>> Dear all
>>>>>
>>>>> Thanks for the kindly response and reviewing. That's really appreciated.
>>>>>
>>>>> On 01/13/2018 12:46 AM, Eric Dumazet wrote:
>>>>>>> Does this need to be dma_wmb(), and should it be in
>>>>>>> mlx4_en_update_rx_prod_db ?
>>>>>>>
>>>>>>
>>>>>> +1 on dma_wmb()
>>>>>>
>>>>>> On what architecture bug was observed ?
>>>>>
>>>>> This issue was observed on x86-64.
>>>>> And I will send a new patch, in which replace wmb() with dma_wmb(), to customer
>>>>> to confirm.
>>>>
>>>> +1 on dma_wmb, let us know once customer confirms.
>>>> Please place it within mlx4_en_update_rx_prod_db as suggested.
>>>
>>> Yes, I have recommended it to customer.
>>> Once I get the result, I will share it here.
>>>> All other calls to mlx4_en_update_rx_prod_db are in control/slow path so I prefer being on the safe side, and care less about bulking the barrier.
>>>>
>>>> Thanks,
>>>> Tariq
>>>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2018-01-21 16:26:00

by Tariq Toukan

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating



On 21/01/2018 11:31 AM, Tariq Toukan wrote:
>
>
> On 19/01/2018 5:49 PM, Eric Dumazet wrote:
>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
>>> Hi Tariq
>>>
>>> Very sad that the crash was reproduced again after applied the patch.

Memory barriers vary for different Archs, can you please share more
details regarding arch and repro steps?

>>>
>>> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>>> @@ -252,6 +252,7 @@ static inline bool mlx4_en_is_ring_empty(struct
>>> mlx4_en_rx_ring *ring)
>>>   static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring
>>> *ring)
>>>   {
>>> +    dma_wmb();
>>
>> So... is wmb() here fixing the issue ?
>>
>>>       *ring->wqres.db.db = cpu_to_be32(ring->prod & 0xffff);
>>>   }
>>>
>>> I analyzed the kdump, it should be a memory corruption.
>>>
>>> Thanks
>>> Jianchao
>
> Hmm, this is actually consistent with the example below [1].
>
> AIU from the example, it seems that the dma_wmb/dma_rmb barriers are
> good for synchronizing cpu/device accesses to the "Streaming DMA mapped"
> buffers (the descriptors, went through the dma_map_page() API), but not
> for the doorbell (a coherent memory, typically allocated via
> dma_alloc_coherent) that requires using the stronger wmb() barrier.
>
>
> [1] Documentation/memory-barriers.txt
>
>  (*) dma_wmb();
>  (*) dma_rmb();
>
>      These are for use with consistent memory to guarantee the ordering
>      of writes or reads of shared memory accessible to both the CPU and a
>      DMA capable device.
>
>      For example, consider a device driver that shares memory with a
> device
>      and uses a descriptor status value to indicate if the descriptor
> belongs
>      to the device or the CPU, and a doorbell to notify it when new
>      descriptors are available:
>
>     if (desc->status != DEVICE_OWN) {
>         /* do not read data until we own descriptor */
>         dma_rmb();
>
>         /* read/modify data */
>         read_data = desc->data;
>         desc->data = write_data;
>
>         /* flush modifications before status update */
>         dma_wmb();
>
>         /* assign ownership */
>         desc->status = DEVICE_OWN;
>
>         /* force memory to sync before notifying device via MMIO */
>         wmb();
>
>         /* notify device of new descriptors */
>         writel(DESC_NOTIFY, doorbell);
>     }
>
>      The dma_rmb() allows us guarantee the device has released ownership
>      before we read the data from the descriptor, and the dma_wmb() allows
>      us to guarantee the data is written to the descriptor before the
> device
>      can see it now has ownership.  The wmb() is needed to guarantee
> that the
>      cache coherent memory writes have completed before attempting a
> write to
>      the cache incoherent MMIO region.
>
>      See Documentation/DMA-API.txt for more information on consistent
> memory.
>
>
>>> On 01/15/2018 01:50 PM, jianchao.wang wrote:
>>>> Hi Tariq
>>>>
>>>> Thanks for your kindly response.
>>>>
>>>> On 01/14/2018 05:47 PM, Tariq Toukan wrote:
>>>>> Thanks Jianchao for your patch.
>>>>>
>>>>> And Thank you guys for your reviews, much appreciated.
>>>>> I was off-work on Friday and Saturday.
>>>>>
>>>>> On 14/01/2018 4:40 AM, jianchao.wang wrote:
>>>>>> Dear all
>>>>>>
>>>>>> Thanks for the kindly response and reviewing. That's really
>>>>>> appreciated.
>>>>>>
>>>>>> On 01/13/2018 12:46 AM, Eric Dumazet wrote:
>>>>>>>> Does this need to be dma_wmb(), and should it be in
>>>>>>>> mlx4_en_update_rx_prod_db ?
>>>>>>>>
>>>>>>>
>>>>>>> +1 on dma_wmb()
>>>>>>>
>>>>>>> On what architecture bug was observed ?
>>>>>>
>>>>>> This issue was observed on x86-64.
>>>>>> And I will send a new patch, in which replace wmb() with
>>>>>> dma_wmb(), to customer
>>>>>> to confirm.
>>>>>
>>>>> +1 on dma_wmb, let us know once customer confirms.
>>>>> Please place it within mlx4_en_update_rx_prod_db as suggested.
>>>>
>>>> Yes, I have recommended it to customer.
>>>> Once I get the result, I will share it here.
>>>>> All other calls to mlx4_en_update_rx_prod_db are in control/slow
>>>>> path so I prefer being on the safe side, and care less about
>>>>> bulking the barrier.
>>>>>
>>>>> Thanks,
>>>>> Tariq
>>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to [email protected]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

2018-01-21 16:43:57

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

On Sun, 2018-01-21 at 18:24 +0200, Tariq Toukan wrote:
>
> On 21/01/2018 11:31 AM, Tariq Toukan wrote:
> >
> >
> > On 19/01/2018 5:49 PM, Eric Dumazet wrote:
> > > On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
> > > > Hi Tariq
> > > >
> > > > Very sad that the crash was reproduced again after applied the patch.
>
> Memory barriers vary for different Archs, can you please share more
> details regarding arch and repro steps?

Yeah, mlx4 NICs in Google fleet receive trillions of packets per
second, and we never noticed an issue.

Although we are using a slightly different driver, using order-0 pages
and fast pages recycling.


2018-01-21 20:41:14

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

> Hmm, this is actually consistent with the example below [1].
>
> AIU from the example, it seems that the dma_wmb/dma_rmb barriers are good
> for synchronizing cpu/device accesses to the "Streaming DMA mapped" buffers
> (the descriptors, went through the dma_map_page() API), but not for the
> doorbell (a coherent memory, typically allocated via dma_alloc_coherent)
> that requires using the stronger wmb() barrier.

If x86 truely requires a wmb() (aka SFENCE) here then the userspace
RDMA stuff is broken too, and that has been tested to death at this
point..

I looked into this at one point and I thought I concluded that x86 did
not require a SFENCE between a posted PCI write and writes to system
memory to guarnetee order with-respect-to the PCI device?

Well, so long as non-temporal stores and other specialty accesses are
not being used.. Is there a chance a fancy sse optimized memcpy or
memset, crypto or something is being involved here?

However, Documentation/memory-barriers.txt does seem pretty clear that
the kernel definition of wmb() makes it required here, even if it
might be overkill for x86?

Jason

2018-01-22 02:12:59

by jianchao.wang

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

Hi Tariq and all

Many thanks for your kindly and detailed response and comment.

On 01/22/2018 12:24 AM, Tariq Toukan wrote:
>
>
> On 21/01/2018 11:31 AM, Tariq Toukan wrote:
>>
>>
>> On 19/01/2018 5:49 PM, Eric Dumazet wrote:
>>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
>>>> Hi Tariq
>>>>
>>>> Very sad that the crash was reproduced again after applied the patch.
>
> Memory barriers vary for different Archs, can you please share more details regarding arch and repro steps?The hardware is HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
The xen is installed. The crash occurred in DOM0.
Regarding to the repro steps, it is a customer's test which does heavy disk I/O over NFS storage without any guest.

The patch that can fix this issue is as follow:
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -1005,6 +1005,7 @@ out:
wmb(); /* ensure HW sees CQ consumer before we post new buffers */
ring->cons = cq->mcq.cons_index;
mlx4_en_refill_rx_buffers(priv, ring);
+ wmb();
mlx4_en_update_rx_prod_db(ring);
return polled;
}

Thanks
Jianchao
>
>>>>
>>>> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>>>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>>>> @@ -252,6 +252,7 @@ static inline bool mlx4_en_is_ring_empty(struct mlx4_en_rx_ring *ring)
>>>>   static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring *ring)
>>>>   {
>>>> +    dma_wmb();
>>>
>>> So... is wmb() here fixing the issue ?
>>>
>>>>       *ring->wqres.db.db = cpu_to_be32(ring->prod & 0xffff);
>>>>   }
>>>>
>>>> I analyzed the kdump, it should be a memory corruption.
>>>>
>>>> Thanks
>>>> Jianchao
>>
>> Hmm, this is actually consistent with the example below [1].
>>
>> AIU from the example, it seems that the dma_wmb/dma_rmb barriers are good for synchronizing cpu/device accesses to the "Streaming DMA mapped" buffers (the descriptors, went through the dma_map_page() API), but not for the doorbell (a coherent memory, typically allocated via dma_alloc_coherent) that requires using the stronger wmb() barrier.
>>
>>
>> [1] Documentation/memory-barriers.txt
>>
>>   (*) dma_wmb();
>>   (*) dma_rmb();
>>
>>       These are for use with consistent memory to guarantee the ordering
>>       of writes or reads of shared memory accessible to both the CPU and a
>>       DMA capable device.
>>
>>       For example, consider a device driver that shares memory with a device
>>       and uses a descriptor status value to indicate if the descriptor belongs
>>       to the device or the CPU, and a doorbell to notify it when new
>>       descriptors are available:
>>
>>      if (desc->status != DEVICE_OWN) {
>>          /* do not read data until we own descriptor */
>>          dma_rmb();
>>
>>          /* read/modify data */
>>          read_data = desc->data;
>>          desc->data = write_data;
>>
>>          /* flush modifications before status update */
>>          dma_wmb();
>>
>>          /* assign ownership */
>>          desc->status = DEVICE_OWN;
>>
>>          /* force memory to sync before notifying device via MMIO */
>>          wmb();
>>
>>          /* notify device of new descriptors */
>>          writel(DESC_NOTIFY, doorbell);
>>      }
>>
>>       The dma_rmb() allows us guarantee the device has released ownership
>>       before we read the data from the descriptor, and the dma_wmb() allows
>>       us to guarantee the data is written to the descriptor before the device
>>       can see it now has ownership.  The wmb() is needed to guarantee that the
>>       cache coherent memory writes have completed before attempting a write to
>>       the cache incoherent MMIO region.
>>
>>       See Documentation/DMA-API.txt for more information on consistent memory.
>>
>>
>>>> On 01/15/2018 01:50 PM, jianchao.wang wrote:
>>>>> Hi Tariq
>>>>>
>>>>> Thanks for your kindly response.
>>>>>
>>>>> On 01/14/2018 05:47 PM, Tariq Toukan wrote:
>>>>>> Thanks Jianchao for your patch.
>>>>>>
>>>>>> And Thank you guys for your reviews, much appreciated.
>>>>>> I was off-work on Friday and Saturday.
>>>>>>
>>>>>> On 14/01/2018 4:40 AM, jianchao.wang wrote:
>>>>>>> Dear all
>>>>>>>
>>>>>>> Thanks for the kindly response and reviewing. That's really appreciated.
>>>>>>>
>>>>>>> On 01/13/2018 12:46 AM, Eric Dumazet wrote:
>>>>>>>>> Does this need to be dma_wmb(), and should it be in
>>>>>>>>> mlx4_en_update_rx_prod_db ?
>>>>>>>>>
>>>>>>>>
>>>>>>>> +1 on dma_wmb()
>>>>>>>>
>>>>>>>> On what architecture bug was observed ?
>>>>>>>
>>>>>>> This issue was observed on x86-64.
>>>>>>> And I will send a new patch, in which replace wmb() with dma_wmb(), to customer
>>>>>>> to confirm.
>>>>>>
>>>>>> +1 on dma_wmb, let us know once customer confirms.
>>>>>> Please place it within mlx4_en_update_rx_prod_db as suggested.
>>>>>
>>>>> Yes, I have recommended it to customer.
>>>>> Once I get the result, I will share it here.
>>>>>> All other calls to mlx4_en_update_rx_prod_db are in control/slow path so I prefer being on the safe side, and care less about bulking the barrier.
>>>>>>
>>>>>> Thanks,
>>>>>> Tariq
>>>>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to [email protected]
>>> More majordomo info at  https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=s8_-sqvK_-1EHwvxh5DBpBIakIb0lpcn0fN6zbFxgpk&s=q3jITeGfYvYPdMo8vqfURwAbUNbSrVi2pkJfmPVGUH8&e=
>>>
>

2018-01-22 02:41:45

by jianchao.wang

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

Hi Eric

On 01/22/2018 12:43 AM, Eric Dumazet wrote:
> On Sun, 2018-01-21 at 18:24 +0200, Tariq Toukan wrote:
>>
>> On 21/01/2018 11:31 AM, Tariq Toukan wrote:
>>>
>>>
>>> On 19/01/2018 5:49 PM, Eric Dumazet wrote:
>>>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
>>>>> Hi Tariq
>>>>>
>>>>> Very sad that the crash was reproduced again after applied the patch.
>>
>> Memory barriers vary for different Archs, can you please share more
>> details regarding arch and repro steps?
>
> Yeah, mlx4 NICs in Google fleet receive trillions of packets per
> second, and we never noticed an issue.
>
> Although we are using a slightly different driver, using order-0 pages
> and fast pages recycling.
>
>
The driver we use will will set the page reference count to (size of pages)/stride, the
pages will be freed by networking stack when the reference become zero, and the order-3
pages maybe allocated soon, this give NIC device a chance to corrupt the pages which have
been allocated by others, such as slab.
In the current version with order-0 and page recycling, maybe the corruption occurred on
the inbound packets sometimes and just cause some bad and invalid packets which will be
dropped.

Thanks
Jianchao

2018-01-22 15:48:49

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

On Mon, Jan 22, 2018 at 10:40:53AM +0800, jianchao.wang wrote:
> Hi Eric
>
> On 01/22/2018 12:43 AM, Eric Dumazet wrote:
> > On Sun, 2018-01-21 at 18:24 +0200, Tariq Toukan wrote:
> >>
> >> On 21/01/2018 11:31 AM, Tariq Toukan wrote:
> >>>
> >>>
> >>> On 19/01/2018 5:49 PM, Eric Dumazet wrote:
> >>>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
> >>>>> Hi Tariq
> >>>>>
> >>>>> Very sad that the crash was reproduced again after applied the patch.
> >>
> >> Memory barriers vary for different Archs, can you please share more
> >> details regarding arch and repro steps?
> >
> > Yeah, mlx4 NICs in Google fleet receive trillions of packets per
> > second, and we never noticed an issue.
> >
> > Although we are using a slightly different driver, using order-0 pages
> > and fast pages recycling.
> >
> >
> The driver we use will will set the page reference count to (size of pages)/stride, the
> pages will be freed by networking stack when the reference become zero, and the order-3
> pages maybe allocated soon, this give NIC device a chance to corrupt the pages which have
> been allocated by others, such as slab.

But it looks like the wmb() is placed when stuffing new rx descriptors
into the device - how can it prevent corruption of pages where
ownership was transfered from device to the host? That sounds more like a
rmb() is missing someplace to me...

(Granted the missing wmb() is a bug, but it may not be fully solving this
issue??)

Jason

2018-01-23 03:26:47

by jianchao.wang

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

Hi Jason

Thanks for your kindly response.

On 01/22/2018 11:47 PM, Jason Gunthorpe wrote:
>>> Yeah, mlx4 NICs in Google fleet receive trillions of packets per
>>> second, and we never noticed an issue.
>>>
>>> Although we are using a slightly different driver, using order-0 pages
>>> and fast pages recycling.
>>>
>>>
>> The driver we use will will set the page reference count to (size of pages)/stride, the
>> pages will be freed by networking stack when the reference become zero, and the order-3
>> pages maybe allocated soon, this give NIC device a chance to corrupt the pages which have
>> been allocated by others, such as slab.
> But it looks like the wmb() is placed when stuffing new rx descriptors
> into the device - how can it prevent corruption of pages where
> ownership was transfered from device to the host? That sounds more like a
> rmb() is missing someplace to me...
>
The device may see the prod_db updating before rx_desc updating.
Then it will get stale rx_descs.
These stale rx_descs may contain pages that has been freed to host.


> (Granted the missing wmb() is a bug, but it may not be fully solving this
> issue??)the wmb() here fix one of the customer's test case.

Thanks
Jianchao

2018-01-25 03:28:29

by jianchao.wang

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

Hi Tariq

On 01/22/2018 10:12 AM, jianchao.wang wrote:
>>> On 19/01/2018 5:49 PM, Eric Dumazet wrote:
>>>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
>>>>> Hi Tariq
>>>>>
>>>>> Very sad that the crash was reproduced again after applied the patch.
>> Memory barriers vary for different Archs, can you please share more details regarding arch and repro steps?

> The hardware is HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
> The xen is installed. The crash occurred in DOM0.
> Regarding to the repro steps, it is a customer's test which does heavy disk I/O over NFS storage without any guest.
>

What is the finial suggestion on this ?
If use wmb there, is the performance pulled down ?

Thanks in advance
Jianchao

2018-01-25 03:56:29

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

On Thu, 2018-01-25 at 11:27 +0800, jianchao.wang wrote:
> Hi Tariq
>
> On 01/22/2018 10:12 AM, jianchao.wang wrote:
> > > > On 19/01/2018 5:49 PM, Eric Dumazet wrote:
> > > > > On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
> > > > > > Hi Tariq
> > > > > >
> > > > > > Very sad that the crash was reproduced again after applied the patch.
> > >
> > > Memory barriers vary for different Archs, can you please share more details regarding arch and repro steps?
> > The hardware is HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
> > The xen is installed. The crash occurred in DOM0.
> > Regarding to the repro steps, it is a customer's test which does heavy disk I/O over NFS storage without any guest.
> >
>
> What is the finial suggestion on this ?
> If use wmb there, is the performance pulled down ?

Since https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=dad42c3038a59d27fced28ee4ec1d4a891b28155

we batch allocations, so mlx4_en_refill_rx_buffers() is not called that often.

I doubt the additional wmb() will have serious impact there.


2018-01-25 06:26:08

by jianchao.wang

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

Hi Eric

Thanks for you kindly response and suggestion.
That's really appreciated.

Jianchao

On 01/25/2018 11:55 AM, Eric Dumazet wrote:
> On Thu, 2018-01-25 at 11:27 +0800, jianchao.wang wrote:
>> Hi Tariq
>>
>> On 01/22/2018 10:12 AM, jianchao.wang wrote:
>>>>> On 19/01/2018 5:49 PM, Eric Dumazet wrote:
>>>>>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
>>>>>>> Hi Tariq
>>>>>>>
>>>>>>> Very sad that the crash was reproduced again after applied the patch.
>>>>
>>>> Memory barriers vary for different Archs, can you please share more details regarding arch and repro steps?
>>> The hardware is HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
>>> The xen is installed. The crash occurred in DOM0.
>>> Regarding to the repro steps, it is a customer's test which does heavy disk I/O over NFS storage without any guest.
>>>
>>
>> What is the finial suggestion on this ?
>> If use wmb there, is the performance pulled down ?
>
> Since https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_davem_net-2Dnext.git_commit_-3Fid-3Ddad42c3038a59d27fced28ee4ec1d4a891b28155&d=DwICaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=c0oI8duFkyFBILMQYDsqRApHQrOlLY_2uGiz_utcd7s&s=E4_XKmSI0B63qB0DLQ1EX_fj1bOP78ZdeYADBf33B-k&e=
>
> we batch allocations, so mlx4_en_refill_rx_buffers() is not called that often.
>
> I doubt the additional wmb() will have serious impact there.
>
>

2018-01-25 09:55:44

by Tariq Toukan

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating



On 25/01/2018 8:25 AM, jianchao.wang wrote:
> Hi Eric
>
> Thanks for you kindly response and suggestion.
> That's really appreciated.
>
> Jianchao
>
> On 01/25/2018 11:55 AM, Eric Dumazet wrote:
>> On Thu, 2018-01-25 at 11:27 +0800, jianchao.wang wrote:
>>> Hi Tariq
>>>
>>> On 01/22/2018 10:12 AM, jianchao.wang wrote:
>>>>>> On 19/01/2018 5:49 PM, Eric Dumazet wrote:
>>>>>>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
>>>>>>>> Hi Tariq
>>>>>>>>
>>>>>>>> Very sad that the crash was reproduced again after applied the patch.
>>>>>
>>>>> Memory barriers vary for different Archs, can you please share more details regarding arch and repro steps?
>>>> The hardware is HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
>>>> The xen is installed. The crash occurred in DOM0.
>>>> Regarding to the repro steps, it is a customer's test which does heavy disk I/O over NFS storage without any guest.
>>>>
>>>
>>> What is the finial suggestion on this ?
>>> If use wmb there, is the performance pulled down ?

I want to evaluate this effect.
I agree with Eric, expected impact is restricted, especially after
batching the allocations.

>>
>> Since https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_davem_net-2Dnext.git_commit_-3Fid-3Ddad42c3038a59d27fced28ee4ec1d4a891b28155&d=DwICaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=c0oI8duFkyFBILMQYDsqRApHQrOlLY_2uGiz_utcd7s&s=E4_XKmSI0B63qB0DLQ1EX_fj1bOP78ZdeYADBf33B-k&e=
>>
>> we batch allocations, so mlx4_en_refill_rx_buffers() is not called that often.
>>
>> I doubt the additional wmb() will have serious impact there.
>>

I will test the effect (it'll be beginning of next week).
I'll update so we can make a more confident decision.

Thanks,
Tariq

>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2018-01-27 12:42:24

by jianchao.wang

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

Hi Tariq

Thanks for your kindly response.
That's really appreciated.

On 01/25/2018 05:54 PM, Tariq Toukan wrote:
>
>
> On 25/01/2018 8:25 AM, jianchao.wang wrote:
>> Hi Eric
>>
>> Thanks for you kindly response and suggestion.
>> That's really appreciated.
>>
>> Jianchao
>>
>> On 01/25/2018 11:55 AM, Eric Dumazet wrote:
>>> On Thu, 2018-01-25 at 11:27 +0800, jianchao.wang wrote:
>>>> Hi Tariq
>>>>
>>>> On 01/22/2018 10:12 AM, jianchao.wang wrote:
>>>>>>> On 19/01/2018 5:49 PM, Eric Dumazet wrote:
>>>>>>>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
>>>>>>>>> Hi Tariq
>>>>>>>>>
>>>>>>>>> Very sad that the crash was reproduced again after applied the patch.
>>>>>>
>>>>>> Memory barriers vary for different Archs, can you please share more details regarding arch and repro steps?
>>>>> The hardware is HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
>>>>> The xen is installed. The crash occurred in DOM0.
>>>>> Regarding to the repro steps, it is a customer's test which does heavy disk I/O over NFS storage without any guest.
>>>>>
>>>>
>>>> What is the finial suggestion on this ?
>>>> If use wmb there, is the performance pulled down ?
>
> I want to evaluate this effect.
> I agree with Eric, expected impact is restricted, especially after batching the allocations.>
>>>
>>> Since https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_davem_net-2Dnext.git_commit_-3Fid-3Ddad42c3038a59d27fced28ee4ec1d4a891b28155&d=DwICaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=c0oI8duFkyFBILMQYDsqRApHQrOlLY_2uGiz_utcd7s&s=E4_XKmSI0B63qB0DLQ1EX_fj1bOP78ZdeYADBf33B-k&e=
>>>
>>> we batch allocations, so mlx4_en_refill_rx_buffers() is not called that often.
>>>
>>> I doubt the additional wmb() will have serious impact there.
>>>
>
> I will test the effect (it'll be beginning of next week).
> I'll update so we can make a more confident decision.
>
I have also sent patches with wmb and batching allocations to customer and let them check whether the performance is impacted.
And update here asap when get feedback.

> Thanks,
> Tariq
>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to [email protected]
>> More majordomo info at  https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DwICaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=f0myCdBQoRjaklxGau_S9ZtQKSQYALW9p2MIuTMAEYo&s=447fFu-xZoLvmxdaVhijK6cUk4Jcx7GtBCNddQT4GOQ&e=
>>
>

2019-01-02 04:02:19

by jianchao.wang

[permalink] [raw]
Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating



On 12/31/18 12:27 AM, Tariq Toukan wrote:
>
>
> On 1/27/2018 2:41 PM, jianchao.wang wrote:
>> Hi Tariq
>>
>> Thanks for your kindly response.
>> That's really appreciated.
>>
>> On 01/25/2018 05:54 PM, Tariq Toukan wrote:
>>>
>>>
>>> On 25/01/2018 8:25 AM, jianchao.wang wrote:
>>>> Hi Eric
>>>>
>>>> Thanks for you kindly response and suggestion.
>>>> That's really appreciated.
>>>>
>>>> Jianchao
>>>>
>>>> On 01/25/2018 11:55 AM, Eric Dumazet wrote:
>>>>> On Thu, 2018-01-25 at 11:27 +0800, jianchao.wang wrote:
>>>>>> Hi Tariq
>>>>>>
>>>>>> On 01/22/2018 10:12 AM, jianchao.wang wrote:
>>>>>>>>> On 19/01/2018 5:49 PM, Eric Dumazet wrote:
>>>>>>>>>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
>>>>>>>>>>> Hi Tariq
>>>>>>>>>>>
>>>>>>>>>>> Very sad that the crash was reproduced again after applied the patch.
>>>>>>>>
>>>>>>>> Memory barriers vary for different Archs, can you please share more details regarding arch and repro steps?
>>>>>>> The hardware is HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
>>>>>>> The xen is installed. The crash occurred in DOM0.
>>>>>>> Regarding to the repro steps, it is a customer's test which does heavy disk I/O over NFS storage without any guest.
>>>>>>>
>>>>>>
>>>>>> What is the finial suggestion on this ?
>>>>>> If use wmb there, is the performance pulled down ?
>>>
>>> I want to evaluate this effect.
>>> I agree with Eric, expected impact is restricted, especially after batching the allocations.>
>>>>>
>>>>> Since https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_davem_net-2Dnext.git_commit_-3Fid-3Ddad42c3038a59d27fced28ee4ec1d4a891b28155&d=DwICaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=c0oI8duFkyFBILMQYDsqRApHQrOlLY_2uGiz_utcd7s&s=E4_XKmSI0B63qB0DLQ1EX_fj1bOP78ZdeYADBf33B-k&e=
>>>>>
>>>>> we batch allocations, so mlx4_en_refill_rx_buffers() is not called that often.
>>>>>
>>>>> I doubt the additional wmb() will have serious impact there.
>>>>>
>>>
>>> I will test the effect (it'll be beginning of next week).
>>> I'll update so we can make a more confident decision.
>>>
>> I have also sent patches with wmb and batching allocations to customer and let them check whether the performance is impacted.
>> And update here asap when get feedback.
>>
>>> Thanks,
>>> Tariq
>>>
>
> Hi Jianchao,
>
> I am interested to push this bug fix.
> Do you want me to submit, or do it yourself?
> Can you elaborate regarding the arch with the repro?
>
> This is the patch I suggest:
>
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -161,6 +161,8 @@ static bool mlx4_en_is_ring_empty(const struct
> mlx4_en_rx_ring *ring)
>
> static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring *ring)
> {
> + /* ensure rx_desc updating reaches HW before prod db updating */
> + wmb();
> *ring->wqres.db.db = cpu_to_be32(ring->prod & 0xffff);
> }
>

Hi Tariq

Happy new year !

The customer provided confused test result for us.
The fix cannot fix their issue.

And we finally find a upstream fix
5d70bd5c98d0e655bde2aae2b5251bdd44df5e71
(net/mlx4_en: fix potential use-after-free with dma_unmap_page)
and killed the issue during Oct 2018. That's really a long way.

Please go ahead with this patch.

Thanks
Jianchao