2018-03-27 17:14:28

by Evgeniy Didin

[permalink] [raw]
Subject: dma-mapping: clearing GFP_ZERO flag caused crashes of Ethernet on arc/hsdk board.

Hello,

After commit 57bf5a8963f8 ("dma-mapping: clear harmful GFP_* flags in common code") we noticed problems with Ethernet controller on one of our platforms (namely ARC HSDK).
I
n particular we see that removal of __GFP_ZERO flag in function dma_alloc_attrs() was the culprit because in our implementation of arc_dma_alloc() we only allocate zeroed pages if
that flag is explicitly set by the caller. Now with unconditional removal of that flag in dma_alloc_attrs() we allocate non-zeroed pages and that seem to cause problems.

From
mentioned commit message I may conclude that architectural code is supposed to always allocate zeroed pages but I cannot find any requirement of that in kernel's documentation.
Coul
d you please point me to that requirement if that exists at all, then we'll implement a fix in our arch code like that:
--------------------->8---------------------
diff --git
a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c
index 1dcc404b5aec..c92e518413aa 100644
--- a/arch/arc/mm/dma.c
+++ b/arch/arc/mm/dma.c
@@ -30,7 +30,7 @@ static void *arc_dma_alloc(struct
device *dev, size_t size,
void *kvaddr;
int need_coh = 1, need_kvaddr = 0;

- page = alloc_pages(gfp, order);
+ page = alloc_pages(gfp | __GFP_ZERO, order);

if (!page)
return NULL;
--------------------->8---------------------

Best regards,
Evgeniy Didin


2018-03-27 18:14:26

by Andy Shevchenko

[permalink] [raw]
Subject: Re: dma-mapping: clearing GFP_ZERO flag caused crashes of Ethernet on arc/hsdk board.

On Tue, Mar 27, 2018 at 8:12 PM, Evgeniy Didin
<[email protected]> wrote:
> Hello,
>
> After commit 57bf5a8963f8 ("dma-mapping: clear harmful GFP_* flags in common code") we noticed problems with Ethernet controller on one of our platforms (namely ARC HSDK).
> I
> n particular we see that removal of __GFP_ZERO flag in function dma_alloc_attrs() was the culprit because in our implementation of arc_dma_alloc() we only allocate zeroed pages if
> that flag is explicitly set by the caller. Now with unconditional removal of that flag in dma_alloc_attrs() we allocate non-zeroed pages and that seem to cause problems.
>
> From
> mentioned commit message I may conclude that architectural code is supposed to always allocate zeroed pages but I cannot find any requirement of that in kernel's documentation.
> Coul
> d you please point me to that requirement if that exists at all, then we'll implement a fix in our arch code like that:

Can you elaborate what driver is in use?
stmmac with dwmac-anarion?

If so, this driver (w/o anarion parts, which I believe doesn't have
anything to do with this) is widely used on other platforms.
We have to see a lot of reports, though only one so far?

The logical question is why?

Another question why caller can't ask for zero pages explicitly?

P.S. Current kernel code shows only 3 use cases of GFP_ZERO. It seems
arm64 has something similar in mind.

> --------------------->8---------------------
> diff --git
> a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c
> index 1dcc404b5aec..c92e518413aa 100644
> --- a/arch/arc/mm/dma.c
> +++ b/arch/arc/mm/dma.c
> @@ -30,7 +30,7 @@ static void *arc_dma_alloc(struct
> device *dev, size_t size,
> void *kvaddr;
> int need_coh = 1, need_kvaddr = 0;
>
> - page = alloc_pages(gfp, order);
> + page = alloc_pages(gfp | __GFP_ZERO, order);
>
> if (!page)
> return NULL;
> --------------------->8---------------------
>
> Best regards,
> Evgeniy Didin



--
With Best Regards,
Andy Shevchenko

2018-03-27 18:26:53

by Vineet Gupta

[permalink] [raw]
Subject: Re: dma-mapping: clearing GFP_ZERO flag caused crashes of Ethernet on arc/hsdk board.

Hi Christoph, Andy

On 03/27/2018 11:11 AM, Andy Shevchenko wrote:
> On Tue, Mar 27, 2018 at 8:12 PM, Evgeniy Didin
> <[email protected]> wrote:
>> Hello,
>>
>> After commit 57bf5a8963f8 ("dma-mapping: clear harmful GFP_* flags in common code") we noticed problems with Ethernet controller on one of our platforms (namely ARC HSDK).
>> I
>> n particular we see that removal of __GFP_ZERO flag in function dma_alloc_attrs() was the culprit because in our implementation of arc_dma_alloc() we only allocate zeroed pages if
>> that flag is explicitly set by the caller. Now with unconditional removal of that flag in dma_alloc_attrs() we allocate non-zeroed pages and that seem to cause problems.
>>
>> From
>> mentioned commit message I may conclude that architectural code is supposed to always allocate zeroed pages but I cannot find any requirement of that in kernel's documentation.
>> Coul
>> d you please point me to that requirement if that exists at all, then we'll implement a fix in our arch code like that:

[snip]

> Another question why caller can't ask for zero pages explicitly?

Question to whom ? The caller can ask for it - but the problem here is generic dma
API code is clearing out GFP_ZERO and expecting arch code to memst unconditionally
- is that expected of arch code - and is documented ?

That is broken to begin with - arch dma_alloc* simply passes thru gfp flags to
page allocator and doesn't muck around with them. We could in theory but doesn't
seem like the right thing to do IMO.

-Vineet



2018-03-27 21:21:50

by Alexey Brodkin

[permalink] [raw]
Subject: Re: dma-mapping: clearing GFP_ZERO flag caused crashes of Ethernet on arc/hsdk board.

Hi Andy,

On Tue, 2018-03-27 at 21:11 +0300, Andy Shevchenko wrote:
> On Tue, Mar 27, 2018 at 8:12 PM, Evgeniy Didin
> <[email protected]> wrote:
> > Hello,
> >
> > After commit 57bf5a8963f8 ("dma-mapping: clear harmful GFP_* flags in common code") we noticed problems with Ethernet controller on one of our
> > platforms (namely ARC HSDK).
> > I
> > n particular we see that removal of __GFP_ZERO flag in function dma_alloc_attrs() was the culprit because in our implementation of arc_dma_alloc()
> > we only allocate zeroed pages if
> > that flag is explicitly set by the caller. Now with unconditional removal of that flag in dma_alloc_attrs() we allocate non-zeroed pages and that
> > seem to cause problems.
> >
> > From
> > mentioned commit message I may conclude that architectural code is supposed to always allocate zeroed pages but I cannot find any requirement of
> > that in kernel's documentation.
> > Coul
> > d you please point me to that requirement if that exists at all, then we'll implement a fix in our arch code like that:
>
> Can you elaborate what driver is in use?
> stmmac with dwmac-anarion?

It is indeed DW GMAC (AKA STMMAC) with built-in DMA.

> If so, this driver (w/o anarion parts, which I believe doesn't have
> anything to do with this) is widely used on other platforms.
> We have to see a lot of reports, though only one so far?
>
> The logical question is why?

1. See that's another platform with ARC core so maybe in case of ARM
DMA allocator already zeroes pages regardless provided flags -
personally I didn't check that.

2. Even on HSDK we saw that only on attempt to run "iperf", even DHCP
client works perfectly fine on that same platform so maybe others
just don't see problems yet.

3. Who knows if RCs are being tested on other platforms with
networking so maybe similar reports will start to appear once
4.16 gets released.

-Alexey

2018-03-28 08:04:33

by Christoph Hellwig

[permalink] [raw]
Subject: Re: dma-mapping: clearing GFP_ZERO flag caused crashes of Ethernet on arc/hsdk board.

> > The logical question is why?
>
> 1. See that's another platform with ARC core so maybe in case of ARM
> DMA allocator already zeroes pages regardless provided flags -
> personally I didn't check that.

Yes, most architectures always clear memory returned by dma_alloc*.
Looks like a few don't and my commit got them in trouble. As usual
I'd prefer to match x86 semantics for now to avoid problems.

I'll send patches for arc and s390 which seem to be actually used
holdouts, and will look if anyone else is also affected.