2015-11-26 16:34:20

by Pavel Machek

[permalink] [raw]
Subject: 4.3+: Atheros ethernet fails after resume from s2ram, due to order 4 allocation

Hi!

...and dmesg tells us what is going on:

[ 6961.550240] NetworkManager: page allocation failure: order:4,
mode:0x2080020
[ 6961.550249] CPU: 0 PID: 2590 Comm: NetworkManager Tainted: G
W 4.3.0+ #124
[ 6961.550250] Hardware name: Acer Aspire 5732Z/Aspire 5732Z, BIOS
V3.07 02/10/2010
[ 6961.550252] 00000000 00000000 f2ad1a04 c42ba5b8 00000000 f2ad1a2c
c40d650a c4d3ee1c
[ 6961.550260] f34ef600 00000004 02080020 c4eeef40 00000000 00000010
00000000 f2ad1ac8
[ 6961.550266] c40d8caa 02080020 00000004 00000000 00000070 f34ef200
00000060 00000010
[ 6961.550272] Call Trace:
...[ 6961.550299] [<c4006811>] dma_generic_alloc_coherent+0x71/0x120
[ 6961.550301] [<c40067a0>] ? via_no_dac+0x30/0x30
[ 6961.550307] [<c465b16e>] atl1c_open+0x29e/0x300
[ 6961.550313] [<c48b96f5>] ? call_netdevice_notifiers_info+0x25/0x50
[ 6961.550316] [<c48c081b>] __dev_open+0x7b/0xf0
[ 6961.550318] [<c48c0ac9>] __dev_change_flags+0x89/0x140
[ 6961.550320] [<c48c0ba3>] dev_change_flags+0x23/0x60
[ 6961.550325] [<c48ce416>] do_setlink+0x286/0x7b0
[ 6961.550328] [<c42ded02>] ? nla_parse+0x22/0xd0
[ 6961.550330] [<c48cf906>] rtnl_newlink+0x5d6/0x860
[ 6961.550336] [<c407f8a1>] ? __lock_acquire.isra.24+0x3a1/0xc80
[ 6961.550342] [<c4047ae2>] ? ns_capable+0x22/0x60
[ 6961.550345] [<c48e7c5d>] ? __netlink_ns_capable+0x2d/0x40
[ 6961.550351] [<c49c9c54>] ? xprt_transmit+0x94/0x220
[ 6961.550354] [<c48cd9e6>] rtnetlink_rcv_msg+0x76/0x1f0
[ 6961.550356] [<c48cd970>] ? rtnetlink_rcv+0x30/0x30
[ 6961.550359] [<c48eb35e>] netlink_rcv_skb+0x8e/0xb0
...
[ 6961.550412] Mem-Info:
[ 6961.550417] active_anon:30319 inactive_anon:25075 isolated_anon:0
active_file:327764 inactive_file:152179 isolated_file:16
unevictable:0 dirty:6 writeback:0 unstable:0
slab_reclaimable:149091 slab_unreclaimable:18973
mapped:18100 shmem:4847 pagetables:1538 bounce:0
free:57732 free_pcp:10 free_cma:0
...
[ 6961.550492] 485897 total pagecache pages
[ 6961.550494] 1086 pages in swap cache
[ 6961.550496] Swap cache stats: add 16738, delete 15652, find
6708/8500
[ 6961.550497] Free swap = 1656440kB
[ 6961.550498] Total swap = 1681428kB
[ 6961.550499] 785914 pages RAM
[ 6961.550500] 557663 pages HighMem/MovableOnly
[ 6961.550501] 12639 pages reserved
[ 6961.550506] atl1c 0000:05:00.0: pci_alloc_consistend failed
[ 6962.148358] psmouse serio1: synaptics: queried max coordinates: x
[..5772], y [..5086]

Order 4 allocation... probably doable during boot, but not really
suitable during resume.

I'm not sure how repeatable it is, but it definitely happened more
than once.

/*
* real ring DMA buffer
* each ring/block may need up to 8 bytes for alignment, hence the
* additional bytes tacked onto the end.
*/
ring_header->size = size =
sizeof(struct atl1c_tpd_desc) * tpd_ring->count * 2 +
sizeof(struct atl1c_rx_free_desc) * rx_desc_count +
sizeof(struct atl1c_recv_ret_status) * rx_desc_count +
8 * 4;

ring_header->desc = pci_alloc_consistent(pdev, ring_header->size,
&ring_header->dma);
if (unlikely(!ring_header->desc)) {
dev_err(&pdev->dev, "pci_alloc_consistend failed\n");
goto err_nomem;
}

(Note the typo in dev_err... at least it is easy to grep).

Ok, so what went on is easy.. any ideas how to fix it?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


2015-11-26 21:56:53

by Francois Romieu

[permalink] [raw]
Subject: Re: 4.3+: Atheros ethernet fails after resume from s2ram, due to order 4 allocation

Pavel Machek <[email protected]> :
[...]
> Ok, so what went on is easy.. any ideas how to fix it ?

The driver should 1) prohibit holes in its receive ring, 2) allocate before
pushing data up in the stack 3) drop packets when it can't allocate a
fresh buffer and 4) stop releasing receive buffers - and any resource for
that matter - during suspend.

Really.

--
Ueimor

2015-11-27 08:20:16

by Michal Hocko

[permalink] [raw]
Subject: Re: 4.3+: Atheros ethernet fails after resume from s2ram, due to order 4 allocation

On Thu 26-11-15 17:34:13, Pavel Machek wrote:
> Hi!
>
> ...and dmesg tells us what is going on:
>
> [ 6961.550240] NetworkManager: page allocation failure: order:4,
> mode:0x2080020

This is GFP_ATOMIC|___GFP_RECLAIMABLE high order request. So something
that the caller should tollerate to fail.

> [ 6961.550249] CPU: 0 PID: 2590 Comm: NetworkManager Tainted: G
> W 4.3.0+ #124
> [ 6961.550250] Hardware name: Acer Aspire 5732Z/Aspire 5732Z, BIOS
> V3.07 02/10/2010
> [ 6961.550252] 00000000 00000000 f2ad1a04 c42ba5b8 00000000 f2ad1a2c
> c40d650a c4d3ee1c
> [ 6961.550260] f34ef600 00000004 02080020 c4eeef40 00000000 00000010
> 00000000 f2ad1ac8
> [ 6961.550266] c40d8caa 02080020 00000004 00000000 00000070 f34ef200
> 00000060 00000010
> [ 6961.550272] Call Trace:
> ...[ 6961.550299] [<c4006811>] dma_generic_alloc_coherent+0x71/0x120
> [ 6961.550301] [<c40067a0>] ? via_no_dac+0x30/0x30
> [ 6961.550307] [<c465b16e>] atl1c_open+0x29e/0x300
> [ 6961.550313] [<c48b96f5>] ? call_netdevice_notifiers_info+0x25/0x50
> [ 6961.550316] [<c48c081b>] __dev_open+0x7b/0xf0
> [ 6961.550318] [<c48c0ac9>] __dev_change_flags+0x89/0x140
> [ 6961.550320] [<c48c0ba3>] dev_change_flags+0x23/0x60
> [ 6961.550325] [<c48ce416>] do_setlink+0x286/0x7b0
> [ 6961.550328] [<c42ded02>] ? nla_parse+0x22/0xd0
> [ 6961.550330] [<c48cf906>] rtnl_newlink+0x5d6/0x860
> [ 6961.550336] [<c407f8a1>] ? __lock_acquire.isra.24+0x3a1/0xc80
> [ 6961.550342] [<c4047ae2>] ? ns_capable+0x22/0x60
> [ 6961.550345] [<c48e7c5d>] ? __netlink_ns_capable+0x2d/0x40
> [ 6961.550351] [<c49c9c54>] ? xprt_transmit+0x94/0x220
> [ 6961.550354] [<c48cd9e6>] rtnetlink_rcv_msg+0x76/0x1f0
> [ 6961.550356] [<c48cd970>] ? rtnetlink_rcv+0x30/0x30
> [ 6961.550359] [<c48eb35e>] netlink_rcv_skb+0x8e/0xb0
> ...
> [ 6961.550412] Mem-Info:
> [ 6961.550417] active_anon:30319 inactive_anon:25075 isolated_anon:0
> active_file:327764 inactive_file:152179 isolated_file:16
> unevictable:0 dirty:6 writeback:0 unstable:0
> slab_reclaimable:149091 slab_unreclaimable:18973
> mapped:18100 shmem:4847 pagetables:1538 bounce:0
> free:57732 free_pcp:10 free_cma:0
> ...
> [ 6961.550492] 485897 total pagecache pages
> [ 6961.550494] 1086 pages in swap cache
> [ 6961.550496] Swap cache stats: add 16738, delete 15652, find
> 6708/8500
> [ 6961.550497] Free swap = 1656440kB
> [ 6961.550498] Total swap = 1681428kB
> [ 6961.550499] 785914 pages RAM
> [ 6961.550500] 557663 pages HighMem/MovableOnly
> [ 6961.550501] 12639 pages reserved
> [ 6961.550506] atl1c 0000:05:00.0: pci_alloc_consistend failed
> [ 6962.148358] psmouse serio1: synaptics: queried max coordinates: x
> [..5772], y [..5086]
>
> Order 4 allocation... probably doable during boot, but not really
> suitable during resume.
>
> I'm not sure how repeatable it is, but it definitely happened more
> than once.
>
> /*
> * real ring DMA buffer
> * each ring/block may need up to 8 bytes for alignment, hence the
> * additional bytes tacked onto the end.
> */
> ring_header->size = size =
> sizeof(struct atl1c_tpd_desc) * tpd_ring->count * 2 +
> sizeof(struct atl1c_rx_free_desc) * rx_desc_count +
> sizeof(struct atl1c_recv_ret_status) * rx_desc_count +
> 8 * 4;
>
> ring_header->desc = pci_alloc_consistent(pdev, ring_header->size,
> &ring_header->dma);

Why is pci_alloc_consistent doing an unconditional GFP_ATOMIC
allocation? atl1_setup_ring_resources already does GFP_KERNEL
allocation in the same function so this should be sleepable
context. I think we should either add pci_alloc_consistent_gfp if
there are no explicit reasons to not do so or you can workaround
that by opencoding it and using dma_alloc_coherent directly with
GFP_KERNEL|__GFP_REPEAT. This doesn't guarantee a success though
because this is > PAGE_ALLOC_COSTLY_ORDER but it would increase chances
considerably. Also a vmalloc fallback can be used then more safely.

> if (unlikely(!ring_header->desc)) {
> dev_err(&pdev->dev, "pci_alloc_consistend failed\n");
> goto err_nomem;
> }
>
> (Note the typo in dev_err... at least it is easy to grep).
>
> Ok, so what went on is easy.. any ideas how to fix it?

--
Michal Hocko
SUSE Labs

2015-11-28 14:50:08

by Pavel Machek

[permalink] [raw]
Subject: Re: 4.3+: Atheros ethernet fails after resume from s2ram, due to order 4 allocation

Hi!


> > /*
> > * real ring DMA buffer
> > * each ring/block may need up to 8 bytes for alignment, hence the
> > * additional bytes tacked onto the end.
> > */
> > ring_header->size = size =
> > sizeof(struct atl1c_tpd_desc) * tpd_ring->count * 2 +
> > sizeof(struct atl1c_rx_free_desc) * rx_desc_count +
> > sizeof(struct atl1c_recv_ret_status) * rx_desc_count +
> > 8 * 4;
> >
> > ring_header->desc = pci_alloc_consistent(pdev, ring_header->size,
> > &ring_header->dma);
>
> Why is pci_alloc_consistent doing an unconditional GFP_ATOMIC
> allocation? atl1_setup_ring_resources already does GFP_KERNEL
> allocation in the same function so this should be sleepable
> context. I think we should either add pci_alloc_consistent_gfp if
> there are no explicit reasons to not do so or you can workaround

There's existing interface "dma_alloc_coherent" which can be used.

I did not yet try with __GFP_REPEAT, but GFP_KERNEL should already be
big improvement.

Let me send a patch..
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2015-11-28 14:51:17

by Pavel Machek

[permalink] [raw]
Subject: [PATCH] Improve Atheros ethernet driver not to do order 4 GFP_ATOMIC allocation


atl1c driver is doing order-4 allocation with GFP_ATOMIC
priority. That often breaks networking after resume. Switch to
GFP_KERNEL. Still not ideal, but should be significantly better.

Signed-off-by: Pavel Machek <[email protected]>

diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
index 2795d6d..afb71e0 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -1016,10 +1016,10 @@ static int atl1c_setup_ring_resources(struct atl1c_adapter *adapter)
sizeof(struct atl1c_recv_ret_status) * rx_desc_count +
8 * 4;

- ring_header->desc = pci_alloc_consistent(pdev, ring_header->size,
- &ring_header->dma);
+ ring_header->desc = dma_alloc_coherent(&pdev->dev, ring_header->size,
+ &ring_header->dma, GFP_KERNEL);
if (unlikely(!ring_header->desc)) {
- dev_err(&pdev->dev, "pci_alloc_consistend failed\n");
+ dev_err(&pdev->dev, "could not get memmory for DMA buffer\n");
goto err_nomem;
}
memset(ring_header->desc, 0, ring_header->size);

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2015-11-29 21:58:08

by Sergei Shtylyov

[permalink] [raw]
Subject: Re: [PATCH] Improve Atheros ethernet driver not to do order 4 GFP_ATOMIC allocation

Hello.

On 11/28/2015 5:51 PM, Pavel Machek wrote:

> atl1c driver is doing order-4 allocation with GFP_ATOMIC
> priority. That often breaks networking after resume. Switch to
> GFP_KERNEL. Still not ideal, but should be significantly better.
>
> Signed-off-by: Pavel Machek <[email protected]>
>
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index 2795d6d..afb71e0 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> @@ -1016,10 +1016,10 @@ static int atl1c_setup_ring_resources(struct atl1c_adapter *adapter)
> sizeof(struct atl1c_recv_ret_status) * rx_desc_count +
> 8 * 4;
>
> - ring_header->desc = pci_alloc_consistent(pdev, ring_header->size,
> - &ring_header->dma);
> + ring_header->desc = dma_alloc_coherent(&pdev->dev, ring_header->size,
> + &ring_header->dma, GFP_KERNEL);
> if (unlikely(!ring_header->desc)) {
> - dev_err(&pdev->dev, "pci_alloc_consistend failed\n");
> + dev_err(&pdev->dev, "could not get memmory for DMA buffer\n");

s/memmory/memory/.

[...]

MBR, Sergei

2015-11-30 13:21:33

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] Improve Atheros ethernet driver not to do order 4 GFP_ATOMIC allocation

On Sat 28-11-15 15:51:13, Pavel Machek wrote:
>
> atl1c driver is doing order-4 allocation with GFP_ATOMIC
> priority. That often breaks networking after resume. Switch to
> GFP_KERNEL. Still not ideal, but should be significantly better.

It is not clear why GFP_KERNEL can replace GFP_ATOMIC safely neither
from the changelog nor from the patch context. It is correct here
because atl1c_setup_ring_resources is a sleepable context (otherwise
tpd_ring->buffer_info = kzalloc(size, GFP_KERNEL) would be incorrect
already) but a short note wouldn't kill us, would it?

> Signed-off-by: Pavel Machek <[email protected]>

Anyway
Reviewed-by: Michal Hocko <[email protected]>

>
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index 2795d6d..afb71e0 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> @@ -1016,10 +1016,10 @@ static int atl1c_setup_ring_resources(struct atl1c_adapter *adapter)
> sizeof(struct atl1c_recv_ret_status) * rx_desc_count +
> 8 * 4;
>
> - ring_header->desc = pci_alloc_consistent(pdev, ring_header->size,
> - &ring_header->dma);
> + ring_header->desc = dma_alloc_coherent(&pdev->dev, ring_header->size,
> + &ring_header->dma, GFP_KERNEL);
> if (unlikely(!ring_header->desc)) {
> - dev_err(&pdev->dev, "pci_alloc_consistend failed\n");
> + dev_err(&pdev->dev, "could not get memmory for DMA buffer\n");
> goto err_nomem;
> }
> memset(ring_header->desc, 0, ring_header->size);
>
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
Michal Hocko
SUSE Labs

2015-11-30 17:58:26

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] Improve Atheros ethernet driver not to do order 4 GFP_ATOMIC allocation

On Sat, 2015-11-28 at 15:51 +0100, Pavel Machek wrote:
> atl1c driver is doing order-4 allocation with GFP_ATOMIC
> priority. That often breaks networking after resume. Switch to
> GFP_KERNEL. Still not ideal, but should be significantly better.
>
> Signed-off-by: Pavel Machek <[email protected]>
>
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index 2795d6d..afb71e0 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> @@ -1016,10 +1016,10 @@ static int atl1c_setup_ring_resources(struct atl1c_adapter *adapter)
> sizeof(struct atl1c_recv_ret_status) * rx_desc_count +
> 8 * 4;
>
> - ring_header->desc = pci_alloc_consistent(pdev, ring_header->size,
> - &ring_header->dma);
> + ring_header->desc = dma_alloc_coherent(&pdev->dev, ring_header->size,
> + &ring_header->dma, GFP_KERNEL);
> if (unlikely(!ring_header->desc)) {
> - dev_err(&pdev->dev, "pci_alloc_consistend failed\n");
> + dev_err(&pdev->dev, "could not get memmory for DMA buffer\n");
> goto err_nomem;
> }
> memset(ring_header->desc, 0, ring_header->size);
>

It seems there is a missed opportunity to get rid of the memset() here,
by adding __GFP_ZERO to the dma_alloc_coherent() GFP_KERNEL mask,
or simply using dma_zalloc_coherent()




2015-12-01 20:35:22

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] Improve Atheros ethernet driver not to do order 4 GFP_ATOMIC allocation

From: Michal Hocko <[email protected]>
Date: Mon, 30 Nov 2015 14:21:29 +0100

> On Sat 28-11-15 15:51:13, Pavel Machek wrote:
>>
>> atl1c driver is doing order-4 allocation with GFP_ATOMIC
>> priority. That often breaks networking after resume. Switch to
>> GFP_KERNEL. Still not ideal, but should be significantly better.
>
> It is not clear why GFP_KERNEL can replace GFP_ATOMIC safely neither
> from the changelog nor from the patch context.

Earlier in the function we do a GFP_KERNEL kmalloc so:

¯\_(ツ)_/¯

It should be fine.
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-12-01 20:36:48

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] Improve Atheros ethernet driver not to do order 4 GFP_ATOMIC allocation

From: Eric Dumazet <[email protected]>
Date: Mon, 30 Nov 2015 09:58:23 -0800

> On Sat, 2015-11-28 at 15:51 +0100, Pavel Machek wrote:
>> atl1c driver is doing order-4 allocation with GFP_ATOMIC
>> priority. That often breaks networking after resume. Switch to
>> GFP_KERNEL. Still not ideal, but should be significantly better.
>>
>> Signed-off-by: Pavel Machek <[email protected]>
>>
>> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
>> index 2795d6d..afb71e0 100644
>> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
>> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
>> @@ -1016,10 +1016,10 @@ static int atl1c_setup_ring_resources(struct atl1c_adapter *adapter)
>> sizeof(struct atl1c_recv_ret_status) * rx_desc_count +
>> 8 * 4;
>>
>> - ring_header->desc = pci_alloc_consistent(pdev, ring_header->size,
>> - &ring_header->dma);
>> + ring_header->desc = dma_alloc_coherent(&pdev->dev, ring_header->size,
>> + &ring_header->dma, GFP_KERNEL);
>> if (unlikely(!ring_header->desc)) {
>> - dev_err(&pdev->dev, "pci_alloc_consistend failed\n");
>> + dev_err(&pdev->dev, "could not get memmory for DMA buffer\n");
>> goto err_nomem;
>> }
>> memset(ring_header->desc, 0, ring_header->size);
>>
>
> It seems there is a missed opportunity to get rid of the memset() here,
> by adding __GFP_ZERO to the dma_alloc_coherent() GFP_KERNEL mask,
> or simply using dma_zalloc_coherent()

Also, the Subject line needs to be adjusted. The proper format for
the Subject line is:

[PATCH $TREE] $subsystem: $description.

Where "$TREE" is either 'net' or 'net-next', $subsystem is the lowercase
name of the driver (here 'atl1c') and then a colon, and then a space, and
then the single-line description.

Thanks.

2015-12-03 07:49:09

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] Improve Atheros ethernet driver not to do order 4 GFP_ATOMIC allocation

On Wed 2015-12-02 22:43:31, Chris Snook wrote:
> On Tue, Dec 1, 2015 at 12:35 PM David Miller <[email protected]> wrote:
>
> > From: Michal Hocko <[email protected]>
> > Date: Mon, 30 Nov 2015 14:21:29 +0100
> >
> > > On Sat 28-11-15 15:51:13, Pavel Machek wrote:
> > >>
> > >> atl1c driver is doing order-4 allocation with GFP_ATOMIC
> > >> priority. That often breaks networking after resume. Switch to
> > >> GFP_KERNEL. Still not ideal, but should be significantly better.
> > >
> > > It is not clear why GFP_KERNEL can replace GFP_ATOMIC safely neither
> > > from the changelog nor from the patch context.
> >
> > Earlier in the function we do a GFP_KERNEL kmalloc so:
> >
> > ¯\_(ツ)_/¯
> >
> > It should be fine.
> >
>
> AFAICT, the people who benefit from GFP_ATOMIC are the people running all
> their storage over NFS/iSCSI who are suspending their machines while
> they're so busy they don't have any clean order 4 pagecache to drop, and
> want the machine to panic rather than hang. The people who benefit
>from

iSCSI on machine that suspends... is that a joke or complicated way of
saying that noone benefits? And code uses... both GFP_ATOMIC and
GFP_KERNEL so that both sides are equally unhappy? :-).

Do you want to test the patch, update the subject line and send it to
Davem, or should I do it?

Do you see a way to split the allocation? Not even order 4 GFP_KERNEL
allocation is a nice thing to do...

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2015-12-03 08:16:52

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] Improve Atheros ethernet driver not to do order 4 GFP_ATOMIC allocation

On Wed 02-12-15 22:43:31, Chris Snook wrote:
> On Tue, Dec 1, 2015 at 12:35 PM David Miller <[email protected]> wrote:
>
> > From: Michal Hocko <[email protected]>
> > Date: Mon, 30 Nov 2015 14:21:29 +0100
> >
> > > On Sat 28-11-15 15:51:13, Pavel Machek wrote:
> > >>
> > >> atl1c driver is doing order-4 allocation with GFP_ATOMIC
> > >> priority. That often breaks networking after resume. Switch to
> > >> GFP_KERNEL. Still not ideal, but should be significantly better.
> > >
> > > It is not clear why GFP_KERNEL can replace GFP_ATOMIC safely neither
> > > from the changelog nor from the patch context.
> >
> > Earlier in the function we do a GFP_KERNEL kmalloc so:
> >
> > ¯\_(ツ)_/¯
> >
> > It should be fine.
> >
>
> AFAICT, the people who benefit from GFP_ATOMIC are the people running all
> their storage over NFS/iSCSI who are suspending their machines while
> they're so busy they don't have any clean order 4 pagecache to drop, and
> want the machine to panic rather than hang.

Why would GFP_KERNEL order-4 allocation hang? It will fail if there are
not >=4 order pages available even after reclaim and/or compaction.
GFP_ATOMIC allocations should be used only when an access to memory
reserves is really required. If the allocation just doesn't want to
invoke direct reclaim then GFP_NOWAIT is a more suitable alternative.

--
Michal Hocko
SUSE Labs

2015-12-03 15:59:12

by Pavel Machek

[permalink] [raw]
Subject: [PATCH net] atl1c: Improve driver not to do order 4 GFP_ATOMIC allocation


atl1c driver is doing order-4 allocation with GFP_ATOMIC
priority. That often breaks networking after resume. Switch to
GFP_KERNEL. Still not ideal, but should be significantly better.

atl1c_setup_ring_resources() is called from .open() function, and
already uses GFP_KERNEL, so this change is safe.

Signed-off-by: Pavel Machek <[email protected]>

diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
index 2795d6d..afb71e0 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -1016,10 +1016,10 @@ static int atl1c_setup_ring_resources(struct atl1c_adapter *adapter)
sizeof(struct atl1c_recv_ret_status) * rx_desc_count +
8 * 4;

- ring_header->desc = pci_alloc_consistent(pdev, ring_header->size,
- &ring_header->dma);
+ ring_header->desc = dma_alloc_coherent(&pdev->dev, ring_header->size,
+ &ring_header->dma, GFP_KERNEL);
if (unlikely(!ring_header->desc)) {
- dev_err(&pdev->dev, "pci_alloc_consistend failed\n");
+ dev_err(&pdev->dev, "could not get memory for DMA buffer\n");
goto err_nomem;
}
memset(ring_header->desc, 0, ring_header->size);


--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2015-12-03 15:59:29

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] Improve Atheros ethernet driver not to do order 4 GFP_ATOMIC allocation

On Tue 2015-12-01 15:36:28, David Miller wrote:
> From: Eric Dumazet <[email protected]>
> Date: Mon, 30 Nov 2015 09:58:23 -0800
>
> > On Sat, 2015-11-28 at 15:51 +0100, Pavel Machek wrote:
> >> atl1c driver is doing order-4 allocation with GFP_ATOMIC
> >> priority. That often breaks networking after resume. Switch to
> >> GFP_KERNEL. Still not ideal, but should be significantly better.
> >>
> >> Signed-off-by: Pavel Machek <[email protected]>
> >>
> >> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> >> index 2795d6d..afb71e0 100644
> >> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> >> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> >> @@ -1016,10 +1016,10 @@ static int atl1c_setup_ring_resources(struct atl1c_adapter *adapter)
> >> sizeof(struct atl1c_recv_ret_status) * rx_desc_count +
> >> 8 * 4;
> >>
> >> - ring_header->desc = pci_alloc_consistent(pdev, ring_header->size,
> >> - &ring_header->dma);
> >> + ring_header->desc = dma_alloc_coherent(&pdev->dev, ring_header->size,
> >> + &ring_header->dma, GFP_KERNEL);
> >> if (unlikely(!ring_header->desc)) {
> >> - dev_err(&pdev->dev, "pci_alloc_consistend failed\n");
> >> + dev_err(&pdev->dev, "could not get memmory for DMA buffer\n");
> >> goto err_nomem;
> >> }
> >> memset(ring_header->desc, 0, ring_header->size);
> >>
> >
> > It seems there is a missed opportunity to get rid of the memset() here,
> > by adding __GFP_ZERO to the dma_alloc_coherent() GFP_KERNEL mask,
> > or simply using dma_zalloc_coherent()
>
> Also, the Subject line needs to be adjusted. The proper format for
> the Subject line is:
>
> [PATCH $TREE] $subsystem: $description.
>
> Where "$TREE" is either 'net' or 'net-next', $subsystem is the lowercase
> name of the driver (here 'atl1c') and then a colon, and then a space, and
> then the single-line description.

Done, thanks.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2015-12-03 16:14:03

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH net] atl1c: Improve driver not to do order 4 GFP_ATOMIC allocation

On Thu 03-12-15 16:59:05, Pavel Machek wrote:
>
> atl1c driver is doing order-4 allocation with GFP_ATOMIC
> priority. That often breaks networking after resume. Switch to
> GFP_KERNEL. Still not ideal, but should be significantly better.
>
> atl1c_setup_ring_resources() is called from .open() function, and
> already uses GFP_KERNEL, so this change is safe.

Thanks for updating the changelog

> Signed-off-by: Pavel Machek <[email protected]>

Acked-by: Michal Hocko <[email protected]>

>
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index 2795d6d..afb71e0 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> @@ -1016,10 +1016,10 @@ static int atl1c_setup_ring_resources(struct atl1c_adapter *adapter)
> sizeof(struct atl1c_recv_ret_status) * rx_desc_count +
> 8 * 4;
>
> - ring_header->desc = pci_alloc_consistent(pdev, ring_header->size,
> - &ring_header->dma);
> + ring_header->desc = dma_alloc_coherent(&pdev->dev, ring_header->size,
> + &ring_header->dma, GFP_KERNEL);
> if (unlikely(!ring_header->desc)) {
> - dev_err(&pdev->dev, "pci_alloc_consistend failed\n");
> + dev_err(&pdev->dev, "could not get memory for DMA buffer\n");
> goto err_nomem;
> }
> memset(ring_header->desc, 0, ring_header->size);
>
>
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
Michal Hocko
SUSE Labs

2015-12-03 17:17:32

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH net] atl1c: Improve driver not to do order 4 GFP_ATOMIC allocation

On Thu, 2015-12-03 at 16:59 +0100, Pavel Machek wrote:
> atl1c driver is doing order-4 allocation with GFP_ATOMIC
> priority. That often breaks networking after resume. Switch to
> GFP_KERNEL. Still not ideal, but should be significantly better.
>
> atl1c_setup_ring_resources() is called from .open() function, and
> already uses GFP_KERNEL, so this change is safe.
>
> Signed-off-by: Pavel Machek <[email protected]>
>
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index 2795d6d..afb71e0 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> @@ -1016,10 +1016,10 @@ static int atl1c_setup_ring_resources(struct atl1c_adapter *adapter)
> sizeof(struct atl1c_recv_ret_status) * rx_desc_count +
> 8 * 4;
>
> - ring_header->desc = pci_alloc_consistent(pdev, ring_header->size,
> - &ring_header->dma);
> + ring_header->desc = dma_alloc_coherent(&pdev->dev, ring_header->size,
> + &ring_header->dma, GFP_KERNEL);
> if (unlikely(!ring_header->desc)) {
> - dev_err(&pdev->dev, "pci_alloc_consistend failed\n");
> + dev_err(&pdev->dev, "could not get memory for DMA buffer\n");
> goto err_nomem;
> }
> memset(ring_header->desc, 0, ring_header->size);
>
>

So this memset() will really require a different patch to get removed ?

Sigh, not sure why I review patches.


2015-12-03 17:32:53

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net] atl1c: Improve driver not to do order 4 GFP_ATOMIC allocation

From: Eric Dumazet <[email protected]>
Date: Thu, 03 Dec 2015 09:17:28 -0800

> On Thu, 2015-12-03 at 16:59 +0100, Pavel Machek wrote:
>> atl1c driver is doing order-4 allocation with GFP_ATOMIC
>> priority. That often breaks networking after resume. Switch to
>> GFP_KERNEL. Still not ideal, but should be significantly better.
>>
>> atl1c_setup_ring_resources() is called from .open() function, and
>> already uses GFP_KERNEL, so this change is safe.
>>
>> Signed-off-by: Pavel Machek <[email protected]>
>>
>> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
>> index 2795d6d..afb71e0 100644
>> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
>> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
>> @@ -1016,10 +1016,10 @@ static int atl1c_setup_ring_resources(struct atl1c_adapter *adapter)
>> sizeof(struct atl1c_recv_ret_status) * rx_desc_count +
>> 8 * 4;
>>
>> - ring_header->desc = pci_alloc_consistent(pdev, ring_header->size,
>> - &ring_header->dma);
>> + ring_header->desc = dma_alloc_coherent(&pdev->dev, ring_header->size,
>> + &ring_header->dma, GFP_KERNEL);
>> if (unlikely(!ring_header->desc)) {
>> - dev_err(&pdev->dev, "pci_alloc_consistend failed\n");
>> + dev_err(&pdev->dev, "could not get memory for DMA buffer\n");
>> goto err_nomem;
>> }
>> memset(ring_header->desc, 0, ring_header->size);
>>
>>
>
> So this memset() will really require a different patch to get removed ?
>
> Sigh, not sure why I review patches.

Agreed, please use dma_zalloc_coherent() and kill that memset().

Thanks.

2015-12-04 08:11:32

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH net] atl1c: Improve driver not to do order 4 GFP_ATOMIC allocation

> >> if (unlikely(!ring_header->desc)) {
> >> - dev_err(&pdev->dev, "pci_alloc_consistend failed\n");
> >> + dev_err(&pdev->dev, "could not get memory for DMA buffer\n");
> >> goto err_nomem;
> >> }
> >> memset(ring_header->desc, 0, ring_header->size);
> >>
> >>
> >
> > So this memset() will really require a different patch to get removed ?
> >
> > Sigh, not sure why I review patches.
>
> Agreed, please use dma_zalloc_coherent() and kill that memset().

Ok, updated. I'll also add cc: stable, because it makes notebooks with
affected chipset unusable.

Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2015-12-04 08:50:05

by Pavel Machek

[permalink] [raw]
Subject: [PATCH net] atl1c: Improve driver not to do order 4 GFP_ATOMIC allocation

atl1c driver is doing order-4 allocation with GFP_ATOMIC
priority. That often breaks networking after resume. Switch to
GFP_KERNEL. Still not ideal, but should be significantly better.

atl1c_setup_ring_resources() is called from .open() function, and
already uses GFP_KERNEL, so this change is safe.

Signed-off-by: Pavel Machek <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: stable <[email protected]>

diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
index 2795d6d..8b5988e 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -1016,13 +1016,12 @@ static int atl1c_setup_ring_resources(struct atl1c_adapter *adapter)
sizeof(struct atl1c_recv_ret_status) * rx_desc_count +
8 * 4;

- ring_header->desc = pci_alloc_consistent(pdev, ring_header->size,
- &ring_header->dma);
+ ring_header->desc = dma_zalloc_coherent(&pdev->dev, ring_header->size,
+ &ring_header->dma, GFP_KERNEL);
if (unlikely(!ring_header->desc)) {
- dev_err(&pdev->dev, "pci_alloc_consistend failed\n");
+ dev_err(&pdev->dev, "could not get memory for DMA buffer\n");
goto err_nomem;
}
- memset(ring_header->desc, 0, ring_header->size);
/* init TPD ring */

tpd_ring[0].dma = roundup(ring_header->dma, 8);

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2015-12-04 16:21:46

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net] atl1c: Improve driver not to do order 4 GFP_ATOMIC allocation

From: Pavel Machek <[email protected]>
Date: Fri, 4 Dec 2015 09:11:27 +0100

>> >> if (unlikely(!ring_header->desc)) {
>> >> - dev_err(&pdev->dev, "pci_alloc_consistend failed\n");
>> >> + dev_err(&pdev->dev, "could not get memory for DMA buffer\n");
>> >> goto err_nomem;
>> >> }
>> >> memset(ring_header->desc, 0, ring_header->size);
>> >>
>> >>
>> >
>> > So this memset() will really require a different patch to get removed ?
>> >
>> > Sigh, not sure why I review patches.
>>
>> Agreed, please use dma_zalloc_coherent() and kill that memset().
>
> Ok, updated. I'll also add cc: stable, because it makes notebooks with
> affected chipset unusable.

Networking patches do not use CC: stable, instead you simply ask me
to queue it up and then I batch submit networking fixes to -stable
periodically myself.

2015-12-04 21:30:33

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH net] atl1c: Improve driver not to do order 4 GFP_ATOMIC allocation

On Fri 2015-12-04 11:21:40, David Miller wrote:
> From: Pavel Machek <[email protected]>
> Date: Fri, 4 Dec 2015 09:11:27 +0100
>
> >> >> if (unlikely(!ring_header->desc)) {
> >> >> - dev_err(&pdev->dev, "pci_alloc_consistend failed\n");
> >> >> + dev_err(&pdev->dev, "could not get memory for DMA buffer\n");
> >> >> goto err_nomem;
> >> >> }
> >> >> memset(ring_header->desc, 0, ring_header->size);
> >> >>
> >> >>
> >> >
> >> > So this memset() will really require a different patch to get removed ?
> >> >
> >> > Sigh, not sure why I review patches.
> >>
> >> Agreed, please use dma_zalloc_coherent() and kill that memset().
> >
> > Ok, updated. I'll also add cc: stable, because it makes notebooks with
> > affected chipset unusable.
>
> Networking patches do not use CC: stable, instead you simply ask me
> to queue it up and then I batch submit networking fixes to -stable
> periodically myself.

Ok, can you take the patch and ignore the Cc, or should I do one more
iteration?

Thanks,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2015-12-04 22:01:30

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net] atl1c: Improve driver not to do order 4 GFP_ATOMIC allocation

From: Pavel Machek <[email protected]>
Date: Fri, 4 Dec 2015 22:30:27 +0100

> On Fri 2015-12-04 11:21:40, David Miller wrote:
>> From: Pavel Machek <[email protected]>
>> Date: Fri, 4 Dec 2015 09:11:27 +0100
>>
>> >> >> if (unlikely(!ring_header->desc)) {
>> >> >> - dev_err(&pdev->dev, "pci_alloc_consistend failed\n");
>> >> >> + dev_err(&pdev->dev, "could not get memory for DMA buffer\n");
>> >> >> goto err_nomem;
>> >> >> }
>> >> >> memset(ring_header->desc, 0, ring_header->size);
>> >> >>
>> >> >>
>> >> >
>> >> > So this memset() will really require a different patch to get removed ?
>> >> >
>> >> > Sigh, not sure why I review patches.
>> >>
>> >> Agreed, please use dma_zalloc_coherent() and kill that memset().
>> >
>> > Ok, updated. I'll also add cc: stable, because it makes notebooks with
>> > affected chipset unusable.
>>
>> Networking patches do not use CC: stable, instead you simply ask me
>> to queue it up and then I batch submit networking fixes to -stable
>> periodically myself.
>
> Ok, can you take the patch and ignore the Cc, or should I do one more
> iteration?

I took care of it.