2010-10-06 21:54:34

by Ben Hutchings

[permalink] [raw]
Subject: Re: Bug#599345: linux-image-2.6.32-5-amd64: iwlagn allocation failure

On Wed, 2010-10-06 at 21:19 +0200, Julien Cristau wrote:
> Package: linux-2.6
> Version: 2.6.32-23
> Severity: normal
>
> iwlagn decided 2 weeks of uptime was enough and it wanted me to reboot,
> because it was unhappy after resume.

You also told me that this machine had no swap enabled, which is an
unusual configuration and restricts the ability of the VM to defragment
memory. I would recommend adding a small swap file/partition. However:

> Bit of kern.log:
>
> Oct 6 21:05:02 radis kernel: [378665.326381] NetworkManager: page allocation failure. order:4, mode:0x40d0
> Oct 6 21:05:02 radis kernel: [378665.326393] Pid: 25555, comm: NetworkManager Not tainted 2.6.32-5-amd64 #1
[...]
> Oct 6 21:05:02 radis kernel: [378665.363630] iwlagn 0000:0c:00.0: kmalloc for auxiliary BD structures failed
[...]

This particular allocation is for an array which is not used for DMA and
therefore could be stored in non-contiguous pages allocated with
vmalloc(). But there may be some good reason not to do this.

Ben.

--
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part

2010-10-07 07:51:19

by Johannes Berg

[permalink] [raw]
Subject: Re: Bug#599345: linux-image-2.6.32-5-amd64: iwlagn allocation failure

On Wed, 2010-10-06 at 22:54 +0100, Ben Hutchings wrote:
> On Wed, 2010-10-06 at 21:19 +0200, Julien Cristau wrote:
> > Package: linux-2.6
> > Version: 2.6.32-23

> > Oct 6 21:05:02 radis kernel: [378665.326381] NetworkManager: page allocation failure. order:4, mode:0x40d0
> > Oct 6 21:05:02 radis kernel: [378665.326393] Pid: 25555, comm: NetworkManager Not tainted 2.6.32-5-amd64 #1
> [...]
> > Oct 6 21:05:02 radis kernel: [378665.363630] iwlagn 0000:0c:00.0: kmalloc for auxiliary BD structures failed
> [...]
>
> This particular allocation is for an array which is not used for DMA and
> therefore could be stored in non-contiguous pages allocated with
> vmalloc(). But there may be some good reason not to do this.

I have, however much more recently than that kernel, cleaned this up in
commit ff0d91c3eea6e25b47258349b455671f98f1b0cd -- this particular
allocation is now 2048 or 4096 bytes depending on the architecture (32
vs 64 bit pointers). If you want to backport this, there are two or
three more commits right before it that would probably be required.

johannes


2010-10-13 15:30:37

by Johannes Berg

[permalink] [raw]
Subject: Re: Bug#599345: linux-image-2.6.32-5-amd64: iwlagn allocation failure

On Wed, 2010-10-13 at 17:17 +0200, Julien Cristau wrote:

> Getting lots of those in dmesg:
> iwlagn 0000:0c:00.0: Too many chunks: 2
>
> Doesn't seem to prevent the network from working though.

It'll at least leak lots of memory though. But I think the check there
is just wrong -- there are TFDs, and there are SKBs, and we need two
TFDs, but just one SKB.

johannes


2010-10-10 19:03:31

by Ben Hutchings

[permalink] [raw]
Subject: Re: Bug#599345: linux-image-2.6.32-5-amd64: iwlagn allocation failure

On Thu, 2010-10-07 at 09:51 +0200, Johannes Berg wrote:
> On Wed, 2010-10-06 at 22:54 +0100, Ben Hutchings wrote:
> > On Wed, 2010-10-06 at 21:19 +0200, Julien Cristau wrote:
> > > Package: linux-2.6
> > > Version: 2.6.32-23
>
> > > Oct 6 21:05:02 radis kernel: [378665.326381] NetworkManager: page allocation failure. order:4, mode:0x40d0
> > > Oct 6 21:05:02 radis kernel: [378665.326393] Pid: 25555, comm: NetworkManager Not tainted 2.6.32-5-amd64 #1
> > [...]
> > > Oct 6 21:05:02 radis kernel: [378665.363630] iwlagn 0000:0c:00.0: kmalloc for auxiliary BD structures failed
> > [...]
> >
> > This particular allocation is for an array which is not used for DMA and
> > therefore could be stored in non-contiguous pages allocated with
> > vmalloc(). But there may be some good reason not to do this.
>
> I have, however much more recently than that kernel, cleaned this up in
> commit ff0d91c3eea6e25b47258349b455671f98f1b0cd -- this particular
> allocation is now 2048 or 4096 bytes depending on the architecture (32
> vs 64 bit pointers). If you want to backport this, there are two or
> three more commits right before it that would probably be required.

It seems like we can get away with a much smaller change though.
Julien, could you test this patch?

Ben.

---
From: Ben Hutchings <[email protected]>
Date: Mon, 17 May 2010 02:37:34 -0700
Subject: [PATCH] iwlwifi: reduce memory allocation

Vaguely based on
commit ff0d91c3eea6e25b47258349b455671f98f1b0cd upstream,
from which the following description comes.

From: Johannes Berg <[email protected]>

Currently, the driver allocates up to 19 skb pointers
for each TFD, of which we have 256 per queue. This
means that for each TX queue, we allocate 19k/38k
(an order 4 or 5 allocation on 32/64 bit respectively)
just for each queue's "txb" array, which contains only
the SKB pointers.

However, due to the way we use these pointers only the
first one can ever be assigned. When the driver was
initially written, the idea was that it could be
passed multiple SKBs for each TFD and attach all
those to implement gather DMA. However, due to
constraints in the userspace API and lack of TCP/IP
level checksumming in the device, this is in fact not
possible. And even if it were, the SKBs would be
chained, and we wouldn't need to keep pointers to
each anyway.

Change this to only keep track of one SKB per TFD,
and thereby reduce memory consumption to just one
pointer per TFD, which is an order 0 allocation per
transmit queue.

Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/net/wireless/iwlwifi/iwl-agn.c | 2 +-
drivers/net/wireless/iwlwifi/iwl-dev.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/iwl-agn.c b/drivers/net/wireless/iwlwifi/iwl-agn.c
index c56d355..6323bd7 100644
--- a/drivers/net/wireless/iwlwifi/iwl-agn.c
+++ b/drivers/net/wireless/iwlwifi/iwl-agn.c
@@ -438,7 +438,7 @@ void iwl_hw_txq_free_tfd(struct iwl_priv *priv, struct iwl_tx_queue *txq)
/* Sanity check on number of chunks */
num_tbs = iwl_tfd_get_num_tbs(tfd);

- if (num_tbs >= IWL_NUM_OF_TBS) {
+ if (num_tbs >= 2) {
IWL_ERR(priv, "Too many chunks: %i\n", num_tbs);
/* @todo issue fatal error, it is quite serious situation */
return;
diff --git a/drivers/net/wireless/iwlwifi/iwl-dev.h b/drivers/net/wireless/iwlwifi/iwl-dev.h
index 8f98d72..48927f2 100644
--- a/drivers/net/wireless/iwlwifi/iwl-dev.h
+++ b/drivers/net/wireless/iwlwifi/iwl-dev.h
@@ -197,7 +197,7 @@ struct iwl_queue {

/* One for each TFD */
struct iwl_tx_info {
- struct sk_buff *skb[IWL_NUM_OF_TBS - 1];
+ struct sk_buff *skb[1];
};

/**
---

--
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part

2010-10-10 19:37:20

by Johannes Berg

[permalink] [raw]
Subject: Re: Bug#599345: linux-image-2.6.32-5-amd64: iwlagn allocation failure

On Sun, 2010-10-10 at 20:03 +0100, Ben Hutchings wrote:

> > I have, however much more recently than that kernel, cleaned this up in
> > commit ff0d91c3eea6e25b47258349b455671f98f1b0cd -- this particular
> > allocation is now 2048 or 4096 bytes depending on the architecture (32
> > vs 64 bit pointers). If you want to backport this, there are two or
> > three more commits right before it that would probably be required.
>
> It seems like we can get away with a much smaller change though.
> Julien, could you test this patch?

Yes, that patch seems alright -- we can't end up using more than one.

johannes


2010-10-13 15:25:22

by Julien Cristau

[permalink] [raw]
Subject: Re: Bug#599345: linux-image-2.6.32-5-amd64: iwlagn allocation failure

On Sun, Oct 10, 2010 at 20:03:13 +0100, Ben Hutchings wrote:

> On Thu, 2010-10-07 at 09:51 +0200, Johannes Berg wrote:
> > On Wed, 2010-10-06 at 22:54 +0100, Ben Hutchings wrote:
> > > On Wed, 2010-10-06 at 21:19 +0200, Julien Cristau wrote:
> > > > Package: linux-2.6
> > > > Version: 2.6.32-23
> >
> > > > Oct 6 21:05:02 radis kernel: [378665.326381] NetworkManager: page allocation failure. order:4, mode:0x40d0
> > > > Oct 6 21:05:02 radis kernel: [378665.326393] Pid: 25555, comm: NetworkManager Not tainted 2.6.32-5-amd64 #1
> > > [...]
> > > > Oct 6 21:05:02 radis kernel: [378665.363630] iwlagn 0000:0c:00.0: kmalloc for auxiliary BD structures failed
> > > [...]
> > >
> > > This particular allocation is for an array which is not used for DMA and
> > > therefore could be stored in non-contiguous pages allocated with
> > > vmalloc(). But there may be some good reason not to do this.
> >
> > I have, however much more recently than that kernel, cleaned this up in
> > commit ff0d91c3eea6e25b47258349b455671f98f1b0cd -- this particular
> > allocation is now 2048 or 4096 bytes depending on the architecture (32
> > vs 64 bit pointers). If you want to backport this, there are two or
> > three more commits right before it that would probably be required.
>
> It seems like we can get away with a much smaller change though.
> Julien, could you test this patch?
>
Getting lots of those in dmesg:
iwlagn 0000:0c:00.0: Too many chunks: 2

Doesn't seem to prevent the network from working though.

Cheers,
Julien


Attachments:
(No filename) (1.49 kB)
signature.asc (836.00 B)
Digital signature
Download all attachments

2010-12-22 04:12:16

by Ben Hutchings

[permalink] [raw]
Subject: Re: Bug#599345: linux-image-2.6.32-5-amd64: iwlagn allocation failure

On Wed, 2010-10-13 at 17:30 +0200, Johannes Berg wrote:
> On Wed, 2010-10-13 at 17:17 +0200, Julien Cristau wrote:
>
> > Getting lots of those in dmesg:
> > iwlagn 0000:0c:00.0: Too many chunks: 2
> >
> > Doesn't seem to prevent the network from working though.
>
> It'll at least leak lots of memory though. But I think the check there
> is just wrong -- there are TFDs, and there are SKBs, and we need two
> TFDs, but just one SKB.

Right. The old condition:
if (num_tbs >= IWL_NUM_OF_TBS) {
should have been:
if (num_tbs > IWL_NUM_OF_TBS) {
though in practice neither condition was possible.

In the minimal patch, the condition should be changed to:
if (num_tbs > 2) {

Ben.

--
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part