Subject: Re: <IRQ> [<ffffffff810c9204>] ? __alloc_pages_nodemask+0x622/0x696

Some more things from the syslog that may be of relevance. System went
through multiple suspend and resume cycles.

Before the failure we have a wlan authentication and the ATH driver throws a fit.

[ 7339.673270] ll header: ff:ff:ff:ff:ff:ff:00:90:fb:1c:4f:5c:08:00
[ 7340.076865] FIREWALL:INPUT IN=wlan0 OUT= MAC=00:19:e3:06:2c:16:00:90:fb:1c:4f:5c:08:00 SRC=192.168.5.1
DST=192.168.5.11 LEN=48 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=251 SEQ=0
[ 7416.743752] cfg80211: Calling CRDA to update world regulatory domain
[ 7419.292521] wlan0: authenticate with 00:21:55:b4:20:20 (try 1)
[ 7419.312690] wlan0: authenticated
[ 7419.318325] wlan0: associate with 00:21:55:b4:20:20 (try 1)
[ 7419.322764] wlan0: RX AssocResp from 00:21:55:b4:20:20 (capab=0x401 status=0 aid=218)
[ 7419.322770] wlan0: associated
[ 7427.161366] ath: Failed to stop TX DMA in 100 msec after killing last frame
[ 7427.161432] ath: Failed to stop TX DMA!
[ 7434.492398] ath: Failed to stop TX DMA in 100 msec after killing last frame
[ 7434.492462] ath: Failed to stop TX DMA!
[ 7603.339047]
=============================================================================
[ 7603.339053] BUG radix_tree_node: Padding overwritten. 0xffff88000008fe00-0xffff88000008fe32
[ 7603.339055]
-----------------------------------------------------------------------------
[ 7603.339056]


Could the ath driver corrupt memory if the TX DMA cannot be stopped?

Maybe fix the driver first?




2011-03-15 22:13:15

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: <IRQ> [<ffffffff810c9204>] ? __alloc_pages_nodemask+0x622/0x696

On Tue, Mar 15, 2011 at 02:28:52PM -0700, Justin P. Mattock wrote:
> On 03/15/2011 01:59 PM, Luis R. Rodriguez wrote:
> > On Wed, Mar 09, 2011 at 04:05:02PM -0800, Bob Copeland wrote:
> >> On Wed, Mar 09, 2011 at 03:54:24PM -0600, Christoph Lameter wrote:
> >>> [ 7419.322770] wlan0: associated
> >>> [ 7427.161366] ath: Failed to stop TX DMA in 100 msec after killing last frame
> >>> [ 7427.161432] ath: Failed to stop TX DMA!
> >>> [ 7434.492398] ath: Failed to stop TX DMA in 100 msec after killing last frame
> >>> [ 7434.492462] ath: Failed to stop TX DMA!
> >>> [ 7603.339047]
> >>> =============================================================================
> >>> [ 7603.339053] BUG radix_tree_node: Padding overwritten. 0xffff88000008fe00-0xffff88000008fe32
> >>
> >>> Could the ath driver corrupt memory if the TX DMA cannot be stopped?
> >
> > Yes, this is why we have had a series of patches to address this upstream,
> > the latest one was for RX DMA failing to stop but Felix found a fix for that.
> > Apart from that I am not aware of more issues. Can you please try with
> > wireless-testing.git ?
> >
> > Luis
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> sure..
>
> as for seeing this issue. nothing really now on the current. only
> message I see is:
>
> [ 69.551295] ath: Failed to stop TX DMA in 100 msec after killing last
> frame
> [ 69.551336] ath: Failed to stop TX DMA!

Can you easily reproduce?

Luis

2011-03-15 23:56:54

by Justin P. Mattock

[permalink] [raw]
Subject: Re: <IRQ> [<ffffffff810c9204>] ? __alloc_pages_nodemask+0x622/0x696

On 03/15/2011 03:13 PM, Luis R. Rodriguez wrote:
> On Tue, Mar 15, 2011 at 02:28:52PM -0700, Justin P. Mattock wrote:
>> On 03/15/2011 01:59 PM, Luis R. Rodriguez wrote:
>>> On Wed, Mar 09, 2011 at 04:05:02PM -0800, Bob Copeland wrote:
>>>> On Wed, Mar 09, 2011 at 03:54:24PM -0600, Christoph Lameter wrote:
>>>>> [ 7419.322770] wlan0: associated
>>>>> [ 7427.161366] ath: Failed to stop TX DMA in 100 msec after killing last frame
>>>>> [ 7427.161432] ath: Failed to stop TX DMA!
>>>>> [ 7434.492398] ath: Failed to stop TX DMA in 100 msec after killing last frame
>>>>> [ 7434.492462] ath: Failed to stop TX DMA!
>>>>> [ 7603.339047]
>>>>> =============================================================================
>>>>> [ 7603.339053] BUG radix_tree_node: Padding overwritten. 0xffff88000008fe00-0xffff88000008fe32
>>>>
>>>>> Could the ath driver corrupt memory if the TX DMA cannot be stopped?
>>>
>>> Yes, this is why we have had a series of patches to address this upstream,
>>> the latest one was for RX DMA failing to stop but Felix found a fix for that.
>>> Apart from that I am not aware of more issues. Can you please try with
>>> wireless-testing.git ?
>>>
>>> Luis
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at http://www.tux.org/lkml/
>>>
>> sure..
>>
>> as for seeing this issue. nothing really now on the current. only
>> message I see is:
>>
>> [ 69.551295] ath: Failed to stop TX DMA in 100 msec after killing last
>> frame
>> [ 69.551336] ath: Failed to stop TX DMA!
>
> Can you easily reproduce?
>
> Luis
>

at this point I have seen this a few times now. but as for reproducing I
can try and see.(cloning the wireless tree and loading it up)

Justin P. Mattock

2011-03-10 00:05:51

by Bob Copeland

[permalink] [raw]
Subject: Re: <IRQ> [<ffffffff810c9204>] ? __alloc_pages_nodemask+0x622/0x696

On Wed, Mar 09, 2011 at 03:54:24PM -0600, Christoph Lameter wrote:
> [ 7419.322770] wlan0: associated
> [ 7427.161366] ath: Failed to stop TX DMA in 100 msec after killing last frame
> [ 7427.161432] ath: Failed to stop TX DMA!
> [ 7434.492398] ath: Failed to stop TX DMA in 100 msec after killing last frame
> [ 7434.492462] ath: Failed to stop TX DMA!
> [ 7603.339047]
> =============================================================================
> [ 7603.339053] BUG radix_tree_node: Padding overwritten. 0xffff88000008fe00-0xffff88000008fe32

> Could the ath driver corrupt memory if the TX DMA cannot be stopped?

Looks like ath_draintxq() can do something bad (e.g. DMA engine uses
an old address from ath_tx_return_buffer()) if DMA is still active
in ath_drain_all_txq().

But this is ath9k which I'm not so familiar with, maybe Luis can say.

--
Bob Copeland %% http://www.bobcopeland.com


2011-03-15 20:59:49

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: <IRQ> [<ffffffff810c9204>] ? __alloc_pages_nodemask+0x622/0x696

On Wed, Mar 09, 2011 at 04:05:02PM -0800, Bob Copeland wrote:
> On Wed, Mar 09, 2011 at 03:54:24PM -0600, Christoph Lameter wrote:
> > [ 7419.322770] wlan0: associated
> > [ 7427.161366] ath: Failed to stop TX DMA in 100 msec after killing last frame
> > [ 7427.161432] ath: Failed to stop TX DMA!
> > [ 7434.492398] ath: Failed to stop TX DMA in 100 msec after killing last frame
> > [ 7434.492462] ath: Failed to stop TX DMA!
> > [ 7603.339047]
> > =============================================================================
> > [ 7603.339053] BUG radix_tree_node: Padding overwritten. 0xffff88000008fe00-0xffff88000008fe32
>
> > Could the ath driver corrupt memory if the TX DMA cannot be stopped?

Yes, this is why we have had a series of patches to address this upstream,
the latest one was for RX DMA failing to stop but Felix found a fix for that.
Apart from that I am not aware of more issues. Can you please try with
wireless-testing.git ?

Luis

2011-03-15 21:29:00

by Justin P. Mattock

[permalink] [raw]
Subject: Re: <IRQ> [<ffffffff810c9204>] ? __alloc_pages_nodemask+0x622/0x696

On 03/15/2011 01:59 PM, Luis R. Rodriguez wrote:
> On Wed, Mar 09, 2011 at 04:05:02PM -0800, Bob Copeland wrote:
>> On Wed, Mar 09, 2011 at 03:54:24PM -0600, Christoph Lameter wrote:
>>> [ 7419.322770] wlan0: associated
>>> [ 7427.161366] ath: Failed to stop TX DMA in 100 msec after killing last frame
>>> [ 7427.161432] ath: Failed to stop TX DMA!
>>> [ 7434.492398] ath: Failed to stop TX DMA in 100 msec after killing last frame
>>> [ 7434.492462] ath: Failed to stop TX DMA!
>>> [ 7603.339047]
>>> =============================================================================
>>> [ 7603.339053] BUG radix_tree_node: Padding overwritten. 0xffff88000008fe00-0xffff88000008fe32
>>
>>> Could the ath driver corrupt memory if the TX DMA cannot be stopped?
>
> Yes, this is why we have had a series of patches to address this upstream,
> the latest one was for RX DMA failing to stop but Felix found a fix for that.
> Apart from that I am not aware of more issues. Can you please try with
> wireless-testing.git ?
>
> Luis
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
sure..

as for seeing this issue. nothing really now on the current. only
message I see is:

[ 69.551295] ath: Failed to stop TX DMA in 100 msec after killing last
frame
[ 69.551336] ath: Failed to stop TX DMA!

which I saw earlier today.. keep in mind I am at the book store so the
internet is constantly connecting/disconnecting

Justin P. Mattock