On Friday, November 25, 2011, Rafael J. Wysocki wrote:
> On Thursday, November 24, 2011, Linus Torvalds wrote:
> > Hey, since most of the US will be in a food-induced coma tomorrow, I
> > just *know* that doing a new release candidate is a good idea.
> >
> > One quarter arch updates, two quarters drivers, and one quarter random
> > changes. Shake vigorously and serve cold..
> >
> > And maybe the rest of the world can try to make up for the lack of any
> > expected US participation? Hmm?
>
> Well, unfortunately, this kernel is unusable on my Acer Ferrari One.
>
> First off, it hangs solid every time several seconds or at last a few
> minutes after boot. I haven't been able to collect any debug data from
> it yet, but one of the symptoms is black screen with (unmovable) mouse
> cursor (this only happens when X has been started, but the box hangs without
> X too).
>
> Second, the wireless is apparently unable to associate with the AP
> (that 3.1-rc10 works with correctly on the same box).
>
> Tomorrow I'll try to identify the offending commits.
Well, it took more time than I had hoped. :-(
Bisection turns up:
commit 2577c6e8f2320f1d2f09be122efef5b9118efee4
Author: Senthil Balasubramanian <[email protected]>
Date: Tue Sep 13 22:38:18 2011 +0530
ath9k_hw: Add support for AR946/8x chipsets.
This patch adds support for AR946/8x chipets.
Signed-off-by: Senthil Balasubramanian <[email protected]>
Signed-off-by: John W. Linville <[email protected]>
which I think is wrong for at lest two reasons. Not that I understand
what it actually does to the driver, but first, it does much more than the
changelog says and, second, it is practically impossible to revert
because of the number of commits on top depending on it. Quite frankly,
it is about to make it to my list of examples of how things should _not_ be
done in the kernel.
The commit immediately preceding it doesn't show any symptoms of failure, so
I'm quite convinced this one really introduced the problem for me.
The chip in the affected box is (according to "lspci -v"):
09:00.0 Network controller: Atheros Communications Inc. AR928X Wireless Network Adapter (PCI-Express) (rev 01)
Subsystem: Foxconn International, Inc. Device e01f
Flags: bus master, fast devsel, latency 0, IRQ 19
Memory at f0000000 (64-bit, non-prefetchable) [size=64K]
Capabilities: [40] Power Management version 2
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
Capabilities: [60] Express Legacy Endpoint, MSI 00
Capabilities: [90] MSI-X: Enable- Count=1 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
Kernel driver in use: ath9k
Thanks,
Rafael
From: Rafael J. Wysocki <[email protected]>
Revert a hunk in drivers/net/wireless/ath/ath9k/hw.c introduced by
commit 2577c6e8f2320f1d2f09be122efef5b9118efee4 (ath9k_hw: Add
support for AR946/8x chipsets) that caused a nasty regression to
appear on my Acer Ferrari One (the box locks up entirely at random
times after the wireless has been started without any way to get
debug information out of it).
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/net/wireless/ath/ath9k/hw.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Index: linux/drivers/net/wireless/ath/ath9k/hw.c
===================================================================
--- linux.orig/drivers/net/wireless/ath/ath9k/hw.c
+++ linux/drivers/net/wireless/ath/ath9k/hw.c
@@ -1827,7 +1827,8 @@ static void ath9k_set_power_sleep(struct
}
/* Clear Bit 14 of AR_WA after putting chip into Full Sleep mode. */
- REG_WRITE(ah, AR_WA, ah->WARegVal & ~AR_WA_D3_L1_DISABLE);
+ if (AR_SREV_9300_20_OR_LATER(ah))
+ REG_WRITE(ah, AR_WA, ah->WARegVal & ~AR_WA_D3_L1_DISABLE);
}
/*
On 2011-11-27 8:16 AM, Adrian Chadd wrote:
> Hm, considering the origina patch did this:
>
> /* Clear Bit 14 of AR_WA after putting chip into Full Sleep mode. */
> - if (AR_SREV_9300_20_OR_LATER(ah))
> - REG_WRITE(ah, AR_WA,
> - ah->WARegVal & ~AR_WA_D3_L1_DISABLE);
> + if (!AR_SREV_9480(ah))
> + REG_WRITE(ah, AR_WA, ah->WARegVal & ~AR_WA_D3_L1_DISABLE);
>
> .. something tells me that perhaps the correct statement is:
>
> if (AR_SREV_9300_20_OR_LATER(ah) && !AR_SREV_9480(ah))
> REG_WRITE() ...
>
I checked again, the !AR_SREV_9480(ah) was intentionally removed in a
later change, so using AR_SREV_9300_20_OR_LATER(ah) is correct.
- Felix
On 27 November 2011 05:32, Rafael J. Wysocki <[email protected]> wrote:
>> Tomorrow I'll try to identify the offending commits.
>
> Well, it took more time than I had hoped. :-(
>
> Bisection turns up:
>
> commit 2577c6e8f2320f1d2f09be122efef5b9118efee4
> Author: Senthil Balasubramanian <[email protected]>
> Date: ? Tue Sep 13 22:38:18 2011 +0530
>
> ? ?ath9k_hw: Add support for AR946/8x chipsets.
>
> ? ?This patch adds support for AR946/8x chipets.
Oh god. That's going to be fun to figure out :)
> which I think is wrong for at lest two reasons. ?Not that I understand
> what it actually does to the driver, but first, it does much more than the
> changelog says and, second, it is practically impossible to revert
> because of the number of commits on top depending on it. ?Quite frankly,
> it is about to make it to my list of examples of how things should _not_ be
> done in the kernel.
>
> The commit immediately preceding it doesn't show any symptoms of failure, so
> I'm quite convinced this one really introduced the problem for me.
>
> The chip in the affected box is (according to "lspci -v"):
Can you please find out what the actual PCI device id is? That way I
can help you (hopefully) isolate which bits of that change would cause
your regression.
That commit almost exclusively touches the AR93xx chipset support, so
I can't (easily) see how this would break support for the AR928x NICs.
Adrian
On 2011-11-27 4:32 AM, Rafael J. Wysocki wrote:
> On Friday, November 25, 2011, Rafael J. Wysocki wrote:
>> On Thursday, November 24, 2011, Linus Torvalds wrote:
>> > Hey, since most of the US will be in a food-induced coma tomorrow, I
>> > just *know* that doing a new release candidate is a good idea.
>> >
>> > One quarter arch updates, two quarters drivers, and one quarter random
>> > changes. Shake vigorously and serve cold..
>> >
>> > And maybe the rest of the world can try to make up for the lack of any
>> > expected US participation? Hmm?
>>
>> Well, unfortunately, this kernel is unusable on my Acer Ferrari One.
>>
>> First off, it hangs solid every time several seconds or at last a few
>> minutes after boot. I haven't been able to collect any debug data from
>> it yet, but one of the symptoms is black screen with (unmovable) mouse
>> cursor (this only happens when X has been started, but the box hangs without
>> X too).
>>
>> Second, the wireless is apparently unable to associate with the AP
>> (that 3.1-rc10 works with correctly on the same box).
>>
>> Tomorrow I'll try to identify the offending commits.
>
> Well, it took more time than I had hoped. :-(
>
> Bisection turns up:
>
> commit 2577c6e8f2320f1d2f09be122efef5b9118efee4
> Author: Senthil Balasubramanian <[email protected]>
> Date: Tue Sep 13 22:38:18 2011 +0530
>
> ath9k_hw: Add support for AR946/8x chipsets.
>
> This patch adds support for AR946/8x chipets.
>
> Signed-off-by: Senthil Balasubramanian <[email protected]>
> Signed-off-by: John W. Linville <[email protected]>
>
> which I think is wrong for at lest two reasons. Not that I understand
> what it actually does to the driver, but first, it does much more than the
> changelog says and, second, it is practically impossible to revert
> because of the number of commits on top depending on it. Quite frankly,
> it is about to make it to my list of examples of how things should _not_ be
> done in the kernel.
>
> The commit immediately preceding it doesn't show any symptoms of failure, so
> I'm quite convinced this one really introduced the problem for me.
>
> The chip in the affected box is (according to "lspci -v"):
>
> 09:00.0 Network controller: Atheros Communications Inc. AR928X Wireless Network Adapter (PCI-Express) (rev 01)
> Subsystem: Foxconn International, Inc. Device e01f
> Flags: bus master, fast devsel, latency 0, IRQ 19
> Memory at f0000000 (64-bit, non-prefetchable) [size=64K]
> Capabilities: [40] Power Management version 2
> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
> Capabilities: [60] Express Legacy Endpoint, MSI 00
> Capabilities: [90] MSI-X: Enable- Count=1 Masked-
> Capabilities: [100] Advanced Error Reporting
> Capabilities: [140] Virtual Channel
> Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
> Kernel driver in use: ath9k
>
> Thanks,
> Rafael
Looking at the diff, the only thing I can find that would break AR928x
is this (in the function ath9k_set_power_sleep in hw.c):
/* Clear Bit 14 of AR_WA after putting chip into Full Sleep mode. */
- if (AR_SREV_9300_20_OR_LATER(ah))
- REG_WRITE(ah, AR_WA,
- ah->WARegVal & ~AR_WA_D3_L1_DISABLE);
+ if (!AR_SREV_9480(ah))
+ REG_WRITE(ah, AR_WA, ah->WARegVal & ~AR_WA_D3_L1_DISABLE);
}
Please try changing the if line to:
if (AR_SREV_9300_20_OR_LATER(ah) && !AR_SREV_9480(ah))
Apparently later changes even removed this check. I'll take a look at the
history to figure out what's going on, this could probably explain other
regressions on pre-AR9380 chips as well.
- Felix
Hm, considering the origina patch did this:
/* Clear Bit 14 of AR_WA after putting chip into Full Sleep mode. */
- if (AR_SREV_9300_20_OR_LATER(ah))
- REG_WRITE(ah, AR_WA,
- ah->WARegVal & ~AR_WA_D3_L1_DISABLE);
+ if (!AR_SREV_9480(ah))
+ REG_WRITE(ah, AR_WA, ah->WARegVal & ~AR_WA_D3_L1_DISABLE);
.. something tells me that perhaps the correct statement is:
if (AR_SREV_9300_20_OR_LATER(ah) && !AR_SREV_9480(ah))
REG_WRITE() ...
?
Adrian
On 27 November 2011 14:25, Felix Fietkau <[email protected]> wrote:
>>
>> if (AR_SREV_9300_20_OR_LATER(ah) && !AR_SREV_9480(ah))
>> ? ? REG_WRITE() ...
>>
> I checked again, the !AR_SREV_9480(ah) was intentionally removed in a
> later change, so using AR_SREV_9300_20_OR_LATER(ah) is correct.
Cool. I'm glad we got to the bottom of this one.
Adrian