2011-02-24 18:20:27

by Tony Houghton

[permalink] [raw]
Subject: ath9k causes lockups since kernel 2.6.33

[Posted to linux-wireless and [email protected]. Sorry about the
repost, I got the bug number wrong before]

After upgrading my netbook's kernel from 2.6.32 in Debian squeeze I
found it could not reliably shut down the wireless connection eg when
suspending. It would almost always completely lock up. Ubuntu's 2.6.35
kernel is also affected, and so are certain versions pulled from git,
compiled with a config based on the Debian default.

While testing different kernels I found it would crash at different
times, usually before the screen turned off for suspending, but
sometimes it would crash on resuming and occasionally it locked up while
booting, but it's always a complete lock-up ie the keyboard is
completely responsive, including caps lock, the mouse won't move if the
display is still on, and the only way out is to hold down the power
button.

The adaptor is an AR9285:

84: udi = '/org/freedesktop/Hal/devices/pci_168c_2b'
pci.device_protocol = 0 (0x0) (int)
pci.vendor = 'Atheros Communications Inc.' (string)
info.vendor = 'Atheros Communications Inc.' (string)
pci.product = 'AR9285 Wireless Network Adapter
(PCI-Express)' (string) linux.sysfs_path =
'/sys/devices/pci0000:00/0000:00:15.0/0000:03:00.0' (strin g)
info.parent = '/org/freedesktop/Hal/devices/pci_10de_ac6' (string)
info.linux.driver = 'ath9k' (string)
pci.subsys_vendor = 'Hewlett-Packard Company' (string)
linux.hotplug_type = 2 (0x2) (int)
linux.subsystem = 'pci' (string)
info.subsystem = 'pci' (string)
info.product = 'AR9285 Wireless Network Adapter
(PCI-Express)' (string) info.udi =
'/org/freedesktop/Hal/devices/pci_168c_2b' (string)
pci.linux.sysfs_path =
'/sys/devices/pci0000:00/0000:00:15.0/0000:03:00.0' (string)
pci.product_id = 43 (0x2b) (int) pci.vendor_id = 5772 (0x168c)
(int) pci.subsys_product_id = 12352 (0x3040) (int)
pci.subsys_vendor_id = 4156 (0x103c) (int) pci.device_class = 2
(0x2) (int) pci.device_subclass = 128 (0x80) (int)

I haven't tried looking in logs because the crashes are so severe I
don't think they'd be able to record anything useful. But using git
bisect I think I have tracked down the change that started causing this
problem:

53bc7aa08b48e5cd745f986731cc7dc24eef2a9f is the first bad commit
commit 53bc7aa08b48e5cd745f986731cc7dc24eef2a9f
Author: Vivek Natarajan <[email protected]>
Date: Mon Apr 5 14:48:04 2010 +0530

ath9k: Add support for newer AR9285 chipsets.

This patch adds support for a modified newer version of AR9285
chipsets.

Signed-off-by: Vivek Natarajan <[email protected]>
Signed-off-by: John W. Linville <[email protected]>

:040000 040000 2ceb3a80ec957f3304308169c4ab9e5356622a95
14b6922350867c88d7ba6823408a9ce9aa15ddf5 M drivers

git bisect start
# good: [66f41d4c5c8a5deed66fdcc84509376c9a0bf9d8] Linux 2.6.34-rc6
git bisect good 66f41d4c5c8a5deed66fdcc84509376c9a0bf9d8
# bad: [9fe6206f400646a2322096b56c59891d530e8d51] Linux 2.6.35
git bisect bad 9fe6206f400646a2322096b56c59891d530e8d51
# bad: [c316ba3b518bc35ce5aef5421135220389f4eb98] Merge branch
'linux-next' of git://git.infradead.org/ubi-2.6 git bisect bad
c316ba3b518bc35ce5aef5421135220389f4eb98 # good:
[fb091be08d1acf184e8801dfdcace6e0cb19b1fe] Merge branch
'v4l_for_2.6.35' of
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6 git
bisect good fb091be08d1acf184e8801dfdcace6e0cb19b1fe # bad:
[b56f2d55c6c22b0c5774b3b22e336fb6cc5f4094] netfilter: use
rcu_dereference_protected() git bisect bad
b56f2d55c6c22b0c5774b3b22e336fb6cc5f4094 # bad:
[adfba3c7c026a6a5560d2a43fefc9b198cb74462] mac80211: use fixed channel
in ibss join when appropriate git bisect bad
adfba3c7c026a6a5560d2a43fefc9b198cb74462 # bad:
[1968cc78d91c79857089713bf3f3cceb5e9c63ae] ath5k: correct channel
setting for 2.5 mhz spacing git bisect bad
1968cc78d91c79857089713bf3f3cceb5e9c63ae # good:
[56b632e8cc7a13cece861d890deb2843116f9372] drivers/net: Remove local
#define IW_IOCTL, use IW_HANDLER git bisect good
56b632e8cc7a13cece861d890deb2843116f9372 # good:
[ecdf94b81237d272b1514b76f27a5d22782bcaa6] iwlwifi: remove
skb_linearize for rx frames git bisect good
ecdf94b81237d272b1514b76f27a5d22782bcaa6 # skip:
[c81494d548d0735f13c04dd2c336cde470d1a5ae] ath9k: rename symbols in
enum ath9k_internal_frame_type to avoid confusion git bisect skip
c81494d548d0735f13c04dd2c336cde470d1a5ae # skip:
[f9ea3eb44218b0e12a190f222400f8d56136915f] include/net/iw_handler.h:
Use SIOCIWFIRST not SIOCSIWCOMMIT in comment git bisect skip
f9ea3eb44218b0e12a190f222400f8d56136915f # skip:
[152d530d9edbb08424dc1b6561252597a7932c49] ath9k: remove ah->mask_reg,
it's never used properly git bisect skip
152d530d9edbb08424dc1b6561252597a7932c49 # skip:
[879999cec9489f8942ebce3ec1b5f23ef948dda7] ar9170usb: fix panic
triggered by undersized rxstream buffer git bisect skip
879999cec9489f8942ebce3ec1b5f23ef948dda7 # skip:
[b409894f9d6961bd5feffb86ba1d8dbbebfb5b72] ath: fix coding
style/readability in ath/ar9170 git bisect skip
b409894f9d6961bd5feffb86ba1d8dbbebfb5b72 # skip:
[9fd1ea428590cf6e35e5a7df32ff6bccfd371db2] wireless/ipw2x00: remove
trailing space in messages git bisect skip
9fd1ea428590cf6e35e5a7df32ff6bccfd371db2 # skip:
[3069168c82d65f88e4ac76eda09baff02adfd743] ath9k: move imask from sc to
ah git bisect skip 3069168c82d65f88e4ac76eda09baff02adfd743 # good:
[7590a550b88b8c3cb025f0a8ed58e279ad62e4c1] wl1251: use DRIVER_NAME
macro in wl1251_spi_driver git bisect good
7590a550b88b8c3cb025f0a8ed58e279ad62e4c1 # bad:
[0f2df9eac70423838a1f8d410fd3899ddd88317b] Merge branch 'master' of
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
into merge git bisect bad 0f2df9eac70423838a1f8d410fd3899ddd88317b #
bad: [2b43ae6daf26f29cec49fa3a3f18025355495500] mac80211: remove irq
disabling for sta lock git bisect bad
2b43ae6daf26f29cec49fa3a3f18025355495500 # bad:
[bde748a40d4d5a9915def6772e208848c105e616] ath9k_htc: Add support for
power save. git bisect bad bde748a40d4d5a9915def6772e208848c105e616 #
bad: [53bc7aa08b48e5cd745f986731cc7dc24eef2a9f] ath9k: Add support for
newer AR9285 chipsets. git bisect bad
53bc7aa08b48e5cd745f986731cc7dc24eef2a9f # good:
[d5cdfacb35ed886271d1ccfffbded98d3447da17] cfg80211: Add
local-state-change-only auth/deauth/disassoc git bisect good
d5cdfacb35ed886271d1ccfffbded98d3447da17


2011-02-24 20:02:11

by Jonathan Nieder

[permalink] [raw]
Subject: Re: ath9k causes lockups since kernel 2.6.35

(just cc-ing some people listed in MAINTAINERS)
Hi,

Tony Houghton wrote:

> With 2.6.37 I can not use suspend on my Compaq/HP 311c (Intel Atom
> N270/NVidia Ion LE). Originally the machine just kept locking up without
> even blanking the display when I tried to suspend (using the GNOME menu
> or by shutting the lid). I upgraded upower and gnome-power-manager etc
> to experimental and after that the machine suspended OK but could not
> resume. The backlight came on but the screen stayed blank and I could
> not get to a console or anything with Alt+Fn.
[...]
> I tried replacing network-manager with wicd but that crashed the system
> when it connected instead of when disconnected.
[...]
> While testing different kernels I found it would crash at different
> times, usually before the screen turned off for suspending, but
> sometimes it would crash on resuming and occasionally it locked up while
> booting, but it's always a complete lock-up ie the keyboard is
> completely responsive, including caps lock, the mouse won't move if the
> display is still on, and the only way out is to hold down the power
> button.
[...]
> I haven't tried looking in logs because the crashes are so severe I
> don't think they'd be able to record anything useful. But using git
> bisect I think I have tracked down the change that started causing this
> problem:
>
> 53bc7aa08b48e5cd745f986731cc7dc24eef2a9f is the first bad commit
> commit 53bc7aa08b48e5cd745f986731cc7dc24eef2a9f
> Author: Vivek Natarajan <[email protected]>
> Date: Mon Apr 5 14:48:04 2010 +0530
>
> ath9k: Add support for newer AR9285 chipsets.
>
> This patch adds support for a modified newer version of AR9285
> chipsets.
>
> Signed-off-by: Vivek Natarajan <[email protected]>
> Signed-off-by: John W. Linville <[email protected]>

The adaptor is an AR9285[1].

That commit is based against v2.6.33 and was merged in v2.6.35-rc1

$ git describe 53bc7aa08b48e5cd745f986731cc7dc24eef2a9f
v2.6.33-3523-g53bc7aa
$ git name-rev --tags 53bc7aa08b48e5cd745f986731cc7dc24eef2a9f
53bc7aa08b48e5cd745f986731cc7dc24eef2a9f tags/v2.6.35-rc1~473^2~167^2~346

Any ideas for tracking this down?

Thanks,
Jonathan

[1]
> 84: udi = '/org/freedesktop/Hal/devices/pci_168c_2b'
> pci.device_protocol = 0 (0x0) (int)
> pci.vendor = 'Atheros Communications Inc.' (string)
> info.vendor = 'Atheros Communications Inc.' (string)
> pci.product = 'AR9285 Wireless Network Adapter
> (PCI-Express)' (string) linux.sysfs_path =
> '/sys/devices/pci0000:00/0000:00:15.0/0000:03:00.0' (strin g)
> info.parent = '/org/freedesktop/Hal/devices/pci_10de_ac6' (string)
> info.linux.driver = 'ath9k' (string)
> pci.subsys_vendor = 'Hewlett-Packard Company' (string)
> linux.hotplug_type = 2 (0x2) (int)
> linux.subsystem = 'pci' (string)
> info.subsystem = 'pci' (string)
> info.product = 'AR9285 Wireless Network Adapter
> (PCI-Express)' (string) info.udi =
> '/org/freedesktop/Hal/devices/pci_168c_2b' (string)
> pci.linux.sysfs_path =
> '/sys/devices/pci0000:00/0000:00:15.0/0000:03:00.0' (string)
> pci.product_id = 43 (0x2b) (int) pci.vendor_id = 5772 (0x168c)
> (int) pci.subsys_product_id = 12352 (0x3040) (int)
> pci.subsys_vendor_id = 4156 (0x103c) (int) pci.device_class = 2
> (0x2) (int) pci.device_subclass = 128 (0x80) (int)

2011-02-25 14:47:17

by Tony Houghton

[permalink] [raw]
Subject: Re: [ath9k-devel] ath9k causes lockups since kernel 2.6.35

On Fri, 25 Feb 2011 13:21:32 +0530
Mohammed Shafi <[email protected]> wrote:

> > Tony Houghton wrote:
> >
> >> With 2.6.37 I can not use suspend on my Compaq/HP 311c (Intel Atom
> >> N270/NVidia Ion LE). Originally the machine just kept locking up without
> >> even blanking the display when I tried to suspend (using the GNOME menu
> >> or by shutting the lid).

[Snip]

The above was from my original message before I worked out that the
problem was in the wireless because disabling and reneabling wireless,
killing the ath9k module etc, would often produce similar crashes, but
suspend seems to do it more consistently.

> >> booting, but it's always a complete lock-up ie the keyboard is
> >> completely responsive, including caps lock, the mouse won't move if the

I meant *un*responsive of course!

> >> I haven't tried looking in logs because the crashes are so severe I
> >> don't think they'd be able to record anything useful.

If I enable debugging is there a way to get it to sync to disc after
every message? I think the developers are going to need at least that
information to be able to track this down.

> is this issue still reproducible ?

Yes :-(.

> Apart from this reporting I have not seen any other issues for AR9285.

Strange, isn't it. I can't be the only person trying to use a recent
version of Linux on one of these netbooks, but I couldn't find any
similar complaints on Google. Maybe mine has an obscure fault which is
only triggered by a feature supported in the newer kernels.

2011-02-25 14:41:26

by Tony Houghton

[permalink] [raw]
Subject: Re: ath9k causes lockups since kernel 2.6.35

On Fri, 25 Feb 2011 08:57:11 +0100
Sedat Dilek <[email protected]> wrote:

> Debian/experimental provides now 2.6.38-rc6 kernel packages, might be
> worth a test?

I just tried that and at first it looked as if it might be working,
because it successfully disabled and reenabled wireless (via
network-manager). But when I tried suspend it crashed on resuming. IIRC
I also tried the newest tagged version from git and that was a fail too.

FWIW this last crash was on battery power, but most of the time I've
been plugged in to the mains. I saw some commit messages about
power-saving mode but it didn't occur to me that battery/mains might
make a difference and it looks like it doesn't.

2011-02-25 08:44:23

by Mohammed Shafi

[permalink] [raw]
Subject: Re: [ath9k-devel] ath9k causes lockups since kernel 2.6.35

On Fri, Feb 25, 2011 at 1:21 PM, Mohammed Shafi <[email protected]> wrote:
> On Fri, Feb 25, 2011 at 1:32 AM, Jonathan Nieder <[email protected]> wrote:
>> (just cc-ing some people listed in MAINTAINERS)
>> Hi,
>>
>> Tony Houghton wrote:
>>
>>> With 2.6.37 I can not use suspend on my Compaq/HP 311c (Intel Atom
>>> N270/NVidia Ion LE). Originally the machine just kept locking up without
>>> even blanking the display when I tried to suspend (using the GNOME menu
>>> or by shutting the lid). I upgraded upower and gnome-power-manager etc
>>> to experimental and after that the machine suspended OK but could not
>>> resume. The backlight came on but the screen stayed blank and I could
>>> not get to a console or anything with Alt+Fn.
>> [...]
>>> I tried replacing network-manager with wicd but that crashed the system
>>> when it connected instead of when disconnected.
>> [...]
>>> While testing different kernels I found it would crash at different
>>> times, usually before the screen turned off for suspending, but
>>> sometimes it would crash on resuming and occasionally it locked up while
>>> booting, but it's always a complete lock-up ie the keyboard is
>>> completely responsive, including caps lock, the mouse won't move if the
>>> display is still on, and the only way out is to hold down the power
>>> button.
>> [...]
>>> I haven't tried looking in logs because the crashes are so severe I
>>> don't think they'd be able to record anything useful. But using git
>>> bisect I think I have tracked down the change that started causing this
>>> problem:
>>>
>>> 53bc7aa08b48e5cd745f986731cc7dc24eef2a9f is the first bad commit
>>> commit 53bc7aa08b48e5cd745f986731cc7dc24eef2a9f
>>> Author: Vivek Natarajan <[email protected]>
>>> Date: ? Mon Apr 5 14:48:04 2010 +0530
>>>
>>> ? ? ath9k: Add support for newer AR9285 chipsets.
>>>
>>> ? ? This patch adds support for a modified newer version of AR9285
>>> ? ? chipsets.
>>>
>>> ? ? Signed-off-by: Vivek Natarajan <[email protected]>
>>> ? ? Signed-off-by: John W. Linville <[email protected]>
>>
>> The adaptor is an AR9285[1].
>>
>> That commit is based against v2.6.33 and was merged in v2.6.35-rc1
>>
>> $ git describe 53bc7aa08b48e5cd745f986731cc7dc24eef2a9f
>> v2.6.33-3523-g53bc7aa
>> $ git name-rev --tags 53bc7aa08b48e5cd745f986731cc7dc24eef2a9f
>> 53bc7aa08b48e5cd745f986731cc7dc24eef2a9f tags/v2.6.35-rc1~473^2~167^2~346
>>
>> Any ideas for tracking this down?
>
> is this issue still reproducible ?
> Apart from this reporting I have not seen any other ?issues for AR9285.

sorry just now saw's Tonys message in linux wireless mailing list
>
>>
>> Thanks,
>> Jonathan
>>
>> [1]
>>> 84: udi = '/org/freedesktop/Hal/devices/pci_168c_2b'
>>> ? pci.device_protocol = 0 ?(0x0) ?(int)
>>> ? pci.vendor = 'Atheros Communications Inc.' ?(string)
>>> ? info.vendor = 'Atheros Communications Inc.' ?(string)
>>> ? pci.product = 'AR9285 Wireless Network Adapter
>>> (PCI-Express)' ?(string) linux.sysfs_path =
>>> '/sys/devices/pci0000:00/0000:00:15.0/0000:03:00.0' ?(strin g)
>>> ? info.parent = '/org/freedesktop/Hal/devices/pci_10de_ac6' ?(string)
>>> ? info.linux.driver = 'ath9k' ?(string)
>>> ? pci.subsys_vendor = 'Hewlett-Packard Company' ?(string)
>>> ? linux.hotplug_type = 2 ?(0x2) ?(int)
>>> ? linux.subsystem = 'pci' ?(string)
>>> ? info.subsystem = 'pci' ?(string)
>>> ? info.product = 'AR9285 Wireless Network Adapter
>>> (PCI-Express)' ?(string) info.udi =
>>> '/org/freedesktop/Hal/devices/pci_168c_2b' ?(string)
>>> pci.linux.sysfs_path =
>>> '/sys/devices/pci0000:00/0000:00:15.0/0000:03:00.0' ?(string)
>>> pci.product_id = 43 ?(0x2b) ?(int) pci.vendor_id = 5772 ?(0x168c)
>>> (int) pci.subsys_product_id = 12352 ?(0x3040) ?(int)
>>> pci.subsys_vendor_id = 4156 ?(0x103c) ?(int) pci.device_class = 2
>>> (0x2) ?(int) pci.device_subclass = 128 ?(0x80) ?(int)
>> _______________________________________________
>> ath9k-devel mailing list
>> [email protected]
>> https://lists.ath9k.org/mailman/listinfo/ath9k-devel
>>
>

2011-02-26 18:35:38

by Tony Houghton

[permalink] [raw]
Subject: Re: [ath9k-devel] ath9k causes lockups since kernel 2.6.35

On Fri, 25 Feb 2011 16:57:45 +0000
Tony Houghton <[email protected]> wrote:

> On Fri, 25 Feb 2011 21:37:25 +0530
> Mohammed Shafi <[email protected]> wrote:
>
> > On Fri, Feb 25, 2011 at 8:17 PM, Tony Houghton <[email protected]> wrote:
> > >
> > >> >> I haven't tried looking in logs because the crashes are so severe I
> > >> >> don't think they'd be able to record anything useful.
> >
> > Can you please get those messages with net console ?

No, I can't get it to work. I found a good guide at
<http://www.novell.com/communities/node/4753/netconsole-howto-send-kernel-boot-messages-over-ethernet>
and adapted it for Debian but absolutely no messages appear. I connected
it via Ethernet to another netbook which runs Ubuntu. On the Debian one
(the one with the problem I want to debug) I added to
/etc/network/interfaces:

auto eth0
iface eth0 inet static
address 10.0.0.20
netmask 255.255.255.0

and on Ubuntu:

auto eth0
iface eth0 inet static
address 10.0.0.10
netmask 255.255.255.0

On the Debian one I added netconsole to
/etc/modules and created /etc/modprobe.d/netconsole.conf containing:

options netconsole [email protected]/eth0,[email protected]/00:1E:68:DD:DB:40

On the Ubuntu one I ran:

nc -l -u 6666

I've double-checked the addresses etc and verified that the link is up
and the netconsole module is loaded, but no messages appear.

> > > If I enable debugging is there a way to get it to sync to disc after
> > > every message? I think the developers are going to need at least that
> > > information to be able to track this down.
> >
> > Yes sure, that could narrow down the issue.
>
> I'll try that first because it should be easier than setting up net
> console.

I tried to reconfigure rsyslog to sync on each message but it didn't log
anything useful :-(.

2011-02-25 16:07:26

by Mohammed Shafi

[permalink] [raw]
Subject: Re: [ath9k-devel] ath9k causes lockups since kernel 2.6.35

On Fri, Feb 25, 2011 at 8:17 PM, Tony Houghton <[email protected]> wrote:
> On Fri, 25 Feb 2011 13:21:32 +0530
> Mohammed Shafi <[email protected]> wrote:
>
>> > Tony Houghton wrote:
>> >
>> >> With 2.6.37 I can not use suspend on my Compaq/HP 311c (Intel Atom
>> >> N270/NVidia Ion LE). Originally the machine just kept locking up without
>> >> even blanking the display when I tried to suspend (using the GNOME menu
>> >> or by shutting the lid).
>
> [Snip]
>
> The above was from my original message before I worked out that the
> problem was in the wireless because disabling and reneabling wireless,
> killing the ath9k module etc, would often produce similar crashes, but
> suspend seems to do it more consistently.

Ok.

>
>> >> booting, but it's always a complete lock-up ie the keyboard is
>> >> completely responsive, including caps lock, the mouse won't move if the
>
> I meant *un*responsive of course!
>
>> >> I haven't tried looking in logs because the crashes are so severe I
>> >> don't think they'd be able to record anything useful.

Can you please get those messages with net console ?

>
> If I enable debugging is there a way to get it to sync to disc after
> every message? I think the developers are going to need at least that
> information to be able to track this down.

Yes sure, that could narrow down the issue.

>
>> is this issue still reproducible ?
>
> Yes :-(.

It would be highly helpful if this issue is reproducible in the latest
wireless testing or compat wireless. It will help us to fix it cleanly

>
>> Apart from this reporting I have not seen any other ?issues for AR9285.
>
> Strange, isn't it. I can't be the only person trying to use a recent
> version of Linux on one of these netbooks, but I couldn't find any
> similar complaints on Google. Maybe mine has an obscure fault which is
> only triggered by a feature supported in the newer kernels.
>

also need to rule out its a platform independent issue.

2011-02-28 05:36:57

by Mohammed Shafi

[permalink] [raw]
Subject: Re: [ath9k-devel] ath9k causes lockups since kernel 2.6.35

On Sun, Feb 27, 2011 at 12:05 AM, Tony Houghton <[email protected]> wrote:
> On Fri, 25 Feb 2011 16:57:45 +0000
> Tony Houghton <[email protected]> wrote:
>
>> On Fri, 25 Feb 2011 21:37:25 +0530
>> Mohammed Shafi <[email protected]> wrote:
>>
>> > On Fri, Feb 25, 2011 at 8:17 PM, Tony Houghton <[email protected]> wrote:
>> > >
>> > >> >> I haven't tried looking in logs because the crashes are so severe I
>> > >> >> don't think they'd be able to record anything useful.
>> >
>> > Can you please get those messages with net console ?
>
> No, I can't get it to work. I found a good guide at
> <http://www.novell.com/communities/node/4753/netconsole-howto-send-kernel-boot-messages-over-ethernet>
> and adapted it for Debian but absolutely no messages appear. I connected
> it via Ethernet to another netbook which runs Ubuntu. On the Debian one
> (the one with the problem I want to debug) I added to
> /etc/network/interfaces:
>
> auto eth0
> iface eth0 inet static
> ? ?address 10.0.0.20
> ? ?netmask 255.255.255.0
>
> and on Ubuntu:
>
> auto eth0
> iface eth0 inet static
> ? ?address 10.0.0.10
> ? ?netmask 255.255.255.0
>
> On the Debian one I added netconsole to
> /etc/modules and created /etc/modprobe.d/netconsole.conf containing:
>
> options netconsole [email protected]/eth0,[email protected]/00:1E:68:DD:DB:40
>
> On the Ubuntu one I ran:
>
> nc -l -u 6666
>
> I've double-checked the addresses etc and verified that the link is up
> and the netconsole module is loaded, but no messages appear.
>
>> > > If I enable debugging is there a way to get it to sync to disc after
>> > > every message? I think the developers are going to need at least that
>> > > information to be able to track this down.
>> >
>> > Yes sure, that could narrow down the issue.
>>
>> I'll try that first because it should be easier than setting up net
>> console.
>
> I tried to reconfigure rsyslog to sync on each message but it didn't log
> anything useful :-(.

I will try to reproduce here with my platform.

> _______________________________________________
> ath9k-devel mailing list
> [email protected]
> https://lists.ath9k.org/mailman/listinfo/ath9k-devel
>

2011-02-25 07:51:34

by Mohammed Shafi

[permalink] [raw]
Subject: Re: [ath9k-devel] ath9k causes lockups since kernel 2.6.35

On Fri, Feb 25, 2011 at 1:32 AM, Jonathan Nieder <[email protected]> wrote:
> (just cc-ing some people listed in MAINTAINERS)
> Hi,
>
> Tony Houghton wrote:
>
>> With 2.6.37 I can not use suspend on my Compaq/HP 311c (Intel Atom
>> N270/NVidia Ion LE). Originally the machine just kept locking up without
>> even blanking the display when I tried to suspend (using the GNOME menu
>> or by shutting the lid). I upgraded upower and gnome-power-manager etc
>> to experimental and after that the machine suspended OK but could not
>> resume. The backlight came on but the screen stayed blank and I could
>> not get to a console or anything with Alt+Fn.
> [...]
>> I tried replacing network-manager with wicd but that crashed the system
>> when it connected instead of when disconnected.
> [...]
>> While testing different kernels I found it would crash at different
>> times, usually before the screen turned off for suspending, but
>> sometimes it would crash on resuming and occasionally it locked up while
>> booting, but it's always a complete lock-up ie the keyboard is
>> completely responsive, including caps lock, the mouse won't move if the
>> display is still on, and the only way out is to hold down the power
>> button.
> [...]
>> I haven't tried looking in logs because the crashes are so severe I
>> don't think they'd be able to record anything useful. But using git
>> bisect I think I have tracked down the change that started causing this
>> problem:
>>
>> 53bc7aa08b48e5cd745f986731cc7dc24eef2a9f is the first bad commit
>> commit 53bc7aa08b48e5cd745f986731cc7dc24eef2a9f
>> Author: Vivek Natarajan <[email protected]>
>> Date: ? Mon Apr 5 14:48:04 2010 +0530
>>
>> ? ? ath9k: Add support for newer AR9285 chipsets.
>>
>> ? ? This patch adds support for a modified newer version of AR9285
>> ? ? chipsets.
>>
>> ? ? Signed-off-by: Vivek Natarajan <[email protected]>
>> ? ? Signed-off-by: John W. Linville <[email protected]>
>
> The adaptor is an AR9285[1].
>
> That commit is based against v2.6.33 and was merged in v2.6.35-rc1
>
> $ git describe 53bc7aa08b48e5cd745f986731cc7dc24eef2a9f
> v2.6.33-3523-g53bc7aa
> $ git name-rev --tags 53bc7aa08b48e5cd745f986731cc7dc24eef2a9f
> 53bc7aa08b48e5cd745f986731cc7dc24eef2a9f tags/v2.6.35-rc1~473^2~167^2~346
>
> Any ideas for tracking this down?

is this issue still reproducible ?
Apart from this reporting I have not seen any other issues for AR9285.

>
> Thanks,
> Jonathan
>
> [1]
>> 84: udi = '/org/freedesktop/Hal/devices/pci_168c_2b'
>> ? pci.device_protocol = 0 ?(0x0) ?(int)
>> ? pci.vendor = 'Atheros Communications Inc.' ?(string)
>> ? info.vendor = 'Atheros Communications Inc.' ?(string)
>> ? pci.product = 'AR9285 Wireless Network Adapter
>> (PCI-Express)' ?(string) linux.sysfs_path =
>> '/sys/devices/pci0000:00/0000:00:15.0/0000:03:00.0' ?(strin g)
>> ? info.parent = '/org/freedesktop/Hal/devices/pci_10de_ac6' ?(string)
>> ? info.linux.driver = 'ath9k' ?(string)
>> ? pci.subsys_vendor = 'Hewlett-Packard Company' ?(string)
>> ? linux.hotplug_type = 2 ?(0x2) ?(int)
>> ? linux.subsystem = 'pci' ?(string)
>> ? info.subsystem = 'pci' ?(string)
>> ? info.product = 'AR9285 Wireless Network Adapter
>> (PCI-Express)' ?(string) info.udi =
>> '/org/freedesktop/Hal/devices/pci_168c_2b' ?(string)
>> pci.linux.sysfs_path =
>> '/sys/devices/pci0000:00/0000:00:15.0/0000:03:00.0' ?(string)
>> pci.product_id = 43 ?(0x2b) ?(int) pci.vendor_id = 5772 ?(0x168c)
>> (int) pci.subsys_product_id = 12352 ?(0x3040) ?(int)
>> pci.subsys_vendor_id = 4156 ?(0x103c) ?(int) pci.device_class = 2
>> (0x2) ?(int) pci.device_subclass = 128 ?(0x80) ?(int)
> _______________________________________________
> ath9k-devel mailing list
> [email protected]
> https://lists.ath9k.org/mailman/listinfo/ath9k-devel
>

2011-02-25 16:57:49

by Tony Houghton

[permalink] [raw]
Subject: Re: [ath9k-devel] ath9k causes lockups since kernel 2.6.35

On Fri, 25 Feb 2011 21:37:25 +0530
Mohammed Shafi <[email protected]> wrote:

> On Fri, Feb 25, 2011 at 8:17 PM, Tony Houghton <[email protected]> wrote:
> >
> >> >> I haven't tried looking in logs because the crashes are so severe I
> >> >> don't think they'd be able to record anything useful.
>
> Can you please get those messages with net console ?
>
> > If I enable debugging is there a way to get it to sync to disc after
> > every message? I think the developers are going to need at least that
> > information to be able to track this down.
>
> Yes sure, that could narrow down the issue.

I'll try that first because it should be easier than setting up net
console.

> >> is this issue still reproducible ?
> >
> > Yes :-(.
>
> It would be highly helpful if this issue is reproducible in the latest
> wireless testing or compat wireless. It will help us to fix it cleanly

I think I'll need to rebuild anyway to enable ath9k debugging so I might
as well build whichever version would be most useful to developers. Is
there a particular branch I should pull from? If it's maintained
separately from the main kernel please give me instructions on how to
merge it or whatever, because I'm not very familiar with the kernel or
more advanced git usage.

2011-02-25 07:57:12

by Sedat Dilek

[permalink] [raw]
Subject: Re: ath9k causes lockups since kernel 2.6.35

Just FYI:

(As I know you are on Debian...)
Debian/experimental provides now 2.6.38-rc6 kernel packages, might be
worth a test?

- Sedat -

2011-03-03 07:05:50

by Vivek Natarajan

[permalink] [raw]
Subject: Re: [ath9k-devel] ath9k causes lockups since kernel 2.6.35

On Sun, Feb 27, 2011 at 12:05 AM, Tony Houghton <[email protected]> wrote:
> On Fri, 25 Feb 2011 16:57:45 +0000
> Tony Houghton <[email protected]> wrote:
>
>> On Fri, 25 Feb 2011 21:37:25 +0530
>> Mohammed Shafi <[email protected]> wrote:
>>
>> > On Fri, Feb 25, 2011 at 8:17 PM, Tony Houghton <[email protected]> wrote:
>> > >
>> > >> >> I haven't tried looking in logs because the crashes are so severe I
>> > >> >> don't think they'd be able to record anything useful.
>> >
>> > Can you please get those messages with net console ?
>
> No, I can't get it to work. I found a good guide at
> <http://www.novell.com/communities/node/4753/netconsole-howto-send-kernel-boot-messages-over-ethernet>
> and adapted it for Debian but absolutely no messages appear. I connected
> it via Ethernet to another netbook which runs Ubuntu. On the Debian one
> (the one with the problem I want to debug) I added to
> /etc/network/interfaces:
>
> auto eth0
> iface eth0 inet static
> ? ?address 10.0.0.20
> ? ?netmask 255.255.255.0
>
> and on Ubuntu:
>
> auto eth0
> iface eth0 inet static
> ? ?address 10.0.0.10
> ? ?netmask 255.255.255.0
>
> On the Debian one I added netconsole to
> /etc/modules and created /etc/modprobe.d/netconsole.conf containing:
>
> options netconsole [email protected]/eth0,[email protected]/00:1E:68:DD:DB:40

I used to give
dmesg -n 8
along with the above command and I get all the messages in the remote laptop.

>
> On the Ubuntu one I ran:
>
> nc -l -u 6666
>
> I've double-checked the addresses etc and verified that the link is up
> and the netconsole module is loaded, but no messages appear.
>

Can you please try this once again with the above debug level and see
if you get the crash log?

Vivek.

2011-03-15 17:16:07

by Tony Houghton

[permalink] [raw]
Subject: Re: Could ath9k and rt2800pci bugs be related?

On Tue, 15 Mar 2011 21:47:14 +0530
Mohammed Shafi <[email protected]> wrote:

> i dont know what to say, may be you can look at
> vim /drivers/net/wireless/ath/ath9k/pci.c
> and
> vim /net/mac80211/pm.c
> let me also see if i can get the problem reproduced

When I installed wireless-compat it didn't replace all the modules it
provides, only the ones I'd activated with driver-select ath9k. I didn't
notice whether mac80211 drivers got replaced too. If I want to alter
pm.c do I need to add anything to the driver-select line to make sure my
altered version gets installed?

2011-03-03 19:17:04

by Tony Houghton

[permalink] [raw]
Subject: Re: [ath9k-devel] ath9k causes lockups since kernel 2.6.35

On Thu, 3 Mar 2011 21:27:32 +0530
Mohammed Shafi <[email protected]> wrote:

> sudo modprobe ath9k debug=0xffffffff

That debug option was what I needed. I'm attaching what I captured with
netconsole leading up to a crash.

I also found that I couldn't reproduce the problem while running on
battery, but as soon as I connected the PSU it crashed the very next
time I disabled wireless. I hadn't noticed that PSU/battery made a
difference before, but it does seem to be an issue.


Attachments:
(No filename) (487.00 B)
netconsole.log (18.33 kB)
Download all attachments

2011-03-06 22:04:42

by Tony Houghton

[permalink] [raw]
Subject: Re: [ath9k-devel] ath9k causes lockups since kernel 2.6.35

On Fri, 4 Mar 2011 21:20:02 +0530
Mohammed Shafi <[email protected]> wrote:

> 2011/3/4 Tony Houghton <[email protected]>:
> >
> > I also found that I couldn't reproduce the problem while running on
> > battery, but as soon as I connected the PSU it crashed the very next
> > time I disabled wireless. I hadn't noticed that PSU/battery made a
> > difference before, but it does seem to be an issue.
> >
>
> Please see whether this is consistently reproducible.

No, today it crashed on battery power. Something else must have been
different too when I tried on battery before, because I restarted the
wireless several times without problem, and usually it crashes almost
every time.

> As a try can you please disable the network manager and use iw command
> to connect?

I tried wicd early on, that crashed too.

2011-03-03 15:58:11

by Mohammed Shafi

[permalink] [raw]
Subject: Re: [ath9k-devel] ath9k causes lockups since kernel 2.6.35

On Thu, Mar 3, 2011 at 8:01 PM, Tony Houghton <[email protected]> wrote:
> On Thu, 3 Mar 2011 10:51:15 +0530
> Mohammed Shafi <[email protected]> wrote:
>
>> On Wed, Mar 2, 2011 at 10:00 PM, Tony Houghton <[email protected]> wrote:
>> >
>> > Is this still not reproducible? I'd like to do more to help trace
>> > the problem but I'm a bit stuck. Is there something I can do to
>> > make sure all the debug/log messages from ath9k appear on the
>> > console?
>>
>> sudo dmesg -n 8?
>
> AFAICT that will just make sure the debug messages get logged, but not
> make them appear anywhere the other messages aren't appearing eg on a
> console or net console. I could try tail -f /dev/xconsole.
>
> The ath9k module has a "debug" parameter. It just says it's a mask and I
> don't know how to use it. Do I set bits to enable or disable levels?
> What value to log "everything"?
>

Please make sure of the following things
CONFIG_ATH_COMMON=m
CONFIG_ATH_DEBUG=y

CONFIG_ATH9K_HW=m
CONFIG_ATH9K_COMMON=m
CONFIG_ATH9K=m
CONFIG_ATH9K_DEBUGFS=y
CONFIG_ATH9K_RATE_CONTROL=y
CONFIG_ATH9K_HTC=m
CONFIG_ATH9K_HTC_DEBUGFS=y

sudo modprobe ath9k debug=0xffffffff

you will get lot of messages from the debug log.

2011-03-15 15:24:14

by Tony Houghton

[permalink] [raw]
Subject: Re: Could ath9k and rt2800pci bugs be related?

On Tue, 15 Mar 2011 19:05:27 +0530
Mohammed Shafi <[email protected]> wrote:

> No its nothing to do with kernel hacking, its just a wireless package.
> instead of compiling the whole kernel or wireless testing, we can
> install a wireless package within our linux distribution in few
> minutes. This package is called compat-wireless which includes latest
> fixes in wireless testing.
> more information in:
> http://wireless.kernel.org/en/users/Download

I tried that and I can confirm that it still crashes. I used the
compat-wireless-2011-03-14 snapshot with a Debian stock 2.6.37-2 kernel.

BTW one of the files had an error with TASK_INTERRUPTIBLE undefined. I
can't remember which file because I forgot to make a note of it before
make wlunload caused the expected crash. The apparent cause was that
/usr/include/linux/sched.h contains far less than sched.h in the kernel
source. I worked around it by defining the macro as 1 at the top of the
affected file.


2011-03-15 07:47:41

by Mohammed Shafi

[permalink] [raw]
Subject: Re: Could ath9k and rt2800pci bugs be related?

On Mon, Mar 14, 2011 at 10:18 PM, Tony Houghton <[email protected]> wrote:
> [I am no longer subscribed to the list so please Cc me]
>
> I'm the person who reported that my system kept freezing when shutting
> down my AR9285 wireless with kernel 2.6.35 or newer.
>
> I've just bought an Acer Aspire Revo 3700 and this had very similar
> symptoms with a different adapter. Luckily another customer had posted
> about it on the vendor's website and his fix works for me. The fix is to
> blacklist the rt2800pci module and the rt2860sta driver seems to work
> quite happily without it.
>
> It looks as if rt2800pci was introduced somewhere between 2.6.32 and
> 2.6.34 and I experienced the crashes with 2.6.35 (Mint 10/Ubuntu 10.10)
> and 2.6.37 (Debian unstable), but not with 2.6.32 (Debian squeeze) which
> doesn't have that module.

can you please check by disabling the supicious rt modules and see
whether this problems happens.
for quick check please try with the latest compat wireless.

>
> It's probably just coincidence, but I was struck by how similar the
> symptoms are and wondered whether these different drivers have anything
> in common?
>
> lspci output:
>
> 02:00.0 Network controller [0280]: RaLink RT3090 Wireless 802.11n 1T/1R PCIe [1814:3090]
> ? ? ? ?Subsystem: Lite-On Communications Inc Device [11ad:6622]
> ? ? ? ?Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> ? ? ? ?Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> ? ? ? ?Latency: 0, Cache Line Size: 32 bytes
> ? ? ? ?Interrupt: pin A routed to IRQ 18
> ? ? ? ?Region 0: Memory at febf0000 (32-bit, non-prefetchable) [size=64K]
> ? ? ? ?Capabilities: <access denied>
> ? ? ? ?Kernel driver in use: rt2860
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2011-03-15 13:35:28

by Mohammed Shafi

[permalink] [raw]
Subject: Re: Could ath9k and rt2800pci bugs be related?

On Tue, Mar 15, 2011 at 6:49 PM, Tony Houghton <[email protected]> wrote:
> On Tue, 15 Mar 2011 13:17:40 +0530
> Mohammed Shafi <[email protected]> wrote:
>
>> On Mon, Mar 14, 2011 at 10:18 PM, Tony Houghton <[email protected]> wrote:
>> >
>> > I've just bought an Acer Aspire Revo 3700 and this had very similar
>> > symptoms with a different adapter. Luckily another customer had
>> > posted about it on the vendor's website and his fix works for me.
>> > The fix is to blacklist the rt2800pci module and the rt2860sta
>> > driver seems to work quite happily without it.
>>
>> can you please check by disabling the supicious rt modules and see
>> whether this problems happens.
>
> Yes, I did blacklist rt2800pci and the system works correctly without
> the module loaded. Even the wireless connection still works.

Ok still there is also one or two guys reporting this locking issue,
so we need to be very sure.

>
>> for quick check please try with the latest compat wireless.
>
> How do I do that?
>
> I would also be willing to add extra debugging messages to ath9k to help
> track down the AR9285 problem. I'm a C programmer, but not a kernel
> hacker so I think I would need some advice about which functions to
> examine.
>

No its nothing to do with kernel hacking, its just a wireless package.
instead of compiling the whole kernel or wireless testing, we can
install a wireless package within our linux distribution in few
minutes. This package is called compat-wireless which includes latest
fixes in wireless testing.
more information in:
http://wireless.kernel.org/en/users/Download

1.now download the compat wireless package in
http://linuxwireless.org/download/compat-wireless-2.6/
2. cd compat-wireless-...
3. As you are suspicious about rt modules remove them in config.mk
4.now do ./scripts/driver-select ath9k
5.make
6.make install
7.make unload
8.sudo modprobe ath9k

thanks,
shafi

2011-03-15 16:17:15

by Mohammed Shafi

[permalink] [raw]
Subject: Re: Could ath9k and rt2800pci bugs be related?

On Tue, Mar 15, 2011 at 8:47 PM, Tony Houghton <[email protected]> wrote:
> On Tue, 15 Mar 2011 19:05:27 +0530
> Mohammed Shafi <[email protected]> wrote:
>
>> On Tue, Mar 15, 2011 at 6:49 PM, Tony Houghton <[email protected]> wrote:
>> > On Tue, 15 Mar 2011 13:17:40 +0530
>> > Mohammed Shafi <[email protected]> wrote:
>> >
>> >> On Mon, Mar 14, 2011 at 10:18 PM, Tony Houghton <[email protected]>
>> >> wrote:
>> >> >
>> >> > I've just bought an Acer Aspire Revo 3700 and this had very
>> >> > similar symptoms with a different adapter. Luckily another
>> >> > customer had posted about it on the vendor's website and his fix
>> >> > works for me. The fix is to blacklist the rt2800pci module and
>> >> > the rt2860sta driver seems to work quite happily without it.
>> >>
>> >> can you please check by disabling the supicious rt modules and see
>> >> whether this problems happens.
>> >
>> > Yes, I did blacklist rt2800pci and the system works correctly
>> > without the module loaded. Even the wireless connection still works.
>>
>> Ok still there is also one or two guys reporting this locking issue,
>> so we need to be very sure.
>
> I'm afraid you misunderstood me. The rt28* issue is on a different
> system with an Ralink adapter. AFAIK the actual rt28* modules have
> nothing to do with the Atheros problem. I only made a connection because
> the symptoms are so similar, and I thought it possible that the two
> different drivers might share some code, but it's more likely to be a
> coincidence.

Ok fine.

>
> Has anyone else reported the rt28* problem and/or are the developers
> aware of it? I know I'm not the only affected person because I read
> about it in a customer comment on the vendor's web site for the Acer
> R3700.

no i am not aware of it.

>
>> >> for quick check please try with the latest compat wireless.
>> >
>> > How do I do that?
>> >
>> > I would also be willing to add extra debugging messages to ath9k to
>> > help track down the AR9285 problem. I'm a C programmer, but not a
>> > kernel hacker so I think I would need some advice about which
>> > functions to examine.
>>
>> No its nothing to do with kernel hacking, its just a wireless package.
>
> For kernel hacking I meant I would like to experiment with the code
> myself. If you can't reproduce the problem I think it would be very
> helpful if I can make it print extra messages to narrow it down. If you
> could tell me something like, "The shutdown process should start at
> function X and end at function Y," I'll know better which code to
> experiment with. BTW, all types of wireless shutdown seem to be
> affected, whether I turn off the WAP, click Disconnect in network
> manager, press the rfkill switch, suspend, shutdown or rmmod ath9k.

i dont know what to say, may be you can look at
vim /drivers/net/wireless/ath/ath9k/pci.c
and
vim /net/mac80211/pm.c
let me also see if i can get the problem reproduced

>
> Should I use printk to print the debugging messages or something else?
> As there is the debug module parameter I guess the latter. Is there also
> some sort of sleep function which I can safely add after each debug
> message to make sure the message is made visible before the crash?

printk is sufficient and if you can enable all debugs by ath9k debug=0xfffffffff
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2011-03-15 13:19:00

by Tony Houghton

[permalink] [raw]
Subject: Re: Could ath9k and rt2800pci bugs be related?

On Tue, 15 Mar 2011 13:17:40 +0530
Mohammed Shafi <[email protected]> wrote:

> On Mon, Mar 14, 2011 at 10:18 PM, Tony Houghton <[email protected]> wrote:
> >
> > I've just bought an Acer Aspire Revo 3700 and this had very similar
> > symptoms with a different adapter. Luckily another customer had
> > posted about it on the vendor's website and his fix works for me.
> > The fix is to blacklist the rt2800pci module and the rt2860sta
> > driver seems to work quite happily without it.
>
> can you please check by disabling the supicious rt modules and see
> whether this problems happens.

Yes, I did blacklist rt2800pci and the system works correctly without
the module loaded. Even the wireless connection still works.

> for quick check please try with the latest compat wireless.

How do I do that?

I would also be willing to add extra debugging messages to ath9k to help
track down the AR9285 problem. I'm a C programmer, but not a kernel
hacker so I think I would need some advice about which functions to
examine.

2011-03-03 14:32:00

by Tony Houghton

[permalink] [raw]
Subject: Re: [ath9k-devel] ath9k causes lockups since kernel 2.6.35

On Thu, 3 Mar 2011 10:51:15 +0530
Mohammed Shafi <[email protected]> wrote:

> On Wed, Mar 2, 2011 at 10:00 PM, Tony Houghton <[email protected]> wrote:
> >
> > Is this still not reproducible? I'd like to do more to help trace
> > the problem but I'm a bit stuck. Is there something I can do to
> > make sure all the debug/log messages from ath9k appear on the
> > console?
>
> sudo dmesg -n 8?

AFAICT that will just make sure the debug messages get logged, but not
make them appear anywhere the other messages aren't appearing eg on a
console or net console. I could try tail -f /dev/xconsole.

The ath9k module has a "debug" parameter. It just says it's a mask and I
don't know how to use it. Do I set bits to enable or disable levels?
What value to log "everything"?

2011-03-03 05:21:17

by Mohammed Shafi

[permalink] [raw]
Subject: Re: [ath9k-devel] ath9k causes lockups since kernel 2.6.35

On Wed, Mar 2, 2011 at 10:00 PM, Tony Houghton <[email protected]> wrote:
> On Mon, 28 Feb 2011 11:06:54 +0530
> Mohammed Shafi <[email protected]> wrote:
>
>> On Sun, Feb 27, 2011 at 12:05 AM, Tony Houghton <[email protected]> wrote:
>> > I tried to reconfigure rsyslog to sync on each message but it
>> > didn't log anything useful :-(.
>>
>> I will try to reproduce here with my platform.
>
> Is this still not reproducible? I'd like to do more to help trace the
> problem but I'm a bit stuck. Is there something I can do to make sure
> all the debug/log messages from ath9k appear on the console?

sudo dmesg -n 8?

>
> I'm using rsyslog ATM, which is Debian's default logger AFAIK, but I'm
> willing to install something different if it will help capture these
> messages.

Tony, it looks like we are not having the card you are using. Any way
I will try to recreate the issue with the AR9285 card I have.

>

2011-03-04 15:50:04

by Mohammed Shafi

[permalink] [raw]
Subject: Re: [ath9k-devel] ath9k causes lockups since kernel 2.6.35

2011/3/4 Tony Houghton <[email protected]>:
> On Thu, 3 Mar 2011 21:27:32 +0530
> Mohammed Shafi <[email protected]> wrote:
>
>> sudo modprobe ath9k debug=0xffffffff
>
> That debug option was what I needed. I'm attaching what I captured with
> netconsole leading up to a crash.

Could not find something suspicious, but need to look more into it.
The patch you mentioned adds support for new version of a chip and I
am not quite clear how it can directly cause a kernel lock. All those
things added are mainly hardware code and need to see whether some
thing basic is missed in the patch.

>
> I also found that I couldn't reproduce the problem while running on
> battery, but as soon as I connected the PSU it crashed the very next
> time I disabled wireless. I hadn't noticed that PSU/battery made a
> difference before, but it does seem to be an issue.
>

Please see whether this is consistently reproducible.
As a try can you please disable the network manager and use iw command
to connect?

2011-03-16 05:09:20

by Mohammed Shafi

[permalink] [raw]
Subject: Re: Could ath9k and rt2800pci bugs be related?

On Tue, Mar 15, 2011 at 10:45 PM, Tony Houghton <[email protected]> wrote:
> On Tue, 15 Mar 2011 21:47:14 +0530
> Mohammed Shafi <[email protected]> wrote:
>
>> i dont know what to say, may be you can look at
>> vim /drivers/net/wireless/ath/ath9k/pci.c
>> and
>> vim /net/mac80211/pm.c
>> let me also see if i can get the problem reproduced
>
> When I installed wireless-compat it didn't replace all the modules it
> provides, only the ones I'd activated with driver-select ath9k. I didn't
> notice whether mac80211 drivers got replaced too. If I want to alter
> pm.c do I need to add anything to the driver-select line to make sure my
> altered version gets installed?
>
No it won't be necessary. recompilation takes care of it.

2011-03-02 16:30:59

by Tony Houghton

[permalink] [raw]
Subject: Re: [ath9k-devel] ath9k causes lockups since kernel 2.6.35

On Mon, 28 Feb 2011 11:06:54 +0530
Mohammed Shafi <[email protected]> wrote:

> On Sun, Feb 27, 2011 at 12:05 AM, Tony Houghton <[email protected]> wrote:
> > I tried to reconfigure rsyslog to sync on each message but it
> > didn't log anything useful :-(.
>
> I will try to reproduce here with my platform.

Is this still not reproducible? I'd like to do more to help trace the
problem but I'm a bit stuck. Is there something I can do to make sure
all the debug/log messages from ath9k appear on the console?

I'm using rsyslog ATM, which is Debian's default logger AFAIK, but I'm
willing to install something different if it will help capture these
messages.

2011-03-15 15:17:38

by Tony Houghton

[permalink] [raw]
Subject: Re: Could ath9k and rt2800pci bugs be related?

On Tue, 15 Mar 2011 19:05:27 +0530
Mohammed Shafi <[email protected]> wrote:

> On Tue, Mar 15, 2011 at 6:49 PM, Tony Houghton <[email protected]> wrote:
> > On Tue, 15 Mar 2011 13:17:40 +0530
> > Mohammed Shafi <[email protected]> wrote:
> >
> >> On Mon, Mar 14, 2011 at 10:18 PM, Tony Houghton <[email protected]>
> >> wrote:
> >> >
> >> > I've just bought an Acer Aspire Revo 3700 and this had very
> >> > similar symptoms with a different adapter. Luckily another
> >> > customer had posted about it on the vendor's website and his fix
> >> > works for me. The fix is to blacklist the rt2800pci module and
> >> > the rt2860sta driver seems to work quite happily without it.
> >>
> >> can you please check by disabling the supicious rt modules and see
> >> whether this problems happens.
> >
> > Yes, I did blacklist rt2800pci and the system works correctly
> > without the module loaded. Even the wireless connection still works.
>
> Ok still there is also one or two guys reporting this locking issue,
> so we need to be very sure.

I'm afraid you misunderstood me. The rt28* issue is on a different
system with an Ralink adapter. AFAIK the actual rt28* modules have
nothing to do with the Atheros problem. I only made a connection because
the symptoms are so similar, and I thought it possible that the two
different drivers might share some code, but it's more likely to be a
coincidence.

Has anyone else reported the rt28* problem and/or are the developers
aware of it? I know I'm not the only affected person because I read
about it in a customer comment on the vendor's web site for the Acer
R3700.

> >> for quick check please try with the latest compat wireless.
> >
> > How do I do that?
> >
> > I would also be willing to add extra debugging messages to ath9k to
> > help track down the AR9285 problem. I'm a C programmer, but not a
> > kernel hacker so I think I would need some advice about which
> > functions to examine.
>
> No its nothing to do with kernel hacking, its just a wireless package.

For kernel hacking I meant I would like to experiment with the code
myself. If you can't reproduce the problem I think it would be very
helpful if I can make it print extra messages to narrow it down. If you
could tell me something like, "The shutdown process should start at
function X and end at function Y," I'll know better which code to
experiment with. BTW, all types of wireless shutdown seem to be
affected, whether I turn off the WAP, click Disconnect in network
manager, press the rfkill switch, suspend, shutdown or rmmod ath9k.

Should I use printk to print the debugging messages or something else?
As there is the debug module parameter I guess the latter. Is there also
some sort of sleep function which I can safely add after each debug
message to make sure the message is made visible before the crash?

2011-03-14 16:48:06

by Tony Houghton

[permalink] [raw]
Subject: Could ath9k and rt2800pci bugs be related?

[I am no longer subscribed to the list so please Cc me]

I'm the person who reported that my system kept freezing when shutting
down my AR9285 wireless with kernel 2.6.35 or newer.

I've just bought an Acer Aspire Revo 3700 and this had very similar
symptoms with a different adapter. Luckily another customer had posted
about it on the vendor's website and his fix works for me. The fix is to
blacklist the rt2800pci module and the rt2860sta driver seems to work
quite happily without it.

It looks as if rt2800pci was introduced somewhere between 2.6.32 and
2.6.34 and I experienced the crashes with 2.6.35 (Mint 10/Ubuntu 10.10)
and 2.6.37 (Debian unstable), but not with 2.6.32 (Debian squeeze) which
doesn't have that module.

It's probably just coincidence, but I was struck by how similar the
symptoms are and wondered whether these different drivers have anything
in common?

lspci output:

02:00.0 Network controller [0280]: RaLink RT3090 Wireless 802.11n 1T/1R PCIe [1814:3090]
Subsystem: Lite-On Communications Inc Device [11ad:6622]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 18
Region 0: Memory at febf0000 (32-bit, non-prefetchable) [size=64K]
Capabilities: <access denied>
Kernel driver in use: rt2860

2011-06-13 11:00:46

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: ath9k causes lockups since kernel 2.6.35

On Fri, Jun 10, 2011 at 02:48:00PM +0800, Adrian Chadd wrote:
> This patch against the current wireless-testing tree restores bit 6/7
> being set for the AR9285.
>
> Tony, would you please test this out and see if this works? This is
> against the latest wireless-testing.

cam a fedora user, confirms that patch fixes the locks-ups
https://bugzilla.redhat.com/show_bug.cgi?id=697157#c26

We are waiting for fix now :-)

Thanks
Stanislaw

2011-06-27 10:34:02

by Adrian Chadd

[permalink] [raw]
Subject: Re: ath9k causes lockups since kernel 2.6.35

Hi guys,

This article just popped up, wrt Linux APSM handling changes and some
rather negative effects..

http://www.phoronix.com/scan.php?page=article&item=linux_2638_aspm&num=2

The commit is: 2f671e2dbff6eb5ef4e2600adbec550c13b8fe72

>From the article:

"Evidently, some BIOSes have their ASPM support misconfigured and thus
problems can arise if the PCI-E link power mode is dropped on an
unsupported device. There are a few mentions of hangs and other issues
under Linux associated with this power management feature. It's not
really a surprise though that the BIOSes would be misconfigured given
all of the other BIOS-related problems under Linux and the once very
poor suspend-and-resume support due to all of the workarounds and
hacks that BIOS/hardware vendors have done to cater towards Microsoft
Windows power management. In this case, it seems a large number of
mobile systems are supporting ASPM but not properly advertising the
support via the standard BIOS ACPI FADT (Fixed ACPI Description
Table). Some Linux drivers even forcibly disable ASPM on Linux (e.g.
this kernel patch)."

Would someone please take charge of testing an unmodified ath9k (ie,
without my APSM disable fix) and try reverting this kernel patch?

Thanks,


Adrian

On 25 February 2011 04:02, Jonathan Nieder <[email protected]> wrote:
> (just cc-ing some people listed in MAINTAINERS)
> Hi,
>
> Tony Houghton wrote:
>
>> With 2.6.37 I can not use suspend on my Compaq/HP 311c (Intel Atom
>> N270/NVidia Ion LE). Originally the machine just kept locking up without
>> even blanking the display when I tried to suspend (using the GNOME menu
>> or by shutting the lid). I upgraded upower and gnome-power-manager etc
>> to experimental and after that the machine suspended OK but could not
>> resume. The backlight came on but the screen stayed blank and I could
>> not get to a console or anything with Alt+Fn.
> [...]
>> I tried replacing network-manager with wicd but that crashed the system
>> when it connected instead of when disconnected.
> [...]
>> While testing different kernels I found it would crash at different
>> times, usually before the screen turned off for suspending, but
>> sometimes it would crash on resuming and occasionally it locked up while
>> booting, but it's always a complete lock-up ie the keyboard is
>> completely responsive, including caps lock, the mouse won't move if the
>> display is still on, and the only way out is to hold down the power
>> button.
> [...]
>> I haven't tried looking in logs because the crashes are so severe I
>> don't think they'd be able to record anything useful. But using git
>> bisect I think I have tracked down the change that started causing this
>> problem:
>>
>> 53bc7aa08b48e5cd745f986731cc7dc24eef2a9f is the first bad commit
>> commit 53bc7aa08b48e5cd745f986731cc7dc24eef2a9f
>> Author: Vivek Natarajan <[email protected]>
>> Date: ? Mon Apr 5 14:48:04 2010 +0530
>>
>> ? ? ath9k: Add support for newer AR9285 chipsets.
>>
>> ? ? This patch adds support for a modified newer version of AR9285
>> ? ? chipsets.
>>
>> ? ? Signed-off-by: Vivek Natarajan <[email protected]>
>> ? ? Signed-off-by: John W. Linville <[email protected]>
>
> The adaptor is an AR9285[1].
>
> That commit is based against v2.6.33 and was merged in v2.6.35-rc1
>
> $ git describe 53bc7aa08b48e5cd745f986731cc7dc24eef2a9f
> v2.6.33-3523-g53bc7aa
> $ git name-rev --tags 53bc7aa08b48e5cd745f986731cc7dc24eef2a9f
> 53bc7aa08b48e5cd745f986731cc7dc24eef2a9f tags/v2.6.35-rc1~473^2~167^2~346
>
> Any ideas for tracking this down?
>
> Thanks,
> Jonathan
>
> [1]
>> 84: udi = '/org/freedesktop/Hal/devices/pci_168c_2b'
>> ? pci.device_protocol = 0 ?(0x0) ?(int)
>> ? pci.vendor = 'Atheros Communications Inc.' ?(string)
>> ? info.vendor = 'Atheros Communications Inc.' ?(string)
>> ? pci.product = 'AR9285 Wireless Network Adapter
>> (PCI-Express)' ?(string) linux.sysfs_path =
>> '/sys/devices/pci0000:00/0000:00:15.0/0000:03:00.0' ?(strin g)
>> ? info.parent = '/org/freedesktop/Hal/devices/pci_10de_ac6' ?(string)
>> ? info.linux.driver = 'ath9k' ?(string)
>> ? pci.subsys_vendor = 'Hewlett-Packard Company' ?(string)
>> ? linux.hotplug_type = 2 ?(0x2) ?(int)
>> ? linux.subsystem = 'pci' ?(string)
>> ? info.subsystem = 'pci' ?(string)
>> ? info.product = 'AR9285 Wireless Network Adapter
>> (PCI-Express)' ?(string) info.udi =
>> '/org/freedesktop/Hal/devices/pci_168c_2b' ?(string)
>> pci.linux.sysfs_path =
>> '/sys/devices/pci0000:00/0000:00:15.0/0000:03:00.0' ?(string)
>> pci.product_id = 43 ?(0x2b) ?(int) pci.vendor_id = 5772 ?(0x168c)
>> (int) pci.subsys_product_id = 12352 ?(0x3040) ?(int)
>> pci.subsys_vendor_id = 4156 ?(0x103c) ?(int) pci.device_class = 2
>> (0x2) ?(int) pci.device_subclass = 128 ?(0x80) ?(int)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2011-06-10 06:48:01

by Adrian Chadd

[permalink] [raw]
Subject: Re: ath9k causes lockups since kernel 2.6.35

Hi all,

This patch against the current wireless-testing tree restores bit 6/7
being set for the AR9285.

Tony, would you please test this out and see if this works? This is
against the latest wireless-testing.

Thanks,


Adrian

diff --git a/drivers/net/wireless/ath/ath9k/ar9002_hw.c
b/drivers/net/wireless/ath/ath9k/ar9002_hw.c
index f344cc2..5e4e37f 100644
--- a/drivers/net/wireless/ath/ath9k/ar9002_hw.c
+++ b/drivers/net/wireless/ath/ath9k/ar9002_hw.c
@@ -384,6 +384,7 @@ static void ar9002_hw_configpcipowersave(struct ath_hw *ah,
}
}

+#if 0
if (AR_SREV_9280(ah) || AR_SREV_9285(ah) || AR_SREV_9287(ah)) {
/*
* Disable bit 6 and 7 before entering D3 to
@@ -391,6 +392,7 @@ static void ar9002_hw_configpcipowersave(struct ath_hw *ah,
*/
val &= ~(AR_WA_BIT6 | AR_WA_BIT7);
}
+#endif

if (AR_SREV_9280(ah))
val |= AR_WA_BIT22;
diff --git a/drivers/net/wireless/ath/ath9k/reg.h
b/drivers/net/wireless/ath/ath9k/reg.h
index c18ee99..a3c893d 100644
--- a/drivers/net/wireless/ath/ath9k/reg.h
+++ b/drivers/net/wireless/ath/ath9k/reg.h
@@ -704,7 +704,7 @@
#define AR_WA_ANALOG_SHIFT (1 << 20)
#define AR_WA_POR_SHORT (1 << 21) /* PCI-E Phy reset control */
#define AR_WA_BIT22 (1 << 22)
-#define AR9285_WA_DEFAULT 0x004a050b
+#define AR9285_WA_DEFAULT 0x004a05cb
#define AR9280_WA_DEFAULT 0x0040073b
#define AR_WA_DEFAULT 0x0000073f