2010-12-28 13:32:52

by Mario 'BitKoenig' Holbe

[permalink] [raw]
Subject: 2.6.37-rc7: Regression: b43: crashes in hwrng_register()

Hello,

on 2.6.37-rc7 the b43 driver crashes in hwrng_register(). This makes the
system virtually unusable since it appears to block networking syscalls.
This leads to, for example, ifconfig never return.
This issue does also exist in 2.6.37-rc5.
This issue does not exist in 2.6.36.2.

The hardware in question is:
02:00.0 Network controller [0280]: Broadcom Corporation BCM4312 802.11b/g LP-PHY [14e4:4315] (rev 01)
on a Lenovo Ideapad S12 with VIA Nano.

dmesg excerpt:
[ 2.056847] b43-pci-bridge 0000:02:00.0: PCI INT A -> GSI 28 (level, low) -> IRQ 28
[ 2.056864] b43-pci-bridge 0000:02:00.0: setting latency timer to 64
...
[ 8.643695] b43-phy0: Broadcom 4312 WLAN found (core revision 15)
[ 9.047514] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
[ 9.048441] Registered led device: b43-phy0::tx
[ 9.048479] Registered led device: b43-phy0::rx
[ 9.048518] Registered led device: b43-phy0::radio
[ 9.048542] Broadcom 43xx driver loaded [ Features: PMLS, Firmware-ID: FW13 ]
...
[ 24.312100] b43-phy0: Loading firmware version 410.2160 (2007-05-26 15:32:10)
...
[ 29.848400] b43-pci-bridge 0000:02:00.0: PCI: Disallowing DAC for device
[ 29.848407] b43-phy0: DMA mask fallback from 64-bit to 32-bit
[ 29.868632] BUG: unable to handle kernel paging request at 907cde0c
[ 29.868640] IP: [<f8d543cc>] hwrng_register+0x4c/0x139 [rng_core]
[ 29.868655] *pde = 00000000
[ 29.868659] Oops: 0000 [#1] SMP
[ 29.868664] last sysfs file: /sys/bus/pci/drivers/parport_pc/uevent
[ 29.868670] Modules linked in: parport_pc ppdev lp parport sbs sbshc power_meter pci_slot hed fan container acpi_cpufreq mperf cpufreq_conservative cpufreq_userspace cpufreq_stats cpufreq_powersave dm_crypt fuse loop eeprom via_cputemp i2c_dev nvram padlock_aes aes_i586 aes_generic padlock_sha sha256_generic sha1_generic via_rng msr cpuid snd_hda_codec_realtek snd_hda_intel snd_hda_codec arc4 snd_hwdep ecb snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi b43 snd_rawmidi uvcvideo snd_seq_midi_event joydev videodev btusb snd_seq rng_core video ac battery tpm_tis v4l1_compat tpm tpm_bios output power_supply i2c_viapro snd_timer ideapad_laptop snd_seq_device serio_raw wmi mac80211 cfg80211 processor snd pcspkr i2c_core psmouse button bluetooth evdev shpchp soundcore snd_page_alloc rfkill pci_hotplug ext3 jbd mbcache raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 multipath linear md_mod dm_mirror dm_region_hash dm_log dm_mod btrfs zli
b_deflate crc32c libcrc32c sd_mod crc_t10dif ata_generic uhci_hcd pata_via libata ssb ehci_hcd tg3 scsi_mod usbcore pcmcia via_sdmmc mmc_core pcmcia_core libphy thermal thermal_sys nls_base [last unloaded: scsi_wait_scan]
[ 29.868810]
[ 29.868816] Pid: 1781, comm: NetworkManager Not tainted 2.6.37-rc7-686 #1 MoutCook/20021,2959
[ 29.868822] EIP: 0060:[<f8d543cc>] EFLAGS: 00010286 CPU: 0
[ 29.868829] EIP is at hwrng_register+0x4c/0x139 [rng_core]
[ 29.868834] EAX: 00000001 EBX: f4b17010 ECX: f6e5db6c EDX: f4b17035
[ 29.868839] ESI: 907cddf0 EDI: 00000000 EBP: 00000036 ESP: f6e5db54
[ 29.868844] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 29.868850] Process NetworkManager (pid: 1781, ti=f6e5c000 task=f6eb6080 task.ti=f6e5c000)
[ 29.868854] Stack:
[ 29.868856] f4b16fc0 f4b17035 f8e5a870 f4b17035 0000001f f8e70095 f8e6f9ca f4b71e70
[ 29.868866] 0000000f f6c95000 f6c95000 f6e97400 f4b162c0 f4b10240 f4b16fc8 f8e5ad67
[ 29.868875] f89e43da f4b162c0 f6cab400 f8b80e44 f6cab000 f8b70889 f8b6fe7a 00000000
[ 29.868884] Call Trace:
[ 29.868909] [<f8e5a870>] ? b43_wireless_core_init+0xd0c/0xdd6 [b43]
[ 29.868925] [<f8e5ad67>] ? b43_op_start+0xf8/0x142 [b43]
[ 29.868947] [<f89e43da>] ? cfg80211_netdev_notifier_call+0x342/0x355 [cfg80211]
[ 29.868984] [<f8b70889>] ? ieee80211_do_open+0xed/0x45f [mac80211]
[ 29.869002] [<f8b6fe7a>] ? ieee80211_check_concurrent_iface+0x1c/0x135 [mac80211]
[ 29.869015] [<c11edcba>] ? __dev_open+0x7d/0xa7
[ 29.869022] [<c11ec683>] ? __dev_change_flags+0x9a/0x10d
[ 29.869028] [<c11edc12>] ? dev_change_flags+0x10/0x3b
[ 29.869036] [<c11f7c77>] ? do_setlink+0x23e/0x532
[ 29.869044] [<c11f803b>] ? rtnl_setlink+0xd0/0xe1
[ 29.869058] [<c1145b00>] ? __strncpy_from_user+0x1d/0x2b
[ 29.869064] [<c11f7f6b>] ? rtnl_setlink+0x0/0xe1
[ 29.869069] [<c11f77a2>] ? rtnetlink_rcv_msg+0x186/0x19c
[ 29.869075] [<c11f761c>] ? rtnetlink_rcv_msg+0x0/0x19c
[ 29.869082] [<c1206818>] ? netlink_rcv_skb+0x2d/0x72
[ 29.869088] [<c11f7616>] ? rtnetlink_rcv+0x18/0x1e
[ 29.869093] [<c120666c>] ? netlink_unicast+0xba/0x10e
[ 29.869099] [<c1207170>] ? netlink_sendmsg+0x23d/0x256
[ 29.869111] [<c11dfe26>] ? __sock_sendmsg+0x48/0x4e
[ 29.869117] [<c11e008f>] ? sock_sendmsg+0x78/0x8f
[ 29.869123] [<c11e008f>] ? sock_sendmsg+0x78/0x8f
[ 29.869131] [<c10c6785>] ? d_kill+0x38/0x3d
[ 29.869141] [<c11e7f0c>] ? verify_iovec+0x3d/0x79
[ 29.869147] [<c11e088d>] ? sys_sendmsg+0x15f/0x1c1
[ 29.869153] [<c11e04c4>] ? sockfd_lookup_light+0x13/0x3f
[ 29.869160] [<c11e0b25>] ? sys_sendto+0xfd/0x121
[ 29.869166] [<c11e43eb>] ? sk_prot_alloc+0x62/0xd6
[ 29.869174] [<c1001e6e>] ? __switch_to+0x6f/0xe2
[ 29.869183] [<c12860de>] ? schedule+0x579/0x5b6
[ 29.869190] [<c11e0723>] ? sys_recvmsg+0x3c/0x47
[ 29.869196] [<c11e1afd>] ? sys_socketcall+0x17f/0x1cb
[ 29.869202] [<c1002f9f>] ? sysenter_do_call+0x12/0x28
[ 29.869206] Code: f8 e8 46 25 53 c8 8b 35 ec 45 d5 f8 eb 1a 8b 13 8b 06 e8 17 11 3f c8 85 c0 75 0a be ef ff ff ff e9 d3 00 00 00 8b 76 1c 83 ee 1c <8b> 46 1c 0f 18 00 90 81 fe d0 45 d5 f8 75 d4 83 3d ec 47 d5 f8
[ 29.869249] EIP: [<f8d543cc>] hwrng_register+0x4c/0x139 [rng_core] SS:ESP 0068:f6e5db54
[ 29.869259] CR2: 00000000907cde0c
[ 29.869264] ---[ end trace 6719399ed79e8cc1 ]---


regards
Mario
--
To err is human. To really foul things up requires a computer.


2010-12-29 00:40:57

by Larry Finger

[permalink] [raw]
Subject: Re: 2.6.37-rc7: Regression: b43: crashes in hwrng_register()

Mario Holbe wrote:

> on 2.6.37-rc7 the b43 driver crashes in hwrng_register(). This makes the
> system virtually unusable since it appears to block networking syscalls.
> This leads to, for example, ifconfig never return.
> This issue does also exist in 2.6.37-rc5.
> This issue does not exist in 2.6.36.2.
>
> The hardware in question is:
> 02:00.0 Network controller [0280]: Broadcom Corporation BCM4312 802.11b/g
LP-PHY [14e4:4315] (rev 01)
> on a Lenovo Ideapad S12 with VIA Nano.

> dmesg excerpt:
> [ 2.056847] b43-pci-bridge 0000:02:00.0: PCI INT A -> GSI 28 (level, low) ->
IRQ 28
> [ 2.056864] b43-pci-bridge 0000:02:00.0: setting latency timer to 64
...
> [ 8.643695] b43-phy0: Broadcom 4312 WLAN found (core revision 15)
> [ 9.047514] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
> [ 9.048441] Registered led device: b43-phy0::tx
> [ 9.048479] Registered led device: b43-phy0::rx
> [ 9.048518] Registered led device: b43-phy0::radio
> [ 9.048542] Broadcom 43xx driver loaded [ Features: PMLS, Firmware-ID: FW13 ]
...
> [ 24.312100] b43-phy0: Loading firmware version 410.2160 (2007-05-26 15:32:10)
...
> [ 29.848400] b43-pci-bridge 0000:02:00.0: PCI: Disallowing DAC for device
> [ 29.848407] b43-phy0: DMA mask fallback from 64-bit to 32-bit
> [ 29.868632] BUG: unable to handle kernel paging request at 907cde0c
> [ 29.868640] IP: [<f8d543cc>] hwrng_register+0x4c/0x139 [rng_core]
> [ 29.868655] *pde = 00000000
> [ 29.868659] Oops: 0000 [#1] SMP
> [ 29.868664] last sysfs file: /sys/bus/pci/drivers/parport_pc/uevent
> [ 29.868670] Modules linked in: parport_pc ppdev lp parport sbs sbshc
power_meter pci_slot hed fan container acpi_cpufreq mperf cpufreq_conservative
cpufreq_userspace cpufreq_stats cpufreq_powersave dm_crypt fuse loop eeprom
via_cputemp i2c_dev nvram padlock_aes aes_i586 aes_generic padlock_sha
sha256_generic sha1_generic via_rng msr cpuid snd_hda_codec_realtek
snd_hda_intel snd_hda_codec arc4 snd_hwdep ecb snd_pcm_oss snd_mixer_oss snd_pcm
snd_seq_midi b43 snd_rawmidi uvcvideo snd_seq_midi_event joydev videodev btusb
snd_seq rng_core video ac battery tpm_tis v4l1_compat tpm tpm_bios output
power_supply i2c_viapro snd_timer ideapad_laptop snd_seq_device serio_raw wmi
mac80211 cfg80211 processor snd pcspkr i2c_core psmouse button bluetooth evdev
shpchp soundcore snd_page_alloc rfkill pci_hotplug ext3 jbd mbcache raid10
raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx
raid1 raid0 multipath linear md_mod dm_mirror dm_region_hash dm_log dm_mod btrfs zli
b_deflate crc32c libcrc32c sd_mod crc_t10dif ata_generic uhci_hcd pata_via
libata ssb ehci_hcd tg3 scsi_mod usbcore pcmcia via_sdmmc mmc_core pcmcia_core
libphy thermal thermal_sys nls_base [last unloaded: scsi_wait_scan]
> [ 29.868810]
> [ 29.868816] Pid: 1781, comm: NetworkManager Not tainted 2.6.37-rc7-686 #1
MoutCook/20021,2959
> [ 29.868822] EIP: 0060:[<f8d543cc>] EFLAGS: 00010286 CPU: 0
> [ 29.868829] EIP is at hwrng_register+0x4c/0x139 [rng_core]
> [ 29.868834] EAX: 00000001 EBX: f4b17010 ECX: f6e5db6c EDX: f4b17035
> [ 29.868839] ESI: 907cddf0 EDI: 00000000 EBP: 00000036 ESP: f6e5db54
> [ 29.868844] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [ 29.868850] Process NetworkManager (pid: 1781, ti=f6e5c000 task=f6eb6080
task.ti=f6e5c000)
> [ 29.868854] Stack:
> [ 29.868856] f4b16fc0 f4b17035 f8e5a870 f4b17035 0000001f f8e70095 f8e6f9ca
f4b71e70
> [ 29.868866] 0000000f f6c95000 f6c95000 f6e97400 f4b162c0 f4b10240 f4b16fc8
f8e5ad67
> [ 29.868875] f89e43da f4b162c0 f6cab400 f8b80e44 f6cab000 f8b70889 f8b6fe7a
00000000
> [ 29.868884] Call Trace:
> [ 29.868909] [<f8e5a870>] ? b43_wireless_core_init+0xd0c/0xdd6 [b43]
> [ 29.868925] [<f8e5ad67>] ? b43_op_start+0xf8/0x142 [b43]
> [ 29.868947] [<f89e43da>] ? cfg80211_netdev_notifier_call+0x342/0x355 [cfg80211]
> [ 29.868984] [<f8b70889>] ? ieee80211_do_open+0xed/0x45f [mac80211]
> [ 29.869002] [<f8b6fe7a>] ? ieee80211_check_concurrent_iface+0x1c/0x135 [mac80211]
> [ 29.869015] [<c11edcba>] ? __dev_open+0x7d/0xa7
> [ 29.869022] [<c11ec683>] ? __dev_change_flags+0x9a/0x10d
> [ 29.869028] [<c11edc12>] ? dev_change_flags+0x10/0x3b
> [ 29.869036] [<c11f7c77>] ? do_setlink+0x23e/0x532
> [ 29.869044] [<c11f803b>] ? rtnl_setlink+0xd0/0xe1
> [ 29.869058] [<c1145b00>] ? __strncpy_from_user+0x1d/0x2b
> [ 29.869064] [<c11f7f6b>] ? rtnl_setlink+0x0/0xe1
> [ 29.869069] [<c11f77a2>] ? rtnetlink_rcv_msg+0x186/0x19c
> [ 29.869075] [<c11f761c>] ? rtnetlink_rcv_msg+0x0/0x19c
> [ 29.869082] [<c1206818>] ? netlink_rcv_skb+0x2d/0x72
> [ 29.869088] [<c11f7616>] ? rtnetlink_rcv+0x18/0x1e
> [ 29.869093] [<c120666c>] ? netlink_unicast+0xba/0x10e
> [ 29.869099] [<c1207170>] ? netlink_sendmsg+0x23d/0x256
> [ 29.869111] [<c11dfe26>] ? __sock_sendmsg+0x48/0x4e
> [ 29.869117] [<c11e008f>] ? sock_sendmsg+0x78/0x8f
> [ 29.869123] [<c11e008f>] ? sock_sendmsg+0x78/0x8f
> [ 29.869131] [<c10c6785>] ? d_kill+0x38/0x3d
> [ 29.869141] [<c11e7f0c>] ? verify_iovec+0x3d/0x79
> [ 29.869147] [<c11e088d>] ? sys_sendmsg+0x15f/0x1c1
> [ 29.869153] [<c11e04c4>] ? sockfd_lookup_light+0x13/0x3f
> [ 29.869160] [<c11e0b25>] ? sys_sendto+0xfd/0x121
> [ 29.869166] [<c11e43eb>] ? sk_prot_alloc+0x62/0xd6
> [ 29.869174] [<c1001e6e>] ? __switch_to+0x6f/0xe2
> [ 29.869183] [<c12860de>] ? schedule+0x579/0x5b6
> [ 29.869190] [<c11e0723>] ? sys_recvmsg+0x3c/0x47
> [ 29.869196] [<c11e1afd>] ? sys_socketcall+0x17f/0x1cb
> [ 29.869202] [<c1002f9f>] ? sysenter_do_call+0x12/0x28
> [ 29.869206] Code: f8 e8 46 25 53 c8 8b 35 ec 45 d5 f8 eb 1a 8b 13 8b 06 e8 17
11 3f c8 85 c0 75 0a be ef ff ff ff e9 d3 00 00 00 8b 76 1c 83 ee 1c <8b> 46 1c
0f 18 00 90 81 fe d0 45 d5 f8 75 d4 83 3d ec 47 d5 f8
> [ 29.869249] EIP: [<f8d543cc>] hwrng_register+0x4c/0x139 [rng_core] SS:ESP
0068:f6e5db54
> [ 29.869259] CR2: 00000000907cde0c
> [ 29.869264] ---[ end trace 6719399ed79e8cc1 ]---

I almost missed this posting. Please post wireless problems with
[email protected] for better visibility.

I have a BCM4312 (14e4:4315) on a netbook that does not have this problem, thus
I will have to rely on your debugging. An additional difficulty is that the only
changes to b43 between 2.6.36 and 2.6.37 are adding an additional PCI ID, some
fixes to the SDIO driver, and some code for an 802.11n device. None of these
should affect your 802.11 b/g unit.

Is it possible for you to bisect between 2.6.36 and 2.6.37-rc5? I wish I could
suggest some way to minimize the number of commits and builds, but the problem
could be anywhere.

Larry

2010-12-29 10:30:33

by Maciej Rutecki

[permalink] [raw]
Subject: Re: 2.6.37-rc7: Regression: b43: crashes in hwrng_register()

(CC added)

I created a Bugzilla entry at
https://bugzilla.kernel.org/show_bug.cgi?id=25812
for your bug report, please add your address to the CC list in there, thanks!


On wtorek, 28 grudnia 2010 o 14:32:29 Mario 'BitKoenig' Holbe wrote:
> Hello,
>
> on 2.6.37-rc7 the b43 driver crashes in hwrng_register(). This makes the
> system virtually unusable since it appears to block networking syscalls.
> This leads to, for example, ifconfig never return.
> This issue does also exist in 2.6.37-rc5.
> This issue does not exist in 2.6.36.2.
>
> The hardware in question is:
> 02:00.0 Network controller [0280]: Broadcom Corporation BCM4312 802.11b/g
> LP-PHY [14e4:4315] (rev 01) on a Lenovo Ideapad S12 with VIA Nano.
>
> dmesg excerpt:
> [ 2.056847] b43-pci-bridge 0000:02:00.0: PCI INT A -> GSI 28 (level,
> low) -> IRQ 28 [ 2.056864] b43-pci-bridge 0000:02:00.0: setting latency
> timer to 64 ...
> [ 8.643695] b43-phy0: Broadcom 4312 WLAN found (core revision 15)
> [ 9.047514] ieee80211 phy0: Selected rate control algorithm
> 'minstrel_ht' [ 9.048441] Registered led device: b43-phy0::tx
> [ 9.048479] Registered led device: b43-phy0::rx
> [ 9.048518] Registered led device: b43-phy0::radio
> [ 9.048542] Broadcom 43xx driver loaded [ Features: PMLS, Firmware-ID:
> FW13 ] ...
> [ 24.312100] b43-phy0: Loading firmware version 410.2160 (2007-05-26
> 15:32:10) ...
> [ 29.848400] b43-pci-bridge 0000:02:00.0: PCI: Disallowing DAC for device
> [ 29.848407] b43-phy0: DMA mask fallback from 64-bit to 32-bit
> [ 29.868632] BUG: unable to handle kernel paging request at 907cde0c
> [ 29.868640] IP: [<f8d543cc>] hwrng_register+0x4c/0x139 [rng_core]
> [ 29.868655] *pde = 00000000
> [ 29.868659] Oops: 0000 [#1] SMP
> [ 29.868664] last sysfs file: /sys/bus/pci/drivers/parport_pc/uevent
> [ 29.868670] Modules linked in: parport_pc ppdev lp parport sbs sbshc
> power_meter pci_slot hed fan container acpi_cpufreq mperf
> cpufreq_conservative cpufreq_userspace cpufreq_stats cpufreq_powersave
> dm_crypt fuse loop eeprom via_cputemp i2c_dev nvram padlock_aes aes_i586
> aes_generic padlock_sha sha256_generic sha1_generic via_rng msr cpuid
> snd_hda_codec_realtek snd_hda_intel snd_hda_codec arc4 snd_hwdep ecb
> snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi b43 snd_rawmidi uvcvideo
> snd_seq_midi_event joydev videodev btusb snd_seq rng_core video ac battery
> tpm_tis v4l1_compat tpm tpm_bios output power_supply i2c_viapro snd_timer
> ideapad_laptop snd_seq_device serio_raw wmi mac80211 cfg80211 processor
> snd pcspkr i2c_core psmouse button bluetooth evdev shpchp soundcore
> snd_page_alloc rfkill pci_hotplug ext3 jbd mbcache raid10 raid456
> async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx
> raid1 raid0 multipath linear md_mod dm_mirror dm_region_hash dm_log dm_mod
> btrfs zli b_deflate crc32c libcrc32c sd_mod crc_t10dif ata_generic
> uhci_hcd pata_via libata ssb ehci_hcd tg3 scsi_mod usbcore pcmcia
> via_sdmmc mmc_core pcmcia_core libphy thermal thermal_sys nls_base [last
> unloaded: scsi_wait_scan] [ 29.868810]
> [ 29.868816] Pid: 1781, comm: NetworkManager Not tainted 2.6.37-rc7-686
> #1 MoutCook/20021,2959 [ 29.868822] EIP: 0060:[<f8d543cc>] EFLAGS:
> 00010286 CPU: 0
> [ 29.868829] EIP is at hwrng_register+0x4c/0x139 [rng_core]
> [ 29.868834] EAX: 00000001 EBX: f4b17010 ECX: f6e5db6c EDX: f4b17035
> [ 29.868839] ESI: 907cddf0 EDI: 00000000 EBP: 00000036 ESP: f6e5db54
> [ 29.868844] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [ 29.868850] Process NetworkManager (pid: 1781, ti=f6e5c000 task=f6eb6080
> task.ti=f6e5c000) [ 29.868854] Stack:
> [ 29.868856] f4b16fc0 f4b17035 f8e5a870 f4b17035 0000001f f8e70095
> f8e6f9ca f4b71e70 [ 29.868866] 0000000f f6c95000 f6c95000 f6e97400
> f4b162c0 f4b10240 f4b16fc8 f8e5ad67 [ 29.868875] f89e43da f4b162c0
> f6cab400 f8b80e44 f6cab000 f8b70889 f8b6fe7a 00000000 [ 29.868884] Call
> Trace:
> [ 29.868909] [<f8e5a870>] ? b43_wireless_core_init+0xd0c/0xdd6 [b43]
> [ 29.868925] [<f8e5ad67>] ? b43_op_start+0xf8/0x142 [b43]
> [ 29.868947] [<f89e43da>] ? cfg80211_netdev_notifier_call+0x342/0x355
> [cfg80211] [ 29.868984] [<f8b70889>] ? ieee80211_do_open+0xed/0x45f
> [mac80211] [ 29.869002] [<f8b6fe7a>] ?
> ieee80211_check_concurrent_iface+0x1c/0x135 [mac80211] [ 29.869015]
> [<c11edcba>] ? __dev_open+0x7d/0xa7
> [ 29.869022] [<c11ec683>] ? __dev_change_flags+0x9a/0x10d
> [ 29.869028] [<c11edc12>] ? dev_change_flags+0x10/0x3b
> [ 29.869036] [<c11f7c77>] ? do_setlink+0x23e/0x532
> [ 29.869044] [<c11f803b>] ? rtnl_setlink+0xd0/0xe1
> [ 29.869058] [<c1145b00>] ? __strncpy_from_user+0x1d/0x2b
> [ 29.869064] [<c11f7f6b>] ? rtnl_setlink+0x0/0xe1
> [ 29.869069] [<c11f77a2>] ? rtnetlink_rcv_msg+0x186/0x19c
> [ 29.869075] [<c11f761c>] ? rtnetlink_rcv_msg+0x0/0x19c
> [ 29.869082] [<c1206818>] ? netlink_rcv_skb+0x2d/0x72
> [ 29.869088] [<c11f7616>] ? rtnetlink_rcv+0x18/0x1e
> [ 29.869093] [<c120666c>] ? netlink_unicast+0xba/0x10e
> [ 29.869099] [<c1207170>] ? netlink_sendmsg+0x23d/0x256
> [ 29.869111] [<c11dfe26>] ? __sock_sendmsg+0x48/0x4e
> [ 29.869117] [<c11e008f>] ? sock_sendmsg+0x78/0x8f
> [ 29.869123] [<c11e008f>] ? sock_sendmsg+0x78/0x8f
> [ 29.869131] [<c10c6785>] ? d_kill+0x38/0x3d
> [ 29.869141] [<c11e7f0c>] ? verify_iovec+0x3d/0x79
> [ 29.869147] [<c11e088d>] ? sys_sendmsg+0x15f/0x1c1
> [ 29.869153] [<c11e04c4>] ? sockfd_lookup_light+0x13/0x3f
> [ 29.869160] [<c11e0b25>] ? sys_sendto+0xfd/0x121
> [ 29.869166] [<c11e43eb>] ? sk_prot_alloc+0x62/0xd6
> [ 29.869174] [<c1001e6e>] ? __switch_to+0x6f/0xe2
> [ 29.869183] [<c12860de>] ? schedule+0x579/0x5b6
> [ 29.869190] [<c11e0723>] ? sys_recvmsg+0x3c/0x47
> [ 29.869196] [<c11e1afd>] ? sys_socketcall+0x17f/0x1cb
> [ 29.869202] [<c1002f9f>] ? sysenter_do_call+0x12/0x28
> [ 29.869206] Code: f8 e8 46 25 53 c8 8b 35 ec 45 d5 f8 eb 1a 8b 13 8b 06
> e8 17 11 3f c8 85 c0 75 0a be ef ff ff ff e9 d3 00 00 00 8b 76 1c 83 ee 1c
> <8b> 46 1c 0f 18 00 90 81 fe d0 45 d5 f8 75 d4 83 3d ec 47 d5 f8 [
> 29.869249] EIP: [<f8d543cc>] hwrng_register+0x4c/0x139 [rng_core] SS:ESP
> 0068:f6e5db54 [ 29.869259] CR2: 00000000907cde0c
> [ 29.869264] ---[ end trace 6719399ed79e8cc1 ]---
>
>
> regards
> Mario

--
Maciej Rutecki
http://www.maciek.unixy.pl

2010-12-29 19:55:10

by Mario 'BitKoenig' Holbe

[permalink] [raw]
Subject: Re: 2.6.37-rc7: Regression: b43: crashes in hwrng_register()

Hello Larry,

On Tue, Dec 28, 2010 at 06:34:08PM -0600, Larry Finger wrote:
> Mario Holbe wrote:
> > on 2.6.37-rc7 the b43 driver crashes in hwrng_register(). This makes the
...
> > This issue does also exist in 2.6.37-rc5.
> > This issue does not exist in 2.6.36.2.
...
> > [ 29.868632] BUG: unable to handle kernel paging request at 907cde0c
> > [ 29.868640] IP: [<f8d543cc>] hwrng_register+0x4c/0x139 [rng_core]
...
> > [ 29.868884] Call Trace:
> > [ 29.868909] [<f8e5a870>] ? b43_wireless_core_init+0xd0c/0xdd6 [b43]
>
> I almost missed this posting.

You're welcome :)

> Please post wireless problems with
> [email protected] for better visibility.

Sorry and thanks for completing the CC: list.

> I have a BCM4312 (14e4:4315) on a netbook that does not have this problem, thus
> I will have to rely on your debugging. An additional difficulty is that the only
> changes to b43 between 2.6.36 and 2.6.37 are adding an additional PCI ID, some
> fixes to the SDIO driver, and some code for an 802.11n device. None of these
> should affect your 802.11 b/g unit.
>
> Is it possible for you to bisect between 2.6.36 and 2.6.37-rc5? I wish I could
> suggest some way to minimize the number of commits and builds, but the problem
> could be anywhere.

To be honest, I never bisected such a huge amount of commits before and
I'm somewhat afraid of doing it.

However, I think I'm able to nail the issue down to:
commit 84c164a34ffe67908a932a2d641ec1a80c2d5435 which went to 2.6.37-rc1.
Author: John W. Linville <[email protected]>
Date: Fri Aug 6 15:31:45 2010 -0400

b43: move hwrng registration driver to wireless core initialization

Message-ID: <[email protected]>
http://marc.info/?l=linux-wireless&m=128112658829379&w=2

I did 2 things:
1. I (manually) reverted 84c164a34ffe67908a932a2d641ec1a80c2d5435 from
2.6.37-rc7: The crash disappears, b43 is useable.
2. I added 84c164a34ffe67908a932a2d641ec1a80c2d5435 to 2.6.36.2: The
crash shows up as with vanilla 2.6.37-rc7.

I'm not sure why this is not reproducible for you, probably it has
something to do with the VIA Nano having a second HW-RNG driven by
via-rng. I experienced crashes in the past with earlier kernels when I
tried to move RNGs around via /sys/devices/virtual/misc/hw_random, but
never took the time to trace them down since I just got it working :)

Oh, I'm still able to trigger a crash with
$ cat /sys/devices/virtual/misc/hw_random/rng_available
on 2.6.37-rc7 without 84c164a34ffe67908a932a2d641ec1a80c2d5435 as well
as on vanilla 2.6.36.2. Probably this is (better) reproducible for you?

I suspect both (the 84c164a34ffe67908a932a2d641ec1a80c2d5435 crash as
well as the cat rng_available crash) having something to do with a
partially uninitialized rng-struct, or better: parts of the rng-struct
that are free()d too early (i.e. within its lifetime).


regards
Mario
--
Doing it right is no excuse for not meeting the schedule.
-- Plant Manager, Delphi Corporation


Attachments:
(No filename) (0.00 B)
signature.asc (482.00 B)
Digital signature
Download all attachments

2010-12-30 00:30:44

by Larry Finger

[permalink] [raw]
Subject: Re: 2.6.37-rc7: Regression: b43: crashes in hwrng_register()

On 12/29/2010 01:54 PM, Mario 'BitKoenig' Holbe wrote:
> Hello Larry,
>
> On Tue, Dec 28, 2010 at 06:34:08PM -0600, Larry Finger wrote:
>> Mario Holbe wrote:
>>> on 2.6.37-rc7 the b43 driver crashes in hwrng_register(). This makes the
> ...
>>> This issue does also exist in 2.6.37-rc5.
>>> This issue does not exist in 2.6.36.2.
> ...
>>> [ 29.868632] BUG: unable to handle kernel paging request at 907cde0c
>>> [ 29.868640] IP: [<f8d543cc>] hwrng_register+0x4c/0x139 [rng_core]
> ...
>>> [ 29.868884] Call Trace:
>>> [ 29.868909] [<f8e5a870>] ? b43_wireless_core_init+0xd0c/0xdd6 [b43]
>>
>> I almost missed this posting.
>
> You're welcome :)
>
>> Please post wireless problems with
>> [email protected] for better visibility.
>
> Sorry and thanks for completing the CC: list.
>
>> I have a BCM4312 (14e4:4315) on a netbook that does not have this problem, thus
>> I will have to rely on your debugging. An additional difficulty is that the only
>> changes to b43 between 2.6.36 and 2.6.37 are adding an additional PCI ID, some
>> fixes to the SDIO driver, and some code for an 802.11n device. None of these
>> should affect your 802.11 b/g unit.
>>
>> Is it possible for you to bisect between 2.6.36 and 2.6.37-rc5? I wish I could
>> suggest some way to minimize the number of commits and builds, but the problem
>> could be anywhere.
>
> To be honest, I never bisected such a huge amount of commits before and
> I'm somewhat afraid of doing it.
>
> However, I think I'm able to nail the issue down to:
> commit 84c164a34ffe67908a932a2d641ec1a80c2d5435 which went to 2.6.37-rc1.
> Author: John W. Linville <[email protected]>
> Date: Fri Aug 6 15:31:45 2010 -0400
>
> b43: move hwrng registration driver to wireless core initialization
>
> Message-ID: <[email protected]>
> http://marc.info/?l=linux-wireless&m=128112658829379&w=2
>
> I did 2 things:
> 1. I (manually) reverted 84c164a34ffe67908a932a2d641ec1a80c2d5435 from
> 2.6.37-rc7: The crash disappears, b43 is useable.
> 2. I added 84c164a34ffe67908a932a2d641ec1a80c2d5435 to 2.6.36.2: The
> crash shows up as with vanilla 2.6.37-rc7.
>
> I'm not sure why this is not reproducible for you, probably it has
> something to do with the VIA Nano having a second HW-RNG driven by
> via-rng. I experienced crashes in the past with earlier kernels when I
> tried to move RNGs around via /sys/devices/virtual/misc/hw_random, but
> never took the time to trace them down since I just got it working :)
>
> Oh, I'm still able to trigger a crash with
> $ cat /sys/devices/virtual/misc/hw_random/rng_available
> on 2.6.37-rc7 without 84c164a34ffe67908a932a2d641ec1a80c2d5435 as well
> as on vanilla 2.6.36.2. Probably this is (better) reproducible for you?
>
> I suspect both (the 84c164a34ffe67908a932a2d641ec1a80c2d5435 crash as
> well as the cat rng_available crash) having something to do with a
> partially uninitialized rng-struct, or better: parts of the rng-struct
> that are free()d too early (i.e. within its lifetime).

Thanks for finding the problem. Obviously, I did not go back far enough in the
record to find the commit that you implicate.

Please show the output of "egrep "B43|RNG|RANDOM" .config".

It should not matter, but please try the attached patch.

Larry


Attachments:
b43_fix_hwrng_not_enabled (1.56 kB)

2010-12-30 01:20:19

by Mario 'BitKoenig' Holbe

[permalink] [raw]
Subject: Re: 2.6.37-rc7: Regression: b43: crashes in hwrng_register()

On Wed, Dec 29, 2010 at 06:30:40PM -0600, Larry Finger wrote:
> On 12/29/2010 01:54 PM, Mario 'BitKoenig' Holbe wrote:
> > I did 2 things:
> > 1. I (manually) reverted 84c164a34ffe67908a932a2d641ec1a80c2d5435 from
> > 2.6.37-rc7: The crash disappears, b43 is useable.
> > 2. I added 84c164a34ffe67908a932a2d641ec1a80c2d5435 to 2.6.36.2: The
> > crash shows up as with vanilla 2.6.37-rc7.
>
> Please show the output of "egrep "B43|RNG|RANDOM" .config".

CONFIG_B43=m
CONFIG_B43_PCI_AUTOSELECT=y
CONFIG_B43_PCICORE_AUTOSELECT=y
CONFIG_B43_PCMCIA=y
CONFIG_B43_SDIO=y
CONFIG_B43_PIO=y
CONFIG_B43_PHY_LP=y
CONFIG_B43_LEDS=y
CONFIG_B43_HWRNG=y
# CONFIG_B43_DEBUG is not set
CONFIG_B43LEGACY=m
CONFIG_B43LEGACY_PCI_AUTOSELECT=y
CONFIG_B43LEGACY_PCICORE_AUTOSELECT=y
CONFIG_B43LEGACY_LEDS=y
CONFIG_B43LEGACY_HWRNG=y
CONFIG_B43LEGACY_DEBUG=y
CONFIG_B43LEGACY_DMA=y
CONFIG_B43LEGACY_PIO=y
CONFIG_B43LEGACY_DMA_AND_PIO_MODE=y
# CONFIG_B43LEGACY_DMA_MODE is not set
# CONFIG_B43LEGACY_PIO_MODE is not set
CONFIG_HW_RANDOM=m
CONFIG_HW_RANDOM_TIMERIOMEM=m
CONFIG_HW_RANDOM_INTEL=m
CONFIG_HW_RANDOM_AMD=m
CONFIG_HW_RANDOM_GEODE=m
CONFIG_HW_RANDOM_VIA=m
CONFIG_HW_RANDOM_VIRTIO=m
CONFIG_SSB_B43_PCI_BRIDGE=y
CONFIG_CRYPTO_RNG=m
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DEV_HIFN_795X_RNG=y

> It should not matter, but please try the attached patch.

It will surely not matter: if CONFIG_B43_HWRNG would not have been
defined, hwrng_register() would not have been reached in the dump from
my first mail.

If you really like me to try that patch, I'll do so when I'm awake again
and will then answer you that nothing has changed :)


Mario
--
It is a capital mistake to theorize before one has data.
Insensibly one begins to twist facts to suit theories instead of theories
to suit facts. -- Sherlock Holmes by Arthur Conan Doyle


Attachments:
(No filename) (1.82 kB)
signature.asc (482.00 B)
Digital signature
Download all attachments

2010-12-30 02:37:14

by Larry Finger

[permalink] [raw]
Subject: Re: 2.6.37-rc7: Regression: b43: crashes in hwrng_register()

On 12/29/2010 07:20 PM, Mario 'BitKoenig' Holbe wrote:
>
> It will surely not matter: if CONFIG_B43_HWRNG would not have been
> defined, hwrng_register() would not have been reached in the dump from
> my first mail.
>
> If you really like me to try that patch, I'll do so when I'm awake again
> and will then answer you that nothing has changed :)

No, don't bother. I do have a different request. The byte counts for my 32-bit
system do not match yours. Could you please use the following command to find
the instructions that are failing?

objdump -l -d drivers/char/hw_random/core.o | less

Use the search to find the start of hwrng_register, then add 0x4c to the
starting address. Once I see hte instruction that is failing, I should be able
to find where the failure occurs.

The order in which things are registered should not cause an error, but who knows?

Larry

2010-12-30 14:34:21

by Mario 'BitKoenig' Holbe

[permalink] [raw]
Subject: Re: 2.6.37-rc7: Regression: b43: crashes in hwrng_register()

On Wed, Dec 29, 2010 at 08:37:10PM -0600, Larry Finger wrote:
> No, don't bother. I do have a different request. The byte counts for my 32-bit
> system do not match yours. Could you please use the following command to find
> the instructions that are failing?
>
> objdump -l -d drivers/char/hw_random/core.o | less
>
> Use the search to find the start of hwrng_register, then add 0x4c to the
> starting address. Once I see hte instruction that is failing, I should be able
> to find where the failure occurs.

Alright, here we go...

[ 30.012695] BUG: unable to handle kernel paging request at 4b28f458
[ 30.012708] IP: [<f90703cc>] hwrng_register+0x4c/0x139 [rng_core]

00000380 <hwrng_register>:
hwrng_register():
/tmp/1/linux-source-2.6.37-rc7/drivers/char/hw_random/core.c:299
380: 56 push %esi
381: 53 push %ebx
...
/tmp/1/linux-source-2.6.37-rc7/drivers/char/hw_random/core.c:312
3c6: 8b 76 1c mov 0x1c(%esi),%esi
3c9: 83 ee 1c sub $0x1c,%esi
prefetch():
/tmp/1/linux-source-2.6.37-rc7/arch/x86/include/asm/processor.h:837
3cc: 8b 46 1c mov 0x1c(%esi),%eax
3cf: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
hwrng_register():
/tmp/1/linux-source-2.6.37-rc7/drivers/char/hw_random/core.c:312
3d3: 81 fe f8 ff ff ff cmp $0xfffffff8,%esi
3d9: 75 d4 jne 3af <hwrng_register+0x2f>
/tmp/1/linux-source-2.6.37-rc7/drivers/char/hw_random/core.c:319

312 list_for_each_entry(tmp, &rng_list, list) {
313 if (strcmp(tmp->name, rng->name) == 0)
314 goto out_unlock;
315 }

This is btw. the same data that is accessed in the cat rng_available
crash via hwrng_attr_available_show():

[ 389.303538] BUG: unable to handle kernel paging request at 288dcb5b
[ 389.303553] IP: [<f8dda34c>] hwrng_attr_available_show+0x5c/0x90 [rng_core]

000002f0 <hwrng_attr_available_show>:
hwrng_attr_available_show():
/tmp/1/linux-source-2.6.37-rc7/drivers/char/hw_random/core.c:236
2f0: 55 push %ebp
...
/tmp/1/linux-source-2.6.37-rc7/drivers/char/hw_random/core.c:245
346: 8b 5b 1c mov 0x1c(%ebx),%ebx
349: 83 eb 1c sub $0x1c,%ebx
prefetch():
/tmp/1/linux-source-2.6.37-rc7/arch/x86/include/asm/processor.h:837
34c: 8b 43 1c mov 0x1c(%ebx),%eax
34f: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
hwrng_attr_available_show():
/tmp/1/linux-source-2.6.37-rc7/drivers/char/hw_random/core.c:245

245 list_for_each_entry(rng, &rng_list, list) {
246 strncat(buf, rng->name, PAGE_SIZE - ret - 1);
247 ret += strlen(rng->name);
248 strncat(buf, " ", PAGE_SIZE - ret - 1);
249 ret++;
250 }


regards
Mario
--
The problem in the world today is communication. Too much communication.
-- Homer J. Simpson


Attachments:
(No filename) (2.92 kB)
signature.asc (482.00 B)
Digital signature
Download all attachments

2010-12-30 18:37:27

by Larry Finger

[permalink] [raw]
Subject: Re: 2.6.37-rc7: Regression: b43: crashes in hwrng_register()

On 12/30/2010 08:34 AM, Mario 'BitKoenig' Holbe wrote:
> On Wed, Dec 29, 2010 at 08:37:10PM -0600, Larry Finger wrote:
>> No, don't bother. I do have a different request. The byte counts for my 32-bit
>> system do not match yours. Could you please use the following command to find
>> the instructions that are failing?
>>
>> objdump -l -d drivers/char/hw_random/core.o | less
>>
>> Use the search to find the start of hwrng_register, then add 0x4c to the
>> starting address. Once I see hte instruction that is failing, I should be able
>> to find where the failure occurs.
>
> Alright, here we go...
>
> [ 30.012695] BUG: unable to handle kernel paging request at 4b28f458
> [ 30.012708] IP: [<f90703cc>] hwrng_register+0x4c/0x139 [rng_core]
>
> 00000380 <hwrng_register>:
> hwrng_register():
> /tmp/1/linux-source-2.6.37-rc7/drivers/char/hw_random/core.c:299
> 380: 56 push %esi
> 381: 53 push %ebx
> ...
> /tmp/1/linux-source-2.6.37-rc7/drivers/char/hw_random/core.c:312
> 3c6: 8b 76 1c mov 0x1c(%esi),%esi
> 3c9: 83 ee 1c sub $0x1c,%esi
> prefetch():
> /tmp/1/linux-source-2.6.37-rc7/arch/x86/include/asm/processor.h:837
> 3cc: 8b 46 1c mov 0x1c(%esi),%eax
> 3cf: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
> hwrng_register():
> /tmp/1/linux-source-2.6.37-rc7/drivers/char/hw_random/core.c:312
> 3d3: 81 fe f8 ff ff ff cmp $0xfffffff8,%esi
> 3d9: 75 d4 jne 3af <hwrng_register+0x2f>
> /tmp/1/linux-source-2.6.37-rc7/drivers/char/hw_random/core.c:319
>
> 312 list_for_each_entry(tmp, &rng_list, list) {
> 313 if (strcmp(tmp->name, rng->name) == 0)
> 314 goto out_unlock;
> 315 }
>
> This is btw. the same data that is accessed in the cat rng_available
> crash via hwrng_attr_available_show():
>
> [ 389.303538] BUG: unable to handle kernel paging request at 288dcb5b
> [ 389.303553] IP: [<f8dda34c>] hwrng_attr_available_show+0x5c/0x90 [rng_core]
>
> 000002f0 <hwrng_attr_available_show>:
> hwrng_attr_available_show():
> /tmp/1/linux-source-2.6.37-rc7/drivers/char/hw_random/core.c:236
> 2f0: 55 push %ebp
> ...
> /tmp/1/linux-source-2.6.37-rc7/drivers/char/hw_random/core.c:245
> 346: 8b 5b 1c mov 0x1c(%ebx),%ebx
> 349: 83 eb 1c sub $0x1c,%ebx
> prefetch():
> /tmp/1/linux-source-2.6.37-rc7/arch/x86/include/asm/processor.h:837
> 34c: 8b 43 1c mov 0x1c(%ebx),%eax
> 34f: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
> hwrng_attr_available_show():
> /tmp/1/linux-source-2.6.37-rc7/drivers/char/hw_random/core.c:245
>
> 245 list_for_each_entry(rng, &rng_list, list) {
> 246 strncat(buf, rng->name, PAGE_SIZE - ret - 1);
> 247 ret += strlen(rng->name);
> 248 strncat(buf, " ", PAGE_SIZE - ret - 1);
> 249 ret++;
> 250 }

The head of the rng_list is damaged. It is initialized at compile time and
should be OK. To help discover the order in which hwrng_register() is called,
apply the attached patch. Run it once with commit 84c164a34ffe67908a installed,
and once with it reverted.

Thanks,

Larry




Attachments:
hwrng_debug (822.00 B)

2010-12-30 20:45:46

by Mario 'BitKoenig' Holbe

[permalink] [raw]
Subject: Re: 2.6.37-rc7: Regression: b43: crashes in hwrng_register()

On Thu, Dec 30, 2010 at 12:37:21PM -0600, Larry Finger wrote:
> The head of the rng_list is damaged. It is initialized at compile time and
> should be OK. To help discover the order in which hwrng_register() is called,
> apply the attached patch. Run it once with commit 84c164a34ffe67908a installed,
> and once with it reverted.

All right, 3 dmesg excerpts attached...
2.6.37-rc7-vanilla.dmesg:
2.6.37-rc7 vanilla (i.e. with 84c164a34ffe67908a), crashing
via-rng is registered first, b43-rng second
2.6.37-rc7-without.dmesg:
2.6.37-rc7 with 84c164a34ffe67908a reverted, not crashing
b43-rng is registered first, via-rng second
2.6.37-rc7-without+modprobe.dmesg:
2.6.37-rc7 with 84c164a34ffe67908a reverted, b43 blacklisted and
manually modprobed after via-rng, crashing
via-rng is registered first, b43-rng second

Seems like the crash shows up when b43-rng is registered second, but not
when via-rng is registered second.
Btw.: `cat rng_available' does also not crash when via-rng is registered
second.


regards
Mario
--
> As Luke Leighton said once on samba-ntdom, "now, what was that about
> rebooting? that was so long ago, i had to look it up with man -k."


Attachments:
(No filename) (0.00 B)
signature.asc (482.00 B)
Digital signature
Download all attachments

2010-12-30 22:49:07

by Larry Finger

[permalink] [raw]
Subject: Re: 2.6.37-rc7: Regression: b43: crashes in hwrng_register()

Added the two listed maintainers for hardware randon-number generators and
dropped the wireless and b43 lists.

Matt and Herbert:

There is a regression in 2.6.37-rcX relative to 2.6.36. The problem shows as the
following kernel BUG:

[ 30.313362] BUG: unable to handle kernel paging request at 60870667
[ 30.313372] IP: [<f8f4e3df>] hwrng_register+0x5f/0x14d [rng_core]
[ 30.313391] *pdpt = 0000000036c34001 *pde = 0000000000000000
[ 30.313403] Oops: 0000 [#1] SMP
[ 30.313411] last sysfs file: /sys/module/bluetooth/initstate
[ 30.313420] Modules linked in: l2cap crc16 parport_pc ppdev lp parport sbs
sbshc power_meter pci_slot hed fan container acpi_cpufreq mperf
cpufreq_conservative cpufreq_userspace cpufreq_stats cpufreq_powersave dm_crypt
fuse loop eeprom via_cputemp i2c_dev nvram padlock_aes aes_i586 aes_generic
padlock_sha sha256_generic sha1_generic via_rng msr cpuid snd_hda_codec_realtek
snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss arc4 snd_pcm ecb
snd_seq_midi snd_rawmidi snd_seq_midi_event b43 snd_seq snd_timer rng_core
uvcvideo video snd_seq_device joydev mac80211 videodev ideapad_laptop output
btusb battery processor bluetooth tpm_tis snd v4l1_compat ac tpm wmi
power_supply cfg80211 soundcore snd_page_alloc tpm_bios rfkill button shpchp
pcspkr i2c_viapro evdev i2c_core psmouse serio_raw pci_hotplug ext3 jbd mbcache
raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy
async_tx raid1 raid0 multipath linear md_mod dm_mirror dm_region_hash dm_log
dm_mod btrfs zlib_deflate crc32c libcrc32c sd_mod crc_t10dif ata_generic
pata_via libata uhci_hcd ssb ehci_hcd tg3 via_sdmmc usbcore scsi_mod pcmcia
thermal mmc_core pcmcia_core libphy thermal_sys nls_base [last unloaded:
scsi_wait_scan]
[ 30.313670]
[ 30.313681] Pid: 1742, comm: NetworkManager Not tainted 2.6.37-rc7-self #3
MoutCook/20021,2959
[ 30.313692] EIP: 0060:[<f8f4e3df>] EFLAGS: 00010216 CPU: 0
[ 30.313706] EIP is at hwrng_register+0x5f/0x14d [rng_core]
[ 30.313715] EAX: 00000001 EBX: f4f13010 ECX: f8f4e589 EDX: f4f13035
[ 30.313725] ESI: 6087064b EDI: 00000000 EBP: 00000036 ESP: f4fe7b54
[ 30.313735] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 30.313745] Process NetworkManager (pid: 1742, ti=f4fe6000 task=f6d2e8a0
task.ti=f4fe6000)
[ 30.313753] Stack:
[ 30.313757] f4f12fc0 f4f13035 f8fab870 f4f13035 0000001f f8fc10bb f8fc09f0
f6dcce20
[ 30.313775] 0000000f f6dcac00 f6dcac00 f6f27400 f4f122c0 f4f10240 f4f12fc8
f8fabd67
[ 30.313793] f8c273da f4f122c0 f6ec0400 f8e9ee48 f6ec0000 f8e8e889 f8e8de7a
00000000
[ 30.313810] Call Trace:
[ 30.313835] [<f8fab870>] ? b43_wireless_core_init+0xd0c/0xdd6 [b43]
[ 30.313863] [<f8fabd67>] ? b43_op_start+0xf8/0x142 [b43]
[ 30.313889] [<f8c273da>] ? cfg80211_netdev_notifier_call+0x342/0x355 [cfg80211]
[ 30.313926] [<f8e8e889>] ? ieee80211_do_open+0xed/0x45f [mac80211]
[ 30.313958] [<f8e8de7a>] ? ieee80211_check_concurrent_iface+0x1c/0x135
[mac80211]
[ 30.313975] [<c1203247>] ? __dev_open+0x7d/0xa7
[ 30.313986] [<c1201c10>] ? __dev_change_flags+0x9a/0x10d
[ 30.313998] [<c120319f>] ? dev_change_flags+0x10/0x3b
[ 30.314011] [<c120d207>] ? do_setlink+0x23e/0x532
[ 30.314026] [<c129ced6>] ? schedule+0x579/0x5b6
[ 30.314037] [<c120d5cb>] ? rtnl_setlink+0xd0/0xe1
[ 30.314052] [<c114f000>] ? clear_user+0x2b/0x43
[ 30.314063] [<c120d4fb>] ? rtnl_setlink+0x0/0xe1
[ 30.314074] [<c120cd32>] ? rtnetlink_rcv_msg+0x186/0x19c
[ 30.314086] [<c120cbac>] ? rtnetlink_rcv_msg+0x0/0x19c
[ 30.314098] [<c121bda8>] ? netlink_rcv_skb+0x2d/0x72
[ 30.314109] [<c120cba6>] ? rtnetlink_rcv+0x18/0x1e
[ 30.314120] [<c121bbfc>] ? netlink_unicast+0xba/0x10e
[ 30.314132] [<c121c700>] ? netlink_sendmsg+0x23d/0x256
[ 30.314145] [<c11f53a6>] ? __sock_sendmsg+0x48/0x4e
[ 30.314155] [<c11f560f>] ? sock_sendmsg+0x78/0x8f
[ 30.314167] [<c11f560f>] ? sock_sendmsg+0x78/0x8f
[ 30.314179] [<c10cf5dd>] ? d_kill+0x38/0x3d
[ 30.314192] [<c11fd48c>] ? verify_iovec+0x3d/0x79
[ 30.314203] [<c11f5e0d>] ? sys_sendmsg+0x15f/0x1c1
[ 30.314214] [<c11f5a44>] ? sockfd_lookup_light+0x13/0x3f
[ 30.314225] [<c11f60a5>] ? sys_sendto+0xfd/0x121
[ 30.314237] [<c10079ee>] ? __switch_to+0x6f/0xe2
[ 30.314250] [<c129ced6>] ? schedule+0x579/0x5b6
[ 30.314261] [<c11f5ca3>] ? sys_recvmsg+0x3c/0x47
[ 30.314272] [<c11f707d>] ? sys_socketcall+0x17f/0x1cb
[ 30.314284] [<c1008b1f>] ? sysenter_do_call+0x12/0x28
[ 30.314292] Code: 34 c8 8b 35 1c e6 f4 f8 59 83 ee 1c eb 1d 8b 13 8b 06 e8 84
06 20 c8 85 c0 75 0a be ef ff ff ff e9 d3 00 00 00 8b 76 1c 83 ee 1c <8b> 46 1c
0f 18 00 90 81 fe 00 e6 f4 f8 75 d4 83 3d 2c e8 f4 f8
[ 30.314376] EIP: [<f8f4e3df>] hwrng_register+0x5f/0x14d [rng_core] SS:ESP
0068:f4fe7b54
[ 30.314395] CR2: 0000000060870667
[ 30.314404] ---[ end trace f498f4a4e1f00415 ]---

Mario's box with this fault has two RNG devices - b43 and the one provided by
via-rng. Experimentation has shown that if b43 is registered first, then there
is no problem; however if via-rng is first, then the above BUG is triggered when
b43 registers its hardware rng. This problem is a regression in that one of the
changes in 2.6.37 has b43 registering its rng later in the startup sequence.

Are you the correct people to contact? If not, who is maintaining via-rng? I did
not find any entries in MAINTAINERS.

Do you see any problems in the code in drivers/net/wireless/b43/main.c or
drivers/char/hw_random/via-rng.c. As the latter seems to make b43 fail, I am
suspecting via-rng, but I have no proof.

Thanks,

Larry



On 12/30/2010 02:45 PM, Mario 'BitKoenig' Holbe wrote:
> On Thu, Dec 30, 2010 at 12:37:21PM -0600, Larry Finger wrote:
>> The head of the rng_list is damaged. It is initialized at compile time and
>> should be OK. To help discover the order in which hwrng_register() is called,
>> apply the attached patch. Run it once with commit 84c164a34ffe67908a installed,
>> and once with it reverted.
>
> All right, 3 dmesg excerpts attached...
> 2.6.37-rc7-vanilla.dmesg:
> 2.6.37-rc7 vanilla (i.e. with 84c164a34ffe67908a), crashing
> via-rng is registered first, b43-rng second
> 2.6.37-rc7-without.dmesg:
> 2.6.37-rc7 with 84c164a34ffe67908a reverted, not crashing
> b43-rng is registered first, via-rng second
> 2.6.37-rc7-without+modprobe.dmesg:
> 2.6.37-rc7 with 84c164a34ffe67908a reverted, b43 blacklisted and
> manually modprobed after via-rng, crashing
> via-rng is registered first, b43-rng second
>
> Seems like the crash shows up when b43-rng is registered second, but not
> when via-rng is registered second.
> Btw.: `cat rng_available' does also not crash when via-rng is registered
> second.
>
>
> regards
> Mario

2010-12-30 23:18:34

by Mario 'BitKoenig' Holbe

[permalink] [raw]
Subject: Re: 2.6.37-rc7: Regression: b43: crashes in hwrng_register()

On Thu, Dec 30, 2010 at 04:49:05PM -0600, Larry Finger wrote:
> Added the two listed maintainers for hardware randon-number generators and
> dropped the wireless and b43 lists.
>
> Matt and Herbert:
>
> There is a regression in 2.6.37-rcX relative to 2.6.36. The problem shows as the
> following kernel BUG:
> [ 30.313362] BUG: unable to handle kernel paging request at 60870667
> [ 30.313372] IP: [<f8f4e3df>] hwrng_register+0x5f/0x14d [rng_core]
>
> Mario's box with this fault has two RNG devices - b43 and the one provided by
> via-rng. Experimentation has shown that if b43 is registered first, then there
> is no problem; however if via-rng is first, then the above BUG is triggered when
> b43 registers its hardware rng. This problem is a regression in that one of the
> changes in 2.6.37 has b43 registering its rng later in the startup sequence.
...
> Do you see any problems in the code in drivers/net/wireless/b43/main.c or
> drivers/char/hw_random/via-rng.c. As the latter seems to make b43 fail, I am
> suspecting via-rng, but I have no proof.

I believe I can confirm the bug does not directly belong to b43:
I created a second via-rng driver (just copied via-rng.c to via-rng2.c
and changed the via_rng.name) and modprobed it. I blacklisted b43 to
keep it out of the game.

Virtually the same crash dump as with b43 shows up when I modprobe
via-rng2 after via-rng is loaded already.

Attached is a dmesg excerpt from a 2.6.37-rc7 kernel built with Larrys
hwrng_debug patch applied (which basically calls dump_stack() in
hwrng_register()).

objdump of hw_random/core.o:

00000380 <hwrng_register>:
hwrng_register():
/tmp/1/linux-source-2.6.37-rc7/drivers/char/hw_random/core.c:299
380: 56 push %esi
...
/tmp/1/linux-source-2.6.37-rc7/drivers/char/hw_random/core.c:315
3d9: 8b 76 1c mov 0x1c(%esi),%esi
3dc: 83 ee 1c sub $0x1c,%esi
prefetch():
/tmp/1/linux-source-2.6.37-rc7/arch/x86/include/asm/processor.h:837
3df: 8b 46 1c mov 0x1c(%esi),%eax
3e2: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
hwrng_register():
/tmp/1/linux-source-2.6.37-rc7/drivers/char/hw_random/core.c:315
3e6: 81 fe f8 ff ff ff cmp $0xfffffff8,%esi

hw_random/core.c:
313 /* Must not register two RNGs with the same name. */
314 err = -EEXIST;
315 list_for_each_entry(tmp, &rng_list, list) {
316 if (strcmp(tmp->name, rng->name) == 0)
317 goto out_unlock;
318 }


Larry: Thanks for your help!


regards
Mario
--
Goethe war nicht gerne Minister. Er beschaeftigte sich lieber geistig.
-- Lukasburger Stilblueten


Attachments:
(No filename) (0.00 B)
signature.asc (482.00 B)
Digital signature
Download all attachments

2010-12-31 00:37:47

by Herbert Xu

[permalink] [raw]
Subject: Re: 2.6.37-rc7: Regression: b43: crashes in hwrng_register()

On Thu, Dec 30, 2010 at 04:49:05PM -0600, Larry Finger wrote:
>
> Do you see any problems in the code in drivers/net/wireless/b43/main.c or
> drivers/char/hw_random/via-rng.c. As the latter seems to make b43 fail, I am
> suspecting via-rng, but I have no proof.

My suspicion is that VIA's xstore is writing more than 4 bytes as
the list pointer happens to lie immediately after rng->priv which
is where xstore is writing to.

Harald, do you know whether this is documented or is this a known
errata item?

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2010-12-31 00:46:33

by Larry Finger

[permalink] [raw]
Subject: Re: 2.6.37-rc7: Regression: b43: crashes in hwrng_register()

On 12/30/2010 06:37 PM, Herbert Xu wrote:
> On Thu, Dec 30, 2010 at 04:49:05PM -0600, Larry Finger wrote:
>>
>> Do you see any problems in the code in drivers/net/wireless/b43/main.c or
>> drivers/char/hw_random/via-rng.c. As the latter seems to make b43 fail, I am
>> suspecting via-rng, but I have no proof.
>
> My suspicion is that VIA's xstore is writing more than 4 bytes as
> the list pointer happens to lie immediately after rng->priv which
> is where xstore is writing to.
>
> Harald, do you know whether this is documented or is this a known
> errata item?

The following patch should be able to test if xstore is overwriting the list
pointer.

Larry
---

Index: wireless-testing/include/linux/hw_random.h
===================================================================
--- wireless-testing.orig/include/linux/hw_random.h
+++ wireless-testing/include/linux/hw_random.h
@@ -38,6 +38,7 @@ struct hwrng {
int (*data_read)(struct hwrng *rng, u32 *data);
int (*read)(struct hwrng *rng, void *data, size_t max, bool wait);
unsigned long priv;
+ char junk[12];

/* internal. */
struct list_head list;



Attachments:
hwrng_debug (454.00 B)

2010-12-31 02:25:11

by Larry Finger

[permalink] [raw]
Subject: Re: 2.6.37-rc7: Regression: b43: crashes in hwrng_register()

On 12/30/2010 07:57 PM, Michael Büsch wrote:
> On Thu, 2010-12-30 at 21:45 +0100, Mario 'BitKoenig' Holbe wrote:
>> On Thu, Dec 30, 2010 at 12:37:21PM -0600, Larry Finger wrote:
>>> The head of the rng_list is damaged. It is initialized at compile time and
>>> should be OK. To help discover the order in which hwrng_register() is called,
>>> apply the attached patch. Run it once with commit 84c164a34ffe67908a installed,
>>> and once with it reverted.
>>
>> All right, 3 dmesg excerpts attached...
>> 2.6.37-rc7-vanilla.dmesg:
>> 2.6.37-rc7 vanilla (i.e. with 84c164a34ffe67908a), crashing
>> via-rng is registered first, b43-rng second
>> 2.6.37-rc7-without.dmesg:
>> 2.6.37-rc7 with 84c164a34ffe67908a reverted, not crashing
>> b43-rng is registered first, via-rng second
>> 2.6.37-rc7-without+modprobe.dmesg:
>> 2.6.37-rc7 with 84c164a34ffe67908a reverted, b43 blacklisted and
>> manually modprobed after via-rng, crashing
>> via-rng is registered first, b43-rng second
>>
>> Seems like the crash shows up when b43-rng is registered second, but not
>> when via-rng is registered second.
>> Btw.: `cat rng_available' does also not crash when via-rng is registered
>> second.
>
>
> I suspect that there is some "hw_random.h" header version mixup is going
> on here. The layout of struct hwrng was changed recently.
>
> Your crash seems to happen on the list head embedded in struct hwrng.
>
> Please make sure that your build environment is clean and you're not
> using any external stuff such as compat-wireless. All of hwrng-core,
> rng-via and b43 must be compiled against the same hw_random.h.

AFAIK, he is building with the mainline 2.6.37-rc7/8 tree from Linus, thus the
build should be clean, but thanks for the heads-up.

In an Email from Herbert Xu that did not go to the wireless or b43 lists, it is
suspected that the xstore command on a VIA CPU might generate more than 4 bytes
of output and clobber the list header. We now also know that a second copy of
via-rng will also fail, thus b43 is cleared.

Larry

2010-12-31 02:26:59

by Michael Büsch

[permalink] [raw]
Subject: Re: 2.6.37-rc7: Regression: b43: crashes in hwrng_register()

On Thu, 2010-12-30 at 21:45 +0100, Mario 'BitKoenig' Holbe wrote:
> On Thu, Dec 30, 2010 at 12:37:21PM -0600, Larry Finger wrote:
> > The head of the rng_list is damaged. It is initialized at compile time and
> > should be OK. To help discover the order in which hwrng_register() is called,
> > apply the attached patch. Run it once with commit 84c164a34ffe67908a installed,
> > and once with it reverted.
>
> All right, 3 dmesg excerpts attached...
> 2.6.37-rc7-vanilla.dmesg:
> 2.6.37-rc7 vanilla (i.e. with 84c164a34ffe67908a), crashing
> via-rng is registered first, b43-rng second
> 2.6.37-rc7-without.dmesg:
> 2.6.37-rc7 with 84c164a34ffe67908a reverted, not crashing
> b43-rng is registered first, via-rng second
> 2.6.37-rc7-without+modprobe.dmesg:
> 2.6.37-rc7 with 84c164a34ffe67908a reverted, b43 blacklisted and
> manually modprobed after via-rng, crashing
> via-rng is registered first, b43-rng second
>
> Seems like the crash shows up when b43-rng is registered second, but not
> when via-rng is registered second.
> Btw.: `cat rng_available' does also not crash when via-rng is registered
> second.


I suspect that there is some "hw_random.h" header version mixup is going
on here. The layout of struct hwrng was changed recently.

Your crash seems to happen on the list head embedded in struct hwrng.

Please make sure that your build environment is clean and you're not
using any external stuff such as compat-wireless. All of hwrng-core,
rng-via and b43 must be compiled against the same hw_random.h.

--
Greetings Michael.

2010-12-31 02:29:24

by Mario 'BitKoenig' Holbe

[permalink] [raw]
Subject: Re: 2.6.37-rc7: Regression: b43: crashes in hwrng_register()

On Thu, Dec 30, 2010 at 06:46:31PM -0600, Larry Finger wrote:
> On 12/30/2010 06:37 PM, Herbert Xu wrote:
> > My suspicion is that VIA's xstore is writing more than 4 bytes as
> > the list pointer happens to lie immediately after rng->priv which
> > is where xstore is writing to.
> >
> > Harald, do you know whether this is documented or is this a known
> > errata item?
>
> The following patch should be able to test if xstore is overwriting the list
> pointer.

Confirmed. No crashes with the junk buffer in action.
I applied both patches (dump_stack() in hwrng_register() and junk[]
after priv data) to vanilla 2.6.37-rc7 and tested both: via-rng and my
via+rng2 as well as via-rng and b43-rng - no crashes. The (previously
also crashing) `cat rng_available' does survive as well:

$ cat /sys/devices/virtual/misc/hw_random/rng_available
via b43_phy0 via2
$

Attached 2 dmesg excerpts.


regards & g'nite
Mario
--
Tower: "Say fuelstate." Pilot: "Fuelstate."
Tower: "Say again." Pilot: "Again."
Tower: "Arghl, give me your fuel!" Pilot: "Sorry, need it by myself..."


Attachments:
(No filename) (0.00 B)
signature.asc (482.00 B)
Digital signature
Download all attachments

2010-12-31 02:47:05

by Herbert Xu

[permalink] [raw]
Subject: Re: 2.6.37-rc7: Regression: b43: crashes in hwrng_register()

On Fri, Dec 31, 2010 at 03:25:51AM +0100, Mario 'BitKoenig' Holbe wrote:
>
> Confirmed. No crashes with the junk buffer in action.
> I applied both patches (dump_stack() in hwrng_register() and junk[]
> after priv data) to vanilla 2.6.37-rc7 and tested both: via-rng and my
> via+rng2 as well as via-rng and b43-rng - no crashes. The (previously
> also crashing) `cat rng_available' does survive as well:
>
> $ cat /sys/devices/virtual/misc/hw_random/rng_available
> via b43_phy0 via2
> $
>
> Attached 2 dmesg excerpts.

Thanks for checking. Can you provide the output of

cat /proc/cpuinfo

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2010-12-31 08:51:27

by Mario 'BitKoenig' Holbe

[permalink] [raw]
Subject: Re: 2.6.37-rc7: Regression: b43: crashes in hwrng_register()

On Fri, Dec 31, 2010 at 01:46:53PM +1100, Herbert Xu wrote:
> Thanks for checking. Can you provide the output of
> cat /proc/cpuinfo

attached.


Mario
--
The only thing to be scared of, son, is tomorrow.
I don't live for tomorrow. Never saw the fun in it.
-- Denny Crane, Boston Legal


Attachments:
(No filename) (0.00 B)
signature.asc (482.00 B)
Digital signature
Download all attachments