2009-06-14 19:09:45

by Joerg Platte

[permalink] [raw]
Subject: [2.6.30] Kernel bug with dock driver

Hi,

today I ran into the following bug several times (Kernel 2.6.30, IBM ThinkPad
T40 with ultrabay HD and CD/DVD):

BUG: unable to handle kernel paging request at 00001000
IP: [<c01df5fa>] strcpy+0xe/0x1b
*pde = 00000000
Oops: 0000 [#1] PREEMPT
last sysfs file: /sys/class/power_supply/BAT0/energy_full
Modules linked in: nls_iso8859_1 nls_cp850 vfat fat usb_storage usb_libusual
udf crc_itu_t sg sr_mod cdrom nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs
radeon drm sco bridge stp llc bnep l2cap bluetooth ipt_MASQUERADE
iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT
ipt_LOG xt_limit xt_tcpudp xt_mac xt_multiport iptable_filter iptable_mangle
ip_tables x_tables nf_conntrack_ftp nf_conntrack vboxdrv binfmt_misc
af_packet cpufreq_userspace cpufreq_stats cpufreq_powersave nsc_ircc autofs4
fuse nls_utf8 ntfs nls_base ext2 deadline_iosched as_iosched ircomm_tty
ircomm tun acpi_cpufreq sbs sbshc snd_intel8x0m snd_intel8x0 snd_ac97_codec
ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_oss snd_seq_midi
snd_rawmidi snd_seq_midi_event joydev snd_seq snd_timer snd_seq_device
dvb_usb_cinergyT2 thinkpad_acpi irtty_sir dvb_usb yenta_socket
rsrc_nonstatic ipw2200 pcmcia libipw sir_dev rfkill snd video led_class
lib80211 i2c_i801 rng_core soundcore pcmcia_core snd_page_alloc 8250_pci
dvb_core irda parport_pc parport crc_ccitt psmouse pcspkr nvram output
battery ac button processor rtc_cmos 8250_pnp serio_raw rtc_core rtc_lib
8250 serial_core evdev ext3 jbd mbcache usbhid hid sd_mod ata_generic
pata_acpi ata_piix uhci_hcd ehci_hcd libata e1000 usbcore scsi_mod intel_agp
agpgart thermal fan unix cpufreq_conservative cpufreq_ondemand freq_table
radeonfb fb_ddc backlight i2c_algo_bit cfbcopyarea i2c_core cfbimgblt
cfbfillrect fbcon tileblit font bitblit softcursor fb

Pid: 52, comm: kacpi_notify Not tainted (2.6.30 #1) 2373G1G
EIP: 0060:[<c01df5fa>] EFLAGS: 00010286 CPU: 0
EIP is at strcpy+0xe/0x1b
EAX: c278a22c EBX: c278a200 ECX: c278a22c EDX: 00001000
ESI: 00001000 EDI: c278a22c EBP: f70a8f34 ESP: f70a8f28
DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process kacpi_notify (pid: 52, ti=f70a8000 task=f70c0c40 task.ti=f70a8000)
Stack:
c278a200 c278a214 c278a244 f70a8f64 c01fc898 030a8f54 00001000 f708a879
00000002 f6b3ca80 c278a218 00000014 f97fb132 f0a07a80 00000000 f70a8f70
f97fb14e 00000000 f70a8f7c f97fb023 f7041460 f70a8f8c c020199d f6984ac0
Call Trace:
[<c01fc898>] ? acpi_bus_generate_netlink_event+0x140/0x199
[<f97fb132>] ? bay_notify+0x0/0x1f [thinkpad_acpi]
[<f97fb14e>] ? bay_notify+0x1c/0x1f [thinkpad_acpi]
[<f97fb023>] ? dispatch_acpi_notify+0x23/0x26 [thinkpad_acpi]
[<c020199d>] ? acpi_ev_notify_dispatch+0x4c/0x57
[<c01f4558>] ? acpi_os_execute_deferred+0x20/0x2c
[<c012cff6>] ? worker_thread+0x15a/0x1fd
[<c01f4538>] ? acpi_os_execute_deferred+0x0/0x2c
[<c012fc7d>] ? autoremove_wake_function+0x0/0x33
[<c012ce9c>] ? worker_thread+0x0/0x1fd
[<c012f8bc>] ? kthread+0x42/0x67
[<c012f87a>] ? kthread+0x0/0x67
[<c01030d3>] ? kernel_thread_helper+0x7/0x10
Code: ff ff 21 e3 8b 5b 18 83 eb 07 39 d9 73 08 89 01 89 51 04 31 c0 c3 b8
f2 ff ff ff c3 90 55 89 c1 89 e5 57 89 c7 56 89 d6 83 ec 04 <ac> aa 84 c0 75
fa 5a 89 c8 5e 5f 5d c3 55 89 e5 57 89 c7 56 89
EIP: [<c01df5fa>] strcpy+0xe/0x1b SS:ESP 0068:f70a8f28
CR2: 0000000000001000
---[ end trace fe729b61cde0024d ]---

The bug message appears with a high probability - but not always - when I
release the eject lever to exchange the drive in the ultrabay slot. I'm not
sure if this bug is a regression of 2.6.30, because I did not change the drive
in the ultrabay for several months. Maybe it was already in 2.6.29 or before.
How can I help to resolve this bug?

regards,
Jörg


Subject: Re: [2.6.30] Kernel bug with dock driver

On Sun, 14 Jun 2009, Joerg Platte wrote:
> The bug message appears with a high probability - but not always -
> when I release the eject lever to exchange the drive in the ultrabay
> slot. I'm not sure if this bug is a regression of 2.6.30, because I
> did not change the drive in the ultrabay for several months. Maybe it
> was already in 2.6.29 or before. How can I help to resolve this bug?

Do it in single user mode with udev helpers *killed*, but just release
the lever, _don't_ pull the drive off the bay, unless you do verify that
the bay LED went off.

***NEVER*** remove an Ultrabay device with the Ultrabay LED still lit,
especially if it is a PATA device.

If the bug cannot be reproduced in that condition, it is happening
because userspace is trapping the lever release event (as it should),
and doing something the kernel doesn't like with it (which is a bug
in the kernel in any case). That might help you track down a better
way to reproduce the issue.

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

2009-06-15 07:36:52

by Joerg Platte

[permalink] [raw]
Subject: Re: [2.6.30] Kernel bug with dock driver

Am Monday, 15. June 2009 schrieb Henrique de Moraes Holschuh:
> Do it in single user mode with udev helpers *killed*, but just release

Today I tried to reproduce the bug in runlevel 2 (Debian based system) without
success, even after a suspend to disk. In the past this was one method to
trigger bugs with the bay.

> the lever, _don't_ pull the drive off the bay, unless you do verify that
> the bay LED went off.
>
> ***NEVER*** remove an Ultrabay device with the Ultrabay LED still lit,
> especially if it is a PATA device.

I did this accidentally some times before and the bay is still working. But
I'll try to not to do it again :)

> If the bug cannot be reproduced in that condition, it is happening
> because userspace is trapping the lever release event (as it should),
> and doing something the kernel doesn't like with it (which is a bug
> in the kernel in any case). That might help you track down a better
> way to reproduce the issue.

Userspace scripts are written by me (based on some information from
thinkwiki).
The udev rule looks like this:
ATTR{type}=="ata_bay", ATTR{docked}=="1", KERNEL=="dock.*", ACTION=="change",
SUBSYSTEM=="platform", RUN+="/usr/local/sbin/ultrabay_open"

and ultrabay_open contains the following lines of shell code:
#!/bin/bash
ULTRABAY_SYSDIR='/sys/class/scsi_device/1:0:0:0/device'
shopt -s nullglob

if [ -d $ULTRABAY_SYSDIR ]; then
sync
echo 1 > $ULTRABAY_SYSDIR/delete
logger "removed $ULTRABAY_SYSDIR"
fi
sync
# Turn off power to the UltraBay:
if [ -f /sys/devices/platform/dock.2/undock ]; then
echo 1 > /sys/devices/platform/dock.2/undock
logger "/sys/devices/platform/dock.2/undock"
else
echo eject > /proc/acpi/ibm/bay
fi
# Tell the user we're OK
logger "/proc/acpi/ibm/beep"
echo 12 > /proc/acpi/ibm/beep

What I discovered today after modifying ultrabay_open was that in some cases
the script is called again directly after inserting a drive to the bay. But
only, if the execution of this script takes some more time (I added a sleep 2
after the second sync to be able to see which command might trigger the bug).

So maybe yesterday this script was called twice and the kernel does not like
to undock a device while it is in the process of being undocked? How should I
modify the udev rule to prevent another execution each time a drive is
inserted into the bay?

regards,
J?rg

Subject: Re: [2.6.30] Kernel bug with dock driver

On Mon, 15 Jun 2009, Joerg Platte wrote:
> > If the bug cannot be reproduced in that condition, it is happening
> > because userspace is trapping the lever release event (as it should),
> > and doing something the kernel doesn't like with it (which is a bug
> > in the kernel in any case). That might help you track down a better
> > way to reproduce the issue.
>
> if [ -d $ULTRABAY_SYSDIR ]; then
> sync
> echo 1 > $ULTRABAY_SYSDIR/delete
> logger "removed $ULTRABAY_SYSDIR"
> fi
> sync
> # Turn off power to the UltraBay:
> if [ -f /sys/devices/platform/dock.2/undock ]; then
> echo 1 > /sys/devices/platform/dock.2/undock
> logger "/sys/devices/platform/dock.2/undock"

There might be a race there, as you call undock but you don't really know if
the SCSI device was deleted.

> What I discovered today after modifying ultrabay_open was that in some cases
> the script is called again directly after inserting a drive to the bay. But
> only, if the execution of this script takes some more time (I added a sleep 2
> after the second sync to be able to see which command might trigger the bug).

So, please try to reproduce this by sending a number of delete requests
back-to-back in a row, and also mixing delete requests with undock requests.
When you find out what causes the OOPS (and it _is_ a bug in the kernel if
any of those oops), we can try to direct the bug report to someone who can
fix the problem.

> to undock a device while it is in the process of being undocked? How should I
> modify the udev rule to prevent another execution each time a drive is
> inserted into the bay?

You will need locking, unfortunately. Also, check if there is a device in
the bay. event+device in bay == hotunplug. event+no device in bay ==
hotplug.

That is, if the events don't already tell you (i.e. different events for
plug and unplug). I don't recall right now if the ACPI events are
different, but I do recall the thinkpad BIOS follows the ACPI spec correctly
on this area.

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

2009-06-16 09:25:18

by Joerg Platte

[permalink] [raw]
Subject: Re: [2.6.30] Kernel bug with dock driver

Am Monday, 15. June 2009 schrieb Henrique de Moraes Holschuh:
> There might be a race there, as you call undock but you don't really know
> if the SCSI device was deleted.

I tried to undock a device without previously disabling it. Here's the result
after inserting it again:

ata2.00: disabled
ACPI: \_SB_.PCI0.IDE0.SCND.MSTR - undocking
ata2.00: detaching (SCSI 1:0:0:0)
sd 1:0:0:0: [sdb] Synchronizing SCSI cache
sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 1:0:0:0: [sdb] Stopping disk
sd 1:0:0:0: [sdb] START_STOP FAILED
sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<c01df5fa>] strcpy+0xe/0x1b
*pde = 00000000
Oops: 0000 [#1] PREEMPT
last sysfs file: /sys/devices/platform/dock.2/docked
Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs radeon drm
sco bridge stp llc bnep l2cap bluetooth ipt_MASQUERADE iptable_nat nf_nat
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT ipt_LOG xt_limit
xt_tcpudp xt_mac xt_multiport iptable_filter iptable_mangle ip_tables x_tables
nf_conntrack_ftp nf_conntrack vboxdrv binfmt_misc aes_i586 cryptomgr aead
pcompress crypto_blkcipher crypto_hash aes_generic crypto_algapi
lib80211_crypt_ccmp af_packet cpufreq_userspace cpufreq_stats
cpufreq_powersave autofs4 nsc_ircc fuse nls_utf8 ntfs nls_base ext2
deadline_iosched as_iosched ircomm_tty ircomm tun acpi_cpufreq sbs sbshc
joydev snd_intel8x0 snd_intel8x0m snd_seq_oss snd_ac97_codec snd_seq_midi
ac97_bus snd_rawmidi snd_pcm_oss snd_seq_midi_event snd_mixer_oss irtty_sir
snd_pcm snd_seq dvb_usb_cinergyT2 yenta_socket sir_dev snd_seq_device
rsrc_nonstatic thinkpad_acpi dvb_usb pcmcia rfkill ipw2200 snd_timer dvb_core
led_class i2c_i801 rng_core rtc_cmos pcmcia_core video ac libipw parport_pc
8250_pci parport psmouse 8250_pnp snd lib80211 output soundcore rtc_core nvram
serio_raw rtc_lib button snd_page_alloc pcspkr 8250 battery processor irda
serial_core crc_ccitt evdev ext3 jbd mbcache usbhid hid sd_mod ata_generic
pata_acpi ata_piix uhci_hcd ehci_hcd libata e1000 usbcore scsi_mod intel_agp
agpgart thermal fan unix cpufreq_conservative cpufreq_ondemand freq_table
radeonfb fb_ddc backlight i2c_algo_bit cfbcopyarea i2c_core cfbimgblt
cfbfillrect fbcon tileblit font bitblit softcursor fb

Pid: 52, comm: kacpi_notify Not tainted (2.6.30 #1) 2373G1G
EIP: 0060:[<c01df5fa>] EFLAGS: 00010286 CPU: 0
EIP is at strcpy+0xe/0x1b
EAX: f302482c EBX: f3024800 ECX: f302482c EDX: 00000000
ESI: 00000000 EDI: f302482c EBP: f70a4f34 ESP: f70a4f28
DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process kacpi_notify (pid: 52, ti=f70a4000 task=f704c980 task.ti=f70a4000)
Stack:
f3024800 f3024814 f3024844 f70a4f64 c01fc898 010a4f54 00000000 f70c2879
00000004 f30e83c0 f3024818 00000014 f97c8132 f69b2600 00000000 f70a4f70
f97c814e 00000000 f70a4f7c f97c8023 f7070460 f70a4f8c c020199d f4e37ee0
Call Trace:
[<c01fc898>] ? acpi_bus_generate_netlink_event+0x140/0x199
[<f97c8132>] ? bay_notify+0x0/0x1f [thinkpad_acpi]
[<f97c814e>] ? bay_notify+0x1c/0x1f [thinkpad_acpi]
[<f97c8023>] ? dispatch_acpi_notify+0x23/0x26 [thinkpad_acpi]
[<c020199d>] ? acpi_ev_notify_dispatch+0x4c/0x57
[<c01f4558>] ? acpi_os_execute_deferred+0x20/0x2c
[<c012cff6>] ? worker_thread+0x15a/0x1fd
[<c01f4538>] ? acpi_os_execute_deferred+0x0/0x2c
[<c012fc7d>] ? autoremove_wake_function+0x0/0x33
[<c012ce9c>] ? worker_thread+0x0/0x1fd
[<c012f8bc>] ? kthread+0x42/0x67
[<c012f87a>] ? kthread+0x0/0x67
[<c01030d3>] ? kernel_thread_helper+0x7/0x10
Code: ff ff 21 e3 8b 5b 18 83 eb 07 39 d9 73 08 89 01 89 51 04 31 c0 c3 b8 f2
ff ff ff c3 90 55 89 c1 89 e5 57 89 c7 56 89 d6 83 ec 04 <ac> aa 84 c0 75 fa
5a 89 c8 5e 5f 5d c3 55 89 e5 57 89 c7 56 89
EIP: [<c01df5fa>] strcpy+0xe/0x1b SS:ESP 0068:f70a4f28
CR2: 0000000000000000
---[ end trace 798a63a30da95ce2 ]---

> So, please try to reproduce this by sending a number of delete requests
> back-to-back in a row, and also mixing delete requests with undock
> requests. When you find out what causes the OOPS (and it _is_ a bug in the
> kernel if any of those oops), we can try to direct the bug report to
> someone who can fix the problem.
>
> > to undock a device while it is in the process of being undocked? How
> > should I modify the udev rule to prevent another execution each time a
> > drive is inserted into the bay?
>
> You will need locking, unfortunately. Also, check if there is a device in
> the bay. event+device in bay == hotunplug. event+no device in bay ==
> hotplug.
>
> That is, if the events don't already tell you (i.e. different events for
> plug and unplug). I don't recall right now if the ACPI events are
> different, but I do recall the thinkpad BIOS follows the ACPI spec
> correctly on this area.

Since I was not able to reproduce the initial bug yesterday I remembered that
I deinstalled acpid because of Debian Bug#522756 (acpid terminates each time I
unplug the mouse). After installing acpid again (and disabling the udev rule)
the Bug appeared again. However, in my acpid config I call a script for both
events:

event=ibm/bay device:0f 00000001 00000000
action=/usr/local/sbin/ultrabay_close

This is the content of /usr/local/sbin/ultrabay_close
echo 12 > /proc/acpi/ibm/beep
sync
echo 0 0 0 > /sys/class/scsi_host/host1/scan

event=ibm/bay device:0f 00000003 00000000
action=/usr/local/sbin/ultrabay_open

ultrabay_open is identical to the script posted in my last mail. Maybe there's
a problem with the scan command, because the kernel tries to scan for devices
automatically.

It looks like there are still some bugs in the dock handling when using it
improperly :)

Best regards,
J?rg

2009-06-17 07:12:44

by Zhang, Rui

[permalink] [raw]
Subject: Re: [2.6.30] Kernel bug with dock driver

this seems like an thinkpad_acpi driver problem.
cc henrique.

jplatte,
it would be great if you can verify if this is a regression or not.
And please file a new bug report at
http://bugzilla.kernel.org/enter_bug.cgi?product=ACPI
attach the full dmesg after the bug and the acpidump output there.

thanks,
rui



On Mon, 2009-06-15 at 03:00 +0800, Joerg Platte wrote:
> Hi,
>
> today I ran into the following bug several times (Kernel 2.6.30, IBM ThinkPad
> T40 with ultrabay HD and CD/DVD):
>
> BUG: unable to handle kernel paging request at 00001000
> IP: [<c01df5fa>] strcpy+0xe/0x1b
> *pde = 00000000
> Oops: 0000 [#1] PREEMPT
> last sysfs file: /sys/class/power_supply/BAT0/energy_full
> Modules linked in: nls_iso8859_1 nls_cp850 vfat fat usb_storage usb_libusual
> udf crc_itu_t sg sr_mod cdrom nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs
> radeon drm sco bridge stp llc bnep l2cap bluetooth ipt_MASQUERADE
> iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT
> ipt_LOG xt_limit xt_tcpudp xt_mac xt_multiport iptable_filter iptable_mangle
> ip_tables x_tables nf_conntrack_ftp nf_conntrack vboxdrv binfmt_misc
> af_packet cpufreq_userspace cpufreq_stats cpufreq_powersave nsc_ircc autofs4
> fuse nls_utf8 ntfs nls_base ext2 deadline_iosched as_iosched ircomm_tty
> ircomm tun acpi_cpufreq sbs sbshc snd_intel8x0m snd_intel8x0 snd_ac97_codec
> ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_oss snd_seq_midi
> snd_rawmidi snd_seq_midi_event joydev snd_seq snd_timer snd_seq_device
> dvb_usb_cinergyT2 thinkpad_acpi irtty_sir dvb_usb yenta_socket
> rsrc_nonstatic ipw2200 pcmcia libipw sir_dev rfkill snd video led_class
> lib80211 i2c_i801 rng_core soundcore pcmcia_core snd_page_alloc 8250_pci
> dvb_core irda parport_pc parport crc_ccitt psmouse pcspkr nvram output
> battery ac button processor rtc_cmos 8250_pnp serio_raw rtc_core rtc_lib
> 8250 serial_core evdev ext3 jbd mbcache usbhid hid sd_mod ata_generic
> pata_acpi ata_piix uhci_hcd ehci_hcd libata e1000 usbcore scsi_mod intel_agp
> agpgart thermal fan unix cpufreq_conservative cpufreq_ondemand freq_table
> radeonfb fb_ddc backlight i2c_algo_bit cfbcopyarea i2c_core cfbimgblt
> cfbfillrect fbcon tileblit font bitblit softcursor fb
>
> Pid: 52, comm: kacpi_notify Not tainted (2.6.30 #1) 2373G1G
> EIP: 0060:[<c01df5fa>] EFLAGS: 00010286 CPU: 0
> EIP is at strcpy+0xe/0x1b
> EAX: c278a22c EBX: c278a200 ECX: c278a22c EDX: 00001000
> ESI: 00001000 EDI: c278a22c EBP: f70a8f34 ESP: f70a8f28
> DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> Process kacpi_notify (pid: 52, ti=f70a8000 task=f70c0c40 task.ti=f70a8000)
> Stack:
> c278a200 c278a214 c278a244 f70a8f64 c01fc898 030a8f54 00001000 f708a879
> 00000002 f6b3ca80 c278a218 00000014 f97fb132 f0a07a80 00000000 f70a8f70
> f97fb14e 00000000 f70a8f7c f97fb023 f7041460 f70a8f8c c020199d f6984ac0
> Call Trace:
> [<c01fc898>] ? acpi_bus_generate_netlink_event+0x140/0x199
> [<f97fb132>] ? bay_notify+0x0/0x1f [thinkpad_acpi]
> [<f97fb14e>] ? bay_notify+0x1c/0x1f [thinkpad_acpi]
> [<f97fb023>] ? dispatch_acpi_notify+0x23/0x26 [thinkpad_acpi]
> [<c020199d>] ? acpi_ev_notify_dispatch+0x4c/0x57
> [<c01f4558>] ? acpi_os_execute_deferred+0x20/0x2c
> [<c012cff6>] ? worker_thread+0x15a/0x1fd
> [<c01f4538>] ? acpi_os_execute_deferred+0x0/0x2c
> [<c012fc7d>] ? autoremove_wake_function+0x0/0x33
> [<c012ce9c>] ? worker_thread+0x0/0x1fd
> [<c012f8bc>] ? kthread+0x42/0x67
> [<c012f87a>] ? kthread+0x0/0x67
> [<c01030d3>] ? kernel_thread_helper+0x7/0x10
> Code: ff ff 21 e3 8b 5b 18 83 eb 07 39 d9 73 08 89 01 89 51 04 31 c0 c3 b8
> f2 ff ff ff c3 90 55 89 c1 89 e5 57 89 c7 56 89 d6 83 ec 04 <ac> aa 84 c0 75
> fa 5a 89 c8 5e 5f 5d c3 55 89 e5 57 89 c7 56 89
> EIP: [<c01df5fa>] strcpy+0xe/0x1b SS:ESP 0068:f70a8f28
> CR2: 0000000000001000
> ---[ end trace fe729b61cde0024d ]---
>
> The bug message appears with a high probability - but not always - when I
> release the eject lever to exchange the drive in the ultrabay slot. I'm not
> sure if this bug is a regression of 2.6.30, because I did not change the drive
> in the ultrabay for several months. Maybe it was already in 2.6.29 or before.
> How can I help to resolve this bug?
>
> regards,
> Jörg
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

Subject: Re: [2.6.30] Kernel bug with dock driver

On Tue, 16 Jun 2009, Joerg Platte wrote:
> Pid: 52, comm: kacpi_notify Not tainted (2.6.30 #1) 2373G1G
> EIP: 0060:[<c01df5fa>] EFLAGS: 00010286 CPU: 0
> EIP is at strcpy+0xe/0x1b
> EAX: f302482c EBX: f3024800 ECX: f302482c EDX: 00000000
> ESI: 00000000 EDI: f302482c EBP: f70a4f34 ESP: f70a4f28
> DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> Process kacpi_notify (pid: 52, ti=f70a4000 task=f704c980 task.ti=f70a4000)
> Stack:
> f3024800 f3024814 f3024844 f70a4f64 c01fc898 010a4f54 00000000 f70c2879
> 00000004 f30e83c0 f3024818 00000014 f97c8132 f69b2600 00000000 f70a4f70
> f97c814e 00000000 f70a4f7c f97c8023 f7070460 f70a4f8c c020199d f4e37ee0
> Call Trace:
> [<c01fc898>] ? acpi_bus_generate_netlink_event+0x140/0x199
> [<f97c8132>] ? bay_notify+0x0/0x1f [thinkpad_acpi]

Bay notify does this:

static void bay_notify(struct ibm_struct *ibm, u32 event)
{
acpi_bus_generate_proc_event(ibm->acpi->device, event, 0);
acpi_bus_generate_netlink_event(ibm->acpi->device->pnp.device_class,
dev_name(&ibm->acpi->device->dev),
event, 0);
}

If it causes a NULL derreference, it means someone has changed API and did
NOT change all callsites. It is that simple.

Joerg, please apply this patch, and report the results... WARNING: you will
not get bay events from thinkpad-acpi, so be careful to not remove from the
bay a device that is still active...


diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c
index 912be65..03e4d16 100644
--- a/drivers/platform/x86/thinkpad_acpi.c
+++ b/drivers/platform/x86/thinkpad_acpi.c
@@ -4556,6 +4556,14 @@ static int __init bay_init(struct ibm_init_struct *iibm)

static void bay_notify(struct ibm_struct *ibm, u32 event)
{
+ if (WARN(!ibm, TPACPI_ERR "ibm is NULL\n"))
+ return;
+ if (WARN(!ibm->acpi->device, TPACPI_ERR "ibm->acpi->device is NULL\n"))
+ return;
+ if (WARN(!ibm->acpi->device->pnp.device_class, TPACPI_ERR "ibm->acpi->device->pnp.device_class is NULL\n"))
+ return;
+ if (WARN(!dev_name(&ibm->acpi->device->dev), TPACPI_ERR "dev_name() is returning NULL\n"))
+ return;
acpi_bus_generate_proc_event(ibm->acpi->device, event, 0);
acpi_bus_generate_netlink_event(ibm->acpi->device->pnp.device_class,
dev_name(&ibm->acpi->device->dev),


--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

2009-06-18 21:23:11

by Joerg Platte

[permalink] [raw]
Subject: Re: [2.6.30] Kernel bug with dock driver

Am Wednesday, 17. June 2009 schrieb Henrique de Moraes Holschuh:
Hi Henrique,

> If it causes a NULL derreference, it means someone has changed API and did
> NOT change all callsites. It is that simple.
>
> Joerg, please apply this patch, and report the results... WARNING: you
> will not get bay events from thinkpad-acpi, so be careful to not remove
> from the bay a device that is still active...

This time I got two BUGs while being on battery and without acpid:

------------[ cut here ]------------
WARNING: at drivers/platform/x86/thinkpad_acpi.c:4565 bay_notify+0x35/0x5e
[thinkpad_acpi]()
Hardware name: 2373G1G
thinkpad_acpi: dev_name() is returning NULL
Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs radeon drm
sco bridge stp llc bnep l2cap bluetooth ipt_MASQUERADE iptable_nat nf_nat
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT ipt_LOG xt_limit
xt_tcpudp xt_mac xt_multiport iptable_filter iptable_mangle ip_tables x_tables
nf_conntrack_ftp nf_conntrack vboxdrv binfmt_misc af_packet cpufreq_userspace
cpufreq_stats cpufreq_powersave autofs4 nsc_ircc fuse nls_utf8 ntfs nls_base
ext2 deadline_iosched as_iosched ircomm_tty ircomm tun acpi_cpufreq sbs sbshc
joydev snd_intel8x0m snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss
snd_mixer_oss snd_pcm snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event
snd_seq snd_timer snd_seq_device irtty_sir thinkpad_acpi ipw2200 libipw
yenta_socket sir_dev rfkill snd rsrc_nonstatic lib80211 soundcore pcmcia
rtc_cmos psmouse led_class pcmcia_core snd_page_alloc i2c_i801 rng_core
8250_pci irda pcspkr serio_raw nvram parport_pc 8250_pnp rtc_core parport
button rtc_lib processor battery ac crc_ccitt 8250 serial_core evdev ext3 jbd
mbcache sd_mod ata_generic pata_acpi ata_piix libata uhci_hcd ehci_hcd e1000
scsi_mod usbcore intel_agp agpgart video output thermal fan unix
cpufreq_conservative cpufreq_ondemand freq_table radeonfb fb_ddc backlight
i2c_algo_bit cfbcopyarea i2c_core cfbimgblt cfbfillrect fbcon tileblit font
bitblit softcursor fb
Pid: 52, comm: kacpi_notify Not tainted 2.6.30 #1
Call Trace:
[<c011f839>] warn_slowpath_common+0x60/0x90
[<f9767132>] ? bay_notify+0x0/0x5e [thinkpad_acpi]
[<c011f89d>] warn_slowpath_fmt+0x24/0x27
[<f9767167>] bay_notify+0x35/0x5e [thinkpad_acpi]
[<f9767023>] dispatch_acpi_notify+0x23/0x26 [thinkpad_acpi]
[<c02018cd>] acpi_ev_notify_dispatch+0x4c/0x57
[<c01f4488>] acpi_os_execute_deferred+0x20/0x2c
[<c012cff6>] worker_thread+0x15a/0x1fd
[<c01f4468>] ? acpi_os_execute_deferred+0x0/0x2c
[<c012fc7d>] ? autoremove_wake_function+0x0/0x33
[<c012ce9c>] ? worker_thread+0x0/0x1fd
[<c012f8bc>] kthread+0x42/0x67
[<c012f87a>] ? kthread+0x0/0x67
[<c01030d3>] kernel_thread_helper+0x7/0x10
---[ end trace 0469584017b9bddb ]---
sd 1:0:0:0: [sdb] Synchronizing SCSI cache
sd 1:0:0:0: [sdb] Stopping disk
ata2.00: disabled
ACPI: \_SB_.PCI0.IDE0.SCND.MSTR - undocking
ACPI Warning (nspredef-0290): \_SB_.PCI0.LPC_.EC__.BEEP: Excess arguments -
needs 1, found 2 [20090320]
------------[ cut here ]------------
WARNING: at drivers/platform/x86/thinkpad_acpi.c:4565 bay_notify+0x35/0x5e
[thinkpad_acpi]()
Hardware name: 2373G1G
thinkpad_acpi: dev_name() is returning NULL
Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs radeon drm
sco bridge stp llc bnep l2cap bluetooth ipt_MASQUERADE iptable_nat nf_nat
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT ipt_LOG xt_limit
xt_tcpudp xt_mac xt_multiport iptable_filter iptable_mangle ip_tables x_tables
nf_conntrack_ftp nf_conntrack vboxdrv binfmt_misc af_packet cpufreq_userspace
cpufreq_stats cpufreq_powersave autofs4 nsc_ircc fuse nls_utf8 ntfs nls_base
ext2 deadline_iosched as_iosched ircomm_tty ircomm tun acpi_cpufreq sbs sbshc
joydev snd_intel8x0m snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss
snd_mixer_oss snd_pcm snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event
snd_seq snd_timer snd_seq_device irtty_sir thinkpad_acpi ipw2200 libipw
yenta_socket sir_dev rfkill snd rsrc_nonstatic lib80211 soundcore pcmcia
rtc_cmos psmouse led_class pcmcia_core snd_page_alloc i2c_i801 rng_core
8250_pci irda pcspkr serio_raw nvram parport_pc 8250_pnp rtc_core parport
button rtc_lib processor battery ac crc_ccitt 8250 serial_core evdev ext3 jbd
mbcache sd_mod ata_generic pata_acpi ata_piix libata uhci_hcd ehci_hcd e1000
scsi_mod usbcore intel_agp agpgart video output thermal fan unix
cpufreq_conservative cpufreq_ondemand freq_table radeonfb fb_ddc backlight
i2c_algo_bit cfbcopyarea i2c_core cfbimgblt cfbfillrect fbcon tileblit font
bitblit softcursor fb
Pid: 52, comm: kacpi_notify Tainted: G W 2.6.30 #1
Call Trace:
[<c011f839>] warn_slowpath_common+0x60/0x90
[<f9767132>] ? bay_notify+0x0/0x5e [thinkpad_acpi]
[<c011f89d>] warn_slowpath_fmt+0x24/0x27
[<f9767167>] bay_notify+0x35/0x5e [thinkpad_acpi]
[<f9767023>] dispatch_acpi_notify+0x23/0x26 [thinkpad_acpi]
[<c02018cd>] acpi_ev_notify_dispatch+0x4c/0x57
[<c01f4488>] acpi_os_execute_deferred+0x20/0x2c
[<c012cff6>] worker_thread+0x15a/0x1fd
[<c01f4468>] ? acpi_os_execute_deferred+0x0/0x2c
[<c012fc7d>] ? autoremove_wake_function+0x0/0x33
[<c012ce9c>] ? worker_thread+0x0/0x1fd
[<c012f8bc>] kthread+0x42/0x67
[<c012f87a>] ? kthread+0x0/0x67
[<c01030d3>] kernel_thread_helper+0x7/0x10
---[ end trace 0469584017b9bddc ]---

Hope it helps...

Best regards,
J?rg

2009-06-18 21:27:16

by Joerg Platte

[permalink] [raw]
Subject: Re: [2.6.30] Kernel bug with dock driver

Am Wednesday, 17. June 2009 schrieb Zhang Rui:

> jplatte,
> it would be great if you can verify if this is a regression or not.

I'm not sure where to start. I'm not using the bay that frequent and the bug
is not that easy to trigger. Sometimes I can trigger it directly, sometimes it
takes a day.

> And please file a new bug report at
> http://bugzilla.kernel.org/enter_bug.cgi?product=ACPI
> attach the full dmesg after the bug and the acpidump output there.

I'll file a bug if Henrique is unable to find the bug. Maybe his debugging
output helps to find the problem.

regards,
Jörg

Subject: Re: [2.6.30] Kernel bug with dock driver

On Thu, 18 Jun 2009, Joerg Platte wrote:
> Am Wednesday, 17. June 2009 schrieb Henrique de Moraes Holschuh:
> > If it causes a NULL derreference, it means someone has changed API and did
> > NOT change all callsites. It is that simple.
> >
> thinkpad_acpi: dev_name() is returning NULL

There it is. "Someone set us up the bomb."

> Hope it helps...

Yes, it does. Thank you. Now at least I know what to look for.

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

Subject: Re: [2.6.30] Kernel bug with dock driver

On Thu, 18 Jun 2009, Joerg Platte wrote:
> Am Wednesday, 17. June 2009 schrieb Zhang Rui:
>
> > jplatte,
> > it would be great if you can verify if this is a regression or not.

Something is causing ACPI devices to have NULL for dev_name(). This
broke thinkpad-acpi. Any ideas?

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh