2009-06-14 18:19:48

by Andrey Borzenkov

[permalink] [raw]
Subject: 2.6.30: hald crashes on resume from STR apparently on LED device

I did not experience this crash under 2.6.29. I got it under 2.6.30
(both rc and final) with varying frequency. At least for me it is 2.6.30
regression.

hald aborts on D-Bus assertion with stack trace below.Program received
signal SIGABRT, Aborted.
0x00007f183ff9e915 in raise (sig=<value optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0 0x00007f183ff9e915 in raise (sig=<value optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007f183ff9ff8a in abort () at abort.c:88
#2 0x00007f1840793f35 in _dbus_abort () at dbus-sysdeps.c:88
#3 0x00007f184078fd6d in _dbus_warn_check_failed (
format=0x7f184079c5a0 "arguments to %s() were incorrect, assertion
\"%s\" failed in file %s line %d.\nThis is normally a bug in some
application using the D-Bus library.\n") at dbus-internals.c:283
#4 0x00007f18407873f9 in dbus_pending_call_set_notify (pending=0x0,
function=0x41b320 <reply_from_singleton_device_changed>,
user_data=0xc74c30, free_user_data=0x416000 <del_sdci>) at dbus-
pending-call.c:598
#5 0x0000000000415f6e in singleton_signal_device_changed
(connection=0xbdc140, device=0x7f18411af2c0, added=1)
at hald_dbus.c:3481
#6 0x0000000000419fe2 in hald_singleton_device_added
(command_line=0xc601e0 "hald-addon-leds", device=0x7f18411af2c0)
at hald_dbus.c:3570
#7 0x0000000000413e5f in gdl_store_changed (store=<value optimized
out>, device=0x7f18411af2c0,
is_added=<value optimized out>, user_data=<value optimized out>) at
hald.c:134
#8 0x00007f1840c78bae in IA__g_closure_invoke (closure=0xbcba10,
return_value=0x0, n_param_values=3, param_values=0xc914a0,
invocation_hint=0x7fff3cac7d20) at gclosure.c:767
#9 0x00007f1840c8f108 in signal_emit_unlocked_R (node=0xbca8a0,
detail=<value optimized out>, instance=<value optimized out>,
emission_return=<value optimized out>, instance_and_params=<value
optimized out>) at gsignal.c:3247
#10 0x00007f1840c9059e in IA__g_signal_emit_valist (instance=0xbbd600,
signal_id=<value optimized out>, detail=0,
var_args=0x7fff3cac7f10) at gsignal.c:2980
#11 0x00007f1840c90b33 in IA__g_signal_emit (instance=0xbc7,
signal_id=3015, detail=6) at gsignal.c:3037
#12 0x0000000000411d01 in hal_device_store_add (store=0xbbd600,
device=0x7f18411af2c0) at device_store.c:277
#13 0x0000000000430d72 in dev_callouts_add_done (d=0x7f18411af2c0,
userdata1=0xc6ee20, userdata2=<value optimized out>)
at device.c:4666
#14 0x00000000004086c4 in hal_util_callout_device_add (d=0x7f18411af2c0,
callback=0x430d10 <dev_callouts_add_done>,
userdata1=0xc6ee20, userdata2=0x0) at util.c:996
#15 0x0000000000408524 in hal_util_callout_device_preprobe
(d=0x7f18411af2c0,
callback=0x4310e0 <dev_callouts_preprobing_done>,
userdata1=0xc6ee20, userdata2=0x657fa0) at util.c:1035
#16 0x0000000000425394 in hotplug_event_begin_sysfs
(hotplug_event=0xc6ee20) at hotplug.c:172
#17 0x00000000004257a8 in hotplug_event_process_queue () at
hotplug.c:295
#18 0x00000000004242b7 in hald_udev_data (source=<value optimized out>,
condition=<value optimized out>,
user_data=<value optimized out>) at osspec.c:259
#19 0x00007f18409e1a4e in IA__g_main_context_dispatch (context=0xbc0320)
at gmain.c:1814
#20 0x00007f18409e51b8 in g_main_context_iterate (context=0xbc0320,
block=<value optimized out>,
dispatch=<value optimized out>, self=<value optimized out>) at
gmain.c:2445
#21 0x00007f18409e5645 in IA__g_main_loop_run (loop=0xbbcc20) at
gmain.c:2653
#22 0x00000000004133ed in main (argc=<value optimized out>,
argv=0x7fff3cac9b50) at hald.c:821


Appearance of hald-addon-leds may be red herring; still here is the only
led device I have (for Intel 3946ABG).

udi = '/org/freedesktop/Hal/devices/leds_iwl_phy0_radio'
info.addons.singleton = {'hald-addon-leds'} (string list)
info.capabilities = {'leds'} (string list)
info.category = 'leds' (string)
info.interfaces = {'org.freedesktop.Hal.Device.Leds'} (string list)
info.parent = '/org/freedesktop/Hal/devices/pci_8086_4222' (string)
info.subsystem = 'leds' (string)
info.udi = '/org/freedesktop/Hal/devices/leds_iwl_phy0_radio'
(string)
leds.device_name = 'iwl-phy0' (string)
leds.function = 'radio' (string)
linux.hotplug_type = 2 (0x2) (int)
linux.subsystem = 'leds' (string)
linux.sysfs_path =
'/sys/devices/pci0000:00/0000:00:1c.1/0000:0c:00.0/leds/iwl-phy0::radio'
(string)


This is Dell XPS M1330 notebook. I am ready to provide any required
information; so far crash is reproducible after several suspend/resume
cycles.


Attachments:
signature.asc (197.00 B)
This is a digitally signed message part.

2009-06-21 14:17:20

by Andrey Borzenkov

[permalink] [raw]
Subject: Re: 2.6.30: hald crashes on resume from STR apparently on LED device

On Sunday 14 of June 2009 22:19:02 Andrey Borzenkov wrote:
> I did not experience this crash under 2.6.29. I got it under 2.6.30
> (both rc and final) with varying frequency. At least for me it is
> 2.6.30 regression.
>
> hald aborts on D-Bus assertion with stack trace below.Program
> received signal SIGABRT, Aborted.
> 0x00007f183ff9e915 in raise (sig=<value optimized out>) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
> (gdb) bt
> #0 0x00007f183ff9e915 in raise (sig=<value optimized out>) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1 0x00007f183ff9ff8a in abort () at abort.c:88
> #2 0x00007f1840793f35 in _dbus_abort () at dbus-sysdeps.c:88
> #3 0x00007f184078fd6d in _dbus_warn_check_failed (
> format=0x7f184079c5a0 "arguments to %s() were incorrect,
> assertion \"%s\" failed in file %s line %d.\nThis is normally a bug
> in some application using the D-Bus library.\n") at
> dbus-internals.c:283 #4 0x00007f18407873f9 in
> dbus_pending_call_set_notify (pending=0x0, function=0x41b320
> <reply_from_singleton_device_changed>,
> user_data=0xc74c30, free_user_data=0x416000 <del_sdci>) at dbus-
> pending-call.c:598
> #5 0x0000000000415f6e in singleton_signal_device_changed
> (connection=0xbdc140, device=0x7f18411af2c0, added=1)
> at hald_dbus.c:3481
> #6 0x0000000000419fe2 in hald_singleton_device_added
> (command_line=0xc601e0 "hald-addon-leds", device=0x7f18411af2c0)
> at hald_dbus.c:3570

OK this is HAL bug which could be triggered by some changes in 2.6.30
series. I leave kernel on Cc list so far to ask whether there was
actually behaviour change in 2.6.30 or it is just different timing that
now hits badly.

So what happens is simple. During suspend led device is removed; during
resume it is recreated. Apparently removing led device cause hald-addon-
leds to exit (thus disconnecting D-Bus connection); and sometimes next
"add" event comes too early, before hald had chance to notice missing
addon.

So attached is POC patch that simply adds extra check to
hald_singleton_device_added() to ensure we do not abort on disconnected
D-Bus connection. It does not seem to have evil effects; it is possible
that we do miss some event but I do not quite understand processing
logic here. Apparently event should be replayed ...

Here is list of hald processes after resume. Notice that hald-addon-leds
is restarted every time:

pts/1}% pgrep -l hald
13996 hald
13997 hald-runner
14098 hald-addon-rfki
14130 hald-addon-gene
14156 hald-addon-inpu
14190 hald-addon-stor
14194 hald-addon-cpuf
14195 hald-addon-acpi
14796 hald-addon-leds
{pts/1}% pgrep -l hald
13996 hald
13997 hald-runner
14098 hald-addon-rfki
14130 hald-addon-gene
14156 hald-addon-inpu
14190 hald-addon-stor
14194 hald-addon-cpuf
14195 hald-addon-acpi
16145 hald-addon-leds
{pts/1}% pgrep -l hald
13996 hald
13997 hald-runner
14098 hald-addon-rfki
14130 hald-addon-gene
14156 hald-addon-inpu
14190 hald-addon-stor
14194 hald-addon-cpuf
14195 hald-addon-acpi
17543 hald-addon-leds
{pts/1}% pgrep -l hald
13996 hald
13997 hald-runner
14098 hald-addon-rfki
14130 hald-addon-gene
14156 hald-addon-inpu
14190 hald-addon-stor
14194 hald-addon-cpuf
14195 hald-addon-acpi
{pts/1}% pgrep -l hald
13996 hald
13997 hald-runner
14098 hald-addon-rfki
14130 hald-addon-gene
14156 hald-addon-inpu
14190 hald-addon-stor
14194 hald-addon-cpuf
14195 hald-addon-acpi
21842 hald-addon-leds
{pts/1}% pgrep -l hald
13996 hald
13997 hald-runner
14098 hald-addon-rfki
14130 hald-addon-gene
14156 hald-addon-inpu
14190 hald-addon-stor
14194 hald-addon-cpuf
14195 hald-addon-acpi
23946 hald-addon-leds

and events that are generated by kernel during suspend/remove cycle:

{pts/0}% sudo udevadm monitor --env --kernel
monitor will print the received events for:
KERNEL - the kernel uevent

KERNEL[1245593084.803490] remove
/devices/pci0000:00/0000:00:1c.1/0000:0c:00.0/leds/iwl-phy0::assoc
(leds)
UDEV_LOG=3
ACTION=remove
DEVPATH=/devices/pci0000:00/0000:00:1c.1/0000:0c:00.0/leds/iwl-
phy0::assoc
SUBSYSTEM=leds
SEQNUM=1458

KERNEL[1245593084.803530] remove
/devices/pci0000:00/0000:00:1c.1/0000:0c:00.0/leds/iwl-phy0::RX (leds)
UDEV_LOG=3
ACTION=remove
DEVPATH=/devices/pci0000:00/0000:00:1c.1/0000:0c:00.0/leds/iwl-phy0::RX
SUBSYSTEM=leds
SEQNUM=1459

KERNEL[1245593084.803550] remove
/devices/pci0000:00/0000:00:1c.1/0000:0c:00.0/leds/iwl-phy0::TX (leds)
UDEV_LOG=3
ACTION=remove
DEVPATH=/devices/pci0000:00/0000:00:1c.1/0000:0c:00.0/leds/iwl-phy0::TX
SUBSYSTEM=leds
SEQNUM=1460

KERNEL[1245593084.803570] remove
/devices/pci0000:00/0000:00:1c.1/0000:0c:00.0/leds/iwl-phy0::radio
(leds)
UDEV_LOG=3
ACTION=remove
DEVPATH=/devices/pci0000:00/0000:00:1c.1/0000:0c:00.0/leds/iwl-
phy0::radio
SUBSYSTEM=leds
SEQNUM=1461

KERNEL[1245593084.803590] remove /devices/system/cpu/cpu1/cpufreq
(cpu)
UDEV_LOG=3
ACTION=remove
DEVPATH=/devices/system/cpu/cpu1/cpufreq
SUBSYSTEM=cpu
SEQNUM=1462

KERNEL[1245593084.803610] remove /devices/system/cpu/cpu1/cache/index0
(cpu)
UDEV_LOG=3
ACTION=remove
DEVPATH=/devices/system/cpu/cpu1/cache/index0
SUBSYSTEM=cpu
SEQNUM=1463

KERNEL[1245593084.803629] remove /devices/system/cpu/cpu1/cache/index1
(cpu)
UDEV_LOG=3
ACTION=remove
DEVPATH=/devices/system/cpu/cpu1/cache/index1
SUBSYSTEM=cpu
SEQNUM=1464

KERNEL[1245593084.803648] remove /devices/system/cpu/cpu1/cache/index2
(cpu)
UDEV_LOG=3
ACTION=remove
DEVPATH=/devices/system/cpu/cpu1/cache/index2
SUBSYSTEM=cpu
SEQNUM=1465

KERNEL[1245593084.803667] remove /devices/system/cpu/cpu1/cache (cpu)
UDEV_LOG=3
ACTION=remove
DEVPATH=/devices/system/cpu/cpu1/cache
SUBSYSTEM=cpu
SEQNUM=1466

KERNEL[1245593084.803687] remove
/devices/system/machinecheck/machinecheck1 (machinecheck)
UDEV_LOG=3
ACTION=remove
DEVPATH=/devices/system/machinecheck/machinecheck1
SUBSYSTEM=machinecheck
SEQNUM=1467

KERNEL[1245593084.803707] add /devices/system/cpu/cpu1/cache/index0
(cpu)
UDEV_LOG=3
ACTION=add
DEVPATH=/devices/system/cpu/cpu1/cache/index0
SUBSYSTEM=cpu
SEQNUM=1468

KERNEL[1245593084.803726] add /devices/system/cpu/cpu1/cache/index1
(cpu)
UDEV_LOG=3
ACTION=add
DEVPATH=/devices/system/cpu/cpu1/cache/index1
SUBSYSTEM=cpu
SEQNUM=1469

KERNEL[1245593084.803745] add /devices/system/cpu/cpu1/cache/index2
(cpu)
UDEV_LOG=3
ACTION=add
DEVPATH=/devices/system/cpu/cpu1/cache/index2
SUBSYSTEM=cpu
SEQNUM=1470

KERNEL[1245593084.803764] add /devices/system/cpu/cpu1/cache (cpu)
UDEV_LOG=3
ACTION=add
DEVPATH=/devices/system/cpu/cpu1/cache
SUBSYSTEM=cpu
SEQNUM=1471

KERNEL[1245593084.803784] add
/devices/system/machinecheck/machinecheck1 (machinecheck)
UDEV_LOG=3
ACTION=add
DEVPATH=/devices/system/machinecheck/machinecheck1
SUBSYSTEM=machinecheck
SEQNUM=1472

KERNEL[1245593084.803804] add /devices/system/cpu/cpu1/cpufreq
(cpu)
UDEV_LOG=3
ACTION=add
DEVPATH=/devices/system/cpu/cpu1/cpufreq
SUBSYSTEM=cpu
SEQNUM=1473

KERNEL[1245593084.803825] change
/devices/LNXSYSTM:00/device:00/ACPI0003:00/power_supply/AC
(power_supply)
UDEV_LOG=3
ACTION=change
DEVPATH=/devices/LNXSYSTM:00/device:00/ACPI0003:00/power_supply/AC
SUBSYSTEM=power_supply
POWER_SUPPLY_NAME=AC
POWER_SUPPLY_TYPE=Mains
POWER_SUPPLY_ONLINE=1
SEQNUM=1474

KERNEL[1245593084.803860] change
/devices/LNXSYSTM:00/device:00/PNP0C0A:00/power_supply/BAT0
(power_supply)
UDEV_LOG=3
ACTION=change
DEVPATH=/devices/LNXSYSTM:00/device:00/PNP0C0A:00/power_supply/BAT0
SUBSYSTEM=power_supply
POWER_SUPPLY_NAME=BAT0
POWER_SUPPLY_TYPE=Battery
POWER_SUPPLY_STATUS=Unknown
POWER_SUPPLY_PRESENT=1
POWER_SUPPLY_TECHNOLOGY=Li-ion
POWER_SUPPLY_VOLTAGE_MIN_DESIGN=11100000
POWER_SUPPLY_VOLTAGE_NOW=12016000
POWER_SUPPLY_CURRENT_NOW=2595000
POWER_SUPPLY_CHARGE_FULL_DESIGN=5200000
POWER_SUPPLY_CHARGE_FULL=4925000
POWER_SUPPLY_CHARGE_NOW=2819000
POWER_SUPPLY_MODEL_NAME=DELL WR0478
POWER_SUPPLY_MANUFACTURER=Samsung SDI
POWER_SUPPLY_SERIAL_NUMBER=21873
SEQNUM=1475

KERNEL[1245593084.817242] change
/devices/LNXSYSTM:00/device:00/PNP0C0A:00/power_supply/BAT0
(power_supply)
UDEV_LOG=3
ACTION=change
DEVPATH=/devices/LNXSYSTM:00/device:00/PNP0C0A:00/power_supply/BAT0
SUBSYSTEM=power_supply
POWER_SUPPLY_NAME=BAT0
POWER_SUPPLY_TYPE=Battery
POWER_SUPPLY_STATUS=Unknown
POWER_SUPPLY_PRESENT=1
POWER_SUPPLY_TECHNOLOGY=Li-ion
POWER_SUPPLY_VOLTAGE_MIN_DESIGN=11100000
POWER_SUPPLY_VOLTAGE_NOW=12016000
POWER_SUPPLY_CURRENT_NOW=2595000
POWER_SUPPLY_CHARGE_FULL_DESIGN=5200000
POWER_SUPPLY_CHARGE_FULL=4925000
POWER_SUPPLY_CHARGE_NOW=2819000
POWER_SUPPLY_MODEL_NAME=DELL WR0478
POWER_SUPPLY_MANUFACTURER=Samsung SDI
POWER_SUPPLY_SERIAL_NUMBER=21873
SEQNUM=1476

KERNEL[1245593084.817297] change
/devices/LNXSYSTM:00/device:00/PNP0C0A:00/power_supply/BAT0
(power_supply)
UDEV_LOG=3
ACTION=change
DEVPATH=/devices/LNXSYSTM:00/device:00/PNP0C0A:00/power_supply/BAT0
SUBSYSTEM=power_supply
POWER_SUPPLY_NAME=BAT0
POWER_SUPPLY_TYPE=Battery
POWER_SUPPLY_STATUS=Charging
POWER_SUPPLY_PRESENT=1
POWER_SUPPLY_TECHNOLOGY=Li-ion
POWER_SUPPLY_VOLTAGE_MIN_DESIGN=11100000
POWER_SUPPLY_VOLTAGE_NOW=11717000
POWER_SUPPLY_CURRENT_NOW=2275000
POWER_SUPPLY_CHARGE_FULL_DESIGN=5200000
POWER_SUPPLY_CHARGE_FULL=4925000
POWER_SUPPLY_CHARGE_NOW=2819000
POWER_SUPPLY_MODEL_NAME=DELL WR0478
POWER_SUPPLY_MANUFACTURER=Samsung SDI
POWER_SUPPLY_SERIAL_NUMBER=21873
SEQNUM=1477

KERNEL[1245593084.817342] add
/devices/pci0000:00/0000:00:1c.1/0000:0c:00.0/leds/iwl-phy0::radio
(leds)
UDEV_LOG=3
ACTION=add
DEVPATH=/devices/pci0000:00/0000:00:1c.1/0000:0c:00.0/leds/iwl-
phy0::radio
SUBSYSTEM=leds
SEQNUM=1478

KERNEL[1245593084.817382] add
/devices/pci0000:00/0000:00:1c.1/0000:0c:00.0/leds/iwl-phy0::assoc
(leds)
UDEV_LOG=3
ACTION=add
DEVPATH=/devices/pci0000:00/0000:00:1c.1/0000:0c:00.0/leds/iwl-
phy0::assoc
SUBSYSTEM=leds
SEQNUM=1479

KERNEL[1245593084.817402] add
/devices/pci0000:00/0000:00:1c.1/0000:0c:00.0/leds/iwl-phy0::RX (leds)
UDEV_LOG=3
ACTION=add
DEVPATH=/devices/pci0000:00/0000:00:1c.1/0000:0c:00.0/leds/iwl-phy0::RX
SUBSYSTEM=leds
SEQNUM=1480

KERNEL[1245593084.817421] add
/devices/pci0000:00/0000:00:1c.1/0000:0c:00.0/leds/iwl-phy0::TX (leds)
UDEV_LOG=3
ACTION=add
DEVPATH=/devices/pci0000:00/0000:00:1c.1/0000:0c:00.0/leds/iwl-phy0::TX
SUBSYSTEM=leds
SEQNUM=1481

KERNEL[1245593084.817442] remove
/devices/pci0000:00/0000:00:1a.0/usb2/2-2/2-2.1/2-2.1:1.0/bluetooth/hci0
(bluetooth)
UDEV_LOG=3
ACTION=remove
DEVPATH=/devices/pci0000:00/0000:00:1a.0/usb2/2-2/2-2.1/2-2.1:1.0/bluetooth/hci0
SUBSYSTEM=bluetooth
DEVTYPE=host
SEQNUM=1482

KERNEL[1245593084.817464] add
/devices/pci0000:00/0000:00:1a.0/usb2/2-2/2-2.1/2-2.1:1.0/bluetooth/hci0
(bluetooth)
UDEV_LOG=3
ACTION=add
DEVPATH=/devices/pci0000:00/0000:00:1a.0/usb2/2-2/2-2.1/2-2.1:1.0/bluetooth/hci0
SUBSYSTEM=bluetooth
DEVTYPE=host
SEQNUM=1483


Attachments:
(No filename) (19.95 kB)
hal-0.5.12-singleton_connection.patch (530.00 B)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments

2009-06-22 13:59:29

by Richard Purdie

[permalink] [raw]
Subject: Re: 2.6.30: hald crashes on resume from STR apparently on LED device

On Sun, 2009-06-21 at 18:17 +0400, Andrey Borzenkov wrote:
> So what happens is simple. During suspend led device is removed; during
> resume it is recreated. Apparently removing led device cause hald-addon-
> leds to exit (thus disconnecting D-Bus connection); and sometimes next
> "add" event comes too early, before hald had chance to notice missing
> addon.

Note that its the underlying device that is being removed here and since
the LED is connected to that its removed as well. The LED subsystem
hasn't changed in how it works. I don't think that has any bearing on
your patch though.

Regards,

Richard

--
Richard Purdie
Intel Open Source Technology Centre

2009-06-27 20:34:11

by Andrey Borzenkov

[permalink] [raw]
Subject: Re: 2.6.30: hald crashes on resume from STR apparently on LED device

On Monday 22 of June 2009 17:57:24 Richard Purdie wrote:
> On Sun, 2009-06-21 at 18:17 +0400, Andrey Borzenkov wrote:
> > So what happens is simple. During suspend led device is removed;
> > during resume it is recreated. Apparently removing led device cause
> > hald-addon- leds to exit (thus disconnecting D-Bus connection); and
> > sometimes next "add" event comes too early, before hald had chance
> > to notice missing addon.
>
> Note that its the underlying device that is being removed here and
> since the LED is connected to that its removed as well. The LED
> subsystem hasn't changed in how it works. I don't think that has any
> bearing on your patch though.
>

Yes, looking more closely there is inherent race condition between
tearing down existing helper and processing ADD event for it. Most
likely some timing was changed in current kernel that made it more
easily exposed. If Mandriva had not compiled D-Bus with assertions
enabled, this would simply fail silently without anyone noticing.


Attachments:
signature.asc (197.00 B)
This is a digitally signed message part.