Hi,
some people hit this bug, I'm able to reproduce it too, but I'm out of ideas
what could have cause it. Here are some traces of D processes:
tee D ffff8100041e3c28 0 15798 13503
ffff8100041e3bc8 0000000000000086 0000000000000000 ffffffff804a007b
ffff8100041e3be8 ffff8100029d8000 ffff810001fc9560 ffffffff8828e094
ffff810008af1ba0 ffff8100029d8000 ffff810001fc9560 ffff81002ae74e40
Call Trace:
[<ffffffff804a007b>] thread_return+0x0/0x5a5
[<ffffffff8828e094>] :cpufreq_stats:cpufreq_stats_update+0x54/0x70 /*
_spin_unlock */
[<ffffffff804a0731>] wait_for_completion+0xa1/0xf0
[<ffffffff8022ee90>] default_wake_function+0x0/0x10
[<ffffffff802d583f>] sysfs_addrm_finish+0x1ef/0x270 /* wait_for_completion */
[<ffffffff802d3fb6>] sysfs_hash_and_remove+0xa6/0xc0
[<ffffffff880120cc>] :cpufreq_userspace:cpufreq_governor_userspace+0xac/0x220
[<ffffffff803c4cd2>] __cpufreq_governor+0x32/0x110
[<ffffffff803c56f3>] __cpufreq_set_policy+0x113/0x180
[<ffffffff803c5852>] store_scaling_governor+0xf2/0x200
[<ffffffff803c5f50>] handle_update+0x0/0x10
here started the scheduled work ^^^
[<ffffffff8026a2c3>] __alloc_pages+0x73/0x320 /* get_page_from_freelist */
[<ffffffff803c662b>] store+0x7b/0x90 /* fattr->store(), holding write sem */
[<ffffffff802d46ef>] sysfs_write_file+0xcf/0x150
[<ffffffff8028c778>] vfs_write+0xc8/0x170
[<ffffffff8028ce53>] sys_write+0x53/0x90
[<ffffffff8020bd6e>] system_call+0x7e/0x83
tee D 0000000000000000 0 15800 22179
ffff8100058b7e28 0000000000000086 0000000000000000 001bdcd0000f32a0
0000000000000000 ffff810005f8ce40 ffff810001fd6e40 0000000000000001
0000000101281d48 000000008057bec0 0000000000000003 000280d000000000
Call Trace:
[<ffffffff804a1ed9>] __down_write_nested+0x79/0xc0
[<ffffffff803c5e21>] lock_policy_rwsem_write+0x41/0x80
[<ffffffff803c660c>] store+0x5c/0x90
[<ffffffff802d46ef>] sysfs_write_file+0xcf/0x150
[<ffffffff8028c778>] vfs_write+0xc8/0x170
[<ffffffff8028ce53>] sys_write+0x53/0x90
[<ffffffff8020bd6e>] system_call+0x7e/0x83
cat D 0000000000000000 0 15801 13587
ffff810008f69e38 0000000000000082 0000000000000000 ffffffff8026a119
ffffffff80624fa0 ffff810005f8d560 ffffffff805703a0 ffffffff8057bec0
0000000101282286 0000000000000000 0000000000000003 000000010697e4a0
Call Trace:
[<ffffffff8026a119>] get_page_from_freelist+0x339/0x470
[<ffffffff804a1fa9>] __down_read+0x79/0xb2 /* schedule */
[<ffffffff803c6681>] lock_policy_rwsem_read+0x41/0x80
[<ffffffff803c67cd>] show+0x4d/0x80
[<ffffffff802d4bed>] sysfs_read_file+0x9d/0x150
[<ffffffff8028c8e5>] vfs_read+0xc5/0x160
[<ffffffff8028cdc3>] sys_read+0x53/0x90
[<ffffffff8020bd6e>] system_call+0x7e/0x83
I reproduced it by executing these 3 scripts:
#!/bin/bash
while true
do
echo userspace | tee /sys/devices/system/cpu/*/cpufreq/scaling_governor
cat /sys/devices/system/cpu/*/cpufreq/scaling_governor
sleep 2
echo performance | tee /sys/devices/system/cpu/*/cpufreq/scaling_governor
cat /sys/devices/system/cpu/*/cpufreq/scaling_governor
sleep 4
echo ondemand | tee /sys/devices/system/cpu/*/cpufreq/scaling_governor
cat /sys/devices/system/cpu/*/cpufreq/scaling_governor
sleep 10
done
#!/bin/bash
while true
do
echo userspace | tee /sys/devices/system/cpu/*/cpufreq/scaling_governor
cat /sys/devices/system/cpu/*/cpufreq/scaling_governor
for (( a = 0; a < 10000; a++ )); do echo aaa >/dev/null; done
echo performance | tee /sys/devices/system/cpu/*/cpufreq/scaling_governor
cat /sys/devices/system/cpu/*/cpufreq/scaling_governor
for (( a = 0; a < 11500; a++ )); do echo aaa >/dev/null; done
echo ondemand | tee /sys/devices/system/cpu/*/cpufreq/scaling_governor
cat /sys/devices/system/cpu/*/cpufreq/scaling_governor
sleep 1
done
#!/bin/bash
while true
do cat /sys/devices/system/cpu/*/cpufreq/*
sleep 10
done
This was on 2.6.23, however it is known to be also in older kernels. I
suspect same race between wait_for_completion and complete() -- it wakes
only one process, but not the other in sysfs/dir.c (would complete_all
help?)? Any ideas?
Here are events workers (seems OK):
events/1 S 0000000000000000 0 13126 2
ffff81003fe71ec0 0000000000000046 0000000000000000 0000000000000202
ffff810001e120c0 ffff81003d821560 ffff810001fd6e40 00000000ffffffff
00000001012df7e1 0000000080248f2a 0000000000000003 ffff810001e120c8
Call Trace:
[<ffffffff80249060>] worker_thread+0x0/0x130
[<ffffffff80249165>] worker_thread+0x105/0x130
[<ffffffff8024cb20>] autoremove_wake_function+0x0/0x30
[<ffffffff80249060>] worker_thread+0x0/0x130
[<ffffffff80249060>] worker_thread+0x0/0x130
[<ffffffff8024c71b>] kthread+0x4b/0x80
[<ffffffff8020cbb8>] child_rip+0xa/0x12
[<ffffffff8024c6d0>] kthread+0x0/0x80
[<ffffffff8020cbae>] child_rip+0x0/0x12
events/0 S 0000000000000000 0 7 2
ffff81003ff0bec0 0000000000000046 0000000000000000 0000000000000202
ffff810001e0a0c0 ffff810001fd7560 ffffffff805703a0 00000000ffffffff
00000001012df7c6 0000000080248f2a 0000000000000003 ffff810001e0a0c8
Call Trace:
[<ffffffff80249060>] worker_thread+0x0/0x130
[<ffffffff80249165>] worker_thread+0x105/0x130
[<ffffffff8024cb20>] autoremove_wake_function+0x0/0x30
[<ffffffff80249060>] worker_thread+0x0/0x130
[<ffffffff80249060>] worker_thread+0x0/0x130
[<ffffffff8024c71b>] kthread+0x4b/0x80
[<ffffffff8020cbb8>] child_rip+0xa/0x12
[<ffffffff8024c6d0>] kthread+0x0/0x80
[<ffffffff8020cbae>] child_rip+0x0/0x12
I'm now trying to reproduce this with LOCKDEP on, but it's not easy to get
to that state.
thanks,
--
http://www.fi.muni.cz/~xslaby/ Jiri Slaby
faculty of informatics, masaryk university, brno, cz
On Thu, 2007-11-22 at 11:46 +0100, Jiri Slaby wrote:
> Hi,
>
> some people hit this bug, I'm able to reproduce it too, but I'm out of ideas
> what could have cause it. Here are some traces of D processes:
Could this have to do with the problem Venkatesh/Luming recently sent a
patch for to the cpufreq list:
Eliminate cpufreq_userspace scaling_setspeed deadlock
http://article.gmane.org/gmane.linux.kernel.cpufreq/5629
Thomas
>
> tee D ffff8100041e3c28 0 15798 13503
> ffff8100041e3bc8 0000000000000086 0000000000000000 ffffffff804a007b
> ffff8100041e3be8 ffff8100029d8000 ffff810001fc9560 ffffffff8828e094
> ffff810008af1ba0 ffff8100029d8000 ffff810001fc9560 ffff81002ae74e40
> Call Trace:
> [<ffffffff804a007b>] thread_return+0x0/0x5a5
> [<ffffffff8828e094>] :cpufreq_stats:cpufreq_stats_update+0x54/0x70 /*
> _spin_unlock */
> [<ffffffff804a0731>] wait_for_completion+0xa1/0xf0
> [<ffffffff8022ee90>] default_wake_function+0x0/0x10
> [<ffffffff802d583f>] sysfs_addrm_finish+0x1ef/0x270 /* wait_for_completion */
> [<ffffffff802d3fb6>] sysfs_hash_and_remove+0xa6/0xc0
> [<ffffffff880120cc>] :cpufreq_userspace:cpufreq_governor_userspace+0xac/0x220
> [<ffffffff803c4cd2>] __cpufreq_governor+0x32/0x110
> [<ffffffff803c56f3>] __cpufreq_set_policy+0x113/0x180
> [<ffffffff803c5852>] store_scaling_governor+0xf2/0x200
> [<ffffffff803c5f50>] handle_update+0x0/0x10
>
> here started the scheduled work ^^^
>
> [<ffffffff8026a2c3>] __alloc_pages+0x73/0x320 /* get_page_from_freelist */
> [<ffffffff803c662b>] store+0x7b/0x90 /* fattr->store(), holding write sem */
> [<ffffffff802d46ef>] sysfs_write_file+0xcf/0x150
> [<ffffffff8028c778>] vfs_write+0xc8/0x170
> [<ffffffff8028ce53>] sys_write+0x53/0x90
> [<ffffffff8020bd6e>] system_call+0x7e/0x83
>
> tee D 0000000000000000 0 15800 22179
> ffff8100058b7e28 0000000000000086 0000000000000000 001bdcd0000f32a0
> 0000000000000000 ffff810005f8ce40 ffff810001fd6e40 0000000000000001
> 0000000101281d48 000000008057bec0 0000000000000003 000280d000000000
> Call Trace:
> [<ffffffff804a1ed9>] __down_write_nested+0x79/0xc0
> [<ffffffff803c5e21>] lock_policy_rwsem_write+0x41/0x80
> [<ffffffff803c660c>] store+0x5c/0x90
> [<ffffffff802d46ef>] sysfs_write_file+0xcf/0x150
> [<ffffffff8028c778>] vfs_write+0xc8/0x170
> [<ffffffff8028ce53>] sys_write+0x53/0x90
> [<ffffffff8020bd6e>] system_call+0x7e/0x83
>
> cat D 0000000000000000 0 15801 13587
> ffff810008f69e38 0000000000000082 0000000000000000 ffffffff8026a119
> ffffffff80624fa0 ffff810005f8d560 ffffffff805703a0 ffffffff8057bec0
> 0000000101282286 0000000000000000 0000000000000003 000000010697e4a0
> Call Trace:
> [<ffffffff8026a119>] get_page_from_freelist+0x339/0x470
> [<ffffffff804a1fa9>] __down_read+0x79/0xb2 /* schedule */
> [<ffffffff803c6681>] lock_policy_rwsem_read+0x41/0x80
> [<ffffffff803c67cd>] show+0x4d/0x80
> [<ffffffff802d4bed>] sysfs_read_file+0x9d/0x150
> [<ffffffff8028c8e5>] vfs_read+0xc5/0x160
> [<ffffffff8028cdc3>] sys_read+0x53/0x90
> [<ffffffff8020bd6e>] system_call+0x7e/0x83
>
> I reproduced it by executing these 3 scripts:
> #!/bin/bash
>
> while true
> do
> echo userspace | tee /sys/devices/system/cpu/*/cpufreq/scaling_governor
> cat /sys/devices/system/cpu/*/cpufreq/scaling_governor
> sleep 2
> echo performance | tee /sys/devices/system/cpu/*/cpufreq/scaling_governor
> cat /sys/devices/system/cpu/*/cpufreq/scaling_governor
> sleep 4
> echo ondemand | tee /sys/devices/system/cpu/*/cpufreq/scaling_governor
> cat /sys/devices/system/cpu/*/cpufreq/scaling_governor
> sleep 10
> done
>
> #!/bin/bash
>
> while true
> do
> echo userspace | tee /sys/devices/system/cpu/*/cpufreq/scaling_governor
> cat /sys/devices/system/cpu/*/cpufreq/scaling_governor
> for (( a = 0; a < 10000; a++ )); do echo aaa >/dev/null; done
> echo performance | tee /sys/devices/system/cpu/*/cpufreq/scaling_governor
> cat /sys/devices/system/cpu/*/cpufreq/scaling_governor
> for (( a = 0; a < 11500; a++ )); do echo aaa >/dev/null; done
> echo ondemand | tee /sys/devices/system/cpu/*/cpufreq/scaling_governor
> cat /sys/devices/system/cpu/*/cpufreq/scaling_governor
> sleep 1
> done
>
> #!/bin/bash
>
> while true
> do cat /sys/devices/system/cpu/*/cpufreq/*
> sleep 10
> done
>
> This was on 2.6.23, however it is known to be also in older kernels. I
> suspect same race between wait_for_completion and complete() -- it wakes
> only one process, but not the other in sysfs/dir.c (would complete_all
> help?)? Any ideas?
>
> Here are events workers (seems OK):
> events/1 S 0000000000000000 0 13126 2
> ffff81003fe71ec0 0000000000000046 0000000000000000 0000000000000202
> ffff810001e120c0 ffff81003d821560 ffff810001fd6e40 00000000ffffffff
> 00000001012df7e1 0000000080248f2a 0000000000000003 ffff810001e120c8
> Call Trace:
> [<ffffffff80249060>] worker_thread+0x0/0x130
> [<ffffffff80249165>] worker_thread+0x105/0x130
> [<ffffffff8024cb20>] autoremove_wake_function+0x0/0x30
> [<ffffffff80249060>] worker_thread+0x0/0x130
> [<ffffffff80249060>] worker_thread+0x0/0x130
> [<ffffffff8024c71b>] kthread+0x4b/0x80
> [<ffffffff8020cbb8>] child_rip+0xa/0x12
> [<ffffffff8024c6d0>] kthread+0x0/0x80
> [<ffffffff8020cbae>] child_rip+0x0/0x12
>
> events/0 S 0000000000000000 0 7 2
> ffff81003ff0bec0 0000000000000046 0000000000000000 0000000000000202
> ffff810001e0a0c0 ffff810001fd7560 ffffffff805703a0 00000000ffffffff
> 00000001012df7c6 0000000080248f2a 0000000000000003 ffff810001e0a0c8
> Call Trace:
> [<ffffffff80249060>] worker_thread+0x0/0x130
> [<ffffffff80249165>] worker_thread+0x105/0x130
> [<ffffffff8024cb20>] autoremove_wake_function+0x0/0x30
> [<ffffffff80249060>] worker_thread+0x0/0x130
> [<ffffffff80249060>] worker_thread+0x0/0x130
> [<ffffffff8024c71b>] kthread+0x4b/0x80
> [<ffffffff8020cbb8>] child_rip+0xa/0x12
> [<ffffffff8024c6d0>] kthread+0x0/0x80
> [<ffffffff8020cbae>] child_rip+0x0/0x12
>
> I'm now trying to reproduce this with LOCKDEP on, but it's not easy to get
> to that state.