2008-06-23 10:51:39

by Nageswara R Sastry

[permalink] [raw]
Subject: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

Hi,

While changing the cpufreq governor with the following script for about
30 minutes, the kernel hits a BUG in workqueue.c. Detailed trace attached.

Script:
#!/bin/bash

while [ 1 ]
do
for govnrs in ondemand conservative
do
for cpu in 0 1 2 3
do
echo $govnrs > /sys/devices/system/cpu/cpu${cpu}/cpufreq/scaling_governor
if ! [ -d /sys/devices/system/cpu/cpu${cpu}/cpufreq/${govnrs} ] ; then
echo "Error: Enable to create dir $govnrs"
fi
done
done
done

Kernel stack trace:
------------[ cut here ]------------
kernel BUG at kernel/workqueue.c:223!
invalid opcode: 0000 [#1] SMP
Modules linked in: cpufreq_powersave cpufreq_conservative
cpufreq_userspace usb_storage usbhid ehci_hcd ohci_hcd uhci_hcd usbcore

Pid: 232, comm: kondemand/1 Not tainted (2.6.25.7 #1)
EIP: 0060:[<c012f61a>] EFLAGS: 00010286 CPU: 1
EIP is at queue_delayed_work_on+0x20/0x97
EAX: 00000000 EBX: c483ba94 ECX: c483ba94 EDX: 00000000
ESI: c483bab0 EDI: f7a3f708 EBP: 00000001 ESP: f7a69f40
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process kondemand/1 (pid: 232, ti=f7a68000 task=f794d020 task.ti=f7a68000)
Stack: 00000000 f7a3e7b0 c483ba80 f7ab5e98 c041e54d 00000040 00000000
00000001
00000040 00000246 00000000 00000002 00000000 c012ee7f c483ba98
f7a3e7b0
c483ba94 f7a69f9c c012eeba 00000000 00000002 c012ee7f c041e31e
c099e2a8
Call Trace:
[<c041e54d>] do_dbs_timer+0x22f/0x24f
[<c012ee7f>] run_workqueue+0x81/0x187
[<c012eeba>] run_workqueue+0xbc/0x187
[<c012ee7f>] run_workqueue+0x81/0x187
[<c041e31e>] do_dbs_timer+0x0/0x24f
[<c012f6fa>] worker_thread+0x0/0xbd
[<c012f7ad>] worker_thread+0xb3/0xbd
[<c0131acc>] autoremove_wake_function+0x0/0x2d
[<c0131a1b>] kthread+0x38/0x5d
[<c01319e3>] kthread+0x0/0x5d
[<c0105527>] kernel_thread_helper+0x7/0x10
=======================
Code: c3 a1 dc da 6a c0 e9 78 ff ff ff 55 89 c5 57 89 d7 56 53 89 cb 8d
71 1c f0 0f ba 29 00 19 c0 31 d2 85 c0 75 76 83 79 1c 00 74 04 <0f> 0b
eb fe 8d 41 04 39 41 04 74 04 0f 0b eb fe 89 f8 64 8b 15
EIP: [<c012f61a>] queue_delayed_work_on+0x20/0x97 SS:ESP 0068:f7a69f40
---[ end trace 40ca209e9f1ab79d ]---


Kernel tainted:
# cat /proc/sys/kernel/tainted
128

# /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
stepping : 6
cpu MHz : 1992.000
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3
cx16 xtpr dca lahf_lm
bogomips : 5990.92
clflush size : 64
power management:

This machine is having two Dual core. Core 0 information provided above.

CPUID information:
# ./cpuid
CPU:
vendor_id = "GenuineIntel"
version information (1/eax):
processor type = primary processor (0)
family = Intel Pentium Pro/II/III/Celeron, AMD
Athlon/Duron, Cyrix M2, VIA C3 (6)
model = 0xf (15)
stepping id = 0x6 (6)
extended family = 0x0 (0)
extended model = 0x0 (0)
(simple synth) = Intel Core 2 Duo (Conroe/Allendale B2) / Core 2
Extreme Processor (Conroe B2) / Dual-Core Xeon Processor 5100 (Woodcrest
B2), 65nm

[...]

feature information (1/ecx):
PNI/SSE3: Prescott New Instructions = true
MONITOR/MWAIT = true
CPL-qualified debug store = true
VMX: virtual machine extensions = true
Enhanced Intel SpeedStep Technology = true
thermal monitor 2 = true
context ID: adaptive or shared L1 data = false
cmpxchg16b available = true
xTPR disable = true

[...]

Kernel configuration used:
# CPU Frequency scaling
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=y
CONFIG_CPU_FREQ_DEBUG=y
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_FREQ_STAT_DETAILS=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=m
CONFIG_CPU_FREQ_GOV_USERSPACE=m
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=m

# rpm -qa | egrep "binutils|gcc"
binutils-2.17.50.0.6-2.el5
gcc-c++-4.1.1-52.el5
gcc-4.1.1-52.el5
libgcc-4.1.1-52.el5
gcc-gfortran-4.1.1-52.el5

# uname -a
Linux x3550 2.6.25.7 #1 SMP Thu Jun 19 17:24:06 IST 2008 i686 i686 i386
GNU/Linux

Could be able to reproduce the same on 2.6.26-rc6.

Please let me know if you need more information.
Please CC me as I am not subscribed.

Thanks in advance.

Regards
R.Nageswara Sastry, CSTE?,C|EH?,CSTM?
Linux Technology Center
IBM India System & Technology Lab


2008-06-23 15:27:19

by Johannes Weiner

[permalink] [raw]
Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

Hi,

Nageswara R Sastry <[email protected]> writes:

> Hi,
>
> While changing the cpufreq governor with the following script for
> about 30 minutes, the kernel hits a BUG in workqueue.c. Detailed
> trace attached.
>
> Script:
> #!/bin/bash
>
> while [ 1 ]
> do
> for govnrs in ondemand conservative
> do
> for cpu in 0 1 2 3
> do
> echo $govnrs > /sys/devices/system/cpu/cpu${cpu}/cpufreq/scaling_governor
> if ! [ -d /sys/devices/system/cpu/cpu${cpu}/cpufreq/${govnrs} ] ; then
> echo "Error: Enable to create dir $govnrs"
> fi
> done
> done
> done
>
> Kernel stack trace:
> ------------[ cut here ]------------
> kernel BUG at kernel/workqueue.c:223!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: cpufreq_powersave cpufreq_conservative
> cpufreq_userspace usb_storage usbhid ehci_hcd ohci_hcd uhci_hcd
> usbcore
>
> Pid: 232, comm: kondemand/1 Not tainted (2.6.25.7 #1)
> EIP: 0060:[<c012f61a>] EFLAGS: 00010286 CPU: 1
> EIP is at queue_delayed_work_on+0x20/0x97
> EAX: 00000000 EBX: c483ba94 ECX: c483ba94 EDX: 00000000
> ESI: c483bab0 EDI: f7a3f708 EBP: 00000001 ESP: f7a69f40
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process kondemand/1 (pid: 232, ti=f7a68000 task=f794d020 task.ti=f7a68000)
> Stack: 00000000 f7a3e7b0 c483ba80 f7ab5e98 c041e54d 00000040 00000000
> 00000001
> 00000040 00000246 00000000 00000002 00000000 c012ee7f c483ba98
> f7a3e7b0
> c483ba94 f7a69f9c c012eeba 00000000 00000002 c012ee7f c041e31e
> c099e2a8
> Call Trace:
> [<c041e54d>] do_dbs_timer+0x22f/0x24f
> [<c012ee7f>] run_workqueue+0x81/0x187
> [<c012eeba>] run_workqueue+0xbc/0x187
> [<c012ee7f>] run_workqueue+0x81/0x187
> [<c041e31e>] do_dbs_timer+0x0/0x24f
> [<c012f6fa>] worker_thread+0x0/0xbd
> [<c012f7ad>] worker_thread+0xb3/0xbd
> [<c0131acc>] autoremove_wake_function+0x0/0x2d
> [<c0131a1b>] kthread+0x38/0x5d
> [<c01319e3>] kthread+0x0/0x5d
> [<c0105527>] kernel_thread_helper+0x7/0x10
> =======================
> Code: c3 a1 dc da 6a c0 e9 78 ff ff ff 55 89 c5 57 89 d7 56 53 89 cb
> 8d 71 1c f0 0f ba 29 00 19 c0 31 d2 85 c0 75 76 83 79 1c 00 74 04 <0f>
> 0b eb fe 8d 41 04 39 41 04 74 04 0f 0b eb fe 89 f8 64 8b 15
> EIP: [<c012f61a>] queue_delayed_work_on+0x20/0x97 SS:ESP 0068:f7a69f40
> ---[ end trace 40ca209e9f1ab79d ]---

Added Dave Jones to CC.

The work seems to be queued twice. Might there be a race in the
activation of the governor?

proc #0 proc #1

if (this_dbs_info->enable == 0)
<PREEMPTED>
if (this_dbs_info->enable == 0)
mutex_lock()
dbs_timer_init()
queue_delayed_work_on()
mutex_unlock()
...
mutex_lock()
dbs_timer_init()
queue_delayed_work_on()

If that might happen, would it be feasible to put the check for ->enable
within the mutex-protection?

Hannes

---
From: Johannes Weiner <[email protected]>
Subject: cpufreq: Fix race in enabling ondemand/conservative governors

Prevent double activation of the governor if two processes race on the
check for whether the governor is already active.

Signed-off-by: Johannes Weiner <[email protected]>
---

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
index 5d3a04b..a4902e4 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -486,10 +486,11 @@ static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
if ((!cpu_online(cpu)) || (!policy->cur))
return -EINVAL;

- if (this_dbs_info->enable) /* Already enabled */
- break;
-
mutex_lock(&dbs_mutex);
+ if (this_dbs_info->enable) {
+ mutex_unlock(&dbs_mutex);
+ break;
+ }

rc = sysfs_create_group(&policy->kobj, &dbs_attr_group);
if (rc) {
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index d2af20d..61705e1 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -508,10 +508,12 @@ static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
if ((!cpu_online(cpu)) || (!policy->cur))
return -EINVAL;

- if (this_dbs_info->enable) /* Already enabled */
+ mutex_lock(&dbs_mutex);
+ if (this_dbs_info->enable) {
+ mutex_unlock(&dbs_mutex);
break;
+ }

- mutex_lock(&dbs_mutex);
dbs_enable++;

rc = sysfs_create_group(&policy->kobj, &dbs_attr_group);

2008-06-24 09:17:18

by Nageswara R Sastry

[permalink] [raw]
Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

Johannes Weiner wrote:
>
> From: Johannes Weiner <[email protected]>
> Subject: cpufreq: Fix race in enabling ondemand/conservative governors
>
> Prevent double activation of the governor if two processes race on the
> check for whether the governor is already active.
>
> Signed-off-by: Johannes Weiner <[email protected]>
> ---
>
> diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
> index 5d3a04b..a4902e4 100644
> --- a/drivers/cpufreq/cpufreq_conservative.c
> +++ b/drivers/cpufreq/cpufreq_conservative.c
> @@ -486,10 +486,11 @@ static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
> if ((!cpu_online(cpu)) || (!policy->cur))
> return -EINVAL;
>
> - if (this_dbs_info->enable) /* Already enabled */
> - break;
> -
> mutex_lock(&dbs_mutex);
> + if (this_dbs_info->enable) {
> + mutex_unlock(&dbs_mutex);
> + break;
> + }
>
> rc = sysfs_create_group(&policy->kobj, &dbs_attr_group);
> if (rc) {
> diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
> index d2af20d..61705e1 100644
> --- a/drivers/cpufreq/cpufreq_ondemand.c
> +++ b/drivers/cpufreq/cpufreq_ondemand.c
> @@ -508,10 +508,12 @@ static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
> if ((!cpu_online(cpu)) || (!policy->cur))
> return -EINVAL;
>
> - if (this_dbs_info->enable) /* Already enabled */
> + mutex_lock(&dbs_mutex);
> + if (this_dbs_info->enable) {
> + mutex_unlock(&dbs_mutex);
> break;
> + }
>
> - mutex_lock(&dbs_mutex);
> dbs_enable++;
>
> rc = sysfs_create_group(&policy->kobj, &dbs_attr_group);

Applied the above patch against 2.6.25.7 kernel and tested the same, But
unluckily the above patch didn't stop hitting kernel bug at
kernel/workqueue.c:223
Hitting bug time got delayed near to 2hrs with the above patch.

>> Kernel stack trace:
------------[ cut here ]------------
kernel BUG at kernel/workqueue.c:223!
invalid opcode: 0000 [#2] SMP
Modules linked in: cpufreq_powersave cpufreq_conservative
cpufreq_userspace usbhid usb_storage ehci_hcd ohci_hcd uhci_hcd usbcore

Pid: 232, comm: kondemand/1 Tainted: G D (2.6.25.7.cpufreqpatch #2)
EIP: 0060:[<c012f61a>] EFLAGS: 00010286 CPU: 1
EIP is at queue_delayed_work_on+0x20/0x97
EAX: 00000000 EBX: c483ba94 ECX: c483ba94 EDX: 00000000
ESI: c483bab0 EDI: f7a39708 EBP: 00000001 ESP: f7a69f40
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process kondemand/1 (pid: 232, ti=f7a68000 task=f79b4120 task.ti=f7a68000)
Stack: 00000000 f7a377b0 c483ba80 f78bed80 c041e54d 00000040 00000000
00000001
00000040 00000246 00000000 00000002 00000000 c012ee7f c483ba98
f7a377b0
c483ba94 f7a69f9c c012eeba 00000000 00000002 c012ee7f c041e31e
c099e2a8
Call Trace:
[<c041e54d>] do_dbs_timer+0x22f/0x24f
[<c012ee7f>] run_workqueue+0x81/0x187
[<c012eeba>] run_workqueue+0xbc/0x187
[<c012ee7f>] run_workqueue+0x81/0x187
[<c041e31e>] do_dbs_timer+0x0/0x24f
[<c012f6fa>] worker_thread+0x0/0xbd
[<c012f7ad>] worker_thread+0xb3/0xbd
[<c0131acc>] autoremove_wake_function+0x0/0x2d
[<c0131a1b>] kthread+0x38/0x5d
[<c01319e3>] kthread+0x0/0x5d
[<c0105527>] kernel_thread_helper+0x7/0x10
=======================
Code: c3 a1 dc da 6a c0 e9 78 ff ff ff 55 89 c5 57 89 d7 56 53 89 cb 8d
71 1c f0 0f ba 29 00 19 c0 31 d2 85 c0 75 76 83 79 1c 00 74 04 <0f> 0b
eb fe 8d 41 04 39 41 04 74 04 0f 0b eb fe 89 f8 64 8b 15
EIP: [<c012f61a>] queue_delayed_work_on+0x20/0x97 SS:ESP 0068:f7a69f40
---[ end trace 1f82bd8b1ee896cc ]---

And when the script is running opened one more terminal and issued the
following command to observe the state of governors.

# watch -n 1 "cat /proc/cpuinfo | grep 'cpu MHz' ; cat
/proc/acpi/processor/CPU*/performance; cat
/sys/devices/system/cpu/*/cpufreq/scaling_governor; cat
/sys/devices/system/cpu/*/cpufreq/scaling_cur_freq"

The above command hit with an another kernel BUG at
kernel/workqueue.c:272 the kernel stack trace is,

------------[ cut here ]------------
kernel BUG at kernel/workqueue.c:272!
invalid opcode: 0000 [#1] SMP
Modules linked in: cpufreq_powersave cpufreq_conservative
cpufreq_userspace usbhid usb_storage ehci_hcd ohci_hcd uhci_hcd usbcore

Pid: 233, comm: kondemand/2 Not tainted (2.6.25.7.cpufreqpatch #2)
EIP: 0060:[<c012ee7b>] EFLAGS: 00010212 CPU: 2
EIP is at run_workqueue+0x7d/0x187
EAX: f7a377b0 EBX: c4844a98 ECX: 00000000 EDX: 00000000
ESI: f7a37718 EDI: c4844a94 EBP: f79f9f9c ESP: f79f9f98
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process kondemand/2 (pid: 233, ti=f79f8000 task=f79b5020 task.ti=f79f8000)
Stack: c041e31e c099e2a8 00000000 c061ec35 f7a37718 c012f6fa f7a3773c
00000000
c012f7ad 00000000 f79b5020 c0131acc f79f9fc8 f79f9fc8 f7a37718
00000000
c0131a1b c01319e3 00000000 c0105527 f784bef0 00000000 00000000
00000000
Call Trace:
[<c041e31e>] do_dbs_timer+0x0/0x24f
[<c012f6fa>] worker_thread+0x0/0xbd
[<c012f7ad>] worker_thread+0xb3/0xbd
[<c0131acc>] autoremove_wake_function+0x0/0x2d
[<c0131a1b>] kthread+0x38/0x5d
[<c01319e3>] kthread+0x0/0x5d
[<c0105527>] kernel_thread_helper+0x7/0x10
=======================
Code: e8 d3 2f 16 00 8b 46 1c 89 7e 48 8b 08 8b 50 04 89 51 04 89 0a 89
40 04 89 00 89 f0 e8 59 ff 39 00 8b 43 fc 83 e0 fc 39 f0 74 04 <0f> 0b
eb fe f0 0f ba 73 fc 00 8b 46 4c 31 c9 31 d2 68 7f ee 12
EIP: [<c012ee7b>] run_workqueue+0x7d/0x187 SS:ESP 0068:f79f9f98
---[ end trace 1f82bd8b1ee896cc ]---

Regards
R.Nageswara Sastry, CSTE, CEH, CSTM
Linux Technology Center
IBM India System & Technology Lab

2008-06-25 19:48:29

by Johannes Weiner

[permalink] [raw]
Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

Hi,

Nageswara R Sastry <[email protected]> writes:

> Johannes Weiner wrote:
>>
>> From: Johannes Weiner <[email protected]>
>> Subject: cpufreq: Fix race in enabling ondemand/conservative governors
>>
>> Prevent double activation of the governor if two processes race on the
>> check for whether the governor is already active.
>>
>> Signed-off-by: Johannes Weiner <[email protected]>
>> ---
>>
>> diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
>> index 5d3a04b..a4902e4 100644
>> --- a/drivers/cpufreq/cpufreq_conservative.c
>> +++ b/drivers/cpufreq/cpufreq_conservative.c
>> @@ -486,10 +486,11 @@ static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
>> if ((!cpu_online(cpu)) || (!policy->cur))
>> return -EINVAL;
>>
>> - if (this_dbs_info->enable) /* Already enabled */
>> - break;
>> -
>> mutex_lock(&dbs_mutex);
>> + if (this_dbs_info->enable) {
>> + mutex_unlock(&dbs_mutex);
>> + break;
>> + }
>>
>> rc = sysfs_create_group(&policy->kobj, &dbs_attr_group);
>> if (rc) {
>> diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
>> index d2af20d..61705e1 100644
>> --- a/drivers/cpufreq/cpufreq_ondemand.c
>> +++ b/drivers/cpufreq/cpufreq_ondemand.c
>> @@ -508,10 +508,12 @@ static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
>> if ((!cpu_online(cpu)) || (!policy->cur))
>> return -EINVAL;
>>
>> - if (this_dbs_info->enable) /* Already enabled */
>> + mutex_lock(&dbs_mutex);
>> + if (this_dbs_info->enable) {
>> + mutex_unlock(&dbs_mutex);
>> break;
>> + }
>>
>> - mutex_lock(&dbs_mutex);
>> dbs_enable++;
>>
>> rc = sysfs_create_group(&policy->kobj, &dbs_attr_group);
>
> Applied the above patch against 2.6.25.7 kernel and tested the same,
> But unluckily the above patch didn't stop hitting kernel bug at
> kernel/workqueue.c:223
> Hitting bug time got delayed near to 2hrs with the above patch.
>
>>> Kernel stack trace:
> ------------[ cut here ]------------
> kernel BUG at kernel/workqueue.c:223!
> invalid opcode: 0000 [#2] SMP
> Modules linked in: cpufreq_powersave cpufreq_conservative
> cpufreq_userspace usbhid usb_storage ehci_hcd ohci_hcd uhci_hcd
> usbcore
>
> Pid: 232, comm: kondemand/1 Tainted: G D (2.6.25.7.cpufreqpatch #2)
> EIP: 0060:[<c012f61a>] EFLAGS: 00010286 CPU: 1
> EIP is at queue_delayed_work_on+0x20/0x97
> EAX: 00000000 EBX: c483ba94 ECX: c483ba94 EDX: 00000000
> ESI: c483bab0 EDI: f7a39708 EBP: 00000001 ESP: f7a69f40
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process kondemand/1 (pid: 232, ti=f7a68000 task=f79b4120 task.ti=f7a68000)
> Stack: 00000000 f7a377b0 c483ba80 f78bed80 c041e54d 00000040 00000000
> 00000001
> 00000040 00000246 00000000 00000002 00000000 c012ee7f c483ba98
> f7a377b0
> c483ba94 f7a69f9c c012eeba 00000000 00000002 c012ee7f c041e31e
> c099e2a8
> Call Trace:
> [<c041e54d>] do_dbs_timer+0x22f/0x24f
> [<c012ee7f>] run_workqueue+0x81/0x187
> [<c012eeba>] run_workqueue+0xbc/0x187
> [<c012ee7f>] run_workqueue+0x81/0x187
> [<c041e31e>] do_dbs_timer+0x0/0x24f
> [<c012f6fa>] worker_thread+0x0/0xbd
> [<c012f7ad>] worker_thread+0xb3/0xbd
> [<c0131acc>] autoremove_wake_function+0x0/0x2d
> [<c0131a1b>] kthread+0x38/0x5d
> [<c01319e3>] kthread+0x0/0x5d
> [<c0105527>] kernel_thread_helper+0x7/0x10
> =======================
> Code: c3 a1 dc da 6a c0 e9 78 ff ff ff 55 89 c5 57 89 d7 56 53 89 cb
> 8d 71 1c f0 0f ba 29 00 19 c0 31 d2 85 c0 75 76 83 79 1c 00 74 04 <0f>
> 0b eb fe 8d 41 04 39 41 04 74 04 0f 0b eb fe 89 f8 64 8b 15
> EIP: [<c012f61a>] queue_delayed_work_on+0x20/0x97 SS:ESP 0068:f7a69f40
> ---[ end trace 1f82bd8b1ee896cc ]---
>
> And when the script is running opened one more terminal and issued the
> following command to observe the state of governors.
>
> # watch -n 1 "cat /proc/cpuinfo | grep 'cpu MHz' ; cat
> /proc/acpi/processor/CPU*/performance; cat
> /sys/devices/system/cpu/*/cpufreq/scaling_governor; cat
> /sys/devices/system/cpu/*/cpufreq/scaling_cur_freq"
>
> The above command hit with an another kernel BUG at
> kernel/workqueue.c:272 the kernel stack trace is,
>
> ------------[ cut here ]------------
> kernel BUG at kernel/workqueue.c:272!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: cpufreq_powersave cpufreq_conservative
> cpufreq_userspace usbhid usb_storage ehci_hcd ohci_hcd uhci_hcd
> usbcore
>
> Pid: 233, comm: kondemand/2 Not tainted (2.6.25.7.cpufreqpatch #2)
> EIP: 0060:[<c012ee7b>] EFLAGS: 00010212 CPU: 2
> EIP is at run_workqueue+0x7d/0x187
> EAX: f7a377b0 EBX: c4844a98 ECX: 00000000 EDX: 00000000
> ESI: f7a37718 EDI: c4844a94 EBP: f79f9f9c ESP: f79f9f98
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process kondemand/2 (pid: 233, ti=f79f8000 task=f79b5020 task.ti=f79f8000)
> Stack: c041e31e c099e2a8 00000000 c061ec35 f7a37718 c012f6fa f7a3773c
> 00000000
> c012f7ad 00000000 f79b5020 c0131acc f79f9fc8 f79f9fc8 f7a37718
> 00000000
> c0131a1b c01319e3 00000000 c0105527 f784bef0 00000000 00000000
> 00000000
> Call Trace:
> [<c041e31e>] do_dbs_timer+0x0/0x24f
> [<c012f6fa>] worker_thread+0x0/0xbd
> [<c012f7ad>] worker_thread+0xb3/0xbd
> [<c0131acc>] autoremove_wake_function+0x0/0x2d
> [<c0131a1b>] kthread+0x38/0x5d
> [<c01319e3>] kthread+0x0/0x5d
> [<c0105527>] kernel_thread_helper+0x7/0x10
> =======================
> Code: e8 d3 2f 16 00 8b 46 1c 89 7e 48 8b 08 8b 50 04 89 51 04 89 0a
> 89 40 04 89 00 89 f0 e8 59 ff 39 00 8b 43 fc 83 e0 fc 39 f0 74 04 <0f>
> 0b eb fe f0 0f ba 73 fc 00 8b 46 4c 31 c9 31 d2 68 7f ee 12
> EIP: [<c012ee7b>] run_workqueue+0x7d/0x187 SS:ESP 0068:f79f9f98
> ---[ end trace 1f82bd8b1ee896cc ]---

Cool, thanks for testing!

I think the above patch is right but not enough.

Dave, what do you think about the following?

---
From: Johannes Weiner <[email protected]>
Subject: cpufreq: cancel self-rearming work synchroneuously

The ondemand and conservative governor workers are self-rearming.
Cancel them synchroneously to avoid nasty races.

Reported-by: Nageswara R Sastry <[email protected]>
Signed-off-by: Johannes Weiner <[email protected]>
---

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
index 5d3a04b..78bac06 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -467,7 +467,7 @@ static inline void dbs_timer_init(void)

static inline void dbs_timer_exit(void)
{
- cancel_delayed_work(&dbs_work);
+ cancel_delayed_work_sync(&dbs_work);
return;
}

diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index d2af20d..1eb8c58 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -490,7 +490,7 @@ static inline void dbs_timer_init(struct cpu_dbs_info_s *dbs_info)
static inline void dbs_timer_exit(struct cpu_dbs_info_s *dbs_info)
{
dbs_info->enable = 0;
- cancel_delayed_work(&dbs_info->work);
+ cancel_delayed_work_sync(&dbs_info->work);
}

static int cpufreq_governor_dbs(struct cpufreq_policy *policy,

2008-06-25 19:55:16

by Dave Jones

[permalink] [raw]
Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

On Wed, Jun 25, 2008 at 09:47:50PM +0200, Johannes Weiner wrote:

> Cool, thanks for testing!
>
> I think the above patch is right but not enough.
>
> Dave, what do you think about the following?

Makes sense to me. I'll queue it up.

Dave

--
http://www.codemonkey.org.uk

2008-06-26 12:18:32

by Nageswara R Sastry

[permalink] [raw]
Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

Hi,

Johannes Weiner wrote:

> From: Johannes Weiner <[email protected]>
> Subject: cpufreq: cancel self-rearming work synchroneuously
>
> The ondemand and conservative governor workers are self-rearming.
> Cancel them synchroneously to avoid nasty races.
>
> Reported-by: Nageswara R Sastry <[email protected]>
> Signed-off-by: Johannes Weiner <[email protected]>
> ---
>
> diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
> index 5d3a04b..78bac06 100644
> --- a/drivers/cpufreq/cpufreq_conservative.c
> +++ b/drivers/cpufreq/cpufreq_conservative.c
> @@ -467,7 +467,7 @@ static inline void dbs_timer_init(void)
>
> static inline void dbs_timer_exit(void)
> {
> - cancel_delayed_work(&dbs_work);
> + cancel_delayed_work_sync(&dbs_work);
> return;
> }
>
> diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
> index d2af20d..1eb8c58 100644
> --- a/drivers/cpufreq/cpufreq_ondemand.c
> +++ b/drivers/cpufreq/cpufreq_ondemand.c
> @@ -490,7 +490,7 @@ static inline void dbs_timer_init(struct cpu_dbs_info_s *dbs_info)
> static inline void dbs_timer_exit(struct cpu_dbs_info_s *dbs_info)
> {
> dbs_info->enable = 0;
> - cancel_delayed_work(&dbs_info->work);
> + cancel_delayed_work_sync(&dbs_info->work);
> }
>
> static int cpufreq_governor_dbs(struct cpufreq_policy *policy,

Applied the above patch only and compiled the kernel and seeing an
Circular lock related issue at the time of booting. First I am checking
this and will let you the results by applying both the patches.

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.25.7.cpufreq_patch #2
-------------------------------------------------------
S06cpuspeed/3493 is trying to acquire lock:
(&(&dbs_info->work)->work){--..}, at: [<c012f46c>]
__cancel_work_timer+0x80/0x177

but task is already holding lock:
(dbs_mutex){--..}, at: [<c041e7cb>] cpufreq_governor_dbs+0x25e/0x2ed

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (dbs_mutex){--..}:
[<c013aa76>] add_lock_to_list+0x61/0x83
[<c013cfa3>] __lock_acquire+0x953/0xb05
[<c041e5e1>] cpufreq_governor_dbs+0x74/0x2ed
[<c013d1b4>] lock_acquire+0x5f/0x79
[<c041e5e1>] cpufreq_governor_dbs+0x74/0x2ed
[<c04cdaa7>] mutex_lock_nested+0xce/0x222
[<c041e5e1>] cpufreq_governor_dbs+0x74/0x2ed
[<c041e5e1>] cpufreq_governor_dbs+0x74/0x2ed
[<c041e5e1>] cpufreq_governor_dbs+0x74/0x2ed
[<c041c87a>] __cpufreq_governor+0x73/0xa6
[<c041c9e8>] __cpufreq_set_policy+0x13b/0x19e
[<c041d6b5>] cpufreq_add_dev+0x3b4/0x4aa
[<c041d296>] handle_update+0x0/0x21
[<c02ee310>] sysdev_driver_register+0x48/0x9a
[<c041c75b>] cpufreq_register_driver+0x9b/0x147
[<c06b742c>] kernel_init+0x130/0x26f
[<c06b72fc>] kernel_init+0x0/0x26f
[<c06b72fc>] kernel_init+0x0/0x26f
[<c0105527>] kernel_thread_helper+0x7/0x10
[<ffffffff>] 0xffffffff

-> #1 (&per_cpu(cpu_policy_rwsem, cpu)){----}:
[<c013cfa3>] __lock_acquire+0x953/0xb05
[<c041d194>] lock_policy_rwsem_write+0x30/0x56
[<c010a83b>] save_stack_trace+0x1a/0x35
[<c013d1b4>] lock_acquire+0x5f/0x79
[<c041d194>] lock_policy_rwsem_write+0x30/0x56
[<c04cdfd9>] down_write+0x2b/0x44
[<c041d194>] lock_policy_rwsem_write+0x30/0x56
[<c041d194>] lock_policy_rwsem_write+0x30/0x56
[<c041e35e>] do_dbs_timer+0x40/0x24f
[<c012ee7f>] run_workqueue+0x81/0x187
[<c012eeba>] run_workqueue+0xbc/0x187
[<c012ee7f>] run_workqueue+0x81/0x187
[<c041e31e>] do_dbs_timer+0x0/0x24f
[<c012f6fa>] worker_thread+0x0/0xbd
[<c012f7ad>] worker_thread+0xb3/0xbd
[<c0131acc>] autoremove_wake_function+0x0/0x2d
[<c0131a1b>] kthread+0x38/0x5d
[<c01319e3>] kthread+0x0/0x5d
[<c0105527>] kernel_thread_helper+0x7/0x10
[<ffffffff>] 0xffffffff

-> #0 (&(&dbs_info->work)->work){--..}:
[<c013b6a2>] print_circular_bug_tail+0x2a/0x61
[<c013cec8>] __lock_acquire+0x878/0xb05
[<c013d1b4>] lock_acquire+0x5f/0x79
[<c012f46c>] __cancel_work_timer+0x80/0x177
[<c012f497>] __cancel_work_timer+0xab/0x177
[<c012f46c>] __cancel_work_timer+0x80/0x177
[<c013c0ee>] mark_held_locks+0x39/0x53
[<c04cdbe8>] mutex_lock_nested+0x20f/0x222
[<c013c277>] trace_hardirqs_on+0xe7/0x10e
[<c04cdbf3>] mutex_lock_nested+0x21a/0x222
[<c041e7cb>] cpufreq_governor_dbs+0x25e/0x2ed
[<c041e7dd>] cpufreq_governor_dbs+0x270/0x2ed
[<c041c87a>] __cpufreq_governor+0x73/0xa6
[<c041c9d6>] __cpufreq_set_policy+0x129/0x19e
[<c041ce0b>] store_scaling_governor+0x112/0x135
[<c041d296>] handle_update+0x0/0x21
[<c0410065>] atkbd_set_leds+0x9/0xcf
[<c041ccf9>] store_scaling_governor+0x0/0x135
[<c041d7e7>] store+0x3c/0x54
[<c01a09a0>] sysfs_write_file+0xa9/0xdd
[<c01a08f7>] sysfs_write_file+0x0/0xdd
[<c016e412>] vfs_write+0x83/0xf6
[<c016e958>] sys_write+0x3c/0x63
[<c0104816>] sysenter_past_esp+0x5f/0xa5
[<ffffffff>] 0xffffffff

other info that might help us debug this:

3 locks held by S06cpuspeed/3493:
#0: (&buffer->mutex){--..}, at: [<c01a091b>] sysfs_write_file+0x24/0xdd
#1: (&per_cpu(cpu_policy_rwsem, cpu)){----}, at: [<c041d194>]
lock_policy_rwsem_write+0x30/0x56
#2: (dbs_mutex){--..}, at: [<c041e7cb>] cpufreq_governor_dbs+0x25e/0x2ed

stack backtrace:
Pid: 3493, comm: S06cpuspeed Not tainted 2.6.25.7.cpufreq_patch #2
[<c013b6cf>] print_circular_bug_tail+0x57/0x61
[<c013cec8>] __lock_acquire+0x878/0xb05
[<c013d1b4>] lock_acquire+0x5f/0x79
[<c012f46c>] __cancel_work_timer+0x80/0x177
[<c012f497>] __cancel_work_timer+0xab/0x177
[<c012f46c>] __cancel_work_timer+0x80/0x177
[<c013c0ee>] mark_held_locks+0x39/0x53
[<c04cdbe8>] mutex_lock_nested+0x20f/0x222
[<c013c277>] trace_hardirqs_on+0xe7/0x10e
[<c04cdbf3>] mutex_lock_nested+0x21a/0x222
[<c041e7cb>] cpufreq_governor_dbs+0x25e/0x2ed
[<c041e7dd>] cpufreq_governor_dbs+0x270/0x2ed
[<c041c87a>] __cpufreq_governor+0x73/0xa6
[<c041c9d6>] __cpufreq_set_policy+0x129/0x19e
[<c041ce0b>] store_scaling_governor+0x112/0x135
[<c041d296>] handle_update+0x0/0x21
[<c0410065>] atkbd_set_leds+0x9/0xcf
[<c041ccf9>] store_scaling_governor+0x0/0x135
[<c041d7e7>] store+0x3c/0x54
[<c01a09a0>] sysfs_write_file+0xa9/0xdd
[<c01a08f7>] sysfs_write_file+0x0/0xdd
[<c016e412>] vfs_write+0x83/0xf6
[<c016e958>] sys_write+0x3c/0x63
[<c0104816>] sysenter_past_esp+0x5f/0xa5
=======================

2008-06-26 13:32:51

by Nageswara R Sastry

[permalink] [raw]
Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

Nageswara R Sastry wrote:
> Hi,
>
> Johannes Weiner wrote:
>
>> From: Johannes Weiner <[email protected]>
>> Subject: cpufreq: cancel self-rearming work synchroneuously
>>
>> The ondemand and conservative governor workers are self-rearming.
>> Cancel them synchroneously to avoid nasty races.
>>
>> Reported-by: Nageswara R Sastry <[email protected]>
>> Signed-off-by: Johannes Weiner <[email protected]>
>> ---
>>
>> diff --git a/drivers/cpufreq/cpufreq_conservative.c
>> b/drivers/cpufreq/cpufreq_conservative.c
>> index 5d3a04b..78bac06 100644
>> --- a/drivers/cpufreq/cpufreq_conservative.c
>> +++ b/drivers/cpufreq/cpufreq_conservative.c
>> @@ -467,7 +467,7 @@ static inline void dbs_timer_init(void)
>>
>> static inline void dbs_timer_exit(void)
>> {
>> - cancel_delayed_work(&dbs_work);
>> + cancel_delayed_work_sync(&dbs_work);
>> return;
>> }
>>
>> diff --git a/drivers/cpufreq/cpufreq_ondemand.c
>> b/drivers/cpufreq/cpufreq_ondemand.c
>> index d2af20d..1eb8c58 100644
>> --- a/drivers/cpufreq/cpufreq_ondemand.c
>> +++ b/drivers/cpufreq/cpufreq_ondemand.c
>> @@ -490,7 +490,7 @@ static inline void dbs_timer_init(struct
>> cpu_dbs_info_s *dbs_info)
>> static inline void dbs_timer_exit(struct cpu_dbs_info_s *dbs_info)
>> {
>> dbs_info->enable = 0;
>> - cancel_delayed_work(&dbs_info->work);
>> + cancel_delayed_work_sync(&dbs_info->work);
>> }
>>
>> static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
>
> Applied the above patch only and compiled the kernel and seeing an
> Circular lock related issue at the time of booting. First I am checking
> this and will let you the results by applying both the patches.
>
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.25.7.cpufreq_patch #2
> -------------------------------------------------------
> S06cpuspeed/3493 is trying to acquire lock:
> (&(&dbs_info->work)->work){--..}, at: [<c012f46c>]
> __cancel_work_timer+0x80/0x177
>
> but task is already holding lock:
> (dbs_mutex){--..}, at: [<c041e7cb>] cpufreq_governor_dbs+0x25e/0x2ed
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (dbs_mutex){--..}:
> [<c013aa76>] add_lock_to_list+0x61/0x83
> [<c013cfa3>] __lock_acquire+0x953/0xb05
> [<c041e5e1>] cpufreq_governor_dbs+0x74/0x2ed
> [<c013d1b4>] lock_acquire+0x5f/0x79
> [<c041e5e1>] cpufreq_governor_dbs+0x74/0x2ed
> [<c04cdaa7>] mutex_lock_nested+0xce/0x222
> [<c041e5e1>] cpufreq_governor_dbs+0x74/0x2ed
> [<c041e5e1>] cpufreq_governor_dbs+0x74/0x2ed
> [<c041e5e1>] cpufreq_governor_dbs+0x74/0x2ed
> [<c041c87a>] __cpufreq_governor+0x73/0xa6
> [<c041c9e8>] __cpufreq_set_policy+0x13b/0x19e
> [<c041d6b5>] cpufreq_add_dev+0x3b4/0x4aa
> [<c041d296>] handle_update+0x0/0x21
> [<c02ee310>] sysdev_driver_register+0x48/0x9a
> [<c041c75b>] cpufreq_register_driver+0x9b/0x147
> [<c06b742c>] kernel_init+0x130/0x26f
> [<c06b72fc>] kernel_init+0x0/0x26f
> [<c06b72fc>] kernel_init+0x0/0x26f
> [<c0105527>] kernel_thread_helper+0x7/0x10
> [<ffffffff>] 0xffffffff
>
> -> #1 (&per_cpu(cpu_policy_rwsem, cpu)){----}:
> [<c013cfa3>] __lock_acquire+0x953/0xb05
> [<c041d194>] lock_policy_rwsem_write+0x30/0x56
> [<c010a83b>] save_stack_trace+0x1a/0x35
> [<c013d1b4>] lock_acquire+0x5f/0x79
> [<c041d194>] lock_policy_rwsem_write+0x30/0x56
> [<c04cdfd9>] down_write+0x2b/0x44
> [<c041d194>] lock_policy_rwsem_write+0x30/0x56
> [<c041d194>] lock_policy_rwsem_write+0x30/0x56
> [<c041e35e>] do_dbs_timer+0x40/0x24f
> [<c012ee7f>] run_workqueue+0x81/0x187
> [<c012eeba>] run_workqueue+0xbc/0x187
> [<c012ee7f>] run_workqueue+0x81/0x187
> [<c041e31e>] do_dbs_timer+0x0/0x24f
> [<c012f6fa>] worker_thread+0x0/0xbd
> [<c012f7ad>] worker_thread+0xb3/0xbd
> [<c0131acc>] autoremove_wake_function+0x0/0x2d
> [<c0131a1b>] kthread+0x38/0x5d
> [<c01319e3>] kthread+0x0/0x5d
> [<c0105527>] kernel_thread_helper+0x7/0x10
> [<ffffffff>] 0xffffffff
>
> -> #0 (&(&dbs_info->work)->work){--..}:
> [<c013b6a2>] print_circular_bug_tail+0x2a/0x61
> [<c013cec8>] __lock_acquire+0x878/0xb05
> [<c013d1b4>] lock_acquire+0x5f/0x79
> [<c012f46c>] __cancel_work_timer+0x80/0x177
> [<c012f497>] __cancel_work_timer+0xab/0x177
> [<c012f46c>] __cancel_work_timer+0x80/0x177
> [<c013c0ee>] mark_held_locks+0x39/0x53
> [<c04cdbe8>] mutex_lock_nested+0x20f/0x222
> [<c013c277>] trace_hardirqs_on+0xe7/0x10e
> [<c04cdbf3>] mutex_lock_nested+0x21a/0x222
> [<c041e7cb>] cpufreq_governor_dbs+0x25e/0x2ed
> [<c041e7dd>] cpufreq_governor_dbs+0x270/0x2ed
> [<c041c87a>] __cpufreq_governor+0x73/0xa6
> [<c041c9d6>] __cpufreq_set_policy+0x129/0x19e
> [<c041ce0b>] store_scaling_governor+0x112/0x135
> [<c041d296>] handle_update+0x0/0x21
> [<c0410065>] atkbd_set_leds+0x9/0xcf
> [<c041ccf9>] store_scaling_governor+0x0/0x135
> [<c041d7e7>] store+0x3c/0x54
> [<c01a09a0>] sysfs_write_file+0xa9/0xdd
> [<c01a08f7>] sysfs_write_file+0x0/0xdd
> [<c016e412>] vfs_write+0x83/0xf6
> [<c016e958>] sys_write+0x3c/0x63
> [<c0104816>] sysenter_past_esp+0x5f/0xa5
> [<ffffffff>] 0xffffffff
>
> other info that might help us debug this:
>
> 3 locks held by S06cpuspeed/3493:
> #0: (&buffer->mutex){--..}, at: [<c01a091b>] sysfs_write_file+0x24/0xdd
> #1: (&per_cpu(cpu_policy_rwsem, cpu)){----}, at: [<c041d194>]
> lock_policy_rwsem_write+0x30/0x56
> #2: (dbs_mutex){--..}, at: [<c041e7cb>] cpufreq_governor_dbs+0x25e/0x2ed
>
> stack backtrace:
> Pid: 3493, comm: S06cpuspeed Not tainted 2.6.25.7.cpufreq_patch #2
> [<c013b6cf>] print_circular_bug_tail+0x57/0x61
> [<c013cec8>] __lock_acquire+0x878/0xb05
> [<c013d1b4>] lock_acquire+0x5f/0x79
> [<c012f46c>] __cancel_work_timer+0x80/0x177
> [<c012f497>] __cancel_work_timer+0xab/0x177
> [<c012f46c>] __cancel_work_timer+0x80/0x177
> [<c013c0ee>] mark_held_locks+0x39/0x53
> [<c04cdbe8>] mutex_lock_nested+0x20f/0x222
> [<c013c277>] trace_hardirqs_on+0xe7/0x10e
> [<c04cdbf3>] mutex_lock_nested+0x21a/0x222
> [<c041e7cb>] cpufreq_governor_dbs+0x25e/0x2ed
> [<c041e7dd>] cpufreq_governor_dbs+0x270/0x2ed
> [<c041c87a>] __cpufreq_governor+0x73/0xa6
> [<c041c9d6>] __cpufreq_set_policy+0x129/0x19e
> [<c041ce0b>] store_scaling_governor+0x112/0x135
> [<c041d296>] handle_update+0x0/0x21
> [<c0410065>] atkbd_set_leds+0x9/0xcf
> [<c041ccf9>] store_scaling_governor+0x0/0x135
> [<c041d7e7>] store+0x3c/0x54
> [<c01a09a0>] sysfs_write_file+0xa9/0xdd
> [<c01a08f7>] sysfs_write_file+0x0/0xdd
> [<c016e412>] vfs_write+0x83/0xf6
> [<c016e958>] sys_write+0x3c/0x63
> [<c0104816>] sysenter_past_esp+0x5f/0xa5
> =======================
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

While running the script I observed no stack trace but sysfs hang for a
particular file like scaling_governor under cpufreq for a particular cpu
(this cpu is varying). In the following run I experienced with cpu4.

cat /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor

The above command is not giving any result and it won't be coming out
after pressing ctrl+c or ctrl+z

Regards
R.Nageswara Sastry

2008-06-27 04:12:30

by Nageswara R Sastry

[permalink] [raw]
Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

Nageswara R Sastry wrote:
Hi,
>> =======================================================
>> [ INFO: possible circular locking dependency detected ]
>> 2.6.25.7.cpufreq_patch #2
>> -------------------------------------------------------
[...]

> While running the script I observed no stack trace but sysfs hang for a
> particular file like scaling_governor under cpufreq for a particular cpu
> (this cpu is varying). In the following run I experienced with cpu4.
>
> cat /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
>
> The above command is not giving any result and it won't be coming out
> after pressing ctrl+c or ctrl+z


Applied both the patches and it seems working fine - no stack trace, but
the above "mentioned circular locking dependency and sysfs file hang"
are existing.

Thanks
Regards
R.Nageswara Sastry

2008-07-01 14:01:22

by Johannes Weiner

[permalink] [raw]
Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

Hi,

Nageswara R Sastry <[email protected]> writes:

> Hi,
>
> Johannes Weiner wrote:
>
>> From: Johannes Weiner <[email protected]>
>> Subject: cpufreq: cancel self-rearming work synchroneuously
>>
>> The ondemand and conservative governor workers are self-rearming.
>> Cancel them synchroneously to avoid nasty races.
>>
>> Reported-by: Nageswara R Sastry <[email protected]>
>> Signed-off-by: Johannes Weiner <[email protected]>
>> ---
>>
>> diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
>> index 5d3a04b..78bac06 100644
>> --- a/drivers/cpufreq/cpufreq_conservative.c
>> +++ b/drivers/cpufreq/cpufreq_conservative.c
>> @@ -467,7 +467,7 @@ static inline void dbs_timer_init(void)
>>
>> static inline void dbs_timer_exit(void)
>> {
>> - cancel_delayed_work(&dbs_work);
>> + cancel_delayed_work_sync(&dbs_work);
>> return;
>> }
>>
>> diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
>> index d2af20d..1eb8c58 100644
>> --- a/drivers/cpufreq/cpufreq_ondemand.c
>> +++ b/drivers/cpufreq/cpufreq_ondemand.c
>> @@ -490,7 +490,7 @@ static inline void dbs_timer_init(struct cpu_dbs_info_s *dbs_info)
>> static inline void dbs_timer_exit(struct cpu_dbs_info_s *dbs_info)
>> {
>> dbs_info->enable = 0;
>> - cancel_delayed_work(&dbs_info->work);
>> + cancel_delayed_work_sync(&dbs_info->work);
>> }
>>
>> static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
>
> Applied the above patch only and compiled the kernel and seeing an
> Circular lock related issue at the time of booting. First I am
> checking this and will let you the results by applying both the
> patches.
>
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.25.7.cpufreq_patch #2
> -------------------------------------------------------
> S06cpuspeed/3493 is trying to acquire lock:
> (&(&dbs_info->work)->work){--..}, at: [<c012f46c>]
> __cancel_work_timer+0x80/0x177
>
> but task is already holding lock:
> (dbs_mutex){--..}, at: [<c041e7cb>] cpufreq_governor_dbs+0x25e/0x2ed
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (dbs_mutex){--..}:
> [<c013aa76>] add_lock_to_list+0x61/0x83
> [<c013cfa3>] __lock_acquire+0x953/0xb05
> [<c041e5e1>] cpufreq_governor_dbs+0x74/0x2ed
> [<c013d1b4>] lock_acquire+0x5f/0x79
> [<c041e5e1>] cpufreq_governor_dbs+0x74/0x2ed
> [<c04cdaa7>] mutex_lock_nested+0xce/0x222
> [<c041e5e1>] cpufreq_governor_dbs+0x74/0x2ed
> [<c041e5e1>] cpufreq_governor_dbs+0x74/0x2ed
> [<c041e5e1>] cpufreq_governor_dbs+0x74/0x2ed
> [<c041c87a>] __cpufreq_governor+0x73/0xa6
> [<c041c9e8>] __cpufreq_set_policy+0x13b/0x19e
> [<c041d6b5>] cpufreq_add_dev+0x3b4/0x4aa
> [<c041d296>] handle_update+0x0/0x21
> [<c02ee310>] sysdev_driver_register+0x48/0x9a
> [<c041c75b>] cpufreq_register_driver+0x9b/0x147
> [<c06b742c>] kernel_init+0x130/0x26f
> [<c06b72fc>] kernel_init+0x0/0x26f
> [<c06b72fc>] kernel_init+0x0/0x26f
> [<c0105527>] kernel_thread_helper+0x7/0x10
> [<ffffffff>] 0xffffffff
>
> -> #1 (&per_cpu(cpu_policy_rwsem, cpu)){----}:
> [<c013cfa3>] __lock_acquire+0x953/0xb05
> [<c041d194>] lock_policy_rwsem_write+0x30/0x56
> [<c010a83b>] save_stack_trace+0x1a/0x35
> [<c013d1b4>] lock_acquire+0x5f/0x79
> [<c041d194>] lock_policy_rwsem_write+0x30/0x56
> [<c04cdfd9>] down_write+0x2b/0x44
> [<c041d194>] lock_policy_rwsem_write+0x30/0x56
> [<c041d194>] lock_policy_rwsem_write+0x30/0x56
> [<c041e35e>] do_dbs_timer+0x40/0x24f
> [<c012ee7f>] run_workqueue+0x81/0x187
> [<c012eeba>] run_workqueue+0xbc/0x187
> [<c012ee7f>] run_workqueue+0x81/0x187
> [<c041e31e>] do_dbs_timer+0x0/0x24f
> [<c012f6fa>] worker_thread+0x0/0xbd
> [<c012f7ad>] worker_thread+0xb3/0xbd
> [<c0131acc>] autoremove_wake_function+0x0/0x2d
> [<c0131a1b>] kthread+0x38/0x5d
> [<c01319e3>] kthread+0x0/0x5d
> [<c0105527>] kernel_thread_helper+0x7/0x10
> [<ffffffff>] 0xffffffff
>
> -> #0 (&(&dbs_info->work)->work){--..}:
> [<c013b6a2>] print_circular_bug_tail+0x2a/0x61
> [<c013cec8>] __lock_acquire+0x878/0xb05
> [<c013d1b4>] lock_acquire+0x5f/0x79
> [<c012f46c>] __cancel_work_timer+0x80/0x177
> [<c012f497>] __cancel_work_timer+0xab/0x177
> [<c012f46c>] __cancel_work_timer+0x80/0x177
> [<c013c0ee>] mark_held_locks+0x39/0x53
> [<c04cdbe8>] mutex_lock_nested+0x20f/0x222
> [<c013c277>] trace_hardirqs_on+0xe7/0x10e
> [<c04cdbf3>] mutex_lock_nested+0x21a/0x222
> [<c041e7cb>] cpufreq_governor_dbs+0x25e/0x2ed
> [<c041e7dd>] cpufreq_governor_dbs+0x270/0x2ed
> [<c041c87a>] __cpufreq_governor+0x73/0xa6
> [<c041c9d6>] __cpufreq_set_policy+0x129/0x19e
> [<c041ce0b>] store_scaling_governor+0x112/0x135
> [<c041d296>] handle_update+0x0/0x21
> [<c0410065>] atkbd_set_leds+0x9/0xcf
> [<c041ccf9>] store_scaling_governor+0x0/0x135
> [<c041d7e7>] store+0x3c/0x54
> [<c01a09a0>] sysfs_write_file+0xa9/0xdd
> [<c01a08f7>] sysfs_write_file+0x0/0xdd
> [<c016e412>] vfs_write+0x83/0xf6
> [<c016e958>] sys_write+0x3c/0x63
> [<c0104816>] sysenter_past_esp+0x5f/0xa5
> [<ffffffff>] 0xffffffff
>
> other info that might help us debug this:
>
> 3 locks held by S06cpuspeed/3493:
> #0: (&buffer->mutex){--..}, at: [<c01a091b>] sysfs_write_file+0x24/0xdd
> #1: (&per_cpu(cpu_policy_rwsem, cpu)){----}, at: [<c041d194>]
> lock_policy_rwsem_write+0x30/0x56
> #2: (dbs_mutex){--..}, at: [<c041e7cb>] cpufreq_governor_dbs+0x25e/0x2ed
>
> stack backtrace:
> Pid: 3493, comm: S06cpuspeed Not tainted 2.6.25.7.cpufreq_patch #2
> [<c013b6cf>] print_circular_bug_tail+0x57/0x61
> [<c013cec8>] __lock_acquire+0x878/0xb05
> [<c013d1b4>] lock_acquire+0x5f/0x79
> [<c012f46c>] __cancel_work_timer+0x80/0x177
> [<c012f497>] __cancel_work_timer+0xab/0x177
> [<c012f46c>] __cancel_work_timer+0x80/0x177
> [<c013c0ee>] mark_held_locks+0x39/0x53
> [<c04cdbe8>] mutex_lock_nested+0x20f/0x222
> [<c013c277>] trace_hardirqs_on+0xe7/0x10e
> [<c04cdbf3>] mutex_lock_nested+0x21a/0x222
> [<c041e7cb>] cpufreq_governor_dbs+0x25e/0x2ed
> [<c041e7dd>] cpufreq_governor_dbs+0x270/0x2ed
> [<c041c87a>] __cpufreq_governor+0x73/0xa6
> [<c041c9d6>] __cpufreq_set_policy+0x129/0x19e
> [<c041ce0b>] store_scaling_governor+0x112/0x135
> [<c041d296>] handle_update+0x0/0x21
> [<c0410065>] atkbd_set_leds+0x9/0xcf
> [<c041ccf9>] store_scaling_governor+0x0/0x135
> [<c041d7e7>] store+0x3c/0x54
> [<c01a09a0>] sysfs_write_file+0xa9/0xdd
> [<c01a08f7>] sysfs_write_file+0x0/0xdd
> [<c016e412>] vfs_write+0x83/0xf6
> [<c016e958>] sys_write+0x3c/0x63
> [<c0104816>] sysenter_past_esp+0x5f/0xa5
> =======================

Okay, the problem is in cpufreq_conservative.c. We
cancel_delayed_work_sync() while holding the mutex, but the work itself
tries to grab it and there it deadlocks; lockdep caught that right.

The hunk for _ondemand is correct, but the one for _conservative is
obviously wrong, sorry :/

I will whip something up and get back to you. Thanks a lot for testing!

Hannes

2008-07-04 13:58:18

by Johannes Weiner

[permalink] [raw]
Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

Hi Nageswara,

Johannes Weiner <[email protected]> writes:

> Hi,
>
> Nageswara R Sastry <[email protected]> writes:
>
>> Hi,
>>
>> Johannes Weiner wrote:
>>
>>> From: Johannes Weiner <[email protected]>
>>> Subject: cpufreq: cancel self-rearming work synchroneuously
>>>
>>> The ondemand and conservative governor workers are self-rearming.
>>> Cancel them synchroneously to avoid nasty races.
>>>
>>> Reported-by: Nageswara R Sastry <[email protected]>
>>> Signed-off-by: Johannes Weiner <[email protected]>
>>> ---
>>>
>>> diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
>>> index 5d3a04b..78bac06 100644
>>> --- a/drivers/cpufreq/cpufreq_conservative.c
>>> +++ b/drivers/cpufreq/cpufreq_conservative.c
>>> @@ -467,7 +467,7 @@ static inline void dbs_timer_init(void)
>>>
>>> static inline void dbs_timer_exit(void)
>>> {
>>> - cancel_delayed_work(&dbs_work);
>>> + cancel_delayed_work_sync(&dbs_work);
>>> return;
>>> }
>>>
>>> diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
>>> index d2af20d..1eb8c58 100644
>>> --- a/drivers/cpufreq/cpufreq_ondemand.c
>>> +++ b/drivers/cpufreq/cpufreq_ondemand.c
>>> @@ -490,7 +490,7 @@ static inline void dbs_timer_init(struct cpu_dbs_info_s *dbs_info)
>>> static inline void dbs_timer_exit(struct cpu_dbs_info_s *dbs_info)
>>> {
>>> dbs_info->enable = 0;
>>> - cancel_delayed_work(&dbs_info->work);
>>> + cancel_delayed_work_sync(&dbs_info->work);
>>> }
>>>
>>> static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
>>
>> Applied the above patch only and compiled the kernel and seeing an
>> Circular lock related issue at the time of booting. First I am
>> checking this and will let you the results by applying both the
>> patches.
>>
>> =======================================================
>> [ INFO: possible circular locking dependency detected ]
>> 2.6.25.7.cpufreq_patch #2
>> -------------------------------------------------------
>> S06cpuspeed/3493 is trying to acquire lock:
>> (&(&dbs_info->work)->work){--..}, at: [<c012f46c>]
>> __cancel_work_timer+0x80/0x177
>>
>> but task is already holding lock:
>> (dbs_mutex){--..}, at: [<c041e7cb>] cpufreq_governor_dbs+0x25e/0x2ed
>>
>> which lock already depends on the new lock.
>>
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #2 (dbs_mutex){--..}:
>> [<c013aa76>] add_lock_to_list+0x61/0x83
>> [<c013cfa3>] __lock_acquire+0x953/0xb05
>> [<c041e5e1>] cpufreq_governor_dbs+0x74/0x2ed
>> [<c013d1b4>] lock_acquire+0x5f/0x79
>> [<c041e5e1>] cpufreq_governor_dbs+0x74/0x2ed
>> [<c04cdaa7>] mutex_lock_nested+0xce/0x222
>> [<c041e5e1>] cpufreq_governor_dbs+0x74/0x2ed
>> [<c041e5e1>] cpufreq_governor_dbs+0x74/0x2ed
>> [<c041e5e1>] cpufreq_governor_dbs+0x74/0x2ed
>> [<c041c87a>] __cpufreq_governor+0x73/0xa6
>> [<c041c9e8>] __cpufreq_set_policy+0x13b/0x19e
>> [<c041d6b5>] cpufreq_add_dev+0x3b4/0x4aa
>> [<c041d296>] handle_update+0x0/0x21
>> [<c02ee310>] sysdev_driver_register+0x48/0x9a
>> [<c041c75b>] cpufreq_register_driver+0x9b/0x147
>> [<c06b742c>] kernel_init+0x130/0x26f
>> [<c06b72fc>] kernel_init+0x0/0x26f
>> [<c06b72fc>] kernel_init+0x0/0x26f
>> [<c0105527>] kernel_thread_helper+0x7/0x10
>> [<ffffffff>] 0xffffffff
>>
>> -> #1 (&per_cpu(cpu_policy_rwsem, cpu)){----}:
>> [<c013cfa3>] __lock_acquire+0x953/0xb05
>> [<c041d194>] lock_policy_rwsem_write+0x30/0x56
>> [<c010a83b>] save_stack_trace+0x1a/0x35
>> [<c013d1b4>] lock_acquire+0x5f/0x79
>> [<c041d194>] lock_policy_rwsem_write+0x30/0x56
>> [<c04cdfd9>] down_write+0x2b/0x44
>> [<c041d194>] lock_policy_rwsem_write+0x30/0x56
>> [<c041d194>] lock_policy_rwsem_write+0x30/0x56
>> [<c041e35e>] do_dbs_timer+0x40/0x24f
>> [<c012ee7f>] run_workqueue+0x81/0x187
>> [<c012eeba>] run_workqueue+0xbc/0x187
>> [<c012ee7f>] run_workqueue+0x81/0x187
>> [<c041e31e>] do_dbs_timer+0x0/0x24f
>> [<c012f6fa>] worker_thread+0x0/0xbd
>> [<c012f7ad>] worker_thread+0xb3/0xbd
>> [<c0131acc>] autoremove_wake_function+0x0/0x2d
>> [<c0131a1b>] kthread+0x38/0x5d
>> [<c01319e3>] kthread+0x0/0x5d
>> [<c0105527>] kernel_thread_helper+0x7/0x10
>> [<ffffffff>] 0xffffffff
>>
>> -> #0 (&(&dbs_info->work)->work){--..}:
>> [<c013b6a2>] print_circular_bug_tail+0x2a/0x61
>> [<c013cec8>] __lock_acquire+0x878/0xb05
>> [<c013d1b4>] lock_acquire+0x5f/0x79
>> [<c012f46c>] __cancel_work_timer+0x80/0x177
>> [<c012f497>] __cancel_work_timer+0xab/0x177
>> [<c012f46c>] __cancel_work_timer+0x80/0x177
>> [<c013c0ee>] mark_held_locks+0x39/0x53
>> [<c04cdbe8>] mutex_lock_nested+0x20f/0x222
>> [<c013c277>] trace_hardirqs_on+0xe7/0x10e
>> [<c04cdbf3>] mutex_lock_nested+0x21a/0x222
>> [<c041e7cb>] cpufreq_governor_dbs+0x25e/0x2ed
>> [<c041e7dd>] cpufreq_governor_dbs+0x270/0x2ed
>> [<c041c87a>] __cpufreq_governor+0x73/0xa6
>> [<c041c9d6>] __cpufreq_set_policy+0x129/0x19e
>> [<c041ce0b>] store_scaling_governor+0x112/0x135
>> [<c041d296>] handle_update+0x0/0x21
>> [<c0410065>] atkbd_set_leds+0x9/0xcf
>> [<c041ccf9>] store_scaling_governor+0x0/0x135
>> [<c041d7e7>] store+0x3c/0x54
>> [<c01a09a0>] sysfs_write_file+0xa9/0xdd
>> [<c01a08f7>] sysfs_write_file+0x0/0xdd
>> [<c016e412>] vfs_write+0x83/0xf6
>> [<c016e958>] sys_write+0x3c/0x63
>> [<c0104816>] sysenter_past_esp+0x5f/0xa5
>> [<ffffffff>] 0xffffffff
>>
>> other info that might help us debug this:
>>
>> 3 locks held by S06cpuspeed/3493:
>> #0: (&buffer->mutex){--..}, at: [<c01a091b>] sysfs_write_file+0x24/0xdd
>> #1: (&per_cpu(cpu_policy_rwsem, cpu)){----}, at: [<c041d194>]
>> lock_policy_rwsem_write+0x30/0x56
>> #2: (dbs_mutex){--..}, at: [<c041e7cb>] cpufreq_governor_dbs+0x25e/0x2ed
>>
>> stack backtrace:
>> Pid: 3493, comm: S06cpuspeed Not tainted 2.6.25.7.cpufreq_patch #2
>> [<c013b6cf>] print_circular_bug_tail+0x57/0x61
>> [<c013cec8>] __lock_acquire+0x878/0xb05
>> [<c013d1b4>] lock_acquire+0x5f/0x79
>> [<c012f46c>] __cancel_work_timer+0x80/0x177
>> [<c012f497>] __cancel_work_timer+0xab/0x177
>> [<c012f46c>] __cancel_work_timer+0x80/0x177
>> [<c013c0ee>] mark_held_locks+0x39/0x53
>> [<c04cdbe8>] mutex_lock_nested+0x20f/0x222
>> [<c013c277>] trace_hardirqs_on+0xe7/0x10e
>> [<c04cdbf3>] mutex_lock_nested+0x21a/0x222
>> [<c041e7cb>] cpufreq_governor_dbs+0x25e/0x2ed
>> [<c041e7dd>] cpufreq_governor_dbs+0x270/0x2ed
>> [<c041c87a>] __cpufreq_governor+0x73/0xa6
>> [<c041c9d6>] __cpufreq_set_policy+0x129/0x19e
>> [<c041ce0b>] store_scaling_governor+0x112/0x135
>> [<c041d296>] handle_update+0x0/0x21
>> [<c0410065>] atkbd_set_leds+0x9/0xcf
>> [<c041ccf9>] store_scaling_governor+0x0/0x135
>> [<c041d7e7>] store+0x3c/0x54
>> [<c01a09a0>] sysfs_write_file+0xa9/0xdd
>> [<c01a08f7>] sysfs_write_file+0x0/0xdd
>> [<c016e412>] vfs_write+0x83/0xf6
>> [<c016e958>] sys_write+0x3c/0x63
>> [<c0104816>] sysenter_past_esp+0x5f/0xa5
>> =======================
>
> Okay, the problem is in cpufreq_conservative.c. We
> cancel_delayed_work_sync() while holding the mutex, but the work itself
> tries to grab it and there it deadlocks; lockdep caught that right.
>
> The hunk for _ondemand is correct, but the one for _conservative is
> obviously wrong, sorry :/
>
> I will whip something up and get back to you. Thanks a lot for
> testing!

Could you try the attached patch instead of the one above?

Dave, I dropped the mutex-grabbing from the conservative worker function
as well as I don't see a reason for it, please correct me if I'm wrong.

Hannes

--
From: Johannes Weiner <[email protected]>
Subject: cpufreq: cancel self-rearming work synchroneously

The ondemand and conservative governor workers are self-rearming.
Cancel them synchroneously to avoid nasty races.

This patch also removes taking a mutex in the conservative worker
function as the locking is dbs_mutex -> work and not the
other way round.

Reported-by: Nageswara R Sastry <[email protected]>
Signed-off-by: Johannes Weiner <[email protected]>
---
drivers/cpufreq/cpufreq_conservative.c | 4 +---
drivers/cpufreq/cpufreq_ondemand.c | 2 +-
2 files changed, 2 insertions(+), 4 deletions(-)

--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -450,12 +450,10 @@ static void dbs_check_cpu(int cpu)
static void do_dbs_timer(struct work_struct *work)
{
int i;
- mutex_lock(&dbs_mutex);
for_each_online_cpu(i)
dbs_check_cpu(i);
schedule_delayed_work(&dbs_work,
usecs_to_jiffies(dbs_tuners_ins.sampling_rate));
- mutex_unlock(&dbs_mutex);
}

static inline void dbs_timer_init(void)
@@ -467,7 +465,7 @@ static inline void dbs_timer_init(void)

static inline void dbs_timer_exit(void)
{
- cancel_delayed_work(&dbs_work);
+ cancel_delayed_work_sync(&dbs_work);
return;
}

--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -490,7 +490,7 @@ static inline void dbs_timer_init(struct
static inline void dbs_timer_exit(struct cpu_dbs_info_s *dbs_info)
{
dbs_info->enable = 0;
- cancel_delayed_work(&dbs_info->work);
+ cancel_delayed_work_sync(&dbs_info->work);
}

static int cpufreq_governor_dbs(struct cpufreq_policy *policy,

2008-07-07 09:50:57

by Nageswara R Sastry

[permalink] [raw]
Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

Hi Johannes,
>>> =======================================================
>>> [ INFO: possible circular locking dependency detected ]
>>> 2.6.25.7.cpufreq_patch #2
>>> -------------------------------------------------------
[...]
>> Okay, the problem is in cpufreq_conservative.c. We
>> cancel_delayed_work_sync() while holding the mutex, but the work itself
>> tries to grab it and there it deadlocks; lockdep caught that right.
>>
>> The hunk for _ondemand is correct, but the one for _conservative is
>> obviously wrong, sorry :/
>>
>> I will whip something up and get back to you. Thanks a lot for
>> testing!
>
> Could you try the attached patch instead of the one above?
>
> Dave, I dropped the mutex-grabbing from the conservative worker function
> as well as I don't see a reason for it, please correct me if I'm wrong.
>
> Hannes
>

The script is running now for more than 6 hours successfully, I will
continue this and let you know if there are any failures.

* I am seeing the circular locking dependency with the above patch too.

Thanks!!
Regards
R.Nageswara Sastry

2008-07-07 11:08:16

by Johannes Weiner

[permalink] [raw]
Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

Hi,

Nageswara R Sastry <[email protected]> writes:

> Hi Johannes,
>>>> =======================================================
>>>> [ INFO: possible circular locking dependency detected ]
>>>> 2.6.25.7.cpufreq_patch #2
>>>> -------------------------------------------------------
> [...]
>>> Okay, the problem is in cpufreq_conservative.c. We
>>> cancel_delayed_work_sync() while holding the mutex, but the work itself
>>> tries to grab it and there it deadlocks; lockdep caught that right.
>>>
>>> The hunk for _ondemand is correct, but the one for _conservative is
>>> obviously wrong, sorry :/
>>>
>>> I will whip something up and get back to you. Thanks a lot for
>>> testing!
>>
>> Could you try the attached patch instead of the one above?
>>
>> Dave, I dropped the mutex-grabbing from the conservative worker function
>> as well as I don't see a reason for it, please correct me if I'm wrong.
>>
>> Hannes
>>
>
> The script is running now for more than 6 hours successfully, I will
> continue this and let you know if there are any failures.
>
> * I am seeing the circular locking dependency with the above patch
> too.

Uhm. Failure or no failure? A possible dead-lock report _is_ a
failure. So, do you get one or not? And if so, could you send me the
dmesg parts?

Thanks a lot,

Hannes

2008-07-07 11:19:39

by Nageswara R Sastry

[permalink] [raw]
Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

Nageswara R Sastry wrote:
> Hi Johannes,
>>>> =======================================================
>>>> [ INFO: possible circular locking dependency detected ]
>>>> 2.6.25.7.cpufreq_patch #2
>>>> -------------------------------------------------------
> [...]
>>> Okay, the problem is in cpufreq_conservative.c. We
>>> cancel_delayed_work_sync() while holding the mutex, but the work itself
>>> tries to grab it and there it deadlocks; lockdep caught that right.
>>>
>>> The hunk for _ondemand is correct, but the one for _conservative is
>>> obviously wrong, sorry :/
>>>
>>> I will whip something up and get back to you. Thanks a lot for
>>> testing!
>>
>> Could you try the attached patch instead of the one above?
>>
>> Dave, I dropped the mutex-grabbing from the conservative worker function
>> as well as I don't see a reason for it, please correct me if I'm wrong.
>>
>> Hannes
>>
>
> The script is running now for more than 6 hours successfully, I will
> continue this and let you know if there are any failures.
>
> * I am seeing the circular locking dependency with the above patch too.
>

Seeing sysfs hang for second cpu this time while cat file
/sys/devices/system/cpu/cpu2/cpufreq/scaling_governor

Just to summarize
1. Circular locking dependency issue and
2. sysfs hung for scaling_governor

Thanks!!
Regards
R.Nageswara Sastry

2008-07-08 05:53:11

by Nageswara R Sastry

[permalink] [raw]
Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

Hi Johannes,

Johannes Weiner wrote:

>> * I am seeing the circular locking dependency with the above patch
>> too.
>
> Uhm. Failure or no failure? A possible dead-lock report _is_ a
> failure. So, do you get one or not? And if so, could you send me the
> dmesg parts?
>
> Thanks a lot,
>
> Hannes

I could see a circular locking dependency and sysfs hang with the new
patch named "cpufreq: cancel self-rearming work synchroneously" also.
Please find the dmesg output. Please let me know if you need more
information. Thank a lot for your coordination.

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.25.9.cpufreq #2
-------------------------------------------------------
S06cpuspeed/3427 is trying to acquire lock:
(&(&dbs_info->work)->work){--..}, at: [<c012f46c>]
__cancel_work_timer+0x80/0x177

but task is already holding lock:
(dbs_mutex){--..}, at: [<c041e7db>] cpufreq_governor_dbs+0x26a/0x2f7

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (dbs_mutex){--..}:
[<c013aa76>] add_lock_to_list+0x61/0x83
[<c013cfa3>] __lock_acquire+0x953/0xb05
[<c041e5e3>] cpufreq_governor_dbs+0x72/0x2f7
[<c013d1b4>] lock_acquire+0x5f/0x79
[<c041e5e3>] cpufreq_governor_dbs+0x72/0x2f7
[<c04cdaa7>] mutex_lock_nested+0xce/0x222
[<c041e5e3>] cpufreq_governor_dbs+0x72/0x2f7
[<c041e5e3>] cpufreq_governor_dbs+0x72/0x2f7
[<c041e5e3>] cpufreq_governor_dbs+0x72/0x2f7
[<c041c87e>] __cpufreq_governor+0x73/0xa6
[<c041c9ec>] __cpufreq_set_policy+0x13b/0x19e
[<c041d6b9>] cpufreq_add_dev+0x3b4/0x4aa
[<c041d29a>] handle_update+0x0/0x21
[<c02ee310>] sysdev_driver_register+0x48/0x9a
[<c041c75f>] cpufreq_register_driver+0x9b/0x147
[<c06b742c>] kernel_init+0x130/0x26f
[<c06b72fc>] kernel_init+0x0/0x26f
[<c06b72fc>] kernel_init+0x0/0x26f
[<c0105527>] kernel_thread_helper+0x7/0x10
[<ffffffff>] 0xffffffff

-> #1 (&per_cpu(cpu_policy_rwsem, cpu)){----}:
[<c013cfa3>] __lock_acquire+0x953/0xb05
[<c041d198>] lock_policy_rwsem_write+0x30/0x56
[<c010a83b>] save_stack_trace+0x1a/0x35
[<c013d1b4>] lock_acquire+0x5f/0x79
[<c041d198>] lock_policy_rwsem_write+0x30/0x56
[<c04cdfd9>] down_write+0x2b/0x44
[<c041d198>] lock_policy_rwsem_write+0x30/0x56
[<c041d198>] lock_policy_rwsem_write+0x30/0x56
[<c041e362>] do_dbs_timer+0x40/0x24f
[<c012ee7f>] run_workqueue+0x81/0x187
[<c012eeba>] run_workqueue+0xbc/0x187
[<c012ee7f>] run_workqueue+0x81/0x187
[<c041e322>] do_dbs_timer+0x0/0x24f
[<c012f6fa>] worker_thread+0x0/0xbd
[<c012f7ad>] worker_thread+0xb3/0xbd
[<c0131acc>] autoremove_wake_function+0x0/0x2d
[<c0131a1b>] kthread+0x38/0x5d
[<c01319e3>] kthread+0x0/0x5d
[<c0105527>] kernel_thread_helper+0x7/0x10
[<ffffffff>] 0xffffffff

-> #0 (&(&dbs_info->work)->work){--..}:
[<c013b6a2>] print_circular_bug_tail+0x2a/0x61
[<c013cec8>] __lock_acquire+0x878/0xb05
[<c013d1b4>] lock_acquire+0x5f/0x79
[<c012f46c>] __cancel_work_timer+0x80/0x177
[<c012f497>] __cancel_work_timer+0xab/0x177
[<c012f46c>] __cancel_work_timer+0x80/0x177
[<c013c0ee>] mark_held_locks+0x39/0x53
[<c04cdbe8>] mutex_lock_nested+0x20f/0x222
[<c013c277>] trace_hardirqs_on+0xe7/0x10e
[<c04cdbf3>] mutex_lock_nested+0x21a/0x222
[<c041e7db>] cpufreq_governor_dbs+0x26a/0x2f7
[<c041e7ed>] cpufreq_governor_dbs+0x27c/0x2f7
[<c041c87e>] __cpufreq_governor+0x73/0xa6
[<c041c9da>] __cpufreq_set_policy+0x129/0x19e
[<c041ce0f>] store_scaling_governor+0x112/0x135
[<c041d29a>] handle_update+0x0/0x21
[<c0410065>] atkbd_set_leds+0x5/0xcf
[<c041ccfd>] store_scaling_governor+0x0/0x135
[<c041d7eb>] store+0x3c/0x54
[<c01a09e8>] sysfs_write_file+0xa9/0xdd
[<c01a093f>] sysfs_write_file+0x0/0xdd
[<c016e45a>] vfs_write+0x83/0xf6
[<c016e9a0>] sys_write+0x3c/0x63
[<c0104816>] sysenter_past_esp+0x5f/0xa5
[<ffffffff>] 0xffffffff

other info that might help us debug this:

3 locks held by S06cpuspeed/3427:
#0: (&buffer->mutex){--..}, at: [<c01a0963>] sysfs_write_file+0x24/0xdd
#1: (&per_cpu(cpu_policy_rwsem, cpu)){----}, at: [<c041d198>]
lock_policy_rwsem_write+0x30/0x56
#2: (dbs_mutex){--..}, at: [<c041e7db>] cpufreq_governor_dbs+0x26a/0x2f7

stack backtrace:
Pid: 3427, comm: S06cpuspeed Not tainted 2.6.25.9.cpufreq #2
[<c013b6cf>] print_circular_bug_tail+0x57/0x61
[<c013cec8>] __lock_acquire+0x878/0xb05
[<c013d1b4>] lock_acquire+0x5f/0x79
[<c012f46c>] __cancel_work_timer+0x80/0x177
[<c012f497>] __cancel_work_timer+0xab/0x177
[<c012f46c>] __cancel_work_timer+0x80/0x177
[<c013c0ee>] mark_held_locks+0x39/0x53
[<c04cdbe8>] mutex_lock_nested+0x20f/0x222
[<c013c277>] trace_hardirqs_on+0xe7/0x10e
[<c04cdbf3>] mutex_lock_nested+0x21a/0x222
[<c041e7db>] cpufreq_governor_dbs+0x26a/0x2f7
[<c041e7ed>] cpufreq_governor_dbs+0x27c/0x2f7
[<c041c87e>] __cpufreq_governor+0x73/0xa6
[<c041c9da>] __cpufreq_set_policy+0x129/0x19e
[<c041ce0f>] store_scaling_governor+0x112/0x135
[<c041d29a>] handle_update+0x0/0x21
[<c0410065>] atkbd_set_leds+0x5/0xcf
[<c041ccfd>] store_scaling_governor+0x0/0x135
[<c041d7eb>] store+0x3c/0x54
[<c01a09e8>] sysfs_write_file+0xa9/0xdd
[<c01a093f>] sysfs_write_file+0x0/0xdd
[<c016e45a>] vfs_write+0x83/0xf6
[<c016e9a0>] sys_write+0x3c/0x63
[<c0104816>] sysenter_past_esp+0x5f/0xa5
=======================

Thanks!!
Regards
R.Nageswara Sastry

2008-07-10 11:11:51

by Johannes Weiner

[permalink] [raw]
Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

Hi,

[added Peter on CC, lockdep confuses me]

Nageswara R Sastry <[email protected]> writes:

> Hi Johannes,
>
> Johannes Weiner wrote:
>
>>> * I am seeing the circular locking dependency with the above patch
>>> too.
>>
>> Uhm. Failure or no failure? A possible dead-lock report _is_ a
>> failure. So, do you get one or not? And if so, could you send me the
>> dmesg parts?
>>
>> Thanks a lot,
>>
>> Hannes
>
> I could see a circular locking dependency and sysfs hang with the new
> patch named "cpufreq: cancel self-rearming work synchroneously"
> also. Please find the dmesg output. Please let me know if you need
> more information. Thank a lot for your coordination.
>
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.25.9.cpufreq #2
> -------------------------------------------------------
> S06cpuspeed/3427 is trying to acquire lock:
> (&(&dbs_info->work)->work){--..}, at: [<c012f46c>]
> __cancel_work_timer+0x80/0x177
>
> but task is already holding lock:
> (dbs_mutex){--..}, at: [<c041e7db>] cpufreq_governor_dbs+0x26a/0x2f7

Hmmm, it's weird. This path should be okay. I wonder where the
dependency work -> dbs_mutex comes from. The mutex is nowhere taken
with the work lock held (I removed this in the new version of the patch,
can you double-check you applied to correct patch?).

So the chain should really be dbs_mutex -> work-lock.

> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (dbs_mutex){--..}:
> [<c013aa76>] add_lock_to_list+0x61/0x83
> [<c013cfa3>] __lock_acquire+0x953/0xb05
> [<c041e5e3>] cpufreq_governor_dbs+0x72/0x2f7
> [<c013d1b4>] lock_acquire+0x5f/0x79
> [<c041e5e3>] cpufreq_governor_dbs+0x72/0x2f7
> [<c04cdaa7>] mutex_lock_nested+0xce/0x222
> [<c041e5e3>] cpufreq_governor_dbs+0x72/0x2f7
> [<c041e5e3>] cpufreq_governor_dbs+0x72/0x2f7
> [<c041e5e3>] cpufreq_governor_dbs+0x72/0x2f7
> [<c041c87e>] __cpufreq_governor+0x73/0xa6
> [<c041c9ec>] __cpufreq_set_policy+0x13b/0x19e
> [<c041d6b9>] cpufreq_add_dev+0x3b4/0x4aa
> [<c041d29a>] handle_update+0x0/0x21
> [<c02ee310>] sysdev_driver_register+0x48/0x9a
> [<c041c75f>] cpufreq_register_driver+0x9b/0x147
> [<c06b742c>] kernel_init+0x130/0x26f
> [<c06b72fc>] kernel_init+0x0/0x26f
> [<c06b72fc>] kernel_init+0x0/0x26f
> [<c0105527>] kernel_thread_helper+0x7/0x10
> [<ffffffff>] 0xffffffff
>
> -> #1 (&per_cpu(cpu_policy_rwsem, cpu)){----}:
> [<c013cfa3>] __lock_acquire+0x953/0xb05
> [<c041d198>] lock_policy_rwsem_write+0x30/0x56
> [<c010a83b>] save_stack_trace+0x1a/0x35
> [<c013d1b4>] lock_acquire+0x5f/0x79
> [<c041d198>] lock_policy_rwsem_write+0x30/0x56
> [<c04cdfd9>] down_write+0x2b/0x44
> [<c041d198>] lock_policy_rwsem_write+0x30/0x56
> [<c041d198>] lock_policy_rwsem_write+0x30/0x56
> [<c041e362>] do_dbs_timer+0x40/0x24f
> [<c012ee7f>] run_workqueue+0x81/0x187
> [<c012eeba>] run_workqueue+0xbc/0x187
> [<c012ee7f>] run_workqueue+0x81/0x187
> [<c041e322>] do_dbs_timer+0x0/0x24f
> [<c012f6fa>] worker_thread+0x0/0xbd
> [<c012f7ad>] worker_thread+0xb3/0xbd
> [<c0131acc>] autoremove_wake_function+0x0/0x2d
> [<c0131a1b>] kthread+0x38/0x5d
> [<c01319e3>] kthread+0x0/0x5d
> [<c0105527>] kernel_thread_helper+0x7/0x10
> [<ffffffff>] 0xffffffff
>
> -> #0 (&(&dbs_info->work)->work){--..}:
> [<c013b6a2>] print_circular_bug_tail+0x2a/0x61
> [<c013cec8>] __lock_acquire+0x878/0xb05
> [<c013d1b4>] lock_acquire+0x5f/0x79
> [<c012f46c>] __cancel_work_timer+0x80/0x177
> [<c012f497>] __cancel_work_timer+0xab/0x177
> [<c012f46c>] __cancel_work_timer+0x80/0x177
> [<c013c0ee>] mark_held_locks+0x39/0x53
> [<c04cdbe8>] mutex_lock_nested+0x20f/0x222
> [<c013c277>] trace_hardirqs_on+0xe7/0x10e
> [<c04cdbf3>] mutex_lock_nested+0x21a/0x222
> [<c041e7db>] cpufreq_governor_dbs+0x26a/0x2f7
> [<c041e7ed>] cpufreq_governor_dbs+0x27c/0x2f7
> [<c041c87e>] __cpufreq_governor+0x73/0xa6
> [<c041c9da>] __cpufreq_set_policy+0x129/0x19e
> [<c041ce0f>] store_scaling_governor+0x112/0x135
> [<c041d29a>] handle_update+0x0/0x21
> [<c0410065>] atkbd_set_leds+0x5/0xcf
> [<c041ccfd>] store_scaling_governor+0x0/0x135
> [<c041d7eb>] store+0x3c/0x54
> [<c01a09e8>] sysfs_write_file+0xa9/0xdd
> [<c01a093f>] sysfs_write_file+0x0/0xdd
> [<c016e45a>] vfs_write+0x83/0xf6
> [<c016e9a0>] sys_write+0x3c/0x63
> [<c0104816>] sysenter_past_esp+0x5f/0xa5
> [<ffffffff>] 0xffffffff

Uhm, this dependency is as new as the actual lockdep detection (the same
backtrace as the whole event, see below). What is lockdep doing here?
Shouldn't this be the callpath where the lock was taken for the first
time?

I can not see where the chain is ever work-lock -> dbs_mutex, so how
does lockdep come to the conclusion this would be the correct order?

> other info that might help us debug this:
>
> 3 locks held by S06cpuspeed/3427:
> #0: (&buffer->mutex){--..}, at: [<c01a0963>] sysfs_write_file+0x24/0xdd
> #1: (&per_cpu(cpu_policy_rwsem, cpu)){----}, at: [<c041d198>]
> lock_policy_rwsem_write+0x30/0x56
> #2: (dbs_mutex){--..}, at: [<c041e7db>] cpufreq_governor_dbs+0x26a/0x2f7
>
> stack backtrace:
> Pid: 3427, comm: S06cpuspeed Not tainted 2.6.25.9.cpufreq #2
> [<c013b6cf>] print_circular_bug_tail+0x57/0x61
> [<c013cec8>] __lock_acquire+0x878/0xb05
> [<c013d1b4>] lock_acquire+0x5f/0x79
> [<c012f46c>] __cancel_work_timer+0x80/0x177
> [<c012f497>] __cancel_work_timer+0xab/0x177
> [<c012f46c>] __cancel_work_timer+0x80/0x177
> [<c013c0ee>] mark_held_locks+0x39/0x53
> [<c04cdbe8>] mutex_lock_nested+0x20f/0x222
> [<c013c277>] trace_hardirqs_on+0xe7/0x10e
> [<c04cdbf3>] mutex_lock_nested+0x21a/0x222
> [<c041e7db>] cpufreq_governor_dbs+0x26a/0x2f7
> [<c041e7ed>] cpufreq_governor_dbs+0x27c/0x2f7
> [<c041c87e>] __cpufreq_governor+0x73/0xa6
> [<c041c9da>] __cpufreq_set_policy+0x129/0x19e
> [<c041ce0f>] store_scaling_governor+0x112/0x135
> [<c041d29a>] handle_update+0x0/0x21
> [<c0410065>] atkbd_set_leds+0x5/0xcf
> [<c041ccfd>] store_scaling_governor+0x0/0x135
> [<c041d7eb>] store+0x3c/0x54
> [<c01a09e8>] sysfs_write_file+0xa9/0xdd
> [<c01a093f>] sysfs_write_file+0x0/0xdd
> [<c016e45a>] vfs_write+0x83/0xf6
> [<c016e9a0>] sys_write+0x3c/0x63
> [<c0104816>] sysenter_past_esp+0x5f/0xa5
> =======================
>
> Thanks!!
> Regards
> R.Nageswara Sastry

Hannes

2008-07-15 03:43:29

by Nageswara R Sastry

[permalink] [raw]
Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

Johannes Weiner wrote:
> Hi,
>
> [added Peter on CC, lockdep confuses me]
> Hmmm, it's weird. This path should be okay. I wonder where the
> dependency work -> dbs_mutex comes from. The mutex is nowhere taken
> with the work lock held (I removed this in the new version of the patch,
> can you double-check you applied to correct patch?).
>
> So the chain should really be dbs_mutex -> work-lock.
>
>
> Uhm, this dependency is as new as the actual lockdep detection (the same
> backtrace as the whole event, see below). What is lockdep doing here?
> Shouldn't this be the callpath where the lock was taken for the first
> time?
>
> I can not see where the chain is ever work-lock -> dbs_mutex, so how
> does lockdep come to the conclusion this would be the correct order?
>

I confirm that Linux kernel patched with the following patches.

--
From: Johannes Weiner <[email protected]>
Subject: cpufreq: cancel self-rearming work synchroneously

The ondemand and conservative governor workers are self-rearming.
Cancel them synchroneously to avoid nasty races.

This patch also removes taking a mutex in the conservative worker
function as the locking is dbs_mutex -> work and not the
other way round.

Reported-by: Nageswara R Sastry <[email protected]>
Signed-off-by: Johannes Weiner <[email protected]>
---


---
From: Johannes Weiner <[email protected]>
Subject: cpufreq: Fix race in enabling ondemand/conservative governors

Prevent double activation of the governor if two processes race on the
check for whether the governor is already active.

Signed-off-by: Johannes Weiner <[email protected]>
---

Thanks

Regards
R.Nageswara Sastry

2008-07-16 13:45:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

On Thu, 2008-07-10 at 13:11 +0200, Johannes Weiner wrote:
> Hi,
>
> [added Peter on CC, lockdep confuses me]
>
> Nageswara R Sastry <[email protected]> writes:
>
> > Hi Johannes,
> >
> > Johannes Weiner wrote:
> >
> >>> * I am seeing the circular locking dependency with the above patch
> >>> too.
> >>
> >> Uhm. Failure or no failure? A possible dead-lock report _is_ a
> >> failure. So, do you get one or not? And if so, could you send me the
> >> dmesg parts?
> >>
> >> Thanks a lot,
> >>
> >> Hannes
> >
> > I could see a circular locking dependency and sysfs hang with the new
> > patch named "cpufreq: cancel self-rearming work synchroneously"
> > also. Please find the dmesg output. Please let me know if you need
> > more information. Thank a lot for your coordination.
> >
> > =======================================================
> > [ INFO: possible circular locking dependency detected ]
> > 2.6.25.9.cpufreq #2
> > -------------------------------------------------------
> > S06cpuspeed/3427 is trying to acquire lock:
> > (&(&dbs_info->work)->work){--..}, at: [<c012f46c>]
> > __cancel_work_timer+0x80/0x177
> >
> > but task is already holding lock:
> > (dbs_mutex){--..}, at: [<c041e7db>] cpufreq_governor_dbs+0x26a/0x2f7
>
> Hmmm, it's weird. This path should be okay. I wonder where the
> dependency work -> dbs_mutex comes from. The mutex is nowhere taken
> with the work lock held (I removed this in the new version of the patch,
> can you double-check you applied to correct patch?).
>
> So the chain should really be dbs_mutex -> work-lock.
>
> > which lock already depends on the new lock.
> >
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #2 (dbs_mutex){--..}:
> > [<c013aa76>] add_lock_to_list+0x61/0x83
> > [<c013cfa3>] __lock_acquire+0x953/0xb05
> > [<c041e5e3>] cpufreq_governor_dbs+0x72/0x2f7
> > [<c013d1b4>] lock_acquire+0x5f/0x79
> > [<c041e5e3>] cpufreq_governor_dbs+0x72/0x2f7
> > [<c04cdaa7>] mutex_lock_nested+0xce/0x222
> > [<c041e5e3>] cpufreq_governor_dbs+0x72/0x2f7
> > [<c041e5e3>] cpufreq_governor_dbs+0x72/0x2f7
> > [<c041e5e3>] cpufreq_governor_dbs+0x72/0x2f7
> > [<c041c87e>] __cpufreq_governor+0x73/0xa6
> > [<c041c9ec>] __cpufreq_set_policy+0x13b/0x19e
> > [<c041d6b9>] cpufreq_add_dev+0x3b4/0x4aa
> > [<c041d29a>] handle_update+0x0/0x21
> > [<c02ee310>] sysdev_driver_register+0x48/0x9a
> > [<c041c75f>] cpufreq_register_driver+0x9b/0x147
> > [<c06b742c>] kernel_init+0x130/0x26f
> > [<c06b72fc>] kernel_init+0x0/0x26f
> > [<c06b72fc>] kernel_init+0x0/0x26f
> > [<c0105527>] kernel_thread_helper+0x7/0x10
> > [<ffffffff>] 0xffffffff
> >
> > -> #1 (&per_cpu(cpu_policy_rwsem, cpu)){----}:
> > [<c013cfa3>] __lock_acquire+0x953/0xb05
> > [<c041d198>] lock_policy_rwsem_write+0x30/0x56
> > [<c010a83b>] save_stack_trace+0x1a/0x35
> > [<c013d1b4>] lock_acquire+0x5f/0x79
> > [<c041d198>] lock_policy_rwsem_write+0x30/0x56
> > [<c04cdfd9>] down_write+0x2b/0x44
> > [<c041d198>] lock_policy_rwsem_write+0x30/0x56
> > [<c041d198>] lock_policy_rwsem_write+0x30/0x56
> > [<c041e362>] do_dbs_timer+0x40/0x24f
> > [<c012ee7f>] run_workqueue+0x81/0x187
> > [<c012eeba>] run_workqueue+0xbc/0x187
> > [<c012ee7f>] run_workqueue+0x81/0x187
> > [<c041e322>] do_dbs_timer+0x0/0x24f
> > [<c012f6fa>] worker_thread+0x0/0xbd
> > [<c012f7ad>] worker_thread+0xb3/0xbd
> > [<c0131acc>] autoremove_wake_function+0x0/0x2d
> > [<c0131a1b>] kthread+0x38/0x5d
> > [<c01319e3>] kthread+0x0/0x5d
> > [<c0105527>] kernel_thread_helper+0x7/0x10
> > [<ffffffff>] 0xffffffff
> >
> > -> #0 (&(&dbs_info->work)->work){--..}:
> > [<c013b6a2>] print_circular_bug_tail+0x2a/0x61
> > [<c013cec8>] __lock_acquire+0x878/0xb05
> > [<c013d1b4>] lock_acquire+0x5f/0x79
> > [<c012f46c>] __cancel_work_timer+0x80/0x177
> > [<c012f497>] __cancel_work_timer+0xab/0x177
> > [<c012f46c>] __cancel_work_timer+0x80/0x177
> > [<c013c0ee>] mark_held_locks+0x39/0x53
> > [<c04cdbe8>] mutex_lock_nested+0x20f/0x222
> > [<c013c277>] trace_hardirqs_on+0xe7/0x10e
> > [<c04cdbf3>] mutex_lock_nested+0x21a/0x222
> > [<c041e7db>] cpufreq_governor_dbs+0x26a/0x2f7
> > [<c041e7ed>] cpufreq_governor_dbs+0x27c/0x2f7
> > [<c041c87e>] __cpufreq_governor+0x73/0xa6
> > [<c041c9da>] __cpufreq_set_policy+0x129/0x19e
> > [<c041ce0f>] store_scaling_governor+0x112/0x135
> > [<c041d29a>] handle_update+0x0/0x21
> > [<c0410065>] atkbd_set_leds+0x5/0xcf
> > [<c041ccfd>] store_scaling_governor+0x0/0x135
> > [<c041d7eb>] store+0x3c/0x54
> > [<c01a09e8>] sysfs_write_file+0xa9/0xdd
> > [<c01a093f>] sysfs_write_file+0x0/0xdd
> > [<c016e45a>] vfs_write+0x83/0xf6
> > [<c016e9a0>] sys_write+0x3c/0x63
> > [<c0104816>] sysenter_past_esp+0x5f/0xa5
> > [<ffffffff>] 0xffffffff
>
> Uhm, this dependency is as new as the actual lockdep detection (the same
> backtrace as the whole event, see below). What is lockdep doing here?
> Shouldn't this be the callpath where the lock was taken for the first
> time?
>
> I can not see where the chain is ever work-lock -> dbs_mutex, so how
> does lockdep come to the conclusion this would be the correct order?

Lockdep does not directly observe work-lock -> dbs_mutex, but an
indirect dependency through cpu_policy_rwsem.

So in do_dbs_timer() we get:

work-lock
cpu_policy_rwsem

then from:

cpufreq_add_dev()
__cpufreq_set_policy()
__cpufreq_governor()

we obtain the dependency:

cpu_policy_rwsem
dbs_mutex

So together we get the chain:

work-lock
cpu_policy_rwsem
dbs_mutex

And the below trace shows us trying to obtain the work-lock while
holding the dbs_mutex, which is in clear violation of the above locking
rules.

> > other info that might help us debug this:
> >
> > 3 locks held by S06cpuspeed/3427:
> > #0: (&buffer->mutex){--..}, at: [<c01a0963>] sysfs_write_file+0x24/0xdd
> > #1: (&per_cpu(cpu_policy_rwsem, cpu)){----}, at: [<c041d198>]
> > lock_policy_rwsem_write+0x30/0x56
> > #2: (dbs_mutex){--..}, at: [<c041e7db>] cpufreq_governor_dbs+0x26a/0x2f7
> >
> > stack backtrace:
> > Pid: 3427, comm: S06cpuspeed Not tainted 2.6.25.9.cpufreq #2
> > [<c013b6cf>] print_circular_bug_tail+0x57/0x61
> > [<c013cec8>] __lock_acquire+0x878/0xb05
> > [<c013d1b4>] lock_acquire+0x5f/0x79
> > [<c012f46c>] __cancel_work_timer+0x80/0x177
> > [<c012f497>] __cancel_work_timer+0xab/0x177
> > [<c012f46c>] __cancel_work_timer+0x80/0x177
> > [<c013c0ee>] mark_held_locks+0x39/0x53
> > [<c04cdbe8>] mutex_lock_nested+0x20f/0x222
> > [<c013c277>] trace_hardirqs_on+0xe7/0x10e
> > [<c04cdbf3>] mutex_lock_nested+0x21a/0x222
> > [<c041e7db>] cpufreq_governor_dbs+0x26a/0x2f7
> > [<c041e7ed>] cpufreq_governor_dbs+0x27c/0x2f7
> > [<c041c87e>] __cpufreq_governor+0x73/0xa6
> > [<c041c9da>] __cpufreq_set_policy+0x129/0x19e
> > [<c041ce0f>] store_scaling_governor+0x112/0x135
> > [<c041d29a>] handle_update+0x0/0x21
> > [<c0410065>] atkbd_set_leds+0x5/0xcf
> > [<c041ccfd>] store_scaling_governor+0x0/0x135
> > [<c041d7eb>] store+0x3c/0x54
> > [<c01a09e8>] sysfs_write_file+0xa9/0xdd
> > [<c01a093f>] sysfs_write_file+0x0/0xdd
> > [<c016e45a>] vfs_write+0x83/0xf6
> > [<c016e9a0>] sys_write+0x3c/0x63
> > [<c0104816>] sysenter_past_esp+0x5f/0xa5
> > =======================
> >
> > Thanks!!
> > Regards
> > R.Nageswara Sastry
>
> Hannes

2008-08-12 08:14:06

by Nageswara R Sastry

[permalink] [raw]
Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

Any updates on this bug.

Thanks and Regards
R.Nageswara Sastry

2008-08-12 21:29:48

by Johannes Weiner

[permalink] [raw]
Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

Hi Nageswara,

Nageswara R Sastry <[email protected]> writes:

> Any updates on this bug.

Sorry. I was just looking into it again and still see no resolution.
Added Venkatesh and Alexey to CC.

To summarize again:

The problem we have is that cpufreq_ondemand has a self-rearming worker
function which we should but can not cancel synchroneously with the
current code. The locking order is like this:

store()
policy_rwsem write
cpufreq_governor_dbs()
dbs_mutex

work lock
do_dbs_timer()
policy_rwsem write

which gives a locking hierarchy of:

work lock
policy_rwsem
dbs_mutex

Now, we _should_ cancel the worker synchroneously, but that won't fly
since when cpufreq_governor_dbs() is called, we already hold
policy_rwsem and grabbing the work lock is illegal.

I will continue to read code and see if I can come up with a solution,
but any help is really appreciated.

I have not yet looked at cpufreq_conservative...

> Thanks and Regards
> R.Nageswara Sastry

Thanks for your patience and insistence,

Hannes

2008-08-12 21:45:28

by Johannes Weiner

[permalink] [raw]
Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

Johannes Weiner <[email protected]> writes:

> Hi Nageswara,
>
> Nageswara R Sastry <[email protected]> writes:
>
>> Any updates on this bug.
>
> Sorry. I was just looking into it again and still see no resolution.
> Added Venkatesh and Alexey to CC.

Sorry, wrong address for Alexey. Hope this one works.

> To summarize again:
>
> The problem we have is that cpufreq_ondemand has a self-rearming worker
> function which we should but can not cancel synchroneously with the
> current code. The locking order is like this:
>
> store()
> policy_rwsem write
> cpufreq_governor_dbs()
> dbs_mutex
>
> work lock
> do_dbs_timer()
> policy_rwsem write
>
> which gives a locking hierarchy of:
>
> work lock
> policy_rwsem
> dbs_mutex
>
> Now, we _should_ cancel the worker synchroneously, but that won't fly
> since when cpufreq_governor_dbs() is called, we already hold
> policy_rwsem and grabbing the work lock is illegal.
>
> I will continue to read code and see if I can come up with a solution,
> but any help is really appreciated.
>
> I have not yet looked at cpufreq_conservative...
>
>> Thanks and Regards
>> R.Nageswara Sastry
>
> Thanks for your patience and insistence,
>
> Hannes

2008-10-07 09:41:59

by Nageswara R Sastry

[permalink] [raw]
Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

Any updates on this bug. Thanks in advance for your coordination.

Johannes Weiner wrote:
>> I have not yet looked at cpufreq_conservative...
>>
>> Thanks for your patience and insistence,
>>
>> Hannes
Thanks and Regards
R.Nageswara Sastry

2008-10-28 03:30:09

by Nageswara R Sastry

[permalink] [raw]
Subject: Re: [BUG] While changing the cpufreq governor, kernel hits a bug in workqueue.c

Nageswara R Sastry wrote:
> Any updates on this bug. Thanks in advance for your coordination.
>
> Johannes Weiner wrote:
>>> I have not yet looked at cpufreq_conservative...
>>>
>>> Thanks for your patience and insistence,

From long time I didn't see any activity related to this bug so posting
again after testing on the latest stable kernel.

------------[ cut here ]------------
kernel BUG at kernel/workqueue.c:240!
invalid opcode: 0000 [#1] SMP
Modules linked in: cpufreq_powersave cpufreq_conservative
cpufreq_userspace usb_
storage usbhid ehci_hcd
ohci_hcd uhci_hcd usbcore

Pid: 236, comm: kondemand/1 Not tainted (2.6.27.2 #1)
EIP: 0060:[<c0131c6e>] EFLAGS: 00010286 CPU: 1
EIP is at queue_delayed_work_on+0x24/0xa6
EAX: 00000000 EBX: c3a3c9d4 ECX: c3a3c9d4 EDX: 00000000
ESI: c3a3c9f0 EDI: f7a45658 EBP: 00000001 ESP: f7a53f3c
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process kondemand/1 (pid: 236, ti=f7a52000 task=f7904270 task.ti=f7a52000)
Stack: 00000000 f7a43d08 c3a3c9c0 f7ab5c68 c0424ee5 00000040 00000000
00000001
00000040 00000000 00000002 ffffffff 00000000 ffffffff c3a3c9d8
f7a43d08
c3a3c9d4 f7a53f9c c013156d 00000000 00000002 00000000 c013152f
c0424cbb
Call Trace:
[<c0424ee5>] do_dbs_timer+0x22a/0x24a
[<c013156d>] run_workqueue+0xbf/0x18a
[<c013152f>] run_workqueue+0x81/0x18a
[<c0424cbb>] do_dbs_timer+0x0/0x24a
[<c0131e25>] worker_thread+0x0/0xbd
[<c0131ed8>] worker_thread+0xb3/0xbd
[<c013423c>] autoremove_wake_function+0x0/0x2d
[<c013418a>] kthread+0x38/0x5e
[<c0134152>] kthread+0x0/0x5e
[<c0104503>] kernel_thread_helper+0x7/0x10
=======================
Code: 6c c0 e9 7a ff ff ff 55 89 c5 57 89 d7 56 53 89 cb 8d 71 1c f0 0f
ba 29 00
19 c0 31 d2 85 c0 0f 85 81 00 00 00 83 79
1c 00 74 04 <0f> 0b eb fe 8d 41 04 39
41 04 74 04
0f 0b eb fe 8b 54 24 10 89
EIP: [<c0131c6e>] queue_delayed_work_on+0x24/0xa6 SS:ESP 0068:f7a53f3c
---[ end trace 69cb0cd122ebfb9d ]---


P.S. This time it took just 2 minutes to reproduce this bug. Previously
it used to happen after 30 minutes.

Please let me know if you need more details.

Thanks a lot for your coordination.

Regards
R.Nageswara Sastry