2013-08-29 20:57:37

by Gerlando Falauto

[permalink] [raw]
Subject: kernel deadlock

Hi everyone,

I ran into the deadlock situation reported at the bottom.
Actually, on my latest 3.10 kernel for some reason I don't get the
report (the kernel just hangs for some reason), so it took me quite some
time to track it down.

Once I figured the trigger to the machine hanging was adjtimex(), I
reverted everything (between 3.9 to 3.10) that was touching
kernel/time/timekeeping/timekeeping.c and kernel/time/ntp.c, I double
checked that indeed the problem was not happening anymore, and finally
started bisecting, landing on the following offending commit.
THEN, and ONLY THEN, did I get the &%""?+"% deadlock report.

Do you guys have any ideas what could be wrong and how to fix it?

Thank you,
Gerlando

commit 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40
Author: John Stultz <[email protected]>
Date: Fri Mar 22 11:37:28 2013 -0700

timekeeping: Hold timekeepering locks in do_adjtimex and hardpps

In moving the NTP state to be protected by the timekeeping locks,
be sure to acquire the timekeeping locks prior to calling
ntp functions.

Cc: Thomas Gleixner <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Signed-off-by: John Stultz <[email protected]>

=================================
[ INFO: inconsistent lock state ]
3.10.0-04864-g346ecc9-dirty #16 Not tainted
---------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
SAKEY/738 [HC0[0]:SC0[0]:HE1:SE1] takes:
(timekeeper_lock){?.-...}, at: [<c004a3e4>] do_adjtimex+0x64/0xbc
{IN-HARDIRQ-W} state was registered at:
[<c0055138>] __lock_acquire+0xabc/0x1bb8
[<c0056838>] lock_acquire+0xa8/0x15c
[<c04c14ec>] _raw_spin_lock_irqsave+0x50/0x64
[<c00497a4>] do_timer+0x2c/0xa54
[<c004e7f4>] tick_periodic+0x74/0x9c
[<c004e834>] tick_handle_periodic+0x18/0x7c
[<c001349c>] orion_timer_interrupt+0x24/0x34
[<c0069c2c>] handle_irq_event_percpu+0x5c/0x300
[<c0069f0c>] handle_irq_event+0x3c/0x5c
[<c006c194>] handle_level_irq+0x8c/0xe8
[<c0069574>] generic_handle_irq+0x30/0x4c
[<c000951c>] handle_IRQ+0x30/0x84
[<c04c2178>] __irq_svc+0x38/0xa0
[<c06cf15c>] calibrate_delay+0x350/0x4e4
[<c06986e0>] start_kernel+0x23c/0x2c4
[<0000803c>] 0x803c
irq event stamp: 32358
hardirqs last enabled at (32357): [<c0008c64>] ret_fast_syscall+0x24/0x44
hardirqs last disabled at (32358): [<c04c14bc>]
_raw_spin_lock_irqsave+0x20/0x64
softirqs last enabled at (32160): [<c001e234>] __do_softirq+0x1b8/0x308
softirqs last disabled at (32137): [<c001e77c>] irq_exit+0xa0/0xd8

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(timekeeper_lock);
<Interrupt>
lock(timekeeper_lock);

*** DEADLOCK ***

1 lock held by SAKEY/738:
#0: (timekeeper_lock){?.-...}, at: [<c004a3e4>] do_adjtimex+0x64/0xbc

stack backtrace:
CPU: 0 PID: 738 Comm: SAKEY Not tainted 3.10.0-04864-g346ecc9-dirty #16
[<c000d67c>] (unwind_backtrace+0x0/0xf0) from [<c000b530>]
(show_stack+0x10/0x14)
[<c000b530>] (show_stack+0x10/0x14) from [<c04ba07c>]
(print_usage_bug.part.27+0x218/0x280)
[<c04ba07c>] (print_usage_bug.part.27+0x218/0x280) from [<c0053058>]
(mark_lock+0x538/0x6bc)
[<c0053058>] (mark_lock+0x538/0x6bc) from [<c005326c>]
(mark_held_locks+0x90/0x124)
[<c005326c>] (mark_held_locks+0x90/0x124) from [<c00533a8>]
(trace_hardirqs_on_caller+0xa8/0x23c)
[<c00533a8>] (trace_hardirqs_on_caller+0xa8/0x23c) from [<c04c1c60>]
(_raw_spin_unlock_irq+0x24/0x5c)
[<c04c1c60>] (_raw_spin_unlock_irq+0x24/0x5c) from [<c004ac8c>]
(__do_adjtimex+0x17c/0x65c)
[<c004ac8c>] (__do_adjtimex+0x17c/0x65c) from [<c004a404>]
(do_adjtimex+0x84/0xbc)
[<c004a404>] (do_adjtimex+0x84/0xbc) from [<c001d62c>]
(SyS_adjtimex+0x50/0xa8)
[<c001d62c>] (SyS_adjtimex+0x50/0xa8) from [<c0008c40>]
(ret_fast_syscall+0x0/0x44)


2013-08-29 23:45:28

by John Stultz

[permalink] [raw]
Subject: Re: kernel deadlock

On 08/29/2013 01:56 PM, Falauto, Gerlando wrote:
> Hi everyone,
>
> I ran into the deadlock situation reported at the bottom.
> Actually, on my latest 3.10 kernel for some reason I don't get the
> report (the kernel just hangs for some reason), so it took me quite some
> time to track it down.
>
> Once I figured the trigger to the machine hanging was adjtimex(), I
> reverted everything (between 3.9 to 3.10) that was touching
> kernel/time/timekeeping/timekeeping.c and kernel/time/ntp.c, I double
> checked that indeed the problem was not happening anymore, and finally
> started bisecting, landing on the following offending commit.
> THEN, and ONLY THEN, did I get the &%""?+"% deadlock report.
>
> Do you guys have any ideas what could be wrong and how to fix it?

Thanks for the report!

What exactly is your process for reproducing the issue?


> [ INFO: inconsistent lock state ]
> 3.10.0-04864-g346ecc9-dirty #16 Not tainted
> ---------------------------------
> inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> SAKEY/738 [HC0[0]:SC0[0]:HE1:SE1] takes:
> (timekeeper_lock){?.-...}, at: [<c004a3e4>] do_adjtimex+0x64/0xbc
> {IN-HARDIRQ-W} state was registered at:
> [<c0055138>] __lock_acquire+0xabc/0x1bb8
> [<c0056838>] lock_acquire+0xa8/0x15c
> [<c04c14ec>] _raw_spin_lock_irqsave+0x50/0x64
> [<c00497a4>] do_timer+0x2c/0xa54
> [<c004e7f4>] tick_periodic+0x74/0x9c
> [<c004e834>] tick_handle_periodic+0x18/0x7c
> [<c001349c>] orion_timer_interrupt+0x24/0x34
> [<c0069c2c>] handle_irq_event_percpu+0x5c/0x300
> [<c0069f0c>] handle_irq_event+0x3c/0x5c
> [<c006c194>] handle_level_irq+0x8c/0xe8
> [<c0069574>] generic_handle_irq+0x30/0x4c
> [<c000951c>] handle_IRQ+0x30/0x84
> [<c04c2178>] __irq_svc+0x38/0xa0
> [<c06cf15c>] calibrate_delay+0x350/0x4e4
> [<c06986e0>] start_kernel+0x23c/0x2c4
> [<0000803c>] 0x803c
> irq event stamp: 32358
> hardirqs last enabled at (32357): [<c0008c64>] ret_fast_syscall+0x24/0x44
> hardirqs last disabled at (32358): [<c04c14bc>]
> _raw_spin_lock_irqsave+0x20/0x64
> softirqs last enabled at (32160): [<c001e234>] __do_softirq+0x1b8/0x308
> softirqs last disabled at (32137): [<c001e77c>] irq_exit+0xa0/0xd8
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(timekeeper_lock);
> <Interrupt>
> lock(timekeeper_lock);
>
> *** DEADLOCK ***
>
> 1 lock held by SAKEY/738:
> #0: (timekeeper_lock){?.-...}, at: [<c004a3e4>] do_adjtimex+0x64/0xbc
>
> stack backtrace:
> CPU: 0 PID: 738 Comm: SAKEY Not tainted 3.10.0-04864-g346ecc9-dirty #16
> [<c000d67c>] (unwind_backtrace+0x0/0xf0) from [<c000b530>]
> (show_stack+0x10/0x14)
> [<c000b530>] (show_stack+0x10/0x14) from [<c04ba07c>]
> (print_usage_bug.part.27+0x218/0x280)
> [<c04ba07c>] (print_usage_bug.part.27+0x218/0x280) from [<c0053058>]
> (mark_lock+0x538/0x6bc)
> [<c0053058>] (mark_lock+0x538/0x6bc) from [<c005326c>]
> (mark_held_locks+0x90/0x124)
> [<c005326c>] (mark_held_locks+0x90/0x124) from [<c00533a8>]
> (trace_hardirqs_on_caller+0xa8/0x23c)
> [<c00533a8>] (trace_hardirqs_on_caller+0xa8/0x23c) from [<c04c1c60>]
> (_raw_spin_unlock_irq+0x24/0x5c)
> [<c04c1c60>] (_raw_spin_unlock_irq+0x24/0x5c) from [<c004ac8c>]
> (__do_adjtimex+0x17c/0x65c)
> [<c004ac8c>] (__do_adjtimex+0x17c/0x65c) from [<c004a404>]
> (do_adjtimex+0x84/0xbc)
> [<c004a404>] (do_adjtimex+0x84/0xbc) from [<c001d62c>]
> (SyS_adjtimex+0x50/0xa8)
> [<c001d62c>] (SyS_adjtimex+0x50/0xa8) from [<c0008c40>]
> (ret_fast_syscall+0x0/0x44)

Hrmm. So I'm a little confused by the report, as we hold the write lock
on the timekeeper_lock disabling irqs, so I'm not sure I see how the irq
could trigger to cause the deadlock. In fact, all the timekeeper_lock
users save irqs.

Hrmm. I dunno. :(

Thomas, you have a guess?

Let me know how you trigger it and I'll see if I can't reproduce it myself.

thanks
-john


2013-08-30 23:04:56

by Gerlando Falauto

[permalink] [raw]
Subject: Re: kernel deadlock

Hi,

sorry, it took me a while to narrow it down...

On 08/30/2013 01:45 AM, John Stultz wrote:
> On 08/29/2013 01:56 PM, Falauto, Gerlando wrote:
>> Hi everyone,
>>
>> I ran into the deadlock situation reported at the bottom.
>> Actually, on my latest 3.10 kernel for some reason I don't get the
>> report (the kernel just hangs for some reason), so it took me quite some
>> time to track it down.
>>
>> Once I figured the trigger to the machine hanging was adjtimex(), I
>> reverted everything (between 3.9 to 3.10) that was touching
>> kernel/time/timekeeping/timekeeping.c and kernel/time/ntp.c, I double
>> checked that indeed the problem was not happening anymore, and finally
>> started bisecting, landing on the following offending commit.
>> THEN, and ONLY THEN, did I get the &%""?+"% deadlock report.
>>
>> Do you guys have any ideas what could be wrong and how to fix it?
>
> Thanks for the report!
>
> What exactly is your process for reproducing the issue?

Now (well, now...), it's quite easy.

Three ingredients:

1) Kernel 3.10

2) Enable HRTICK

diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 99399f8..294e3ca 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -41,7 +41,7 @@ SCHED_FEAT(WAKEUP_PREEMPTION, true)
*/
SCHED_FEAT(ARCH_POWER, true)

-SCHED_FEAT(HRTICK, false)
+SCHED_FEAT(HRTICK, true)
SCHED_FEAT(DOUBLE_TICK, false)
SCHED_FEAT(LB_BIAS, true)

3) Run the following:

#include <stdio.h>
#include <sys/timex.h>

int main(void)
{
int i;

for (i = 0 ; ; i++) {
struct timex adj = {};
printf("%d\r", i);
fflush(stdout);
adjtimex(&adj);
}
return 0;
}

Notice how:
1) The original issue (with a bit more complicated scenario) was seen on
ARM and PowerPC platforms
2) Under the above test conditions (on ARM) I *don't* get any deadlock
report printed, the machine just hangs
3) The offending commit (below) I had found through a weird (manual)
process of reverting and re-reverting (where some commits could have
been reverted out of order), so I'm not 100% sure you'd come to the same
conclusions.

commit 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40
Author: John Stultz <[email protected]>
Date: Fri Mar 22 11:37:28 2013 -0700

timekeeping: Hold timekeepering locks in do_adjtimex and hardpps

I'm not able to perform any further testing at this very moment, but if
needed, I can try bisecting again sometime next week, so to make an even
more reliable statement.

Thank you!
Gerlando

>
>
>> [ INFO: inconsistent lock state ]
>> 3.10.0-04864-g346ecc9-dirty #16 Not tainted
>> ---------------------------------
>> inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
>> SAKEY/738 [HC0[0]:SC0[0]:HE1:SE1] takes:
>> (timekeeper_lock){?.-...}, at: [<c004a3e4>] do_adjtimex+0x64/0xbc
>> {IN-HARDIRQ-W} state was registered at:
>> [<c0055138>] __lock_acquire+0xabc/0x1bb8
>> [<c0056838>] lock_acquire+0xa8/0x15c
>> [<c04c14ec>] _raw_spin_lock_irqsave+0x50/0x64
>> [<c00497a4>] do_timer+0x2c/0xa54
>> [<c004e7f4>] tick_periodic+0x74/0x9c
>> [<c004e834>] tick_handle_periodic+0x18/0x7c
>> [<c001349c>] orion_timer_interrupt+0x24/0x34
>> [<c0069c2c>] handle_irq_event_percpu+0x5c/0x300
>> [<c0069f0c>] handle_irq_event+0x3c/0x5c
>> [<c006c194>] handle_level_irq+0x8c/0xe8
>> [<c0069574>] generic_handle_irq+0x30/0x4c
>> [<c000951c>] handle_IRQ+0x30/0x84
>> [<c04c2178>] __irq_svc+0x38/0xa0
>> [<c06cf15c>] calibrate_delay+0x350/0x4e4
>> [<c06986e0>] start_kernel+0x23c/0x2c4
>> [<0000803c>] 0x803c
>> irq event stamp: 32358
>> hardirqs last enabled at (32357): [<c0008c64>] ret_fast_syscall+0x24/0x44
>> hardirqs last disabled at (32358): [<c04c14bc>]
>> _raw_spin_lock_irqsave+0x20/0x64
>> softirqs last enabled at (32160): [<c001e234>] __do_softirq+0x1b8/0x308
>> softirqs last disabled at (32137): [<c001e77c>] irq_exit+0xa0/0xd8
>>
>> other info that might help us debug this:
>> Possible unsafe locking scenario:
>>
>> CPU0
>> ----
>> lock(timekeeper_lock);
>> <Interrupt>
>> lock(timekeeper_lock);
>>
>> *** DEADLOCK ***
>>
>> 1 lock held by SAKEY/738:
>> #0: (timekeeper_lock){?.-...}, at: [<c004a3e4>] do_adjtimex+0x64/0xbc
>>
>> stack backtrace:
>> CPU: 0 PID: 738 Comm: SAKEY Not tainted 3.10.0-04864-g346ecc9-dirty #16
>> [<c000d67c>] (unwind_backtrace+0x0/0xf0) from [<c000b530>]
>> (show_stack+0x10/0x14)
>> [<c000b530>] (show_stack+0x10/0x14) from [<c04ba07c>]
>> (print_usage_bug.part.27+0x218/0x280)
>> [<c04ba07c>] (print_usage_bug.part.27+0x218/0x280) from [<c0053058>]
>> (mark_lock+0x538/0x6bc)
>> [<c0053058>] (mark_lock+0x538/0x6bc) from [<c005326c>]
>> (mark_held_locks+0x90/0x124)
>> [<c005326c>] (mark_held_locks+0x90/0x124) from [<c00533a8>]
>> (trace_hardirqs_on_caller+0xa8/0x23c)
>> [<c00533a8>] (trace_hardirqs_on_caller+0xa8/0x23c) from [<c04c1c60>]
>> (_raw_spin_unlock_irq+0x24/0x5c)
>> [<c04c1c60>] (_raw_spin_unlock_irq+0x24/0x5c) from [<c004ac8c>]
>> (__do_adjtimex+0x17c/0x65c)
>> [<c004ac8c>] (__do_adjtimex+0x17c/0x65c) from [<c004a404>]
>> (do_adjtimex+0x84/0xbc)
>> [<c004a404>] (do_adjtimex+0x84/0xbc) from [<c001d62c>]
>> (SyS_adjtimex+0x50/0xa8)
>> [<c001d62c>] (SyS_adjtimex+0x50/0xa8) from [<c0008c40>]
>> (ret_fast_syscall+0x0/0x44)
>
> Hrmm. So I'm a little confused by the report, as we hold the write lock
> on the timekeeper_lock disabling irqs, so I'm not sure I see how the irq
> could trigger to cause the deadlock. In fact, all the timekeeper_lock
> users save irqs.
>
> Hrmm. I dunno. :(
>
> Thomas, you have a guess?
>
> Let me know how you trigger it and I'll see if I can't reproduce it myself.
>
> thanks
> -john
>
>
>

2013-08-30 23:10:13

by John Stultz

[permalink] [raw]
Subject: Re: kernel deadlock

On 08/30/2013 04:04 PM, Gerlando Falauto wrote:
> Hi,
>
> sorry, it took me a while to narrow it down...
>
> On 08/30/2013 01:45 AM, John Stultz wrote:
>> On 08/29/2013 01:56 PM, Falauto, Gerlando wrote:
>>> Hi everyone,
>>>
>>> I ran into the deadlock situation reported at the bottom.
>>> Actually, on my latest 3.10 kernel for some reason I don't get the
>>> report (the kernel just hangs for some reason), so it took me quite
>>> some
>>> time to track it down.
>>>
>>> Once I figured the trigger to the machine hanging was adjtimex(), I
>>> reverted everything (between 3.9 to 3.10) that was touching
>>> kernel/time/timekeeping/timekeeping.c and kernel/time/ntp.c, I double
>>> checked that indeed the problem was not happening anymore, and finally
>>> started bisecting, landing on the following offending commit.
>>> THEN, and ONLY THEN, did I get the &%""?+"% deadlock report.
>>>
>>> Do you guys have any ideas what could be wrong and how to fix it?
>>
>> Thanks for the report!
>>
>> What exactly is your process for reproducing the issue?
>
> Now (well, now...), it's quite easy.
>
> Three ingredients:
>
> 1) Kernel 3.10
>
> 2) Enable HRTICK
>
> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> index 99399f8..294e3ca 100644
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -41,7 +41,7 @@ SCHED_FEAT(WAKEUP_PREEMPTION, true)
> */
> SCHED_FEAT(ARCH_POWER, true)
>
> -SCHED_FEAT(HRTICK, false)
> +SCHED_FEAT(HRTICK, true)
> SCHED_FEAT(DOUBLE_TICK, false)
> SCHED_FEAT(LB_BIAS, true)
>
> 3) Run the following:
>
> #include <stdio.h>
> #include <sys/timex.h>
>
> int main(void)
> {
> int i;
>
> for (i = 0 ; ; i++) {
> struct timex adj = {};
> printf("%d\r", i);
> fflush(stdout);
> adjtimex(&adj);
> }
> return 0;
> }
>
> Notice how:
> 1) The original issue (with a bit more complicated scenario) was seen
> on ARM and PowerPC platforms
> 2) Under the above test conditions (on ARM) I *don't* get any deadlock
> report printed, the machine just hangs
> 3) The offending commit (below) I had found through a weird (manual)
> process of reverting and re-reverting (where some commits could have
> been reverted out of order), so I'm not 100% sure you'd come to the
> same conclusions.
>
> commit 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40
> Author: John Stultz <[email protected]>
> Date: Fri Mar 22 11:37:28 2013 -0700
>
> timekeeping: Hold timekeepering locks in do_adjtimex and hardpps
>
> I'm not able to perform any further testing at this very moment, but
> if needed, I can try bisecting again sometime next week, so to make an
> even more reliable statement.
>

Thanks so much for the details! I'll take a shot at reproducing this and
will let you know what comes of it.

thanks
-john

2013-08-31 00:48:58

by Stephen Boyd

[permalink] [raw]
Subject: Re: kernel deadlock

On 08/30/13 16:10, John Stultz wrote:
> On 08/30/2013 04:04 PM, Gerlando Falauto wrote:
>> Hi,
>>
>> sorry, it took me a while to narrow it down...
>>
>> On 08/30/2013 01:45 AM, John Stultz wrote:
>>> On 08/29/2013 01:56 PM, Falauto, Gerlando wrote:
>>>> Hi everyone,
>>>>
>>>> I ran into the deadlock situation reported at the bottom.
>>>> Actually, on my latest 3.10 kernel for some reason I don't get the
>>>> report (the kernel just hangs for some reason), so it took me quite
>>>> some
>>>> time to track it down.
>>>>
>>>> Once I figured the trigger to the machine hanging was adjtimex(), I
>>>> reverted everything (between 3.9 to 3.10) that was touching
>>>> kernel/time/timekeeping/timekeeping.c and kernel/time/ntp.c, I double
>>>> checked that indeed the problem was not happening anymore, and finally
>>>> started bisecting, landing on the following offending commit.
>>>> THEN, and ONLY THEN, did I get the &%""?+"% deadlock report.
>>>>
>>>> Do you guys have any ideas what could be wrong and how to fix it?
>>> Thanks for the report!
>>>
>>> What exactly is your process for reproducing the issue?
>> Now (well, now...), it's quite easy.
>>
>> Three ingredients:
>>
>> 1) Kernel 3.10
>>
>> 2) Enable HRTICK
>>
>> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
>> index 99399f8..294e3ca 100644
>> --- a/kernel/sched/features.h
>> +++ b/kernel/sched/features.h
>> @@ -41,7 +41,7 @@ SCHED_FEAT(WAKEUP_PREEMPTION, true)
>> */
>> SCHED_FEAT(ARCH_POWER, true)
>>
>> -SCHED_FEAT(HRTICK, false)
>> +SCHED_FEAT(HRTICK, true)
>> SCHED_FEAT(DOUBLE_TICK, false)
>> SCHED_FEAT(LB_BIAS, true)
>>
>> 3) Run the following:
>>
>> #include <stdio.h>
>> #include <sys/timex.h>
>>
>> int main(void)
>> {
>> int i;
>>
>> for (i = 0 ; ; i++) {
>> struct timex adj = {};
>> printf("%d\r", i);
>> fflush(stdout);
>> adjtimex(&adj);
>> }
>> return 0;
>> }
>>
>> Notice how:
>> 1) The original issue (with a bit more complicated scenario) was seen
>> on ARM and PowerPC platforms
>> 2) Under the above test conditions (on ARM) I *don't* get any deadlock
>> report printed, the machine just hangs
>> 3) The offending commit (below) I had found through a weird (manual)
>> process of reverting and re-reverting (where some commits could have
>> been reverted out of order), so I'm not 100% sure you'd come to the
>> same conclusions.
>>
>> commit 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40
>> Author: John Stultz <[email protected]>
>> Date: Fri Mar 22 11:37:28 2013 -0700
>>
>> timekeeping: Hold timekeepering locks in do_adjtimex and hardpps
>>
>> I'm not able to perform any further testing at this very moment, but
>> if needed, I can try bisecting again sometime next week, so to make an
>> even more reliable statement.
>>
>>

Just curious. Do you have this patch from 3.11 applied to your 3.10
kernel tree?

commit 971ee28cbd1ccd87b3164facd9359a534c1d2892
Author: Peter Zijlstra <[email protected]>
Date: Fri Jun 28 11:18:53 2013 +0200

sched: Fix HRTICK

David reported that the HRTICK sched feature was borken; which was
enough
motivation for me to finally fix it ;-)

We should not allow hrtimer code to do softirq wakeups while holding
schedul
er
locks. The hrtimer code only needs this when we accidentally try to
program
an
expired time. We don't much care about those anyway since we have
the regula
r
tick to fall back to.

Reported-by: David Ahern <[email protected]>
Tested-by: David Ahern <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link:
http://lkml.kernel.org/r/[email protected]
-ass.net
Signed-off-by: Ingo Molnar <[email protected]>

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

2013-08-31 08:11:32

by Gerlando Falauto

[permalink] [raw]
Subject: Re: kernel deadlock

Hi Stephen,
>
> Just curious. Do you have this patch from 3.11 applied to your 3.10
> kernel tree?

Nope, I didn't. But I applied it, and it doesn't seem to make a
difference, unfortunately. :-(

Thanks for your help anyway!
Gerlando

>
> commit 971ee28cbd1ccd87b3164facd9359a534c1d2892
> Author: Peter Zijlstra <[email protected]>
> Date: Fri Jun 28 11:18:53 2013 +0200
>
> sched: Fix HRTICK
>
> David reported that the HRTICK sched feature was borken; which was
> enough
> motivation for me to finally fix it ;-)
>
> We should not allow hrtimer code to do softirq wakeups while holding
> schedul
> er
> locks. The hrtimer code only needs this when we accidentally try to
> program
> an
> expired time. We don't much care about those anyway since we have
> the regula
> r
> tick to fall back to.
>
> Reported-by: David Ahern <[email protected]>
> Tested-by: David Ahern <[email protected]>
> Signed-off-by: Peter Zijlstra <[email protected]>
> Link:
> http://lkml.kernel.org/r/[email protected]
> -ass.net
> Signed-off-by: Ingo Molnar <[email protected]>
>

2013-09-03 14:57:58

by Gerlando Falauto

[permalink] [raw]
Subject: Re: kernel deadlock

Hi,

I tried again from scratch, so let me recap the whole situation, so we
can all view it from the same standpoint. This should make the problem
easier to see and reproduce.

I can confirm that running a stock 3.10 kernel with HRTICK enabled:

diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 99399f8..294e3ca 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -41,7 +41,7 @@ SCHED_FEAT(WAKEUP_PREEMPTION, true)
*/
SCHED_FEAT(ARCH_POWER, true)

-SCHED_FEAT(HRTICK, false)
+SCHED_FEAT(HRTICK, true)
SCHED_FEAT(DOUBLE_TICK, false)
SCHED_FEAT(LB_BIAS, true)

makes the following program (and the whole board, as a matter of fact)
hang with no further notice:

#include <stdio.h>
#include <sys/timex.h>

int main(void)
{
int i;

for (i = 0 ; ; i++) {
struct timex adj = {};
printf("%d\r", i);
fflush(stdout);
adjtimex(&adj);
}
return 0;
}

If I then revert everything up to (and including) the offending commit
(mind the '~'):

$ git log --oneline ...06c017f~ -- kernel/time/timekeeping.c
kernel/time/ntp.c | cut -f1 -d' ' | xargs git revert

The problem disappears.

If I then cherry-pick again the offending commit:

$ git cherry-pick 06c017f; git log -1

commit 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40
Author: John Stultz <[email protected]>
Date: Fri Mar 22 11:37:28 2013 -0700

timekeeping: Hold timekeepering locks in do_adjtimex and hardpps

In moving the NTP state to be protected by the timekeeping locks,
be sure to acquire the timekeeping locks prior to calling
ntp functions.

Cc: Thomas Gleixner <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Signed-off-by: John Stultz <[email protected]>

I get the following deadlock report:

================================ cut ===============================

=================================
[ INFO: inconsistent lock state ]
3.10.0-00018-gd915798 #3 Not tainted
---------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
a.out/574 [HC0[0]:SC0[0]:HE1:SE1] takes:
(timekeeper_lock){?.-...}, at: [<c0049d00>] do_adjtimex+0x94/0xf4
{IN-HARDIRQ-W} state was registered at:
[<c00548fc>] __lock_acquire+0xabc/0x1bb8
[<c0055ffc>] lock_acquire+0xa8/0x15c
[<c0432c1c>] _raw_spin_lock_irqsave+0x50/0x64
[<c0049090>] do_timer+0x2c/0xa54
[<c004dfa8>] tick_periodic+0x74/0x9c
[<c004dfe8>] tick_handle_periodic+0x18/0x7c
[<c0012da8>] orion_timer_interrupt+0x24/0x34
[<c00693b8>] handle_irq_event_percpu+0x5c/0x300
[<c0069698>] handle_irq_event+0x3c/0x5c
[<c006b920>] handle_level_irq+0x8c/0xe8
[<c0068d30>] generic_handle_irq+0x28/0x44
[<c000951c>] handle_IRQ+0x30/0x84
[<c04338b8>] __irq_svc+0x38/0xa0
[<c0617824>] calibrate_delay+0x350/0x4e4
[<c05e26e0>] start_kernel+0x23c/0x2c4
[<0000803c>] 0x803c
irq event stamp: 2840
hardirqs last enabled at (2839): [<c0008ccc>] no_work_pending+0x8/0x28
hardirqs last disabled at (2840): [<c0432bec>]
_raw_spin_lock_irqsave+0x20/0x64
softirqs last enabled at (2098): [<c039e908>] rpc_wake_up_first+0x6c/0x15c
softirqs last disabled at (2096): [<c0432e58>] _raw_spin_lock_bh+0x14/0x54

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(timekeeper_lock);
<Interrupt>
lock(timekeeper_lock);

*** DEADLOCK ***

1 lock held by a.out/574:
#0: (timekeeper_lock){?.-...}, at: [<c0049d00>] do_adjtimex+0x94/0xf4

stack backtrace:
CPU: 0 PID: 574 Comm: a.out Not tainted 3.10.0-00018-gd915798 #3
[<c000d67c>] (unwind_backtrace+0x0/0xf0) from [<c000b530>]
(show_stack+0x10/0x14)
[<c000b530>] (show_stack+0x10/0x14) from [<c042c274>]
(print_usage_bug.part.27+0x218/0x280)
[<c042c274>] (print_usage_bug.part.27+0x218/0x280) from [<c005281c>]
(mark_lock+0x538/0x6bc)
[<c005281c>] (mark_lock+0x538/0x6bc) from [<c0052a30>]
(mark_held_locks+0x90/0x124)
[<c0052a30>] (mark_held_locks+0x90/0x124) from [<c0052b6c>]
(trace_hardirqs_on_caller+0xa8/0x23c)
[<c0052b6c>] (trace_hardirqs_on_caller+0xa8/0x23c) from [<c0433390>]
(_raw_spin_unlock_irq+0x24/0x5c)
[<c0433390>] (_raw_spin_unlock_irq+0x24/0x5c) from [<c004a490>]
(__do_adjtimex+0xf0/0x580)
[<c004a490>] (__do_adjtimex+0xf0/0x580) from [<c0049d20>]
(do_adjtimex+0xb4/0xf4)
[<c0049d20>] (do_adjtimex+0xb4/0xf4) from [<c001cf38>]
(SyS_adjtimex+0x50/0xa8)
[<c001cf38>] (SyS_adjtimex+0x50/0xa8) from [<c0008c40>]
(ret_fast_syscall+0x0/0x44)

================================ cut ===============================

And as soon as I also cherry-pick (notice there is another commit in
between, which seems not to be relevant on this matter):

$ git cherry-pick a076b2146fabb0894cae5e0189a8ba3f1502d737; git show

commit a076b2146fabb0894cae5e0189a8ba3f1502d737
Author: John Stultz <[email protected]>
Date: Fri Mar 22 11:52:03 2013 -0700

ntp: Remove ntp_lock, using the timekeeping locks to protect ntp state

In order to properly handle the NTP state in future changes to the
timekeeping lock management, this patch moves the management of
all of the ntp state under the timekeeping locks.

This allows us to remove the ntp_lock.

Cc: Thomas Gleixner <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Signed-off-by: John Stultz <[email protected]>


I end up in the situation where the system hangs completely and NO
deadlock report whatsoever is output.

So it looks like 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40 introduces the
deadlock, while a076b2146fabb0894cae5e0189a8ba3f1502d737 cares to hide
the report.

Notice how I tested the above on an ARM board; on PowerPC I get similar
results, although I am not able to see the deadlock report under any
circumstances (enabling CONFIG_PROVE_LOCKING, which is the flag that
triggers the deadlock report, causes the kernel to hang at startup even
on a vanilla 3.10 kernel).

John, could you please confirm whether you're at least able to reproduce
it somehow?

Thank you,
Gerlando

2013-09-03 17:26:24

by John Stultz

[permalink] [raw]
Subject: Re: kernel deadlock

On 09/03/2013 07:57 AM, Gerlando Falauto wrote:
> Hi,
>
> I tried again from scratch, so let me recap the whole situation, so we
> can all view it from the same standpoint. This should make the problem
> easier to see and reproduce.
>
> I can confirm that running a stock 3.10 kernel with HRTICK enabled:
[snip]
> makes the following program (and the whole board, as a matter of fact)
> hang with no further notice:
>
[snip]
> If I then revert everything up to (and including) the offending commit
> (mind the '~'):
>
> $ git log --oneline ...06c017f~ -- kernel/time/timekeeping.c
> kernel/time/ntp.c | cut -f1 -d' ' | xargs git revert
>
> The problem disappears.
>
> If I then cherry-pick again the offending commit:
>
> $ git cherry-pick 06c017f; git log -1
>
> commit 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40
> Author: John Stultz <[email protected]>
> Date: Fri Mar 22 11:37:28 2013 -0700
>
> timekeeping: Hold timekeepering locks in do_adjtimex and hardpps
>
[snip]
> And as soon as I also cherry-pick (notice there is another commit in
> between, which seems not to be relevant on this matter):
>
> $ git cherry-pick a076b2146fabb0894cae5e0189a8ba3f1502d737; git show
>
> commit a076b2146fabb0894cae5e0189a8ba3f1502d737
> Author: John Stultz <[email protected]>
> Date: Fri Mar 22 11:52:03 2013 -0700
>
> ntp: Remove ntp_lock, using the timekeeping locks to protect ntp
> state
>
[snip]
>
> I end up in the situation where the system hangs completely and NO
> deadlock report whatsoever is output.
>
> So it looks like 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40 introduces
> the deadlock, while a076b2146fabb0894cae5e0189a8ba3f1502d737 cares to
> hide the report.
>
> Notice how I tested the above on an ARM board; on PowerPC I get
> similar results, although I am not able to see the deadlock report
> under any circumstances (enabling CONFIG_PROVE_LOCKING, which is the
> flag that triggers the deadlock report, causes the kernel to hang at
> startup even on a vanilla 3.10 kernel).
>
> John, could you please confirm whether you're at least able to
> reproduce it somehow?

Thanks again for the detailed notes. I've so far been unable to
reproduce this, but I'm still looking at it.

Enabling the SCHED_FEAT(HRTICK, true) bit tends to cause lots of issues
on the various hardware I have, tripping the lockdep warnings on various
other issues:

[ 4.224487] ======================================================
[ 4.230987] [ INFO: possible circular locking dependency detected ]
[ 4.237579] 3.9.0-rc2-00148-g06c017f-dirty #8 Not tainted
[ 4.243255] -------------------------------------------------------
[ 4.249847] kworker/0:1H/1329 is trying to acquire lock:
[ 4.255432] (&p->pi_lock){-.-.-.}, at: [<c006d184>]
try_to_wake_up+0x30/0x3c
[ 4.263092]
[ 4.263092] but task is already holding lock:
[ 4.269226] (&rq->lock){-.-.-.}, at: [<c0721f18>] __schedule+0xb8/0x7d0
[ 4.276306]
[ 4.276306] which lock already depends on the new lock.

and

[ 2.360591] =============================================
[ 2.363072] [ INFO: possible recursive locking detected ]
[ 2.364882] 3.9.0-rc2+ #9 Not tainted
[ 2.365486] ---------------------------------------------
[ 2.366345] blkid/999 is trying to acquire lock:
[ 2.367092] (&p->pi_lock){-.-.-.}, at: [<ffffffff810c763c>]
try_to_wake_up+0x2c/0x300
[ 2.367275]
[ 2.367275] but task is already holding lock:
[ 2.367275] (&p->pi_lock){-.-.-.}, at: [<ffffffff810c763c>]
try_to_wake_up+0x2c/0x300
[ 2.367275]
[ 2.367275] other info that might help us debug this:
[ 2.367275] Possible unsafe locking scenario:
[ 2.367275]

These warnings disable lockdep, so it may just be I'm not getting a
chance to see the warning you describe.


That said, the one system that doesn't throw those lockdep warnings,
doesn't seem to hit your lockdep warning either, and I've not been able
to trigger any hang using the test code you provided.

Just in case there's something else here at play, could you send me your
.config that you're using?

thanks
-john

2013-09-04 08:11:45

by Gerlando Falauto

[permalink] [raw]
Subject: Re: kernel deadlock

Hi John,

On 09/03/2013 07:26 PM, John Stultz wrote:
> On 09/03/2013 07:57 AM, Gerlando Falauto wrote:
>> Hi,
>>
>> I tried again from scratch, so let me recap the whole situation, so we
>> can all view it from the same standpoint. This should make the problem
>> easier to see and reproduce.
>>
>> I can confirm that running a stock 3.10 kernel with HRTICK enabled:
> [snip]
>> makes the following program (and the whole board, as a matter of fact)
>> hang with no further notice:
>>
> [snip]
>> If I then revert everything up to (and including) the offending commit
>> (mind the '~'):
>>
>> $ git log --oneline ...06c017f~ -- kernel/time/timekeeping.c
>> kernel/time/ntp.c | cut -f1 -d' ' | xargs git revert
>>
>> The problem disappears.
>>
>> If I then cherry-pick again the offending commit:
>>
>> $ git cherry-pick 06c017f; git log -1
>>
>> commit 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40
>> Author: John Stultz <[email protected]>
>> Date: Fri Mar 22 11:37:28 2013 -0700
>>
>> timekeeping: Hold timekeepering locks in do_adjtimex and hardpps
>>
> [snip]
>> And as soon as I also cherry-pick (notice there is another commit in
>> between, which seems not to be relevant on this matter):
>>
>> $ git cherry-pick a076b2146fabb0894cae5e0189a8ba3f1502d737; git show
>>
>> commit a076b2146fabb0894cae5e0189a8ba3f1502d737
>> Author: John Stultz <[email protected]>
>> Date: Fri Mar 22 11:52:03 2013 -0700
>>
>> ntp: Remove ntp_lock, using the timekeeping locks to protect ntp
>> state
>>
> [snip]
>>
>> I end up in the situation where the system hangs completely and NO
>> deadlock report whatsoever is output.
>>
>> So it looks like 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40 introduces
>> the deadlock, while a076b2146fabb0894cae5e0189a8ba3f1502d737 cares to
>> hide the report.
>>
>> Notice how I tested the above on an ARM board; on PowerPC I get
>> similar results, although I am not able to see the deadlock report
>> under any circumstances (enabling CONFIG_PROVE_LOCKING, which is the
>> flag that triggers the deadlock report, causes the kernel to hang at
>> startup even on a vanilla 3.10 kernel).
>>
>> John, could you please confirm whether you're at least able to
>> reproduce it somehow?
>
> Thanks again for the detailed notes. I've so far been unable to
> reproduce this, but I'm still looking at it.
>
> Enabling the SCHED_FEAT(HRTICK, true) bit tends to cause lots of issues
> on the various hardware I have, tripping the lockdep warnings on various
> other issues:
>
> [ 4.224487] ======================================================
> [ 4.230987] [ INFO: possible circular locking dependency detected ]
> [ 4.237579] 3.9.0-rc2-00148-g06c017f-dirty #8 Not tainted
> [ 4.243255] -------------------------------------------------------
> [ 4.249847] kworker/0:1H/1329 is trying to acquire lock:
> [ 4.255432] (&p->pi_lock){-.-.-.}, at: [<c006d184>]
> try_to_wake_up+0x30/0x3c
> [ 4.263092]
> [ 4.263092] but task is already holding lock:
> [ 4.269226] (&rq->lock){-.-.-.}, at: [<c0721f18>] __schedule+0xb8/0x7d0
> [ 4.276306]
> [ 4.276306] which lock already depends on the new lock.
>
> and
>
> [ 2.360591] =============================================
> [ 2.363072] [ INFO: possible recursive locking detected ]
> [ 2.364882] 3.9.0-rc2+ #9 Not tainted
> [ 2.365486] ---------------------------------------------
> [ 2.366345] blkid/999 is trying to acquire lock:
> [ 2.367092] (&p->pi_lock){-.-.-.}, at: [<ffffffff810c763c>]
> try_to_wake_up+0x2c/0x300
> [ 2.367275]
> [ 2.367275] but task is already holding lock:
> [ 2.367275] (&p->pi_lock){-.-.-.}, at: [<ffffffff810c763c>]
> try_to_wake_up+0x2c/0x300
> [ 2.367275]
> [ 2.367275] other info that might help us debug this:
> [ 2.367275] Possible unsafe locking scenario:
> [ 2.367275]
>
> These warnings disable lockdep, so it may just be I'm not getting a
> chance to see the warning you describe.
>
>
> That said, the one system that doesn't throw those lockdep warnings,
> doesn't seem to hit your lockdep warning either, and I've not been able
> to trigger any hang using the test code you provided.
>

So by the sounds of it, it is more of an issue with HRTICK than with
your changes...

> Just in case there's something else here at play, could you send me your
> .config that you're using?
>

Yes, it's definitely possible that there's something else.
Even though my rate of failure (2 out of 2 boards tested) had me
thinking this problem would have been quite common instead...

Here is my km_kirkwood_defconfig (hope that's the right way to send it):

# CONFIG_ARM_PATCH_PHYS_VIRT is not set
CONFIG_PHYS_OFFSET=0x0
CONFIG_SYSVIPC=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_LOG_BUF_SHIFT=19
CONFIG_CGROUPS=y
CONFIG_CGROUP_SCHED=y
CONFIG_NAMESPACES=y
CONFIG_EMBEDDED=y
CONFIG_PROFILING=y
CONFIG_OPROFILE=y
CONFIG_KPROBES=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_BLK_DEV_BSG is not set
CONFIG_ARCH_KIRKWOOD=y
CONFIG_MACH_D2NET_V2=y
CONFIG_MACH_DOCKSTAR=y
CONFIG_MACH_ESATA_SHEEVAPLUG=y
CONFIG_MACH_GURUPLUG=y
CONFIG_MACH_INETSPACE_V2=y
CONFIG_MACH_MV88F6281GTW_GE=y
CONFIG_MACH_NET2BIG_V2=y
CONFIG_MACH_NET5BIG_V2=y
CONFIG_MACH_NETSPACE_MAX_V2=y
CONFIG_MACH_NETSPACE_V2=y
CONFIG_MACH_OPENRD_BASE=y
CONFIG_MACH_OPENRD_CLIENT=y
CONFIG_MACH_OPENRD_ULTIMATE=y
CONFIG_MACH_RD88F6192_NAS=y
CONFIG_MACH_RD88F6281=y
CONFIG_MACH_SHEEVAPLUG=y
CONFIG_MACH_T5325=y
CONFIG_MACH_TS219=y
CONFIG_MACH_TS41X=y
CONFIG_MACH_DLINK_KIRKWOOD_DT=y
CONFIG_MACH_DOCKSTAR_DT=y
CONFIG_MACH_DREAMPLUG_DT=y
CONFIG_MACH_GOFLEXNET_DT=y
CONFIG_MACH_IB62X0_DT=y
CONFIG_MACH_ICONNECT_DT=y
CONFIG_MACH_INETSPACE_V2_DT=y
CONFIG_MACH_IOMEGA_IX2_200_DT=y
CONFIG_MACH_KM_KIRKWOOD_DT=y
CONFIG_MACH_LSXL_DT=y
CONFIG_MACH_MPLCEC4_DT=y
CONFIG_MACH_NETSPACE_LITE_V2_DT=y
CONFIG_MACH_NETSPACE_MAX_V2_DT=y
CONFIG_MACH_NETSPACE_MINI_V2_DT=y
CONFIG_MACH_NETSPACE_V2_DT=y
CONFIG_MACH_OPENBLOCKS_A6_DT=y
CONFIG_MACH_TOPKICK_DT=y
CONFIG_MACH_TS219_DT=y
# CONFIG_CPU_FEROCEON_OLD_ID is not set
CONFIG_CACHE_FEROCEON_L2_WRITETHROUGH=y
CONFIG_PREEMPT=y
CONFIG_AEABI=y
# CONFIG_OABI_COMPAT is not set
CONFIG_ZBOOT_ROM_TEXT=0x0
CONFIG_ZBOOT_ROM_BSS=0x0
CONFIG_CPU_IDLE=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
CONFIG_IP_PNP_BOOTP=y
# CONFIG_IPV6 is not set
CONFIG_NET_PKTGEN=m
CONFIG_CFG80211=y
CONFIG_MAC80211=y
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_MTD=y
CONFIG_MTD_CMDLINE_PARTS=y
CONFIG_MTD_BLOCK=y
CONFIG_MTD_CFI=y
CONFIG_MTD_JEDECPROBE=y
CONFIG_MTD_CFI_ADV_OPTIONS=y
CONFIG_MTD_CFI_GEOMETRY=y
# CONFIG_MTD_MAP_BANK_WIDTH_4 is not set
CONFIG_MTD_CFI_INTELEXT=y
CONFIG_MTD_CFI_STAA=y
CONFIG_MTD_PHYSMAP=y
CONFIG_MTD_M25P80=y
CONFIG_MTD_NAND=y
CONFIG_MTD_NAND_ORION=y
CONFIG_BLK_DEV_LOOP=y
# CONFIG_SCSI_PROC_FS is not set
CONFIG_BLK_DEV_SD=y
CONFIG_BLK_DEV_SR=m
CONFIG_CHR_DEV_SG=m
CONFIG_ATA=y
CONFIG_SATA_AHCI=y
CONFIG_SATA_MV=y
CONFIG_NETDEVICES=y
CONFIG_MII=y
CONFIG_NET_DSA_MV88E6123_61_65=y
CONFIG_MV643XX_ETH=y
CONFIG_MARVELL_PHY=y
CONFIG_LIBERTAS=y
CONFIG_LIBERTAS_SDIO=y
CONFIG_INPUT_EVDEV=y
CONFIG_KEYBOARD_GPIO=y
# CONFIG_INPUT_MOUSE is not set
CONFIG_LEGACY_PTY_COUNT=16
# CONFIG_DEVKMEM is not set
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_RUNTIME_UARTS=2
CONFIG_SERIAL_OF_PLATFORM=y
# CONFIG_HW_RANDOM is not set
CONFIG_I2C=y
# CONFIG_I2C_COMPAT is not set
CONFIG_I2C_CHARDEV=y
CONFIG_I2C_MV64XXX=y
CONFIG_SPI=y
CONFIG_SPI_ORION=y
CONFIG_GPIO_SYSFS=y
# CONFIG_HWMON is not set
CONFIG_WATCHDOG=y
CONFIG_ORION_WATCHDOG=y
CONFIG_HID_DRAGONRISE=y
CONFIG_HID_GYRATION=y
CONFIG_HID_TWINHAN=y
CONFIG_HID_NTRIG=y
CONFIG_HID_PANTHERLORD=y
CONFIG_HID_PETALYNX=y
CONFIG_HID_SAMSUNG=y
CONFIG_HID_SONY=y
CONFIG_HID_SUNPLUS=y
CONFIG_HID_GREENASIA=y
CONFIG_HID_SMARTJOYPLUS=y
CONFIG_HID_TOPSEED=y
CONFIG_HID_THRUSTMASTER=y
CONFIG_HID_ZEROPLUS=y
CONFIG_USB=y
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_PRINTER=m
CONFIG_USB_STORAGE=y
CONFIG_USB_STORAGE_DATAFAB=y
CONFIG_USB_STORAGE_FREECOM=y
CONFIG_USB_STORAGE_SDDR09=y
CONFIG_USB_STORAGE_SDDR55=y
CONFIG_USB_STORAGE_JUMPSHOT=y
CONFIG_MMC=y
CONFIG_SDIO_UART=y
CONFIG_MMC_MVSDIO=y
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
CONFIG_LEDS_GPIO=y
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=y
CONFIG_LEDS_TRIGGER_HEARTBEAT=y
CONFIG_LEDS_TRIGGER_DEFAULT_ON=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_DRV_S35390A=y
CONFIG_RTC_DRV_MV=y
CONFIG_DMADEVICES=y
CONFIG_MV_XOR=y
CONFIG_EXT2_FS=y
CONFIG_EXT3_FS=y
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_ISO9660_FS=m
CONFIG_JOLIET=y
CONFIG_UDF_FS=m
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_TMPFS=y
CONFIG_JFFS2_FS=y
CONFIG_CRAMFS=y
CONFIG_NFS_FS=y
CONFIG_ROOT_NFS=y
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_CODEPAGE_850=y
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_ISO8859_2=y
CONFIG_NLS_UTF8=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_FS=y
# CONFIG_SCHED_DEBUG is not set
# CONFIG_DEBUG_PREEMPT is not set
CONFIG_DEBUG_INFO=y
# CONFIG_FTRACE is not set
CONFIG_DEBUG_USER=y
CONFIG_DEBUG_LL=y
CONFIG_CRYPTO_CBC=m
CONFIG_CRYPTO_PCBC=m
# CONFIG_CRYPTO_ANSI_CPRNG is not set
CONFIG_CRYPTO_DEV_MV_CESA=y
CONFIG_CRC_CCITT=y
CONFIG_CRC16=y
CONFIG_LIBCRC32C=y

What kind of systems are you using? ARM boards right?
I guess it would be nice for the two of us to work on the same board.
I could try reproducing it on a TK71 or on a Dreamplug, though that's
going to take quite some time.
Before I even try, I'd like to know whether you would have either of
those available.
As I said, we might be looking at an issue that's caused by something
completely unrelated (which is not even turned on by default, and has
several other issues too). So I'm not sure we should keep bothering.

Thanks,
Gerlando

2013-09-09 10:09:03

by Peter Zijlstra

[permalink] [raw]
Subject: Re: kernel deadlock

On Tue, Sep 03, 2013 at 10:26:19AM -0700, John Stultz wrote:
> Enabling the SCHED_FEAT(HRTICK, true) bit tends to cause lots of issues
> on the various hardware I have, tripping the lockdep warnings on various
> other issues:

Does whatever kernel you guys are running have this commit:

---
commit 971ee28cbd1ccd87b3164facd9359a534c1d2892
Author: Peter Zijlstra <[email protected]>
Date: Fri Jun 28 11:18:53 2013 +0200

sched: Fix HRTICK

David reported that the HRTICK sched feature was borken; which was enough
motivation for me to finally fix it ;-)

We should not allow hrtimer code to do softirq wakeups while holding scheduler
locks. The hrtimer code only needs this when we accidentally try to program an
expired time. We don't much care about those anyway since we have the regular
tick to fall back to.

Reported-by: David Ahern <[email protected]>
Tested-by: David Ahern <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9b1f2e5..0d8eb45 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -370,13 +370,6 @@ static struct rq *this_rq_lock(void)
#ifdef CONFIG_SCHED_HRTICK
/*
* Use HR-timers to deliver accurate preemption points.
- *
- * Its all a bit involved since we cannot program an hrt while holding the
- * rq->lock. So what we do is store a state in in rq->hrtick_* and ask for a
- * reschedule event.
- *
- * When we get rescheduled we reprogram the hrtick_timer outside of the
- * rq->lock.
*/

static void hrtick_clear(struct rq *rq)
@@ -404,6 +397,15 @@ static enum hrtimer_restart hrtick(struct hrtimer *timer)
}

#ifdef CONFIG_SMP
+
+static int __hrtick_restart(struct rq *rq)
+{
+ struct hrtimer *timer = &rq->hrtick_timer;
+ ktime_t time = hrtimer_get_softexpires(timer);
+
+ return __hrtimer_start_range_ns(timer, time, 0, HRTIMER_MODE_ABS_PINNED, 0);
+}
+
/*
* called from hardirq (IPI) context
*/
@@ -412,7 +414,7 @@ static void __hrtick_start(void *arg)
struct rq *rq = arg;

raw_spin_lock(&rq->lock);
- hrtimer_restart(&rq->hrtick_timer);
+ __hrtick_restart(rq);
rq->hrtick_csd_pending = 0;
raw_spin_unlock(&rq->lock);
}
@@ -430,7 +432,7 @@ void hrtick_start(struct rq *rq, u64 delay)
hrtimer_set_expires(timer, time);

if (rq == this_rq()) {
- hrtimer_restart(timer);
+ __hrtick_restart(rq);
} else if (!rq->hrtick_csd_pending) {
__smp_call_function_single(cpu_of(rq), &rq->hrtick_csd, 0);
rq->hrtick_csd_pending = 1;

2013-09-09 20:29:54

by John Stultz

[permalink] [raw]
Subject: Re: kernel deadlock

On 09/04/2013 01:11 AM, Gerlando Falauto wrote:
> Hi John,
>
> On 09/03/2013 07:26 PM, John Stultz wrote:
>> On 09/03/2013 07:57 AM, Gerlando Falauto wrote:
>>> Hi,
>>>
>>> I tried again from scratch, so let me recap the whole situation, so we
>>> can all view it from the same standpoint. This should make the problem
>>> easier to see and reproduce.
>>>
>>> I can confirm that running a stock 3.10 kernel with HRTICK enabled:
>> [snip]
>>> makes the following program (and the whole board, as a matter of fact)
>>> hang with no further notice:
>>>
>> [snip]
>>> If I then revert everything up to (and including) the offending commit
>>> (mind the '~'):
>>>
>>> $ git log --oneline ...06c017f~ -- kernel/time/timekeeping.c
>>> kernel/time/ntp.c | cut -f1 -d' ' | xargs git revert
>>>
>>> The problem disappears.
>>>
>>> If I then cherry-pick again the offending commit:
>>>
>>> $ git cherry-pick 06c017f; git log -1
>>>
>>> commit 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40
>>> Author: John Stultz <[email protected]>
>>> Date: Fri Mar 22 11:37:28 2013 -0700
>>>
>>> timekeeping: Hold timekeepering locks in do_adjtimex and hardpps
>>>
>> [snip]
>>> And as soon as I also cherry-pick (notice there is another commit in
>>> between, which seems not to be relevant on this matter):
>>>
>>> $ git cherry-pick a076b2146fabb0894cae5e0189a8ba3f1502d737; git show
>>>
>>> commit a076b2146fabb0894cae5e0189a8ba3f1502d737
>>> Author: John Stultz <[email protected]>
>>> Date: Fri Mar 22 11:52:03 2013 -0700
>>>
>>> ntp: Remove ntp_lock, using the timekeeping locks to protect ntp
>>> state
>>>
>> [snip]
>>>
>>> I end up in the situation where the system hangs completely and NO
>>> deadlock report whatsoever is output.
>>>
>>> So it looks like 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40 introduces
>>> the deadlock, while a076b2146fabb0894cae5e0189a8ba3f1502d737 cares to
>>> hide the report.
>>>
>>> Notice how I tested the above on an ARM board; on PowerPC I get
>>> similar results, although I am not able to see the deadlock report
>>> under any circumstances (enabling CONFIG_PROVE_LOCKING, which is the
>>> flag that triggers the deadlock report, causes the kernel to hang at
>>> startup even on a vanilla 3.10 kernel).
>>>
>>> John, could you please confirm whether you're at least able to
>>> reproduce it somehow?
>>
>> Thanks again for the detailed notes. I've so far been unable to
>> reproduce this, but I'm still looking at it.

So I think I've managed to finally reproduce this and hunt it down.

With Peter's "sched: Fix HRTICK" patch and HRTICK enabled, I found I
could trigger a hard hang at boot on my x86_64 kvm system. sysrq didn't
function, so I checked out info cpus and that pointed to both cpus being
in ktime_get() and ktime_get_update_offsets(), which suggested a
seqcount deadlock (basically calling something that reads the seqlock
while we hold the write on it).

Unfortunately the seqlock/seqcount infrastructure doesn't support
lockdep, so I added some debug code to take and release the
timekeeper_lock in every function that does a read on the timekeeper_seq.

This popped up with:
[ 15.846386] ======================================================
[ 15.847174] [ INFO: possible circular locking dependency detected ]
[ 15.847959] 3.10.0+ #30 Not tainted
[ 15.848402] -------------------------------------------------------
[ 15.849182] ntpd/4062 is trying to acquire lock:
[ 15.849765] (&(&pool->lock)->rlock){..-...}, at:
[<ffffffff810aa9b5>] __queue_work+0x145/0x480
[ 15.850051]
[ 15.850051] but task is already holding lock:
[ 15.850051] (timekeeper_lock){-.-.-.}, at: [<ffffffff810df6df>]
do_adjtimex+0x7f/0x100
[ 15.850051]
[ 15.850051] which lock already depends on the new lock.
[ 15.850051]
[ 15.850051]
[ 15.850051] the existing dependency chain (in reverse order) is:
[ 15.850051]
-> #2 (timekeeper_lock){-.-.-.}:
[ 15.850051] [<ffffffff810eee7e>] lock_acquire+0x9e/0x1e0
[ 15.850051] [<ffffffff81900983>] _raw_spin_lock_irqsave+0x53/0x90
[ 15.850051] [<ffffffff810dde62>] ktime_get+0x32/0x110
[ 15.850051] [<ffffffff810c0cd0>] hrtick_start+0x20/0x90
[ 15.850051] [<ffffffff810c8fb3>] hrtick_start_fair+0x83/0xd0
[ 15.850051] [<ffffffff810c94ce>] hrtick_update+0x5e/0x60
[ 15.850051] [<ffffffff810cc5a1>] enqueue_task_fair+0x961/0xfd0
[ 15.850051] [<ffffffff810c0c11>] enqueue_task+0x61/0x70
[ 15.850051] [<ffffffff810c101e>] activate_task+0x1e/0x20
[ 15.850051] [<ffffffff818fedf9>] __schedule+0x799/0xa10
[ 15.850051] [<ffffffff818ff094>] schedule+0x24/0x70
[ 15.850051] [<ffffffff818fc84c>] schedule_timeout+0x16c/0x2c0
[ 15.850051] [<ffffffff818fe60d>] io_schedule_timeout+0x9d/0xf0
[ 15.850051] [<ffffffff818ff74f>] wait_for_completion_io+0xaf/0x120
[ 15.850051] [<ffffffff813468ec>] blk_execute_rq+0x8c/0xe0
[ 15.850051] [<ffffffff8157b23f>] scsi_execute+0xdf/0x170
[ 15.850051] [<ffffffff8157b365>] scsi_execute_req_flags+0x95/0x110
[ 15.850051] [<ffffffff8158c3b4>] sr_check_events+0xa4/0x2e0
[ 15.850051] [<ffffffff815e0393>] cdrom_check_events+0x13/0x30
[ 15.850051] [<ffffffff8158bde4>] sr_block_check_events+0x34/0x50
[ 15.850051] [<ffffffff81349dd7>] disk_check_events+0x57/0x180
[ 15.850051] [<ffffffff81349f17>] disk_events_workfn+0x17/0x20
[ 15.850051] [<ffffffff810ab777>] process_one_work+0x1e7/0x660
[ 15.850051] [<ffffffff810ac069>] worker_thread+0x119/0x370
[ 15.850051] [<ffffffff810b4376>] kthread+0xd6/0xe0
[ 15.850051] [<ffffffff81901fac>] ret_from_fork+0x7c/0xb0
[ 15.850051]
-> #1 (&p->pi_lock){-.-.-.}:
[ 15.850051] [<ffffffff810eee7e>] lock_acquire+0x9e/0x1e0
[ 15.850051] [<ffffffff81900983>] _raw_spin_lock_irqsave+0x53/0x90
[ 15.850051] [<ffffffff810c436c>] try_to_wake_up+0x2c/0x300
[ 15.850051] [<ffffffff810c467e>] wake_up_process+0x1e/0x40
[ 15.850051] [<ffffffff810a89e2>] start_worker+0x22/0x30
[ 15.850051] [<ffffffff810a9d98>] create_and_start_worker+0x48/0x90
[ 15.850051] [<ffffffff820d89a8>] init_workqueues+0x1b8/0x380
[ 15.850051] [<ffffffff820c0e94>] do_one_initcall+0x7b/0x114
[ 15.850051] [<ffffffff820c0fc6>] kernel_init_freeable+0x99/0x1c3
[ 15.850051] [<ffffffff818e2b59>] kernel_init+0x9/0xf0
[ 15.850051] [<ffffffff81901fac>] ret_from_fork+0x7c/0xb0
[ 15.850051]
-> #0 (&(&pool->lock)->rlock){..-...}:
[ 15.850051] [<ffffffff810ed20e>] __lock_acquire+0x1dce/0x1ee0
[ 15.850051] [<ffffffff810eee7e>] lock_acquire+0x9e/0x1e0
[ 15.850051] [<ffffffff819008f1>] _raw_spin_lock+0x41/0x80
[ 15.850051] [<ffffffff810aa9b5>] __queue_work+0x145/0x480
[ 15.850051] [<ffffffff810aadba>] __queue_delayed_work+0xaa/0x1a0
[ 15.850051] [<ffffffff810ac4de>] queue_delayed_work_on+0x8e/0xa0
[ 15.850051] [<ffffffff810dfe68>] __do_adjtimex+0x158/0x560
[ 15.850051] [<ffffffff810df703>] do_adjtimex+0xa3/0x100
[ 15.850051] [<ffffffff81090f7b>] SYSC_adjtimex+0x3b/0x80
[ 15.850051] [<ffffffff810912b9>] SyS_adjtimex+0x9/0x10
[ 15.850051] [<ffffffff81902264>] tracesys+0xdd/0xe2
[ 15.850051]
[ 15.850051] other info that might help us debug this:
[ 15.850051]
[ 15.850051] Chain exists of:
&(&pool->lock)->rlock --> &p->pi_lock --> timekeeper_lock
[ 15.850051] Possible unsafe locking scenario:
[ 15.850051]
[ 15.850051] CPU0 CPU1
[ 15.850051] ---- ----
[ 15.850051] lock(timekeeper_lock);
[ 15.850051] lock(&p->pi_lock);
[ 15.850051] lock(timekeeper_lock);
[ 15.850051] lock(&(&pool->lock)->rlock);
[ 15.850051]
[ 15.850051] *** DEADLOCK ***
[ 15.850051]
[ 15.850051] 1 lock held by ntpd/4062:
[ 15.850051] #0: (timekeeper_lock){-.-.-.}, at: [<ffffffff810df6df>]
do_adjtimex+0x7f/0x100
[ 15.850051]
[ 15.850051] stack backtrace:
[ 15.850051] CPU: 0 PID: 4062 Comm: ntpd Not tainted 3.10.0+ #30
[ 15.850051] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 15.850051] ffffffff82354260 ffff880000239b78 ffffffff818f6cfe
ffff880000239bb8
[ 15.850051] ffffffff818f2f51 ffff880000239c40 ffff88000002ea78
ffff88000002ea40
[ 15.850051] ffff88000002e310 0000000000000000 0000000000000001
ffff880000239ca8
[ 15.850051] Call Trace:
[ 15.850051] [<ffffffff818f6cfe>] dump_stack+0x19/0x1b
[ 15.850051] [<ffffffff818f2f51>] print_circular_bug+0x1f9/0x208
[ 15.850051] [<ffffffff810ed20e>] __lock_acquire+0x1dce/0x1ee0
[ 15.850051] [<ffffffff810e9c2d>] ? trace_hardirqs_off+0xd/0x10
[ 15.850051] [<ffffffff810c6fad>] ? sched_clock_local+0x1d/0x90
[ 15.850051] [<ffffffff810eee7e>] lock_acquire+0x9e/0x1e0
[ 15.850051] [<ffffffff810aa9b5>] ? __queue_work+0x145/0x480
[ 15.850051] [<ffffffff819008f1>] _raw_spin_lock+0x41/0x80
[ 15.850051] [<ffffffff810aa9b5>] ? __queue_work+0x145/0x480
[ 15.850051] [<ffffffff810aa9b5>] __queue_work+0x145/0x480
[ 15.850051] [<ffffffff810aadba>] __queue_delayed_work+0xaa/0x1a0
[ 15.850051] [<ffffffff810ac4de>] queue_delayed_work_on+0x8e/0xa0
[ 15.850051] [<ffffffff810dfe68>] __do_adjtimex+0x158/0x560
[ 15.850051] [<ffffffff810df703>] do_adjtimex+0xa3/0x100
[ 15.850051] [<ffffffff81090f7b>] SYSC_adjtimex+0x3b/0x80
[ 15.850051] [<ffffffff810912b9>] SyS_adjtimex+0x9/0x10
[ 15.850051] [<ffffffff81902264>] tracesys+0xdd/0xe2



So yea, thanks to the lockdep code we can see there's a three-lock deadlock.

And as you discovered, it connected to adjtimex and the timekeeping
locking changes. Specifically that we're now calling
schedule_delayed_work() under a timekeeper_seq write, and that somehow
ends up calling back to a timekeeping accessor function.

Now, I'm still in the dark as to why HRTICK exposes this, but seems like
the following patch should resolve the issue (and quiets the lockdep
warnings in my testing). Let me know how it works for you!

Ingo: This makes me think we really should have some lockdep smarts
added to seqlock/seqcount structures. Is there something already
discovered thats preventing this, or has this just not yet been tried?

thanks
-john

>From 09885df9734aa678ccb61dbef356fea4faeff16c Mon Sep 17 00:00:00 2001
From: John Stultz <[email protected]>
Date: Mon, 9 Sep 2013 13:07:18 -0700
Subject: [PATCH] timekeepeing: Fix HRTICK related deadlock from ntp lock
changes

It was reported that when HRTICK is enabled, its possible to
trigger system deadlocks. These were hard to reproduce, as
HRTICK has been broken in the past, but seemed to be connected
to the timekeeping_seq lock.

Since seqlock/seqcount's aren't supported w/ lockdep, I added
some extra spinlock based locking and triggered the following
lockdep output:

[ 15.849182] ntpd/4062 is trying to acquire lock:
[ 15.849765] (&(&pool->lock)->rlock){..-...}, at: [<ffffffff810aa9b5>] __queue_work+0x145/0x480
[ 15.850051]
[ 15.850051] but task is already holding lock:
[ 15.850051] (timekeeper_lock){-.-.-.}, at: [<ffffffff810df6df>] do_adjtimex+0x7f/0x100

<snip>

[ 15.850051] Chain exists of:
&(&pool->lock)->rlock --> &p->pi_lock --> timekeeper_lock
[ 15.850051] Possible unsafe locking scenario:
[ 15.850051]
[ 15.850051] CPU0 CPU1
[ 15.850051] ---- ----
[ 15.850051] lock(timekeeper_lock);
[ 15.850051] lock(&p->pi_lock);
[ 15.850051] lock(timekeeper_lock);
[ 15.850051] lock(&(&pool->lock)->rlock);
[ 15.850051]
[ 15.850051] *** DEADLOCK ***

This patch avoids this deadlock, by moving the call to
schedule_delayed_work() outside of the timekeeper lock
critical section.

Reported-by: Gerlando Falauto <[email protected]>
Signed-off-by: John Stultz <[email protected]>
---
include/linux/timex.h | 1 +
kernel/time/ntp.c | 6 ++----
kernel/time/timekeeping.c | 2 ++
3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/timex.h b/include/linux/timex.h
index b3726e6..dd3edd7 100644
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -141,6 +141,7 @@ extern int do_adjtimex(struct timex *);
extern void hardpps(const struct timespec *, const struct timespec *);

int read_current_timer(unsigned long *timer_val);
+void ntp_notify_cmos_timer(void);

/* The clock frequency of the i8253/i8254 PIT */
#define PIT_TICK_RATE 1193182ul
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 8f5b3b9..bb22151 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -516,13 +516,13 @@ static void sync_cmos_clock(struct work_struct *work)
schedule_delayed_work(&sync_cmos_work, timespec_to_jiffies(&next));
}

-static void notify_cmos_timer(void)
+void ntp_notify_cmos_timer(void)
{
schedule_delayed_work(&sync_cmos_work, 0);
}

#else
-static inline void notify_cmos_timer(void) { }
+void ntp_notify_cmos_timer(void) { }
#endif


@@ -687,8 +687,6 @@ int __do_adjtimex(struct timex *txc, struct timespec *ts, s32 *time_tai)
if (!(time_status & STA_NANO))
txc->time.tv_usec /= NSEC_PER_USEC;

- notify_cmos_timer();
-
return result;
}

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index baeeb5c..fcc261c 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1682,6 +1682,8 @@ int do_adjtimex(struct timex *txc)
write_seqcount_end(&timekeeper_seq);
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);

+ ntp_notify_cmos_timer();
+
return ret;
}

--
1.8.1.2

2013-09-10 07:29:14

by Ingo Molnar

[permalink] [raw]
Subject: Re: kernel deadlock


* John Stultz <[email protected]> wrote:

> Now, I'm still in the dark as to why HRTICK exposes this, but seems like
> the following patch should resolve the issue (and quiets the lockdep
> warnings in my testing). Let me know how it works for you!
>
> Ingo: This makes me think we really should have some lockdep smarts
> added to seqlock/seqcount structures. Is there something already
> discovered thats preventing this, or has this just not yet been tried?

There's no fundamental reason to not do it, it's a very good idea!

> From 09885df9734aa678ccb61dbef356fea4faeff16c Mon Sep 17 00:00:00 2001
> From: John Stultz <[email protected]>
> Date: Mon, 9 Sep 2013 13:07:18 -0700
> Subject: [PATCH] timekeepeing: Fix HRTICK related deadlock from ntp lock changes

Cool fix! I think I've seen this deadlock in various incarnations but was
never able to track it down. Do you have a guesstimate about roughly when
it was introduced? Is it an ancient bug?

Thanks,

Ingo

2013-09-10 07:56:54

by Peter Zijlstra

[permalink] [raw]
Subject: Re: kernel deadlock

On Mon, Sep 09, 2013 at 01:29:47PM -0700, John Stultz wrote:
> Ingo: This makes me think we really should have some lockdep smarts
> added to seqlock/seqcount structures. Is there something already
> discovered thats preventing this, or has this just not yet been tried?

The only reason this hasn't been done in the past is because of the VDSO
I believe. We use a seqcount_t from userspace.

We either have to open-code the VDSO version or create a variant for
that specific single use case.

2013-09-10 08:59:12

by Ming Lin

[permalink] [raw]
Subject: Re: kernel deadlock

On Tue, Sep 10, 2013 at 4:29 AM, John Stultz <[email protected]> wrote:

[snip]

>
> So I think I've managed to finally reproduce this and hunt it down.
>
> With Peter's "sched: Fix HRTICK" patch and HRTICK enabled, I found I
> could trigger a hard hang at boot on my x86_64 kvm system. sysrq didn't
> function, so I checked out info cpus and that pointed to both cpus being

Hi,

Is "info cpus" a command of kvm/qemu? That's very helpful.
I can reproduce this bug, but there is no any output.
How did you find out that both cpus being in ktime_get() and
ktime_get_update_offsets().

> in ktime_get() and ktime_get_update_offsets(), which suggested a
> seqcount deadlock (basically calling something that reads the seqlock
> while we hold the write on it).

HRTICK enabled, then I can reproduce this simply with,

while [ 1 ] ;
adjtimex -t 9999
done

And your patch fixed it.

Thanks,
Lin Ming

>
> Unfortunately the seqlock/seqcount infrastructure doesn't support
> lockdep, so I added some debug code to take and release the
> timekeeper_lock in every function that does a read on the timekeeper_seq.

2013-09-10 16:23:57

by John Stultz

[permalink] [raw]
Subject: Re: kernel deadlock

On 09/10/2013 12:29 AM, Ingo Molnar wrote:
> * John Stultz <[email protected]> wrote:
>
>> Now, I'm still in the dark as to why HRTICK exposes this, but seems like
>> the following patch should resolve the issue (and quiets the lockdep
>> warnings in my testing). Let me know how it works for you!
>>
>> Ingo: This makes me think we really should have some lockdep smarts
>> added to seqlock/seqcount structures. Is there something already
>> discovered thats preventing this, or has this just not yet been tried?
> There's no fundamental reason to not do it, it's a very good idea!
>
>> From 09885df9734aa678ccb61dbef356fea4faeff16c Mon Sep 17 00:00:00 2001
>> From: John Stultz <[email protected]>
>> Date: Mon, 9 Sep 2013 13:07:18 -0700
>> Subject: [PATCH] timekeepeing: Fix HRTICK related deadlock from ntp lock changes
> Cool fix! I think I've seen this deadlock in various incarnations but was
> never able to track it down. Do you have a guesstimate about roughly when
> it was introduced? Is it an ancient bug?

3.10, I believe. So no not ancient, but we've seen and resolved similar
issues in the past (the old printk while holding xtime_lock deadlock).

Hopefully the lockdep enablement helps sort these out for good.

thanks
-john

2013-09-10 16:38:49

by John Stultz

[permalink] [raw]
Subject: Re: kernel deadlock

On 09/10/2013 01:59 AM, Lin Ming wrote:
> On Tue, Sep 10, 2013 at 4:29 AM, John Stultz <[email protected]> wrote:
>
> [snip]
>
>> So I think I've managed to finally reproduce this and hunt it down.
>>
>> With Peter's "sched: Fix HRTICK" patch and HRTICK enabled, I found I
>> could trigger a hard hang at boot on my x86_64 kvm system. sysrq didn't
>> function, so I checked out info cpus and that pointed to both cpus being
> Hi,
>
> Is "info cpus" a command of kvm/qemu? That's very helpful.
Yes. If you switch into the qemu monitor, you can use "info cpus" to
show the current instruction pointers

> I can reproduce this bug, but there is no any output.
> How did you find out that both cpus being in ktime_get() and
> ktime_get_update_offsets().

Once I had the instruction pointers, I ran gdb on the vmlinux and used
"list *<address>" to show where it was stuck.

>> in ktime_get() and ktime_get_update_offsets(), which suggested a
>> seqcount deadlock (basically calling something that reads the seqlock
>> while we hold the write on it).
> HRTICK enabled, then I can reproduce this simply with,
>
> while [ 1 ] ;
> adjtimex -t 9999
> done
>
> And your patch fixed it.

Great! Can I add a Tested-by: from you on the patch?

thanks
-john

2013-09-11 00:37:57

by Ming Lin

[permalink] [raw]
Subject: Re: kernel deadlock

On Wed, Sep 11, 2013 at 12:38 AM, John Stultz <[email protected]> wrote:
> On 09/10/2013 01:59 AM, Lin Ming wrote:
>> On Tue, Sep 10, 2013 at 4:29 AM, John Stultz <[email protected]> wrote:
>>
>> [snip]
>>
>>> So I think I've managed to finally reproduce this and hunt it down.
>>>
>>> With Peter's "sched: Fix HRTICK" patch and HRTICK enabled, I found I
>>> could trigger a hard hang at boot on my x86_64 kvm system. sysrq didn't
>>> function, so I checked out info cpus and that pointed to both cpus being
>> Hi,
>>
>> Is "info cpus" a command of kvm/qemu? That's very helpful.
> Yes. If you switch into the qemu monitor, you can use "info cpus" to
> show the current instruction pointers
>
>> I can reproduce this bug, but there is no any output.
>> How did you find out that both cpus being in ktime_get() and
>> ktime_get_update_offsets().
>
> Once I had the instruction pointers, I ran gdb on the vmlinux and used
> "list *<address>" to show where it was stuck.
>
>>> in ktime_get() and ktime_get_update_offsets(), which suggested a
>>> seqcount deadlock (basically calling something that reads the seqlock
>>> while we hold the write on it).
>> HRTICK enabled, then I can reproduce this simply with,
>>
>> while [ 1 ] ;
>> adjtimex -t 9999
>> done
>>
>> And your patch fixed it.
>
> Great! Can I add a Tested-by: from you on the patch?

Sure.
Tested-by: Lin Ming <[email protected]>

Thanks.

2013-09-12 14:42:32

by Gerlando Falauto

[permalink] [raw]
Subject: Re: kernel deadlock

Hi Peter,

sorry for the slow response...

On 09/09/2013 12:08 PM, Peter Zijlstra wrote:
> On Tue, Sep 03, 2013 at 10:26:19AM -0700, John Stultz wrote:
>> Enabling the SCHED_FEAT(HRTICK, true) bit tends to cause lots of issues
>> on the various hardware I have, tripping the lockdep warnings on various
>> other issues:
>
> Does whatever kernel you guys are running have this commit:

That didn't seem to make a difference (it had already been pointed out
by Stephen Boyd).
In any case, latest patch from John fixes the problem... Hooray! :-)

Thanks again John!
Gerlando


>
> ---
> commit 971ee28cbd1ccd87b3164facd9359a534c1d2892
> Author: Peter Zijlstra <[email protected]>
> Date: Fri Jun 28 11:18:53 2013 +0200
>
> sched: Fix HRTICK
>
> David reported that the HRTICK sched feature was borken; which was enough
> motivation for me to finally fix it ;-)
>
> We should not allow hrtimer code to do softirq wakeups while holding scheduler
> locks. The hrtimer code only needs this when we accidentally try to program an
> expired time. We don't much care about those anyway since we have the regular
> tick to fall back to.
>
> Reported-by: David Ahern <[email protected]>
> Tested-by: David Ahern <[email protected]>
> Signed-off-by: Peter Zijlstra <[email protected]>
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Ingo Molnar <[email protected]>
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 9b1f2e5..0d8eb45 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -370,13 +370,6 @@ static struct rq *this_rq_lock(void)
> #ifdef CONFIG_SCHED_HRTICK
> /*
> * Use HR-timers to deliver accurate preemption points.
> - *
> - * Its all a bit involved since we cannot program an hrt while holding the
> - * rq->lock. So what we do is store a state in in rq->hrtick_* and ask for a
> - * reschedule event.
> - *
> - * When we get rescheduled we reprogram the hrtick_timer outside of the
> - * rq->lock.
> */
>
> static void hrtick_clear(struct rq *rq)
> @@ -404,6 +397,15 @@ static enum hrtimer_restart hrtick(struct hrtimer *timer)
> }
>
> #ifdef CONFIG_SMP
> +
> +static int __hrtick_restart(struct rq *rq)
> +{
> + struct hrtimer *timer = &rq->hrtick_timer;
> + ktime_t time = hrtimer_get_softexpires(timer);
> +
> + return __hrtimer_start_range_ns(timer, time, 0, HRTIMER_MODE_ABS_PINNED, 0);
> +}
> +
> /*
> * called from hardirq (IPI) context
> */
> @@ -412,7 +414,7 @@ static void __hrtick_start(void *arg)
> struct rq *rq = arg;
>
> raw_spin_lock(&rq->lock);
> - hrtimer_restart(&rq->hrtick_timer);
> + __hrtick_restart(rq);
> rq->hrtick_csd_pending = 0;
> raw_spin_unlock(&rq->lock);
> }
> @@ -430,7 +432,7 @@ void hrtick_start(struct rq *rq, u64 delay)
> hrtimer_set_expires(timer, time);
>
> if (rq == this_rq()) {
> - hrtimer_restart(timer);
> + __hrtick_restart(rq);
> } else if (!rq->hrtick_csd_pending) {
> __smp_call_function_single(cpu_of(rq), &rq->hrtick_csd, 0);
> rq->hrtick_csd_pending = 1;
>