2014-06-03 10:27:13

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v6 2/2] arm64: enable context tracking

Hi guys,

On Fri, May 30, 2014 at 08:08:38PM +0100, Kevin Hilman wrote:
> Will Deacon <[email protected]> writes:
> > I'd like to give these some stress testing before it gets merged, so I'm
> > not sure if it'll make it for 3.16 given where we are at the moment.
>
> FWIW, this feature is disabled by default. I use the following kconfig
> fragment to enable the various parts I use for testing:
>
> CONFIG_NO_HZ=y
> CONFIG_NO_HZ_FULL=y
> CONFIG_NO_HZ_FULL_ALL=y
> CONFIG_NO_HZ_FULL_SYSIDLE=y
>
> # default to power-efficient workqueues (which are then set to unbound)
> CONFIG_WQ_POWER_EFFICIENT_DEFAULT=y
>
> # lockup detector sets a 4s timer on every CPU, which wakes CPUs
> # from idle. (alternately, can be controlled via procfs,
> # e.g: echo 0 > /proc/sys/kernel/watchdog)
> #CONFIG_LOCKUP_DETECTOR=n

I had a go with this, but I couldn't seem to trigger any context tracking
without forcing CONFIG_CONTEXT_TRACKING_FORCE=y. Does that mean we're
missing something else?

Anyway, with that forced on, I see the following during boot:


------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:418 rcu_eqs_enter+0x84/0xa4()
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.15.0-rc8+ #5
Call trace:
[<ffffffc000088048>] dump_backtrace+0x0/0x130
[<ffffffc000088188>] show_stack+0x10/0x1c
[<ffffffc0004891a0>] dump_stack+0x74/0xbc
[<ffffffc0000a45e0>] warn_slowpath_common+0x8c/0xb4
[<ffffffc0000a46cc>] warn_slowpath_null+0x14/0x20
[<ffffffc0000efc14>] rcu_eqs_enter+0x80/0xa4
[<ffffffc0000efc58>] rcu_idle_enter+0x20/0x50
[<ffffffc0000dd314>] cpu_startup_entry+0x118/0x184
[<ffffffc0004865ec>] rest_init+0x7c/0x88
[<ffffffc000609800>] start_kernel+0x368/0x37c
---[ end trace c17313e162496e65 ]---
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:541 rcu_eqs_exit+0xb0/0xbc()
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.15.0-rc8+ #5
Call trace:
[<ffffffc000088048>] dump_backtrace+0x0/0x130
[<ffffffc000088188>] show_stack+0x10/0x1c
[<ffffffc0004891a0>] dump_stack+0x74/0xbc
[<ffffffc0000a45e0>] warn_slowpath_common+0x8c/0xb4
[<ffffffc0000a46cc>] warn_slowpath_null+0x14/0x20
[<ffffffc0000ed384>] rcu_eqs_exit+0xac/0xbc
[<ffffffc0000efdac>] rcu_user_exit+0xc/0x18
[<ffffffc00011d48c>] context_tracking_user_exit+0xc4/0xd4
[<ffffffc000083d98>] el1_irq+0x58/0xd4
[<ffffffc0000dd318>] cpu_startup_entry+0x11c/0x184
[<ffffffc0004865ec>] rest_init+0x7c/0x88
[<ffffffc000609800>] start_kernel+0x368/0x37c
---[ end trace c17313e162496e66 ]---


Can you take a look please? I had to fix up some conflicts to apply your
patches against our for-next branch, so I've put a branch here for you to
look at:

git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git aarch64/context-tracking

Cheers,

Will


2014-06-03 17:34:40

by Kevin Hilman

[permalink] [raw]
Subject: Re: [PATCH v6 2/2] arm64: enable context tracking

Will Deacon <[email protected]> writes:

> Hi guys,
>
> On Fri, May 30, 2014 at 08:08:38PM +0100, Kevin Hilman wrote:
>> Will Deacon <[email protected]> writes:
>> > I'd like to give these some stress testing before it gets merged, so I'm
>> > not sure if it'll make it for 3.16 given where we are at the moment.
>>
>> FWIW, this feature is disabled by default. I use the following kconfig
>> fragment to enable the various parts I use for testing:
>>
>> CONFIG_NO_HZ=y
>> CONFIG_NO_HZ_FULL=y
>> CONFIG_NO_HZ_FULL_ALL=y
>> CONFIG_NO_HZ_FULL_SYSIDLE=y
>>
>> # default to power-efficient workqueues (which are then set to unbound)
>> CONFIG_WQ_POWER_EFFICIENT_DEFAULT=y
>>
>> # lockup detector sets a 4s timer on every CPU, which wakes CPUs
>> # from idle. (alternately, can be controlled via procfs,
>> # e.g: echo 0 > /proc/sys/kernel/watchdog)
>> #CONFIG_LOCKUP_DETECTOR=n
>
> I had a go with this, but I couldn't seem to trigger any context tracking
> without forcing CONFIG_CONTEXT_TRACKING_FORCE=y. Does that mean we're
> missing something else?

No, it just means that you never hit the conditions to trigger full
NOHZ. Using _FORCE is a good way to do that since it forces the context
tracking paths whether or not it's actually needed by full NOHZ.

> Anyway, with that forced on, I see the following during boot:
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:418 rcu_eqs_enter+0x84/0xa4()
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.15.0-rc8+ #5
> Call trace:
> [<ffffffc000088048>] dump_backtrace+0x0/0x130
> [<ffffffc000088188>] show_stack+0x10/0x1c
> [<ffffffc0004891a0>] dump_stack+0x74/0xbc
> [<ffffffc0000a45e0>] warn_slowpath_common+0x8c/0xb4
> [<ffffffc0000a46cc>] warn_slowpath_null+0x14/0x20
> [<ffffffc0000efc14>] rcu_eqs_enter+0x80/0xa4
> [<ffffffc0000efc58>] rcu_idle_enter+0x20/0x50
> [<ffffffc0000dd314>] cpu_startup_entry+0x118/0x184
> [<ffffffc0004865ec>] rest_init+0x7c/0x88
> [<ffffffc000609800>] start_kernel+0x368/0x37c
> ---[ end trace c17313e162496e65 ]---

So this suggests that we've told RCU that we've entered userspace twice,
without having left (the context tracker is an extention of the RCU
extended quiscent state machinery.)

So after I was able to reproduce this (after some IRC discussion with
Will, and using full ubuntu rootfs and CONFIG_CONTEXT_TRACKING_FORCE=y)
I think I found the bug.

Basically, the problem is that we have a ct_user_exit in el1_irq
(interrupt in kernel space) when it should be in el0_irq (interrupt in
user space.)

Moving the ct_user_exit into el0_irq, I'm not able to see the problem.

Larry, could you sanity check that and respin a v8 with that change if
it works for you?

Thanks,

Kevin