2014-04-24 07:46:07

by Daniel Sangorrin

[permalink] [raw]
Subject: Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce

Dear Viresh Kumar,

I tried your set of patches for isolating particular CPU cores from unpinned
timers. On x86_64 they were working fine, however I found out that on ARM
they would fail under the following test:

# mount -t cpuset none /cpuset
# cd /cpuset
# mkdir rt
# cd rt
# echo 1 > cpus
# echo 1 > cpu_exclusive
# cd
# taskset 0x2 ./setquiesce.sh <--- contains "echo 1 > /cpuset/rt/quiesce"
[ 75.622375] ------------[ cut here ]------------
[ 75.627258] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:2595 __migrate_hrtimers+0x17c/0x1bc()
[ 75.636840] DEBUG_LOCKS_WARN_ON(current->hardirq_context)
[ 75.636840] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.0-rc1-37710-g23c8f02 #1
[ 75.649627] [<c0014d18>] (unwind_backtrace) from [<c00119e8>] (show_stack+0x10/0x14)
[ 75.649627] [<c00119e8>] (show_stack) from [<c065b61c>] (dump_stack+0x78/0x94)
[ 75.662689] [<c065b61c>] (dump_stack) from [<c003e9a4>] (warn_slowpath_common+0x60/0x84)
[ 75.670410] [<c003e9a4>] (warn_slowpath_common) from [<c003ea24>] (warn_slowpath_fmt+0x30/0x40)
[ 75.677673] [<c003ea24>] (warn_slowpath_fmt) from [<c005d7b0>] (__migrate_hrtimers+0x17c/0x1bc)
[ 75.677673] [<c005d7b0>] (__migrate_hrtimers) from [<c009e004>] (generic_smp_call_function_single_interrupt+0x8c/0x104)
[ 75.699645] [<c009e004>] (generic_smp_call_function_single_interrupt) from [<c00134d0>] (handle_IPI+0xa4/0x16c)
[ 75.706970] [<c00134d0>] (handle_IPI) from [<c0008614>] (gic_handle_irq+0x54/0x5c)
[ 75.715087] [<c0008614>] (gic_handle_irq) from [<c0012624>] (__irq_svc+0x44/0x5c)
[ 75.725311] Exception stack(0xc08a3f58 to 0xc08a3fa0)
....

I also backported your patches to Linux 3.10.y and found the same problem
both in ARM and x86_64. However, I think I figured out the reason for those
errors. Please, could you check the patch below (it applies on the top of
your tree, branch isolate-cpusets) and let me know what you think?

Thanks,
Daniel

-------------------------PATCH STARTS HERE---------------------------------
cpuset: quiesce: change irq disable/enable by irq save/restore

The function __migrate_timers can be called under interrupt context
or thread context depending on the core where the system call was
executed. In case it executes under interrupt context, it
seems a bad idea to leave interrupts enabled after migrating the
timers. In fact, this caused kernel errors on the ARM architecture and
on the x86_64 architecture with the 3.10 kernel (backported version
of the cpuset-quiesce patch).

Signed-off-by: Daniel Sangorrin <[email protected]>
Signed-off-by: Yoshitake Kobayashi <[email protected]>
---
kernel/hrtimer.c | 5 +++--
kernel/timer.c | 5 +++--
2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index e8cd1db..abb1707 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1701,8 +1701,9 @@ static void __migrate_hrtimers(int scpu, bool remove_pinned)
struct hrtimer_clock_base *clock_base;
unsigned int active_bases;
int i;
+ unsigned long flags;

- local_irq_disable();
+ local_irq_save(flags);
old_base = &per_cpu(hrtimer_bases, scpu);
new_base = &__get_cpu_var(hrtimer_bases);
/*
@@ -1722,7 +1723,7 @@ static void __migrate_hrtimers(int scpu, bool remove_pinned)

/* Check, if we got expired work to do */
__hrtimer_peek_ahead_timers();
- local_irq_enable();
+ local_irq_restore(flags);
}
#endif /* CONFIG_HOTPLUG_CPU || CONFIG_CPUSETS */

diff --git a/kernel/timer.c b/kernel/timer.c
index 4676a07..2715b7d 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1644,6 +1644,7 @@ static void __migrate_timers(int cpu, bool remove_pinned)
struct tvec_base *old_base;
struct tvec_base *new_base;
int i;
+ unsigned long flags;

old_base = per_cpu(tvec_bases, cpu);
new_base = get_cpu_var(tvec_bases);
@@ -1651,7 +1652,7 @@ static void __migrate_timers(int cpu, bool remove_pinned)
* The caller is globally serialized and nobody else
* takes two locks at once, deadlock is not possible.
*/
- spin_lock_irq(&new_base->lock);
+ spin_lock_irqsave(&new_base->lock, flags);
spin_lock_nested(&old_base->lock, SINGLE_DEPTH_NESTING);

BUG_ON(old_base->running_timer);
@@ -1671,7 +1672,7 @@ static void __migrate_timers(int cpu, bool remove_pinned)
}

spin_unlock(&old_base->lock);
- spin_unlock_irq(&new_base->lock);
+ spin_unlock_irqrestore(&new_base->lock, flags);
put_cpu_var(tvec_bases);
}
#endif /* CONFIG_HOTPLUG_CPU || CONFIG_CPUSETS */
--
1.8.1.1
-------------------------PATCH ENDS HERE---------------------------------

On 2014/03/20 22:48, Viresh Kumar wrote:
> For networking platforms we need to provide one isolated CPU per user space data
> plane thread. These CPUs should not be interrupted by kernel at all unless
> userspace has requested for some syscalls. And so must live in isolated mode.
> Currently, there are background kernel activities that are running on almost
> every CPU, like: timers/hrtimers/watchdogs/etc, and these are required to be
> migrated to other CPUs.
>
> This patchset tries to migrate un-pinned timers away in the first attempt. And
> support for migrating others like hrtimers/workqueues/etc would be added later.
>
> This has only went through basic testing currently on ARM Samsung Exynos board
> which only has two CPUs. Separate cpusets were created for these two CPUs and
> then timers were migrated from one cpuset to other.
>
> This option was earlier suggested by Peter Z. here.
>
> https://lkml.org/lkml/2014/1/15/186
>
> Please provide your inputs on how this can be improved..
>
> Viresh Kumar (4):
> timer: track pinned timers with TIMER_PINNED flag
> timer: don't migrate pinned timers
> timer: create timer_quiesce_cpu() for cpusets.quiesce option
> cpuset: Add cpusets.quiesce option
>
> include/linux/timer.h | 10 +++---
> kernel/cpuset.c | 56 +++++++++++++++++++++++++++++++
> kernel/timer.c | 92 +++++++++++++++++++++++++++++++++++++++++----------
> 3 files changed, 136 insertions(+), 22 deletions(-)
>

--
Toshiba Corporate Software Engineering Center
Daniel SANGORRIN
E-mail: [email protected]


2014-04-24 07:43:43

by Viresh Kumar

[permalink] [raw]
Subject: Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce

On 24 April 2014 12:55, Daniel Sangorrin <[email protected]> wrote:
> I tried your set of patches for isolating particular CPU cores from unpinned
> timers. On x86_64 they were working fine, however I found out that on ARM
> they would fail under the following test:

I am happy that these drew attention from somebody Atleast :)

> # mount -t cpuset none /cpuset
> # cd /cpuset
> # mkdir rt
> # cd rt
> # echo 1 > cpus
> # echo 1 > cpu_exclusive
> # cd
> # taskset 0x2 ./setquiesce.sh <--- contains "echo 1 > /cpuset/rt/quiesce"
> [ 75.622375] ------------[ cut here ]------------
> [ 75.627258] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:2595 __migrate_hrtimers+0x17c/0x1bc()
> [ 75.636840] DEBUG_LOCKS_WARN_ON(current->hardirq_context)
> [ 75.636840] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.0-rc1-37710-g23c8f02 #1
> [ 75.649627] [<c0014d18>] (unwind_backtrace) from [<c00119e8>] (show_stack+0x10/0x14)
> [ 75.649627] [<c00119e8>] (show_stack) from [<c065b61c>] (dump_stack+0x78/0x94)
> [ 75.662689] [<c065b61c>] (dump_stack) from [<c003e9a4>] (warn_slowpath_common+0x60/0x84)
> [ 75.670410] [<c003e9a4>] (warn_slowpath_common) from [<c003ea24>] (warn_slowpath_fmt+0x30/0x40)
> [ 75.677673] [<c003ea24>] (warn_slowpath_fmt) from [<c005d7b0>] (__migrate_hrtimers+0x17c/0x1bc)
> [ 75.677673] [<c005d7b0>] (__migrate_hrtimers) from [<c009e004>] (generic_smp_call_function_single_interrupt+0x8c/0x104)
> [ 75.699645] [<c009e004>] (generic_smp_call_function_single_interrupt) from [<c00134d0>] (handle_IPI+0xa4/0x16c)
> [ 75.706970] [<c00134d0>] (handle_IPI) from [<c0008614>] (gic_handle_irq+0x54/0x5c)
> [ 75.715087] [<c0008614>] (gic_handle_irq) from [<c0012624>] (__irq_svc+0x44/0x5c)
> [ 75.725311] Exception stack(0xc08a3f58 to 0xc08a3fa0)

I couldn't understand why we went via a interrupt here ? Probably CPU1
was idle and was woken up with a IPI and then this happened. But in
that case too,
shouldn't the script run from process context instead ?

> I also backported your patches to Linux 3.10.y and found the same problem
> both in ARM and x86_64.

There are very few changes in between 3.10 and latest for timers/hrtimers
and so things are expected to be the same.

> However, I think I figured out the reason for those
> errors. Please, could you check the patch below (it applies on the top of
> your tree, branch isolate-cpusets) and let me know what you think?

Okay, just to let you know, I have also found some issues and they are
now pushed in my tree.. Also it is rebased over 3.15-rc2 now.

> -------------------------PATCH STARTS HERE---------------------------------
> cpuset: quiesce: change irq disable/enable by irq save/restore
>
> The function __migrate_timers can be called under interrupt context
> or thread context depending on the core where the system call was
> executed. In case it executes under interrupt context, it

How exactly?

> seems a bad idea to leave interrupts enabled after migrating the
> timers. In fact, this caused kernel errors on the ARM architecture and
> on the x86_64 architecture with the 3.10 kernel (backported version
> of the cpuset-quiesce patch).

I can't keep it as a separate patch and so would be required to merge
it into my original patch..

Thanks for your inputs :)

2014-04-24 08:33:07

by Daniel Sangorrin

[permalink] [raw]
Subject: Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce


On 2014/04/24 16:43, Viresh Kumar wrote:
> On 24 April 2014 12:55, Daniel Sangorrin <[email protected]> wrote:
>> I tried your set of patches for isolating particular CPU cores from unpinned
>> timers. On x86_64 they were working fine, however I found out that on ARM
>> they would fail under the following test:
>
> I am happy that these drew attention from somebody Atleast :)

Thanks to you for your hard work.

>> # mount -t cpuset none /cpuset
>> # cd /cpuset
>> # mkdir rt
>> # cd rt
>> # echo 1 > cpus
>> # echo 1 > cpu_exclusive
>> # cd
>> # taskset 0x2 ./setquiesce.sh <--- contains "echo 1 > /cpuset/rt/quiesce"
>> [ 75.622375] ------------[ cut here ]------------
>> [ 75.627258] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:2595 __migrate_hrtimers+0x17c/0x1bc()
>> [ 75.636840] DEBUG_LOCKS_WARN_ON(current->hardirq_context)
>> [ 75.636840] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.0-rc1-37710-g23c8f02 #1
>> [ 75.649627] [<c0014d18>] (unwind_backtrace) from [<c00119e8>] (show_stack+0x10/0x14)
>> [ 75.649627] [<c00119e8>] (show_stack) from [<c065b61c>] (dump_stack+0x78/0x94)
>> [ 75.662689] [<c065b61c>] (dump_stack) from [<c003e9a4>] (warn_slowpath_common+0x60/0x84)
>> [ 75.670410] [<c003e9a4>] (warn_slowpath_common) from [<c003ea24>] (warn_slowpath_fmt+0x30/0x40)
>> [ 75.677673] [<c003ea24>] (warn_slowpath_fmt) from [<c005d7b0>] (__migrate_hrtimers+0x17c/0x1bc)
>> [ 75.677673] [<c005d7b0>] (__migrate_hrtimers) from [<c009e004>] (generic_smp_call_function_single_interrupt+0x8c/0x104)
>> [ 75.699645] [<c009e004>] (generic_smp_call_function_single_interrupt) from [<c00134d0>] (handle_IPI+0xa4/0x16c)
>> [ 75.706970] [<c00134d0>] (handle_IPI) from [<c0008614>] (gic_handle_irq+0x54/0x5c)
>> [ 75.715087] [<c0008614>] (gic_handle_irq) from [<c0012624>] (__irq_svc+0x44/0x5c)
>> [ 75.725311] Exception stack(0xc08a3f58 to 0xc08a3fa0)
>
> I couldn't understand why we went via a interrupt here ? Probably CPU1
> was idle and was woken up with a IPI and then this happened. But in
> that case too,
> shouldn't the script run from process context instead ?

In kernel/cpuset.c:quiesce_cpuset() you are using the function
'smp_call_function_any' which asks CPU cores in 'cpumask' to
execute the functions 'hrtimer_quiesce_cpu' and 'timer_quiesce_cpu'.

In the case above, 'cpumask' corresponds to core 0. Since I'm forcing
the call to be executed from core 1 (by using taskset),
an inter-processor interrupt is sent to core 0 for those functions
to be executed.

>> I also backported your patches to Linux 3.10.y and found the same problem
>> both in ARM and x86_64.
>
> There are very few changes in between 3.10 and latest for timers/hrtimers
> and so things are expected to be the same.
>
>> However, I think I figured out the reason for those
>> errors. Please, could you check the patch below (it applies on the top of
>> your tree, branch isolate-cpusets) and let me know what you think?
>
> Okay, just to let you know, I have also found some issues and they are
> now pushed in my tree.. Also it is rebased over 3.15-rc2 now.

Ok, thank you! I see that you have already fixed the problem. I tested
your tree on ARM and now it seems to work correctly.



>
>> -------------------------PATCH STARTS HERE---------------------------------
>> cpuset: quiesce: change irq disable/enable by irq save/restore
>>
>> The function __migrate_timers can be called under interrupt context
>> or thread context depending on the core where the system call was
>> executed. In case it executes under interrupt context, it
>
> How exactly?

See my reply above.

>> seems a bad idea to leave interrupts enabled after migrating the
>> timers. In fact, this caused kernel errors on the ARM architecture and
>> on the x86_64 architecture with the 3.10 kernel (backported version
>> of the cpuset-quiesce patch).
>
> I can't keep it as a separate patch and so would be required to merge
> it into my original patch..
>
> Thanks for your inputs :)
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

Thanks,
Daniel

--
Toshiba Corporate Software Engineering Center
Daniel SANGORRIN
E-mail: [email protected]

2014-04-24 08:41:57

by Viresh Kumar

[permalink] [raw]
Subject: Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce

On 24 April 2014 14:01, Daniel Sangorrin <[email protected]> wrote:
> In kernel/cpuset.c:quiesce_cpuset() you are using the function
> 'smp_call_function_any' which asks CPU cores in 'cpumask' to
> execute the functions 'hrtimer_quiesce_cpu' and 'timer_quiesce_cpu'.
>
> In the case above, 'cpumask' corresponds to core 0. Since I'm forcing
> the call to be executed from core 1 (by using taskset),
> an inter-processor interrupt is sent to core 0 for those functions
> to be executed.

Ahh, I understood that now :) .. So we are setting cpuset.quiesce from CPU1
which will do a IPI to get migrate_timers called on CPU0.. I was setting quiesce
from CPU0 only in my tests :)

But how does this work fine on x86 then? There we should have exactly same
problem, isn't it?

> Ok, thank you! I see that you have already fixed the problem. I tested
> your tree on ARM and now it seems to work correctly.

Yeah, I just pushed your changes as well at the time I wrote last mail :)

2014-04-24 09:25:36

by Daniel Sangorrin

[permalink] [raw]
Subject: Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce

On 2014/04/24 17:41, Viresh Kumar wrote:
> On 24 April 2014 14:01, Daniel Sangorrin <[email protected]> wrote:
>> In kernel/cpuset.c:quiesce_cpuset() you are using the function
>> 'smp_call_function_any' which asks CPU cores in 'cpumask' to
>> execute the functions 'hrtimer_quiesce_cpu' and 'timer_quiesce_cpu'.
>>
>> In the case above, 'cpumask' corresponds to core 0. Since I'm forcing
>> the call to be executed from core 1 (by using taskset),
>> an inter-processor interrupt is sent to core 0 for those functions
>> to be executed.
>
> Ahh, I understood that now :) .. So we are setting cpuset.quiesce from CPU1
> which will do a IPI to get migrate_timers called on CPU0.. I was setting quiesce
> from CPU0 only in my tests :)
>
> But how does this work fine on x86 then? There we should have exactly same
> problem, isn't it?

Yeah, I'm not sure why it is working on 3.15 x86_64 but not in 3.10 x86_64.
Perhaps it's related to this patch: https://lkml.org/lkml/2013/9/19/349

>> Ok, thank you! I see that you have already fixed the problem. I tested
>> your tree on ARM and now it seems to work correctly.
>
> Yeah, I just pushed your changes as well at the time I wrote last mail :)

Oh, I see!

Why didn't you just apply the patch on top of your tree so that the
information included in the git commit (e.g: my name and mail) remains?

This part:

cpuset: quiesce: change irq disable/enable by irq save/restore

The function __migrate_timers can be called under interrupt context
or thread context depending on the core where the system call was
executed. In case it executes under interrupt context, it
seems a bad idea to leave interrupts enabled after migrating the
timers. In fact, this caused kernel errors on the ARM architecture and
on the x86_64 architecture with the 3.10 kernel (backported version
of the cpuset-quiesce patch).

Signed-off-by: Daniel Sangorrin <[email protected]>
Signed-off-by: Yoshitake Kobayashi <[email protected]>

Thanks,
Daniel

--
Toshiba Corporate Software Engineering Center
Daniel SANGORRIN
E-mail: [email protected]

2014-04-24 09:30:41

by Viresh Kumar

[permalink] [raw]
Subject: Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce

On 24 April 2014 14:54, Daniel Sangorrin <[email protected]> wrote:
> Why didn't you just apply the patch on top of your tree so that the
> information included in the git commit (e.g: my name and mail) remains?
>
> This part:
>
> cpuset: quiesce: change irq disable/enable by irq save/restore
>
> The function __migrate_timers can be called under interrupt context
> or thread context depending on the core where the system call was
> executed. In case it executes under interrupt context, it
> seems a bad idea to leave interrupts enabled after migrating the
> timers. In fact, this caused kernel errors on the ARM architecture and
> on the x86_64 architecture with the 3.10 kernel (backported version
> of the cpuset-quiesce patch).
>
> Signed-off-by: Daniel Sangorrin <[email protected]>
> Signed-off-by: Yoshitake Kobayashi <[email protected]>

That's what I told you earlier when I said this:

> I can't keep it as a separate patch and so would be required to merge
> it into my original patch..

And the reason being: "No patch is supposed to break things, otherwise
git bisect wouldn't work smoothly".. And so git bisect would complain
this issue after my patch and so I have to merge the fixes you gave into
the original patch as its not yet merged.

2014-04-25 00:31:59

by Daniel Sangorrin

[permalink] [raw]
Subject: Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce


>> I can't keep it as a separate patch and so would be required to merge
>> it into my original patch..
>
> And the reason being: "No patch is supposed to break things, otherwise
> git bisect wouldn't work smoothly".. And so git bisect would complain
> this issue after my patch and so I have to merge the fixes you gave into
> the original patch as its not yet merged.
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

Ok, no problem then.

Thanks,
Daniel

--
Toshiba Corporate Software Engineering Center
Daniel SANGORRIN
E-mail: [email protected]

2014-04-25 04:51:08

by Viresh Kumar

[permalink] [raw]
Subject: Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce

On 25 April 2014 06:01, Daniel Sangorrin <[email protected]> wrote:
>
>>> I can't keep it as a separate patch and so would be required to merge
>>> it into my original patch..
>>
>> And the reason being: "No patch is supposed to break things, otherwise
>> git bisect wouldn't work smoothly".. And so git bisect would complain
>> this issue after my patch and so I have to merge the fixes you gave into
>> the original patch as its not yet merged.
>> --
>> To unsubscribe from this list: send the line "unsubscribe cgroups" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> Ok, no problem then.

Do you want me to add your Tested-by for my next version ?

2014-04-25 05:21:58

by Daniel Sangorrin

[permalink] [raw]
Subject: Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce



On 2014/04/25 13:51, Viresh Kumar wrote:
> On 25 April 2014 06:01, Daniel Sangorrin <[email protected]> wrote:
>>
>>>> I can't keep it as a separate patch and so would be required to merge
>>>> it into my original patch..
>>>
>>> And the reason being: "No patch is supposed to break things, otherwise
>>> git bisect wouldn't work smoothly".. And so git bisect would complain
>>> this issue after my patch and so I have to merge the fixes you gave into
>>> the original patch as its not yet merged.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe cgroups" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> Ok, no problem then.
>
> Do you want me to add your Tested-by for my next version ?
>
>

Sure.

Tested-by: Daniel Sangorrin <[email protected]>

--
Toshiba Corporate Software Engineering Center
Daniel SANGORRIN
E-mail: [email protected]