Subject: [RFC PATCH] sched/cputime: drop local_irq_safe() in vtime_init_idle()

A while ago vtime_init_idle() used to invoke sched_clock_cpu() and had
write_seqlock_irqsave() because sched_clock_cpu() required to be called
with interrupts off.
This requirement is written before the body of the function and was
introduced in 2010 via c676329abb2b ("sched_clock: Add local_clock() API
and improve documentation"). This requirement has been dropped in 2013
via ef08f0fff876 ("sched/clock: Remove local_irq_disable() from the clocks")
but the body of the function still has the comment.

Now, vtime_init_idle() was converted from seqlock to seqcount via
b7ce2277f087 ("sched/cputime: Convert vtime_seqlock to seqcount") and
while doing so the IRQ-off region around sched_clock_cpu() was preserved
while not strictly required (given the first part of this commit).
A little later, sched_clock_cpu() was replaced with jiffies via
ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy
granularity").
Based on this events I assume it is safe to drop the local_irq_safe()
section.

Cc: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
kernel/sched/cputime.c | 4 ----
1 file changed, 4 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 75f98c5498d5..816cd4ef5873 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -785,14 +785,10 @@ void arch_vtime_task_switch(struct task_struct *prev)

void vtime_init_idle(struct task_struct *t, int cpu)
{
- unsigned long flags;
-
- local_irq_save(flags);
write_seqcount_begin(&t->vtime_seqcount);
t->vtime_snap_whence = VTIME_SYS;
t->vtime_snap = jiffies;
write_seqcount_end(&t->vtime_seqcount);
- local_irq_restore(flags);
}

cputime_t task_gtime(struct task_struct *t)
--
2.8.0.rc3


2016-04-14 13:07:31

by Rik van Riel

[permalink] [raw]
Subject: Re: [RFC PATCH] sched/cputime: drop local_irq_safe() in vtime_init_idle()

On Thu, 2016-04-14 at 14:29 +0200, Sebastian Andrzej Siewior wrote:
> A while ago vtime_init_idle() used to invoke sched_clock_cpu() and
> had
> write_seqlock_irqsave() because sched_clock_cpu() required to be
> called
> with interrupts off.
> This requirement is written before the body of the function and was
> introduced in 2010 via c676329abb2b ("sched_clock: Add local_clock()
> API
> and improve documentation"). This requirement has been dropped in
> 2013
> via ef08f0fff876 ("sched/clock: Remove local_irq_disable() from the
> clocks")
> but the body of the function still has the comment.
>
> Now, vtime_init_idle() was converted from seqlock to seqcount via
> b7ce2277f087 ("sched/cputime: Convert vtime_seqlock to seqcount") and
> while doing so the IRQ-off region around sched_clock_cpu() was
> preserved
> while not strictly required (given the first part of this commit).
> A little later, sched_clock_cpu() was replaced with jiffies via
> ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy
> granularity").
> Based on this events I assume it is safe to drop the local_irq_safe()
> section.
>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Thomas Gleixner <[email protected]
> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
>

Acked-by: Rik van Riel <[email protected]>

--
All Rights Reversed.


Attachments:
signature.asc (473.00 B)
This is a digitally signed message part

2016-04-14 13:18:39

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [RFC PATCH] sched/cputime: drop local_irq_safe() in vtime_init_idle()

On Thu, Apr 14, 2016 at 02:29:14PM +0200, Sebastian Andrzej Siewior wrote:
> A while ago vtime_init_idle() used to invoke sched_clock_cpu() and had
> write_seqlock_irqsave() because sched_clock_cpu() required to be called
> with interrupts off.
> This requirement is written before the body of the function and was
> introduced in 2010 via c676329abb2b ("sched_clock: Add local_clock() API
> and improve documentation"). This requirement has been dropped in 2013
> via ef08f0fff876 ("sched/clock: Remove local_irq_disable() from the clocks")
> but the body of the function still has the comment.
>
> Now, vtime_init_idle() was converted from seqlock to seqcount via
> b7ce2277f087 ("sched/cputime: Convert vtime_seqlock to seqcount") and
> while doing so the IRQ-off region around sched_clock_cpu() was preserved
> while not strictly required (given the first part of this commit).
> A little later, sched_clock_cpu() was replaced with jiffies via
> ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy
> granularity").
> Based on this events I assume it is safe to drop the local_irq_safe()
> section.

I think I tried this but I got lockdep warnings because the other updaters
happen on IRQ.

In practive we can't be interrupted since the idle task hasn't even started.

Now we can probably get rid of the write_seqcount_stuff here because what
is initialized there is supposed to be visible by the idle task once it runs.

But I'm more worried about readers.

>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Thomas Gleixner <[email protected]
> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
> ---
> kernel/sched/cputime.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 75f98c5498d5..816cd4ef5873 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -785,14 +785,10 @@ void arch_vtime_task_switch(struct task_struct *prev)
>
> void vtime_init_idle(struct task_struct *t, int cpu)
> {
> - unsigned long flags;
> -
> - local_irq_save(flags);
> write_seqcount_begin(&t->vtime_seqcount);
> t->vtime_snap_whence = VTIME_SYS;
> t->vtime_snap = jiffies;
> write_seqcount_end(&t->vtime_seqcount);
> - local_irq_restore(flags);
> }
>
> cputime_t task_gtime(struct task_struct *t)
> --
> 2.8.0.rc3
>

Subject: Re: [RFC PATCH] sched/cputime: drop local_irq_safe() in vtime_init_idle()

On 04/14/2016 03:18 PM, Frederic Weisbecker wrote:
> I think I tried this but I got lockdep warnings because the other updaters
> happen on IRQ.

I booted and played a little but lockdep didn't scream here so far.

> In practive we can't be interrupted since the idle task hasn't even started.
>
> Now we can probably get rid of the write_seqcount_stuff here because what
> is initialized there is supposed to be visible by the idle task once it runs.
>
> But I'm more worried about readers.

Most idle_thread_get() callers happen at boottime. Then I see two
during CPU-up (cpu-hotplug) at runtime. One from _cpu_up() via
idle_thread_get() and the second via bringup_cpu() and the second in
bringup_cpu().

Sebastian

2016-04-14 14:24:11

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [RFC PATCH] sched/cputime: drop local_irq_safe() in vtime_init_idle()

On Thu, Apr 14, 2016 at 03:41:17PM +0200, Sebastian Andrzej Siewior wrote:
> On 04/14/2016 03:18 PM, Frederic Weisbecker wrote:
> > I think I tried this but I got lockdep warnings because the other updaters
> > happen on IRQ.
>
> I booted and played a little but lockdep didn't scream here so far.

You need to select CONFIG_NO_HZ_FULL and select some CPU in nohz_full=
in order to trigger it:

[ 0.759869] =================================
[ 0.764220] [ INFO: inconsistent lock state ]
[ 0.768571] 4.6.0-rc2+ #249 Not tainted
[ 0.772404] ---------------------------------
[ 0.776755] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[ 0.782751] cpuhp/7/49 [HC1[1]:SC0[0]:HE0:SE1] takes:
[ 0.787795] (&p->vtime_seqcount){?.+...}, at: [<ffffffff810ad0bb>] irq_exit+0xeb/0x110
[ 0.795867] {HARDIRQ-ON-W} state was registered at:
[ 0.800737] [<ffffffff810fab45>] __lock_acquire+0x895/0x1d00
[ 0.806579] [<ffffffff810fc8ca>] lock_acquire+0xca/0x230
[ 0.812075] [<ffffffff810df0a0>] vtime_init_idle+0x40/0x80
[ 0.817742] [<ffffffff810daf9e>] init_idle+0x2ee/0x380
[ 0.823063] [<ffffffff810a495e>] fork_idle+0xae/0xc0
[ 0.828213] [<ffffffff81f7a0f4>] idle_threads_init+0x6b/0xa4
[ 0.834054] [<ffffffff81f7e1c5>] smp_init+0xf/0x82
[ 0.839028] [<ffffffff81f5b03c>] kernel_init_freeable+0x9b/0x1ab
[ 0.845217] [<ffffffff8193c6be>] kernel_init+0xe/0x100
[ 0.850541] [<ffffffff8194b4f2>] ret_from_fork+0x22/0x50
[ 0.856034] irq event stamp: 28
[ 0.859173] hardirqs last enabled at (27): [<ffffffff8194a9ff>] _raw_spin_unlock_irqrestore+0x5f/0x80
[ 0.868482] hardirqs last disabled at (28): [<ffff88021652bcc8>] 0xffff88021652bcc8
[ 0.876143] softirqs last enabled at (0): [<ffffffff810a2f76>] copy_process.part.38+0x336/0x1c40
[ 0.885020] softirqs last disabled at (0): [< (null)>] (null)
[ 0.892420]
[ 0.892420] other info that might help us debug this:
[ 0.898940] Possible unsafe locking scenario:
[ 0.898940]
[ 0.904849] CPU0
[ 0.907294] ----
[ 0.909737] lock(&p->vtime_seqcount);
[ 0.913605] <Interrupt>
[ 0.916222] lock(&p->vtime_seqcount);
[ 0.920261]
[ 0.920261] *** DEADLOCK ***
[ 0.920261]
[ 0.926173] 1 lock held by cpuhp/7/49:
[ 0.929915] #0: (smpboot_threads_lock){+.+.+.}, at: [<ffffffff810cec9d>] smpboot_unpark_threads+0x1d/0x90
[ 0.939735]
[ 0.939735] stack backtrace:
[ 0.944087] CPU: 7 PID: 49 Comm: cpuhp/7 Not tainted 4.6.0-rc2+ #249
[ 0.950433] Hardware name: MSI MS-7850/Z87-G41 PC Mate(MS-7850), BIOS V1.3 08/18/2013
[ 0.958250] 0000000000000000 ffff88021fbc3d80 ffffffff8141d403 ffff8802165243c0
[ 0.965704] ffffffff827eebd0 ffff88021fbc3dd0 ffffffff811a1743 0000000000000000
[ 0.973159] 0000000000000000 0000000000000001 0000000000000000 ffff880216524bc0
[ 0.980613] Call Trace:
[ 0.983056] <IRQ> [<ffffffff8141d403>] dump_stack+0x85/0xc2
[ 0.988813] [<ffffffff811a1743>] print_usage_bug+0x1f3/0x208
[ 0.994549] [<ffffffff810f9c69>] mark_lock+0x649/0x6c0
[ 0.999768] [<ffffffff810f8e50>] ? check_usage_backwards+0x140/0x140
[ 1.006199] [<ffffffff810fad2b>] __lock_acquire+0xa7b/0x1d00
[ 1.011938] [<ffffffff810fc8ca>] lock_acquire+0xca/0x230
[ 1.017328] [<ffffffff810ad0bb>] ? irq_exit+0xeb/0x110
[ 1.022548] [<ffffffff810ded00>] vtime_gen_account_irq_exit+0x40/0xb0
[ 1.029072] [<ffffffff810ad0bb>] ? irq_exit+0xeb/0x110
[ 1.034292] [<ffffffff810ad0bb>] irq_exit+0xeb/0x110
[ 1.039335] [<ffffffff8103b303>] smp_irq_work_interrupt+0x33/0x40
[ 1.045506] [<ffffffff8194cd21>] irq_work_interrupt+0x91/0xa0
[ 1.051330] <EOI> [<ffffffff8194aa04>] ? _raw_spin_unlock_irqrestore+0x64/0x80
[ 1.058734] [<ffffffff810d69b9>] try_to_wake_up+0x49/0x630
[ 1.064299] [<ffffffff8194a9ea>] ? _raw_spin_unlock_irqrestore+0x4a/0x80
[ 1.071084] [<ffffffff810cec80>] ? smpboot_create_threads+0x90/0x90
[ 1.077429] [<ffffffff810d7ea0>] wake_up_state+0x10/0x20
[ 1.082819] [<ffffffff810caf47>] __kthread_unpark+0x67/0x70
[ 1.088472] [<ffffffff810caf70>] kthread_unpark+0x20/0x30
[ 1.093958] [<ffffffff810cecea>] smpboot_unpark_threads+0x6a/0x90
[ 1.100128] [<ffffffff810a68f9>] cpuhp_invoke_callback+0x69/0x2e0
[ 1.106302] [<ffffffff810a6bb3>] cpuhp_up_callbacks+0x43/0xb0
[ 1.112133] [<ffffffff810a6dba>] cpuhp_thread_fun+0xda/0xf0
[ 1.117784] [<ffffffff810ce644>] smpboot_thread_fn+0x1e4/0x300
[ 1.123696] [<ffffffff810ce460>] ? sort_range+0x30/0x30
[ 1.128999] [<ffffffff810cabfa>] kthread+0x10a/0x120
[ 1.134044] [<ffffffff8194b4f2>] ret_from_fork+0x22/0x50
[ 1.139434] [<ffffffff810caaf0>] ? kthread_create_on_node+0x200/0x200


>
> > In practive we can't be interrupted since the idle task hasn't even started.
> >
> > Now we can probably get rid of the write_seqcount_stuff here because what
> > is initialized there is supposed to be visible by the idle task once it runs.
> >
> > But I'm more worried about readers.
>
> Most idle_thread_get() callers happen at boottime. Then I see two
> during CPU-up (cpu-hotplug) at runtime. One from _cpu_up() via
> idle_thread_get() and the second via bringup_cpu() and the second in
> bringup_cpu()

Right, it's on CPU bootup/hotplug. I'm just worried that, if we erase that lock on
CPU (re-)initialization, the readers (ie: task_cputime()) may read unlucky sequences.

Now probably there is not much opportunity to call task_cputime() on secondary idle
tasks, they don't seem to be in /proc but perhaps they are accessible elsewhere.

I'd rather be paranoid and keep the code that way until we have the final
proof that it's not dangerous.

That said we could skip this initialization based on context tracking/vtime static key,
using vtime_accounting_enabled().

>
> Sebastian