2021-11-14 13:31:36

by Thomas Gleixner

[permalink] [raw]
Subject: [GIT pull] timers/urgent for v5.16-rc1

Linus,

please pull the latest timers/urgent branch from:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-2021-11-14

up to: ca7752caeaa7: posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()

A single fix for POSIX CPU timers to address a problem where POSIX CPU
timer delivery stops working for a new child task because copy_process()
copies state information which is only valid for the parent task.

Thanks,

tglx

------------------>
Michael Pratt (1):
posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()


include/linux/posix-timers.h | 2 ++
kernel/fork.c | 1 +
kernel/time/posix-cpu-timers.c | 19 +++++++++++++++++--
3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 00fef0064355..5bbcd280bfd2 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -184,8 +184,10 @@ static inline void posix_cputimers_group_init(struct posix_cputimers *pct,
#endif

#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
+void clear_posix_cputimers_work(struct task_struct *p);
void posix_cputimers_init_work(void);
#else
+static inline void clear_posix_cputimers_work(struct task_struct *p) { }
static inline void posix_cputimers_init_work(void) { }
#endif

diff --git a/kernel/fork.c b/kernel/fork.c
index 8e9feeef555e..8269ae2e5d7c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2279,6 +2279,7 @@ static __latent_entropy struct task_struct *copy_process(
p->pdeath_signal = 0;
INIT_LIST_HEAD(&p->thread_group);
p->task_works = NULL;
+ clear_posix_cputimers_work(p);

#ifdef CONFIG_KRETPROBES
p->kretprobe_instances.first = NULL;
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 643d412ac623..96b4e7810426 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1158,14 +1158,29 @@ static void posix_cpu_timers_work(struct callback_head *work)
handle_posix_cpu_timers(current);
}

+/*
+ * Clear existing posix CPU timers task work.
+ */
+void clear_posix_cputimers_work(struct task_struct *p)
+{
+ /*
+ * A copied work entry from the old task is not meaningful, clear it.
+ * N.B. init_task_work will not do this.
+ */
+ memset(&p->posix_cputimers_work.work, 0,
+ sizeof(p->posix_cputimers_work.work));
+ init_task_work(&p->posix_cputimers_work.work,
+ posix_cpu_timers_work);
+ p->posix_cputimers_work.scheduled = false;
+}
+
/*
* Initialize posix CPU timers task work in init task. Out of line to
* keep the callback static and to avoid header recursion hell.
*/
void __init posix_cputimers_init_work(void)
{
- init_task_work(&current->posix_cputimers_work.work,
- posix_cpu_timers_work);
+ clear_posix_cputimers_work(current);
}

/*



2021-11-14 19:03:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT pull] timers/urgent for v5.16-rc1

On Sun, Nov 14, 2021 at 5:31 AM Thomas Gleixner <[email protected]> wrote:
>
> + /*
> + * A copied work entry from the old task is not meaningful, clear it.
> + * N.B. init_task_work will not do this.
> + */
> + memset(&p->posix_cputimers_work.work, 0,
> + sizeof(p->posix_cputimers_work.work));
> + init_task_work(&p->posix_cputimers_work.work,
> + posix_cpu_timers_work);

Ugh.

Instead of the added four lines of comment, and two lines of
"memset()", maybe this should just have made init_task_work() DTRT?

Yes,. I see this:

/* Protect against double add, see task_tick_numa and task_numa_work */
p->numa_work.next = &p->numa_work;
...
init_task_work(&p->numa_work, task_numa_work);

but I think that one is so subtle and such a special case that it
should have been updated - just make that magic special flag happen
after the init_task_work.

A lot of the other cases seem to zero-initialize things elsewhere
(generally with kzalloc()), but I note that at least
io_ring_exit_work() seems to have this:

struct io_tctx_exit exit;
...
init_task_work(&exit.task_work, io_tctx_exit_cb);

and the ->next pointer is never set to NULL.

Now, in 99% of all cases the ->next pointer simply doesn't matter,
because task_work_add() will only set it, not caring about the old
value.

But apparently it matters for posix_cputimers_work and for numa_work,
and so I think it's very illogical that init_task_work() will not
actually initialize it properly.

Hmm?

I've pulled this, but it really looks like the wrong solution to the
whole "uninitialized data".

And that task_tick_numa() special case is truly horrendous, and really
should go after the init_task_work() regardless, exactly because you'd
expect that init_task_work() to initialize the work even if it doesn't
happen to right now.

Or is somebody doing init_task_work() to only change the work-function
on an already initialized work entry? Becuase that sounds both racy
and broken to me, and none of the things I looked at from a quick grep
looked like that at all.

Linus

2021-11-14 19:11:15

by pr-tracker-bot

[permalink] [raw]
Subject: Re: [GIT pull] timers/urgent for v5.16-rc1

The pull request you sent on Sun, 14 Nov 2021 14:30:59 +0100 (CET):

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-2021-11-14

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/622c72b651c85cb55bae147debc1a2fae0189b53

Thank you!

--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

2021-11-14 19:24:42

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [GIT pull] timers/urgent for v5.16-rc1

On Sun, Nov 14 2021 at 11:02, Linus Torvalds wrote:
> On Sun, Nov 14, 2021 at 5:31 AM Thomas Gleixner <[email protected]> wrote:
>>
>> + /*
>> + * A copied work entry from the old task is not meaningful, clear it.
>> + * N.B. init_task_work will not do this.
>> + */
>> + memset(&p->posix_cputimers_work.work, 0,
>> + sizeof(p->posix_cputimers_work.work));
>> + init_task_work(&p->posix_cputimers_work.work,
>> + posix_cpu_timers_work);
>
> Ugh.
>
> Instead of the added four lines of comment, and two lines of
> "memset()", maybe this should just have made init_task_work() DTRT?
>
> Yes,. I see this:
>
> /* Protect against double add, see task_tick_numa and task_numa_work */
> p->numa_work.next = &p->numa_work;
> ...
> init_task_work(&p->numa_work, task_numa_work);
>
> but I think that one is so subtle and such a special case that it
> should have been updated - just make that magic special flag happen
> after the init_task_work.
>
> A lot of the other cases seem to zero-initialize things elsewhere
> (generally with kzalloc()), but I note that at least
> io_ring_exit_work() seems to have this:
>
> struct io_tctx_exit exit;
> ...
> init_task_work(&exit.task_work, io_tctx_exit_cb);
>
> and the ->next pointer is never set to NULL.
>
> Now, in 99% of all cases the ->next pointer simply doesn't matter,
> because task_work_add() will only set it, not caring about the old
> value.
>
> But apparently it matters for posix_cputimers_work and for numa_work,
> and so I think it's very illogical that init_task_work() will not
> actually initialize it properly.
>
> Hmm?
>
> I've pulled this, but it really looks like the wrong solution to the
> whole "uninitialized data".
>
> And that task_tick_numa() special case is truly horrendous, and really
> should go after the init_task_work() regardless, exactly because you'd
> expect that init_task_work() to initialize the work even if it doesn't
> happen to right now.
>
> Or is somebody doing init_task_work() to only change the work-function
> on an already initialized work entry? Becuase that sounds both racy
> and broken to me, and none of the things I looked at from a quick grep
> looked like that at all.

I'll have a deeper look at that tomorrow.

Thanks,

tglx

2021-11-15 15:52:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [GIT pull] timers/urgent for v5.16-rc1

On Sun, Nov 14, 2021 at 11:02:31AM -0800, Linus Torvalds wrote:
> On Sun, Nov 14, 2021 at 5:31 AM Thomas Gleixner <[email protected]> wrote:

> But apparently it matters for posix_cputimers_work and for numa_work,
> and so I think it's very illogical that init_task_work() will not
> actually initialize it properly.

The problem with the posix timers thing seems to be that it can race
against fork() but afaict it can't actually mis-behave if it has garbage
in ->next, so the clearing here is pure paranoia.

> And that task_tick_numa() special case is truly horrendous, and really
> should go after the init_task_work() regardless, exactly because you'd
> expect that init_task_work() to initialize the work even if it doesn't
> happen to right now.

Yeah, it's a wee bit 'special' allright :-)

> Or is somebody doing init_task_work() to only change the work-function
> on an already initialized work entry? Becuase that sounds both racy
> and broken to me, and none of the things I looked at from a quick grep
> looked like that at all.

The worst I found is someone sharing an rcu_head between task_work and
call_rcu (supposedly at different stages in the life-time).

I couldn't find any other weird cases.

---
include/linux/task_work.h | 1 +
kernel/sched/fair.c | 4 ++--
kernel/time/posix-cpu-timers.c | 2 --
3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/include/linux/task_work.h b/include/linux/task_work.h
index 5b8a93f288bb..fbbc9aa8e4ae 100644
--- a/include/linux/task_work.h
+++ b/include/linux/task_work.h
@@ -10,6 +10,7 @@ typedef void (*task_work_func_t)(struct callback_head *);
static inline void
init_task_work(struct callback_head *twork, task_work_func_t func)
{
+ twork->next = NULL;
twork->func = func;
}

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6e476f6d9435..d03dacdecf73 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2823,14 +2823,14 @@ void init_numa_balancing(unsigned long clone_flags, struct task_struct *p)
p->node_stamp = 0;
p->numa_scan_seq = mm ? mm->numa_scan_seq : 0;
p->numa_scan_period = sysctl_numa_balancing_scan_delay;
- /* Protect against double add, see task_tick_numa and task_numa_work */
- p->numa_work.next = &p->numa_work;
p->numa_faults = NULL;
RCU_INIT_POINTER(p->numa_group, NULL);
p->last_task_numa_placement = 0;
p->last_sum_exec_runtime = 0;

init_task_work(&p->numa_work, task_numa_work);
+ /* Protect against double add, see task_tick_numa and task_numa_work */
+ p->numa_work.next = &p->numa_work;

/* New address space, reset the preferred nid */
if (!(clone_flags & CLONE_VM)) {
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 96b4e7810426..3352759e6916 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1167,8 +1167,6 @@ void clear_posix_cputimers_work(struct task_struct *p)
* A copied work entry from the old task is not meaningful, clear it.
* N.B. init_task_work will not do this.
*/
- memset(&p->posix_cputimers_work.work, 0,
- sizeof(p->posix_cputimers_work.work));
init_task_work(&p->posix_cputimers_work.work,
posix_cpu_timers_work);
p->posix_cputimers_work.scheduled = false;

2021-11-15 16:07:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [GIT pull] timers/urgent for v5.16-rc1

On Mon, Nov 15, 2021 at 04:51:34PM +0100, Peter Zijlstra wrote:
> diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
> index 96b4e7810426..3352759e6916 100644
> --- a/kernel/time/posix-cpu-timers.c
> +++ b/kernel/time/posix-cpu-timers.c
> @@ -1167,8 +1167,6 @@ void clear_posix_cputimers_work(struct task_struct *p)
> * A copied work entry from the old task is not meaningful, clear it.
> * N.B. init_task_work will not do this.
> */
> - memset(&p->posix_cputimers_work.work, 0,
> - sizeof(p->posix_cputimers_work.work));

Also, clearly that comments needs to go..

> init_task_work(&p->posix_cputimers_work.work,
> posix_cpu_timers_work);
> p->posix_cputimers_work.scheduled = false;

2021-11-16 01:26:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT pull] timers/urgent for v5.16-rc1

On Mon, Nov 15, 2021 at 8:07 AM Peter Zijlstra <[email protected]> wrote:
>
> Also, clearly that comments needs to go..

Yeah, with that your patch looks good to me.

Thanks,
Linus