Hi,
commit 1959a60182f4 ("x86/dumpstack: Pin the target stack when dumping
it") slightly changed the behaviour of stack traces dumping for zombie
tasks.
Before the commit (well, this is older SLE12 kernel, but that should not
matter), if one called 'cat /proc/<zombie pid>/stack', they would get
something like this
[<ffffffff8105b877>] do_exit+0x6f7/0xa80
[<ffffffff8105bc79>] do_group_exit+0x39/0xa0
[<ffffffff8105bcf0>] __wake_up_parent+0x0/0x30
[<ffffffff8152dd09>] system_call_fastpath+0x16/0x1b
[<00007fd128f9c4f9>] 0x7fd128f9c4f9
[<ffffffffffffffff>] 0xffffffffffffffff
After, one gets nothing. The trace is empty. try_get_task_stack() contains
atomic_inc_not_zero() (CONFIG_THREAD_INFO_IN_TASK is now default on
x86_64) and because stack_refcount is 0 for a zombie task, it returns
NULL. Therefore, all save_stack_trace_*() functions return immediately.
I guess that no one has cared about it so far. There is a problem for
live patching though. save_stack_trace_tsk_reliable() returns -EINVAL for
the zombie task and its stack is deemed unreliable. It could block our
transition for quite a long time.
We can skip those tasks in kernel/livepatch/ with a simple test we have in
kGraft. Skip the task if (task->state == TASK_DEAD && task->on_cpu == 0).
But you may want to change it generally, so better to ask first.
Regards,
Miroslav
On Fri, Dec 15, 2017 at 4:54 AM, Miroslav Benes <[email protected]> wrote:
> Hi,
>
> commit 1959a60182f4 ("x86/dumpstack: Pin the target stack when dumping
> it") slightly changed the behaviour of stack traces dumping for zombie
> tasks.
>
> Before the commit (well, this is older SLE12 kernel, but that should not
> matter), if one called 'cat /proc/<zombie pid>/stack', they would get
> something like this
>
> [<ffffffff8105b877>] do_exit+0x6f7/0xa80
> [<ffffffff8105bc79>] do_group_exit+0x39/0xa0
> [<ffffffff8105bcf0>] __wake_up_parent+0x0/0x30
> [<ffffffff8152dd09>] system_call_fastpath+0x16/0x1b
> [<00007fd128f9c4f9>] 0x7fd128f9c4f9
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> After, one gets nothing. The trace is empty. try_get_task_stack() contains
> atomic_inc_not_zero() (CONFIG_THREAD_INFO_IN_TASK is now default on
> x86_64) and because stack_refcount is 0 for a zombie task, it returns
> NULL. Therefore, all save_stack_trace_*() functions return immediately.
>
> I guess that no one has cared about it so far. There is a problem for
> live patching though. save_stack_trace_tsk_reliable() returns -EINVAL for
> the zombie task and its stack is deemed unreliable. It could block our
> transition for quite a long time.
>
> We can skip those tasks in kernel/livepatch/ with a simple test we have in
> kGraft. Skip the task if (task->state == TASK_DEAD && task->on_cpu == 0).
> But you may want to change it generally, so better to ask first.
>
Sounds like a bug in save_stack_trace_tsk_reliable() to me: if the
task has no stack, then the trace is 100% definitely empty :)
On Fri, Dec 15, 2017 at 07:51:45AM -0800, Andy Lutomirski wrote:
> On Fri, Dec 15, 2017 at 4:54 AM, Miroslav Benes <[email protected]> wrote:
> > Hi,
> >
> > commit 1959a60182f4 ("x86/dumpstack: Pin the target stack when dumping
> > it") slightly changed the behaviour of stack traces dumping for zombie
> > tasks.
> >
> > Before the commit (well, this is older SLE12 kernel, but that should not
> > matter), if one called 'cat /proc/<zombie pid>/stack', they would get
> > something like this
> >
> > [<ffffffff8105b877>] do_exit+0x6f7/0xa80
> > [<ffffffff8105bc79>] do_group_exit+0x39/0xa0
> > [<ffffffff8105bcf0>] __wake_up_parent+0x0/0x30
> > [<ffffffff8152dd09>] system_call_fastpath+0x16/0x1b
> > [<00007fd128f9c4f9>] 0x7fd128f9c4f9
> > [<ffffffffffffffff>] 0xffffffffffffffff
> >
> > After, one gets nothing. The trace is empty. try_get_task_stack() contains
> > atomic_inc_not_zero() (CONFIG_THREAD_INFO_IN_TASK is now default on
> > x86_64) and because stack_refcount is 0 for a zombie task, it returns
> > NULL. Therefore, all save_stack_trace_*() functions return immediately.
> >
> > I guess that no one has cared about it so far. There is a problem for
> > live patching though. save_stack_trace_tsk_reliable() returns -EINVAL for
> > the zombie task and its stack is deemed unreliable. It could block our
> > transition for quite a long time.
> >
> > We can skip those tasks in kernel/livepatch/ with a simple test we have in
> > kGraft. Skip the task if (task->state == TASK_DEAD && task->on_cpu == 0).
> > But you may want to change it generally, so better to ask first.
> >
>
> Sounds like a bug in save_stack_trace_tsk_reliable() to me: if the
> task has no stack, then the trace is 100% definitely empty :)
I would agree with that, something like the following should fix it?
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 77835bc021c7..20161ef53537 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -164,8 +164,12 @@ int save_stack_trace_tsk_reliable(struct task_struct *tsk,
{
int ret;
+ /*
+ * If the task doesn't have a stack (e.g., a zombie), the stack is
+ * "reliably" empty.
+ */
if (!try_get_task_stack(tsk))
- return -EINVAL;
+ return 0;
ret = __save_stack_trace_reliable(trace, tsk);
On Sun, 17 Dec 2017, Josh Poimboeuf wrote:
> On Fri, Dec 15, 2017 at 07:51:45AM -0800, Andy Lutomirski wrote:
> > On Fri, Dec 15, 2017 at 4:54 AM, Miroslav Benes <[email protected]> wrote:
> > > Hi,
> > >
> > > commit 1959a60182f4 ("x86/dumpstack: Pin the target stack when dumping
> > > it") slightly changed the behaviour of stack traces dumping for zombie
> > > tasks.
> > >
> > > Before the commit (well, this is older SLE12 kernel, but that should not
> > > matter), if one called 'cat /proc/<zombie pid>/stack', they would get
> > > something like this
> > >
> > > [<ffffffff8105b877>] do_exit+0x6f7/0xa80
> > > [<ffffffff8105bc79>] do_group_exit+0x39/0xa0
> > > [<ffffffff8105bcf0>] __wake_up_parent+0x0/0x30
> > > [<ffffffff8152dd09>] system_call_fastpath+0x16/0x1b
> > > [<00007fd128f9c4f9>] 0x7fd128f9c4f9
> > > [<ffffffffffffffff>] 0xffffffffffffffff
> > >
> > > After, one gets nothing. The trace is empty. try_get_task_stack() contains
> > > atomic_inc_not_zero() (CONFIG_THREAD_INFO_IN_TASK is now default on
> > > x86_64) and because stack_refcount is 0 for a zombie task, it returns
> > > NULL. Therefore, all save_stack_trace_*() functions return immediately.
> > >
> > > I guess that no one has cared about it so far. There is a problem for
> > > live patching though. save_stack_trace_tsk_reliable() returns -EINVAL for
> > > the zombie task and its stack is deemed unreliable. It could block our
> > > transition for quite a long time.
> > >
> > > We can skip those tasks in kernel/livepatch/ with a simple test we have in
> > > kGraft. Skip the task if (task->state == TASK_DEAD && task->on_cpu == 0).
> > > But you may want to change it generally, so better to ask first.
> > >
> >
> > Sounds like a bug in save_stack_trace_tsk_reliable() to me: if the
> > task has no stack, then the trace is 100% definitely empty :)
>
> I would agree with that, something like the following should fix it?
>
> diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
> index 77835bc021c7..20161ef53537 100644
> --- a/arch/x86/kernel/stacktrace.c
> +++ b/arch/x86/kernel/stacktrace.c
> @@ -164,8 +164,12 @@ int save_stack_trace_tsk_reliable(struct task_struct *tsk,
> {
> int ret;
>
> + /*
> + * If the task doesn't have a stack (e.g., a zombie), the stack is
> + * "reliably" empty.
> + */
> if (!try_get_task_stack(tsk))
> - return -EINVAL;
> + return 0;
>
> ret = __save_stack_trace_reliable(trace, tsk);
This obviously fixes the problem, so you can add
Reported-and-tested-by: Miroslav Benes <[email protected]>
Thanks
Miroslav
* Miroslav Benes <[email protected]> wrote:
> On Sun, 17 Dec 2017, Josh Poimboeuf wrote:
>
> > On Fri, Dec 15, 2017 at 07:51:45AM -0800, Andy Lutomirski wrote:
> > > On Fri, Dec 15, 2017 at 4:54 AM, Miroslav Benes <[email protected]> wrote:
> > > > Hi,
> > > >
> > > > commit 1959a60182f4 ("x86/dumpstack: Pin the target stack when dumping
> > > > it") slightly changed the behaviour of stack traces dumping for zombie
> > > > tasks.
> > > >
> > > > Before the commit (well, this is older SLE12 kernel, but that should not
> > > > matter), if one called 'cat /proc/<zombie pid>/stack', they would get
> > > > something like this
> > > >
> > > > [<ffffffff8105b877>] do_exit+0x6f7/0xa80
> > > > [<ffffffff8105bc79>] do_group_exit+0x39/0xa0
> > > > [<ffffffff8105bcf0>] __wake_up_parent+0x0/0x30
> > > > [<ffffffff8152dd09>] system_call_fastpath+0x16/0x1b
> > > > [<00007fd128f9c4f9>] 0x7fd128f9c4f9
> > > > [<ffffffffffffffff>] 0xffffffffffffffff
> > > >
> > > > After, one gets nothing. The trace is empty. try_get_task_stack() contains
> > > > atomic_inc_not_zero() (CONFIG_THREAD_INFO_IN_TASK is now default on
> > > > x86_64) and because stack_refcount is 0 for a zombie task, it returns
> > > > NULL. Therefore, all save_stack_trace_*() functions return immediately.
> > > >
> > > > I guess that no one has cared about it so far. There is a problem for
> > > > live patching though. save_stack_trace_tsk_reliable() returns -EINVAL for
> > > > the zombie task and its stack is deemed unreliable. It could block our
> > > > transition for quite a long time.
> > > >
> > > > We can skip those tasks in kernel/livepatch/ with a simple test we have in
> > > > kGraft. Skip the task if (task->state == TASK_DEAD && task->on_cpu == 0).
> > > > But you may want to change it generally, so better to ask first.
> > > >
> > >
> > > Sounds like a bug in save_stack_trace_tsk_reliable() to me: if the
> > > task has no stack, then the trace is 100% definitely empty :)
> >
> > I would agree with that, something like the following should fix it?
> >
> > diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
> > index 77835bc021c7..20161ef53537 100644
> > --- a/arch/x86/kernel/stacktrace.c
> > +++ b/arch/x86/kernel/stacktrace.c
> > @@ -164,8 +164,12 @@ int save_stack_trace_tsk_reliable(struct task_struct *tsk,
> > {
> > int ret;
> >
> > + /*
> > + * If the task doesn't have a stack (e.g., a zombie), the stack is
> > + * "reliably" empty.
> > + */
> > if (!try_get_task_stack(tsk))
> > - return -EINVAL;
> > + return 0;
> >
> > ret = __save_stack_trace_reliable(trace, tsk);
>
> This obviously fixes the problem, so you can add
>
> Reported-and-tested-by: Miroslav Benes <[email protected]>
Great.
Josh, mind sending a changelogged version, or should I distill a commit out of
this discussion, for tip:x86/urgent?
Thanks,
Ingo
* Ingo Molnar <[email protected]> wrote:
> > This obviously fixes the problem, so you can add
> >
> > Reported-and-tested-by: Miroslav Benes <[email protected]>
>
> Great.
>
> Josh, mind sending a changelogged version, or should I distill a commit out of
> this discussion, for tip:x86/urgent?
... never mind, later in my mbox I found the patch that Josh already submitted!
Thanks,
Ingo