2021-10-11 19:41:32

by Dave Jones

[permalink] [raw]
Subject: Should EXIT_DEAD be visible to userspace ?

One of our users reported a crash in some userspace tooling this
morning, which scrapes /proc/pid to gather stack traces, process states
etc of everything running at the time.

The crash occurred because it fell over an unexpected task state,
which was 'X'. According to the procps man-pages, this state should
never be seen, but here it clearly was.

The kernel running at the time was kinda old (5.2) but I don't see much
change in the EXIT_DEAD space that would explain something that got
fixed subsequently. It's also probably going to be difficult to
reproduce unfortunately.

So my question is, is procps wrong and code should expect to see X state
processes in proc ? The code in question is being hardened to handle
unexpected inputs, but I'm curious if the kernel is leaking some state
that it shouldn't.

Dave


2021-10-11 20:35:59

by Linus Torvalds

[permalink] [raw]
Subject: Re: Should EXIT_DEAD be visible to userspace ?

On Mon, Oct 11, 2021 at 12:40 PM Dave Jones <[email protected]> wrote:
>
> One of our users reported a crash in some userspace tooling this
> morning, which scrapes /proc/pid to gather stack traces, process states
> etc of everything running at the time.
>
> The crash occurred because it fell over an unexpected task state,
> which was 'X'. According to the procps man-pages, this state should
> never be seen, but here it clearly was.

Heh.

> The kernel running at the time was kinda old (5.2) but I don't see much
> change in the EXIT_DEAD space that would explain something that got
> fixed subsequently. It's also probably going to be difficult to
> reproduce unfortunately.
>
> So my question is, is procps wrong and code should expect to see X state
> processes in proc ? The code in question is being hardened to handle
> unexpected inputs, but I'm curious if the kernel is leaking some state
> that it shouldn't.

My gut feel is that the man-pages have clearly been wrong - or at
least misleading - for at least the last couple of years (and possibly
longer), and this is the first report we've ever had of it actually
causing problems.

The docs *do* mention 'X'. Even if they say 'should never be seen',
it's not like it's not right there.

So we could either ask to just have the man-pages fixed to be a little
less strongly worded ("never" -> "seldom" or whatever). And apparently
procps is already getting fixed.

Or we could hide the 'X' state in newer kernels, and just call them
zombies to user space. We could literally just change the string from
"X (dead)" to "Z (dead)" and the "dead" part would still be there (and
different from "Z (zombie)").

And either way, it's likely not going to be something that people will
notice ever again. You update your system, and you wouldn't see the
problem, because whether the kernel was changed or not, procps was
updated.

And if the argument is that people didn't update procps, but *did*
update the kernel, then sure, that could avoid somebody hitting this
again, but that's where the "at least a couple of years and nobody has
noticed before" comes in.

So I can certainly take a patch that hides 'X', and we can even mark
it for stable.

But it feels like realistically nobody will actually care, and the
real fix is the one to procps, and that fix will make any kernel
change irrelevant (and possibly even a slight negative, since now
procps might report interesting cases?).

End result: if somebody cares enough and sends me a tested patch, I'll
apply it. But I personally wouldn't care much, and wouldn't push for
it unless we get somebody who does.

And I really think that "should never be seen" in the docs is just wrong.

Yes, we hold the task lock in task_state() when you look at
/proc/<pid>/status. So maybe it will never be seen there, because
maybe (I didn't check) we move from X->Z while holding the lock.

But other parts of /proc don't actually do that lock_task(), I think.
IOW, looking at /proc/<pid>/stat (which shows just the first letter of
the state), doesn't do that, I think. So it's not actually serialized
with the process going through its dying moments.

So I _think_ the "never" was always just "almost never".

In fact, take a look at commit ad86622b478e ("wait: swap EXIT_ZOMBIE
and EXIT_DEAD to hide EXIT_TRACE from user-space"). That 'X' has been
seen before. It's not great when it happens, but I think this is an
example of "the 'never' has never been true, and goes back to at least
2014".

.. and that 2014 was just when we happened on it before. The actual
issue of 'X' showing up looks like it probably predates it (and likely
goes back to before git history).

So I suspect that stray 'X' is not actually a regression. Just rare
enough to be _almost_ "never".

Linus

2021-10-12 10:45:06

by Christian Brauner

[permalink] [raw]
Subject: Re: Should EXIT_DEAD be visible to userspace ?

On Mon, Oct 11, 2021 at 01:33:21PM -0700, Linus Torvalds wrote:
> On Mon, Oct 11, 2021 at 12:40 PM Dave Jones <[email protected]> wrote:
> >
> > One of our users reported a crash in some userspace tooling this
> > morning, which scrapes /proc/pid to gather stack traces, process states
> > etc of everything running at the time.
> >
> > The crash occurred because it fell over an unexpected task state,
> > which was 'X'. According to the procps man-pages, this state should
> > never be seen, but here it clearly was.
>
> Heh.
>
> > The kernel running at the time was kinda old (5.2) but I don't see much
> > change in the EXIT_DEAD space that would explain something that got
> > fixed subsequently. It's also probably going to be difficult to
> > reproduce unfortunately.
> >
> > So my question is, is procps wrong and code should expect to see X state
> > processes in proc ? The code in question is being hardened to handle
> > unexpected inputs, but I'm curious if the kernel is leaking some state
> > that it shouldn't.
>
> My gut feel is that the man-pages have clearly been wrong - or at
> least misleading - for at least the last couple of years (and possibly
> longer), and this is the first report we've ever had of it actually
> causing problems.
>
> The docs *do* mention 'X'. Even if they say 'should never be seen',
> it's not like it's not right there.
>
> So we could either ask to just have the man-pages fixed to be a little
> less strongly worded ("never" -> "seldom" or whatever). And apparently
> procps is already getting fixed.
>
> Or we could hide the 'X' state in newer kernels, and just call them
> zombies to user space. We could literally just change the string from
> "X (dead)" to "Z (dead)" and the "dead" part would still be there (and
> different from "Z (zombie)").
>
> And either way, it's likely not going to be something that people will
> notice ever again. You update your system, and you wouldn't see the
> problem, because whether the kernel was changed or not, procps was
> updated.
>
> And if the argument is that people didn't update procps, but *did*
> update the kernel, then sure, that could avoid somebody hitting this
> again, but that's where the "at least a couple of years and nobody has
> noticed before" comes in.
>
> So I can certainly take a patch that hides 'X', and we can even mark
> it for stable.
>
> But it feels like realistically nobody will actually care, and the
> real fix is the one to procps, and that fix will make any kernel
> change irrelevant (and possibly even a slight negative, since now
> procps might report interesting cases?).
>
> End result: if somebody cares enough and sends me a tested patch, I'll
> apply it. But I personally wouldn't care much, and wouldn't push for
> it unless we get somebody who does.
>
> And I really think that "should never be seen" in the docs is just wrong.
>
> Yes, we hold the task lock in task_state() when you look at
> /proc/<pid>/status. So maybe it will never be seen there, because
> maybe (I didn't check) we move from X->Z while holding the lock.
>
> But other parts of /proc don't actually do that lock_task(), I think.
> IOW, looking at /proc/<pid>/stat (which shows just the first letter of
> the state), doesn't do that, I think. So it's not actually serialized
> with the process going through its dying moments.
>
> So I _think_ the "never" was always just "almost never".
>
> In fact, take a look at commit ad86622b478e ("wait: swap EXIT_ZOMBIE
> and EXIT_DEAD to hide EXIT_TRACE from user-space"). That 'X' has been
> seen before. It's not great when it happens, but I think this is an
> example of "the 'never' has never been true, and goes back to at least
> 2014".
>
> .. and that 2014 was just when we happened on it before. The actual
> issue of 'X' showing up looks like it probably predates it (and likely
> goes back to before git history).
>
> So I suspect that stray 'X' is not actually a regression. Just rare
> enough to be _almost_ "never".

Imho, let's
1. update the manpage
2. update procps
and call it done.

Christian