2016-03-08 00:19:42

by Simon Kirby

[permalink] [raw]
Subject: Hung task detector versus NFS (TASK_KILLABLE)

Hello!

Back in 2008, you committed 316d9679f33caf7e683471647d1472bfe133d858
which changed softlockup.c (now moved to hung_task.c) to avoid logging a
spew of soft lockup warnings when the Ethernet cable is unplugged with
active NFS mounts.

Meanwhile, I've been seeing hung task warnings like this for years, so I
wondered what the deal is. It seems there are VFS paths that can enter
uninterruptible sleep as result of locks held in interruptible sleep.

For example, I can reproduce hung task warnings by firewalling NFS, then
"cat a" twice: the second hangs in mutex_lock() from path_openat(), which
then spews a hung task warning.

I write this because I would actually find it useful to see the original
backtrace, even if it is interruptible, not just the collateral damage.
Since the "skipping" of NFS is basically incomplete anyway, how big a
deal is this "feature"?

Would anybody object if we just returned this to anything blocked?

The lines in question these days are here in kernel/hung_task.c:

/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
if (t->state == TASK_UNINTERRUPTIBLE)
check_hung_task(t, timeout);

It used to be t->state & TASK_UNINTERRUPTIBLE.

Simon-


2016-03-08 03:11:27

by Andi Kleen

[permalink] [raw]
Subject: Re: Hung task detector versus NFS (TASK_KILLABLE)

> I write this because I would actually find it useful to see the original
> backtrace, even if it is interruptible, not just the collateral damage.
> Since the "skipping" of NFS is basically incomplete anyway, how big a
> deal is this "feature"?

Random backtrace spewing is always a misfeature for 99.99+% of the users
for whom it is gibberish.

If you really need it yourself add a kprobe.

-Andi

2016-03-09 20:23:21

by Simon Kirby

[permalink] [raw]
Subject: Re: Hung task detector versus NFS (TASK_KILLABLE)

On Mon, Mar 07, 2016 at 07:11:19PM -0800, Andi Kleen wrote:

> > I write this because I would actually find it useful to see the original
> > backtrace, even if it is interruptible, not just the collateral damage.
> > Since the "skipping" of NFS is basically incomplete anyway, how big a
> > deal is this "feature"?
>
> Random backtrace spewing is always a misfeature for 99.99+% of the users
> for whom it is gibberish.

Distributions all seem to ship with it on because apparently some people
can read it. There was even discussion that the default 10 is not enough.

> If you really need it yourself add a kprobe.

To emulate a hung task backtrace even when TASK_KILLABLE? That sounds
like some hoop-jumping, but I don't know kprobes.

I'm just saying the current "NFS filter" is broken ("cat a" twice), but
this really will make more noise for people (in cases where NFS is stuck
for minutes), I guess I'll just sit in a corner with that line changed in
my tree.

Simon-