2012-08-21 13:05:25

by Eric Dumazet

[permalink] [raw]
Subject: [PATCH] task_work: add a scheduling point in task_work_run()

From: Eric Dumazet <[email protected]>

It seems commit 4a9d4b02 (switch fput to task_work_add) reintroduced
the problem addressed in commit 944be0b2 (close_files(): add scheduling
point)

If a server process with a lot of files (say 2 million tcp sockets)
is killed, we can spend a lot of time in task_work_run() and trigger
a soft lockup.

Signed-off-by: Eric Dumazet <[email protected]>
---
kernel/task_work.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/task_work.c b/kernel/task_work.c
index 91d4e17..d320d44 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -75,6 +75,7 @@ void task_work_run(void)
p = q->next;
q->func(q);
q = p;
+ cond_resched();
}
}
}


2012-08-21 20:39:30

by Mimi Zohar

[permalink] [raw]
Subject: Re: [PATCH] task_work: add a scheduling point in task_work_run()

On Tue, 2012-08-21 at 15:05 +0200, Eric Dumazet wrote:
> From: Eric Dumazet <[email protected]>
>
> It seems commit 4a9d4b02 (switch fput to task_work_add) reintroduced
> the problem addressed in commit 944be0b2 (close_files(): add scheduling
> point)
>
> If a server process with a lot of files (say 2 million tcp sockets)
> is killed, we can spend a lot of time in task_work_run() and trigger
> a soft lockup.
>
> Signed-off-by: Eric Dumazet <[email protected]>
> ---
> kernel/task_work.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/task_work.c b/kernel/task_work.c
> index 91d4e17..d320d44 100644
> --- a/kernel/task_work.c
> +++ b/kernel/task_work.c
> @@ -75,6 +75,7 @@ void task_work_run(void)
> p = q->next;
> q->func(q);
> q = p;
> + cond_resched();
> }
> }
> }

We're here, because fput() called schedule_work() to delay the last
fput(). The execution needs to take place before the syscall returns to
userspace. Need to read __schedule()... Do you know if cond_resched()
can guarantee that it will be executed before the return to userspace?

thanks,

Mimi

2012-08-21 21:32:14

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] task_work: add a scheduling point in task_work_run()

On Tue, 2012-08-21 at 16:37 -0400, Mimi Zohar wrote:

> We're here, because fput() called schedule_work() to delay the last
> fput(). The execution needs to take place before the syscall returns to
> userspace. Need to read __schedule()... Do you know if cond_resched()
> can guarantee that it will be executed before the return to userspace?

Some clarifications :

- fput() does not call schedule_work() in this case but task_work_add()

- cond_resched() wont return to userspace.

2012-08-22 03:15:19

by Mimi Zohar

[permalink] [raw]
Subject: Re: [PATCH] task_work: add a scheduling point in task_work_run()

On Tue, 2012-08-21 at 23:32 +0200, Eric Dumazet wrote:
> On Tue, 2012-08-21 at 16:37 -0400, Mimi Zohar wrote:
>
> > We're here, because fput() called schedule_work() to delay the last
> > fput(). The execution needs to take place before the syscall returns to
> > userspace. Need to read __schedule()... Do you know if cond_resched()
> > can guarantee that it will be executed before the return to userspace?
>
> Some clarifications :
>
> - fput() does not call schedule_work() in this case but task_work_add()
>
> - cond_resched() wont return to userspace.

Thanks for the clarification.

Mimi

2012-08-22 05:27:34

by Michael wang

[permalink] [raw]
Subject: Re: [PATCH] task_work: add a scheduling point in task_work_run()

Hi, Eric

On 08/21/2012 09:05 PM, Eric Dumazet wrote:
> From: Eric Dumazet <[email protected]>
>
> It seems commit 4a9d4b02 (switch fput to task_work_add) reintroduced
> the problem addressed in commit 944be0b2 (close_files(): add scheduling
> point)
>
> If a server process with a lot of files (say 2 million tcp sockets)
> is killed, we can spend a lot of time in task_work_run() and trigger
> a soft lockup.

The thread will be rescheduled if we support kernel preempt, so this
change may only help the case we haven't enabled CONFIG_PREEMPT, isn't
it? What about using ifndef?

And can we make sure that it is safe to sleep(schedule) at this point?
It may need some totally testing to cover all the situation...

Regards,
Michael Wang

>
> Signed-off-by: Eric Dumazet <[email protected]>
> ---
> kernel/task_work.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/task_work.c b/kernel/task_work.c
> index 91d4e17..d320d44 100644
> --- a/kernel/task_work.c
> +++ b/kernel/task_work.c
> @@ -75,6 +75,7 @@ void task_work_run(void)
> p = q->next;
> q->func(q);
> q = p;
> + cond_resched();
> }
> }
> }
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2012-08-22 05:38:56

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] task_work: add a scheduling point in task_work_run()

On Wed, Aug 22, 2012 at 01:27:21PM +0800, Michael Wang wrote:

> And can we make sure that it is safe to sleep(schedule) at this point?
> It may need some totally testing to cover all the situation...

task_work callback can bloody well block, so yes, it is safe. Hell,
we are doing final close from that; that can lead to any amount of
IO, up to and including on-disk file freeing and, in case of vfsmount
kept alive by an opened file after we'd done umount -l, actual final
unmount of a filesystem. That can more than just block, that can block
for a long time if that's a network filesystem...