2005-09-22 12:13:53

by Michael Kerrisk

[permalink] [raw]
Subject: Process with many NPTL threads terminates slowly on core dump signal

Roland, Ingo,

I'm guessing that one of you might best be able to determine the
cause of the behaviour I'm seeing below.

I wrote a program (below) to investigate the operation of the
RLIMIT_CPU resource limit and I encountered a strangeness: if the
program creates a large number of threads, then it takes a very long
to terminate if it receives a signal that may cause a core dump.

I first noticed this happening on receipt of a SIGXCPU (since the
program is designed to consume infinite CPU time). However, I then
determined that the behaviour occurs on receipt of any signal that
can generate a core dump.

Here's an example run (Linux 2.6.14-rc2, x86, NPTL 2.3.4) -- the
program is asked to set a soft RLIMIT_CPU of 5 seconds, and to
create 20 threads:

==========
$ ulimit -c 0
$ time ./thread_share_RLIMIT_CPU 5 20
Linux tekapo 2.6.14-rc2 #6 SMP PREEMPT Wed Sep 21 09:29:36 CEST 2005 i686
i686 i386 GNU/Linux
Thu Sep 22 14:01:16 CEST 2005
Main thread changed soft CPU resource limit to: 5
Thread 3 cpu=0.21
Thread 3 cpu=0.42
Thread 3 cpu=0.63
Thread 3 cpu=0.84
Thread 3 cpu=1.05
Thread 3 cpu=1.26
[Here, I type ^\, generating SIGQUIT]
Quit

real 0m13.690s
user 0m1.388s
sys 0m11.977s
==========

In the above, one can see a large amount of (system) CPU time
consumed.

In a similar run, sending SIGINT (^C) does not cause this long delay
before process termination. Some further experimentation determined
the following:

* The slow process termination time only occurs for those signals that
may generate core dumps (verified by sending signals using kill(1)
from another terminal window).

* The slow startup time occurs even if RLIMIT_CORE is set to 0 (see
the example above above).

* If the example program is made set-user-UD-root (thus
mm->dumpable == 0), then the problem does not occur.

* I wondered if the fact that the threads were all trying to hog CPU
might have some bearing on the problem. However, even if I make
the threads pause(), the problem is still observable.

All of this makes me think the problem might be somewhere around
do_coredump()/soredump_wait(), but the problem is not immediately
obvious to me.

Cheers,

Michael


/* thread_share_RLIMIT_CPU.c

Usage: thread_share_RLIMIT_CPU [CPU-soft-limit [num-threads]]
*/
#include <sys/types.h>
#include <sys/times.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>

#define errExit(msg) { perror(msg); exit(EXIT_FAILURE); }

#define errExitEN(en, msg) { errno = en; perror(msg); \
exit(EXIT_FAILURE); }

static void *
tfunc(void *x)
{
struct tms tms;
int cpuCentisecs, prevCpu, j;

usleep(20000); /* Give main() a small chance to create other
threads before we start consuming CPU time */

// pause();

/* Following consumes CPU time and prints messages
allowing us to track CPU consumption */

prevCpu = 0;
for (;;) {
for (j = 0; j < 100000; j++)
continue; /* So main loop consumes mostly
user-mode CPU time */

if (times(&tms) == -1) errExit("times");
cpuCentisecs = (tms.tms_utime + tms.tms_stime) * 100 /
sysconf(_SC_CLK_TCK);
if (cpuCentisecs > prevCpu + 20) {
printf("Thread %d cpu=%0.2f\n", (int) x,
cpuCentisecs / 100.0);
prevCpu = cpuCentisecs;
}
}
return NULL;
}

int
main(int argc, char *argv[])
{
pthread_t thr;
int s, numThreads, tn;
struct rlimit rlim;

system("uname -a; date");

if (getrlimit(RLIMIT_CPU, &rlim) == -1) errExit("getrlimit");
rlim.rlim_cur = (argc > 1) ? atoi(argv[1]) : 3;
if (setrlimit(RLIMIT_CPU, &rlim) == -1)
errExit("setrlimit-ORIG_LIMIT");

if (getrlimit(RLIMIT_CPU, &rlim) == -1) errExit("getrlimit");
printf("Main thread changed soft CPU resource limit to: %ld\n",
(long) rlim.rlim_cur);

numThreads = (argc > 2) ? atoi(argv[2]) : 2;

for (tn = 0; tn < numThreads; tn++) {
s = pthread_create(&thr, NULL, tfunc, (void *) tn);
if (s != 0) errExitEN(s, "pthread_create");
}

pause();
} /* main */

--
Lust, ein paar Euro nebenbei zu verdienen? Ohne Kosten, ohne Risiko!
Satte Provisionen f?r GMX Partner: http://www.gmx.net/de/go/partner


2005-09-22 13:10:29

by Michael Kerrisk

[permalink] [raw]
Subject: Re: Process with many NPTL threads terminates slowly on core dump signal

Just one further point to my ealier report. Some investigation
shows that the problem seems to have appaeared with kernel 2.6.11.

Cheers,

Michael

--
Lust, ein paar Euro nebenbei zu verdienen? Ohne Kosten, ohne Risiko!
Satte Provisionen f?r GMX Partner: http://www.gmx.net/de/go/partner

2005-09-24 15:28:44

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: Process with many NPTL threads terminates slowly on core dump signal

On Thu, Sep 22, 2005 at 02:13:50PM +0200, Michael Kerrisk wrote:
> I first noticed this happening on receipt of a SIGXCPU (since the
> program is designed to consume infinite CPU time). However, I then
> determined that the behaviour occurs on receipt of any signal that
> can generate a core dump.

You might want to try using /proc/profile (profile= option) to see
where all the kernel time is going?


--
Daniel Jacobowitz
CodeSourcery, LLC

2005-10-12 13:34:59

by Michael Kerrisk

[permalink] [raw]
Subject: Re: Process with many NPTL threads terminates slowly on core dump signal

Roland,

Following up (belatedly) from my earlier message, I took Daniel
Jacobowitz's suggestion to investigate the result from booting
with "profile=2". When running my program (shwon below) on
2.6.14-rc4 to create 100 threads, and sending a core dump signal,
the program takes 90 seconds to terminate, and readprofile shows
the following:

# readprofile|sort -k3nr|head -10
15504 _spin_unlock_irq 484.5000
3648 do_notify_resume 60.8000
3867 acpi_processor_idle 5.8063
63 resume_userspace 2.6250
630 do_signal 1.9688
44 _spin_unlock_irqrestore 1.3750
21 work_notifysig 0.8400
17 fput 0.5312
185 get_signal_to_deliver 0.2141
26 fget 0.1806

The relevant call chain seems to be:

resume_userspace()
work_notifysig()
do_notify_resume()
do_signal()
get_signal_to_deliver()

And these lines in do_notify_resume() (which is called
heavily, while do_signal() does not seem to be):

/* deal with pending signal delivery */
if (thread_info_flags & _TIF_SIGPENDING)
do_signal(regs,oldset);

make me suspect that the issue could be something to do
with _TIF_SIGPENDING being erroneously cleared. And the
fact that the behaviour appeared in 2.6.11, where you made
some changes to the signal handling code makes me wonder if
the following patch might have triggered this behaviour:

http://marc.theaimsgroup.com/?l=linux-kernel&m=110567323014110&w=2
Subject: [PATCH] clear false pending signal indication in core dump
From: Roland McGrath <roland () redhat ! com>
Date: 2005-01-14 3:05:59

This patch adds the following lines to do_coredump()

+ /*
+ * Clear any false indication of pending signals that might
+ * be seen by the filesystem code called to write the core file.
+ */
+ current->signal->group_stop_count = 0;
+ clear_thread_flag(TIF_SIGPENDING);

However, I might be right off the track here, as I haven't
completely understood the code. Do you have any thoughts?

Cheers,

Michael


> Betreff: Process with many NPTL threads terminates slowly on core dump
> signal
> Datum: Thu, 22 Sep 2005 14:13:50 +0200 (MEST)
>
> Roland, Ingo,
>
> I'm guessing that one of you might best be able to determine the
> cause of the behaviour I'm seeing below.
>
> I wrote a program (below) to investigate the operation of the
> RLIMIT_CPU resource limit and I encountered a strangeness: if the
> program creates a large number of threads, then it takes a very long
> to terminate if it receives a signal that may cause a core dump.
>
> I first noticed this happening on receipt of a SIGXCPU (since the
> program is designed to consume infinite CPU time). However, I then
> determined that the behaviour occurs on receipt of any signal that
> can generate a core dump.
>
> Here's an example run (Linux 2.6.14-rc2, x86, NPTL 2.3.4) -- the
> program is asked to set a soft RLIMIT_CPU of 5 seconds, and to
> create 20 threads:
>
> ==========
> $ ulimit -c 0
> $ time ./thread_share_RLIMIT_CPU 5 20
> Linux tekapo 2.6.14-rc2 #6 SMP PREEMPT Wed Sep 21 09:29:36 CEST 2005 i686
> i686 i386 GNU/Linux
> Thu Sep 22 14:01:16 CEST 2005
> Main thread changed soft CPU resource limit to: 5
> Thread 3 cpu=0.21
> Thread 3 cpu=0.42
> Thread 3 cpu=0.63
> Thread 3 cpu=0.84
> Thread 3 cpu=1.05
> Thread 3 cpu=1.26
> [Here, I type ^\, generating SIGQUIT]
> Quit
>
> real 0m13.690s
> user 0m1.388s
> sys 0m11.977s
> ==========
>
> In the above, one can see a large amount of (system) CPU time
> consumed.
>
> In a similar run, sending SIGINT (^C) does not cause this long delay
> before process termination. Some further experimentation determined
> the following:
>
> * The slow process termination time only occurs for those signals that
> may generate core dumps (verified by sending signals using kill(1)
> from another terminal window).
>
> * The slow startup time occurs even if RLIMIT_CORE is set to 0 (see
> the example above above).
>
> * If the example program is made set-user-UD-root (thus
> mm->dumpable == 0), then the problem does not occur.
>
> * I wondered if the fact that the threads were all trying to hog CPU
> might have some bearing on the problem. However, even if I make
> the threads pause(), the problem is still observable.
>
> All of this makes me think the problem might be somewhere around
> do_coredump()/soredump_wait(), but the problem is not immediately
> obvious to me.
>
> Cheers,
>
> Michael
>
>
> /* thread_share_RLIMIT_CPU.c
>
> Usage: thread_share_RLIMIT_CPU [CPU-soft-limit [num-threads]]
> */
> #include <sys/types.h>
> #include <sys/times.h>
> #include <sys/time.h>
> #include <sys/resource.h>
> #include <pthread.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <errno.h>
>
> #define errExit(msg) { perror(msg); exit(EXIT_FAILURE); }
>
> #define errExitEN(en, msg) { errno = en; perror(msg); \
> exit(EXIT_FAILURE); }
>
> static void *
> tfunc(void *x)
> {
> struct tms tms;
> int cpuCentisecs, prevCpu, j;
>
> usleep(20000); /* Give main() a small chance to create other
> threads before we start consuming CPU time */
>
> // pause();
>
> /* Following consumes CPU time and prints messages
> allowing us to track CPU consumption */
>
> prevCpu = 0;
> for (;;) {
> for (j = 0; j < 100000; j++)
> continue; /* So main loop consumes mostly
> user-mode CPU time */
>
> if (times(&tms) == -1) errExit("times");
> cpuCentisecs = (tms.tms_utime + tms.tms_stime) * 100 /
> sysconf(_SC_CLK_TCK);
> if (cpuCentisecs > prevCpu + 20) {
> printf("Thread %d cpu=%0.2f\n", (int) x,
> cpuCentisecs / 100.0);
> prevCpu = cpuCentisecs;
> }
> }
> return NULL;
> }
>
> int
> main(int argc, char *argv[])
> {
> pthread_t thr;
> int s, numThreads, tn;
> struct rlimit rlim;
>
> system("uname -a; date");
>
> if (getrlimit(RLIMIT_CPU, &rlim) == -1) errExit("getrlimit");
> rlim.rlim_cur = (argc > 1) ? atoi(argv[1]) : 3;
> if (setrlimit(RLIMIT_CPU, &rlim) == -1)
> errExit("setrlimit-ORIG_LIMIT");
>
> if (getrlimit(RLIMIT_CPU, &rlim) == -1) errExit("getrlimit");
> printf("Main thread changed soft CPU resource limit to: %ld\n",
> (long) rlim.rlim_cur);
>
> numThreads = (argc > 2) ? atoi(argv[2]) : 2;
>
> for (tn = 0; tn < numThreads; tn++) {
> s = pthread_create(&thr, NULL, tfunc, (void *) tn);
> if (s != 0) errExitEN(s, "pthread_create");
> }
>
> pause();
> } /* main */

--
Lust, ein paar Euro nebenbei zu verdienen? Ohne Kosten, ohne Risiko!
Satte Provisionen f?r GMX Partner: http://www.gmx.net/de/go/partner

2005-10-12 17:21:08

by Oleg Nesterov

[permalink] [raw]
Subject: Re: Process with many NPTL threads terminates slowly on core dump signal

Michael Kerrisk wrote:
>
> Following up (belatedly) from my earlier message, I took Daniel
> Jacobowitz's suggestion to investigate the result from booting
> with "profile=2". When running my program (shwon below) on
> 2.6.14-rc4 to create 100 threads, and sending a core dump signal,
> the program takes 90 seconds to terminate, and readprofile shows
> the following:

I think the coredumping code in __group_complete_signal() is bogus
and what happens is:

group_send_sig_info(P, SIGQUIT):

adds SIGQUIT to ->shared_pending

__group_complete_signal:

->signal->group_exit_task = P;

for_each_thread(t) {
P->signal->group_stop_count ++;
// sets TIF_SIGPENDING
signal_wake_up(t)
}


Now, P receives the signal:

get_signal_to_deliver:

if (->signal->group_stop_count > 0) // YES
handle_group_stop():
if (->signal->group_exit_task == current) { // YES
->signal->group_exit_task = NULL
return 0;
}

signr = dequeue_signal(); // SIGQUIT

do_coredump:

->signal->flags = SIGNAL_GROUP_EXIT;

coredump_wait:

yield();


Now all other threads do:

get_signal_to_deliver:

if (->signal->group_stop_count > 0) // YES
handle_group_stop();
if (->signal->flags & SIGNAL_GROUP_EXIT) // YES
return 0;

signr = dequeue_signal(); // no pending signals
// recalc_sigpending_tsk() DOES NOT clear TIF_SIGPENDING,
// because it sees ->group_stop_count != 0.

return 0;

TIF_SIGPENDING is not cleared, so get_signal_to_deliver() will be
called again on return to userspace. When all threads will eat their
->time_slice, P will return from yield() and kill all threads.

Could you try this patch (added to mm tree):
http://marc.theaimsgroup.com/?l=linux-kernel&m=112887453531139
? It does not solve the whole problem, but may help.

Please report the result, if possible.

Oleg.

2005-10-13 08:41:41

by Michael Kerrisk

[permalink] [raw]
Subject: Re: Process with many NPTL threads terminates slowly on core dump signal

> Von: Oleg Nesterov <[email protected]>
> Datum: Wed, 12 Oct 2005 21:33:46 +0400
>
> Michael Kerrisk wrote:
> >
> > Following up (belatedly) from my earlier message, I took Daniel
> > Jacobowitz's suggestion to investigate the result from booting
> > with "profile=2". When running my program (shwon below) on
> > 2.6.14-rc4 to create 100 threads, and sending a core dump signal,
> > the program takes 90 seconds to terminate, and readprofile shows
> > the following:
>
> I think the coredumping code in __group_complete_signal() is bogus
> and what happens is:

[...]

> TIF_SIGPENDING is not cleared, so get_signal_to_deliver() will be
> called again on return to userspace. When all threads will eat their
> ->time_slice, P will return from yield() and kill all threads.

Thanks for investiagting this further.

> Could you try this patch (added to mm tree):
> http://marc.theaimsgroup.com/?l=linux-kernel&m=112887453531139
> ? It does not solve the whole problem, but may help.
>
> Please report the result, if possible.

Thanks. I've applied it to 2.6.14-rc4: this patch does fix the
specific behaviour that my program demonstrates.

What remains to be solved?

Cheers,

Michael

--
10 GB Mailbox, 100 FreeSMS/Monat http://www.gmx.net/de/go/topmail
+++ GMX - die erste Adresse f?r Mail, Message, More +++

2005-10-13 17:05:50

by Oleg Nesterov

[permalink] [raw]
Subject: Re: Process with many NPTL threads terminates slowly on core dump signal

Michael Kerrisk wrote:
>
> Thanks. I've applied it to 2.6.14-rc4: this patch does fix the
> specific behaviour that my program demonstrates.
>
> What remains to be solved?

The problem remains that all threads spin with TIF_SIGPENDING flag
after the coredumping thread sets SIGNAL_GROUP_EXIT in do_coredump()
and before it kills them in zap_threads(). If one of these threads is
SCHED_FIFO we have a deadlock. Also, the coredumping thread could be
preempted, placed in ->expired array ...

Oleg.