2004-10-21 13:09:32

by Laurent Dufour

[permalink] [raw]
Subject: PROBLEM : Thread signal informations are not freed when it is execing.

Hi,

It seems that under specific circumstances...

[1] Thread signal informations are not freed when it is execing.

[2] If a thread calling exec has pending signals with signal
informations then this signal informations are unlinked but not freed.

It is easy to detect it with previous kernel 2.4.x because of rtsig-nr
sysctl value (see [6]).

On kernel 2.4 the result is that no RT signal information can be sent
when rtsig-nr reach is max value. On 2.6 kernels it can't be seen but
allocated memory is never freed.

Problem seems to be due to the call at init_sigpending at the end of
de_thread because it reset signal list without freeing linked items.
Perhaps it should be better to use flush_sigqueue.

In 2.6 kernel by reading de_thread function, it seems that problem still
exists.

[3] de_thread, pending signals, exec

[4] 2.4.21 but it seems also applicable to 2.6.8.1

[5] NA

[6] This small test cannot be run on 2.6 kernel because
"kernel.rtsig-nr" sysctl value doesn't exist anymore. But memory leak
seems to be still present in 2.6.8.1 kernel.

/*
* This small program generate a memory leak of signal info structure.
* It can't not be run on new kernel (2.6.x) because rtsig-nr doesn't
* exist anymore but it seems that leak is still present in 2.6 kernels.
*
* The first time the test program is executed, it will set rtsig-nr to
* it maximum (1024 is the default for kernel.rtsig-max).
* Then no process can sent signal with information associated.
* You can verify this by running test program again, it will say that
* no signal can be sent (pthread_kill will return EAGAIN error).
*
* WARNING : You have to reboot the node to reset rtsig-nr value.
*
* Laurent Dufour <[email protected]>
*/

#define _GNU_SOURCE
#include <pthread.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <signal.h>

#define SIGNAL SIGRTMAX

void *thread_fn(void * unused)
{
sigset_t sset;
int i;
pthread_t me = pthread_self();

/* printf("thread %p started\n", pthread_self()); */

sigemptyset(&sset);
sigaddset(&sset, SIGNAL);
if (pthread_sigmask(SIG_BLOCK, &sset, NULL)) {
perror("pthread_sigmask");
exit(1);
}

i=0;
while(1) {
int err = pthread_kill(me, SIGNAL);
if (err) {
if (!i) {
printf("no signal sent, pthread_kill : %s\n",
strerror(err));
exit(1);
}
break;
}
i++;
}
/* printf("%d signal sent.\n", i);*/

printf("After thread is execing :");
execlp("sysctl", "sysctl", "kernel.rtsig-nr", NULL);

perror("execlp");

exit(1);
}

void sig_handler(int signal, siginfo_t *info, void * unused)
{
/* It should never be called since signal is blocked. */
printf("thread %p catch signal %s\n",
pthread_self(), strsignal(signal));
}

int main(int argc, char **argv)
{
pthread_t thread;
struct sigaction act;

setbuf(stdout,NULL);

printf("Before thread's execing : ");
system("sysctl kernel.rtsig-nr");

act.sa_sigaction = sig_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_RESTART | SA_NOMASK | SA_SIGINFO;
if (sigaction(SIGNAL, &act, NULL)) {
perror("sigaction");
exit(1);
}

if (pthread_create(&thread, NULL, thread_fn, NULL)) {
perror("pthread_create");
exit(1);
}

/* printf("wait for thread call exec...\n"); */
while(1) pause();
}


[7] environnement doesn't seems to have any impact.

[8] Hope that helps.


Attachments:
signature.asc (189.00 B)
Ceci est une partie de message num?riquement sign?e.

2004-10-22 09:26:13

by Roland McGrath

[permalink] [raw]
Subject: Re: PROBLEM : Thread signal informations are not freed when it is execing.

I don't think these problems are still relevant to the current sources,
though some of them might still occur in 2.6.9. The fix I posted for the
semantics bugs of exec vs pending signals makes de_thread not abandon the
old signal_struct at all, so it holds on to all the data structures. If
you can reproduce any kind of leak using the code now in Linus's tree,
please show me the details.


Thanks,
Roland

2004-10-22 14:53:39

by Laurent Dufour

[permalink] [raw]
Subject: Re: PROBLEM : Thread signal informations are not freed when it is execing.

Le ven 22/10/2004 ? 11:17, Roland McGrath a ?crit :
> I don't think these problems are still relevant to the current sources,
> though some of them might still occur in 2.6.9. The fix I posted for the
> semantics bugs of exec vs pending signals makes de_thread not abandon the
> old signal_struct at all, so it holds on to all the data structures. If
> you can reproduce any kind of leak using the code now in Linus's tree,
> please show me the details.

I didn't find the post you're talking about, but after reading 2.6.9
de_thread function, I didn't find major diff with previous release.
So I suppose that the leak is always present, and I have some details..

I wrote a new test case that works with 2.6.x kernel. I have run it on a
Fedora Core 2 node with a 2.6.8-1.521 kernel and also with the new 2.6.9
kernel, and it has also produce a leak in siqueue buffers. It can be
seen by looking at sigqueue cache info in /proc/slabinfo.

It seems that the bug is due to the call to
init_sigpending(&current->pending) at line 737 of de_thread.
It breaks the actual sigqueue pending list and all the linked entries
are lost. As a consequence, current->user->pending count is also not
updated.

Here is the new test case. Of course you have to compile it with
-lpthread flag.

/*
* This small program generate a memory leak of signal info structure.
*
* The first time the test program is executed, it will send about 1024
* signal depending of RLIMIT_SIGPENDING value (default is 1024).
* Then it will exec itself and try to send signal again. That will
failed.
*
* After that, you will not be able to run it again since
user->sigpending
* has raised the maximum value. But there is no active task with
pending
* signals !!!
*
* You can also see in /proc/slabinfo that sigqueue pool was increase by
1024.
* Theses siqueue entries are lost and user can't send signal with
signal info
* anymore.
*
* WARNING : You have to reboot the node to restore sigqueue entries.
*
* Laurent Dufour <[email protected]>
*/

#define _GNU_SOURCE
#include <pthread.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <signal.h>

#define SIGNAL SIGRTMAX

char *pname;

void *thread_fn(void * unused)
{
sigset_t sset;
int i;
pthread_t me = pthread_self();

sigemptyset(&sset);
sigaddset(&sset, SIGNAL);
if (pthread_sigmask(SIG_BLOCK, &sset, NULL)) {
perror("pthread_sigmask");
exit(1);
}

i=0;
while(1) {
int err = pthread_kill(me, SIGNAL);
if (err) {
if (!i) {
printf("no signal sent, pthread_kill : %s\n",
strerror(err));
exit(1);
}
break;
}
i++;
}
printf("%d signals sent. Calling exec %s\n", i, pname);

execlp(pname, pname, NULL);

perror(pname);

exit(1);
}

void sig_handler(int signal, siginfo_t *info, void * unused)
{
/* It should never be called since signal is blocked. */
printf("thread %p catch signal %s\n",
pthread_self(), strsignal(signal));
}

int main(int argc, char **argv)
{
pthread_t thread;
struct sigaction act;

setbuf(stdout,NULL);

pname = argv[0];

act.sa_sigaction = sig_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_RESTART | SA_NOMASK | SA_SIGINFO;
if (sigaction(SIGNAL, &act, NULL)) {
perror("sigaction");
exit(1);
}

if (pthread_create(&thread, NULL, thread_fn, NULL)) {
perror("pthread_create");
exit(1);
}

while(1) pause();
}

I hope that will help you.

Laurent.


Attachments:
signature.asc (189.00 B)
Ceci est une partie de message num?riquement sign?e.

2004-10-22 22:01:29

by Roland McGrath

[permalink] [raw]
Subject: Re: PROBLEM : Thread signal informations are not freed when it is execing.

> I didn't find the post you're talking about, but after reading 2.6.9
> de_thread function, I didn't find major diff with previous release.

Like I said, the relevant fix is in *after* 2.6.9, in the current sources
you can get now from bk/bkcvs or snapshots.

> I wrote a new test case that works with 2.6.x kernel. I have run it on a
> Fedora Core 2 node with a 2.6.8-1.521 kernel and also with the new 2.6.9
> kernel, and it has also produce a leak in siqueue buffers. It can be
> seen by looking at sigqueue cache info in /proc/slabinfo.

Thanks. I have verified that your test case produces the leak without my
patch, and has no leak with my patch. The patch is in current sources but
not 2.6.9, and you can find it here:

http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc4/2.6.9-rc4-mm1/broken-out/exec-fix-posix-timers-leak-and-pending-signal-loss.patch


Thanks,
Roland