Hi,
I found sometimes pid of muitl-threaded core's file name shows
wrong number in 2.5.59 with NPTL-0.17. Problem is, pid of core file
name comes from currnet->pid, but I think it should be current->tgid.
Following patch fixes this problem.
MAEDA Naoaki
diff -Naur linux-2.5.59/fs/exec.c linux-2.5.59-corepidfix/fs/exec.c
--- linux-2.5.59/fs/exec.c 2003-01-17 11:22:02.000000000 +0900
+++ linux-2.5.59-corepidfix/fs/exec.c 2003-01-25 13:20:50.000000000 +0900
@@ -1166,7 +1166,7 @@
case 'p':
pid_in_pattern = 1;
rc = snprintf(out_ptr, out_end - out_ptr,
- "%d", current->pid);
+ "%d", current->tgid);
if (rc > out_end - out_ptr)
goto out;
out_ptr += rc;
@@ -1238,7 +1238,7 @@
if (!pid_in_pattern
&& (core_uses_pid || atomic_read(¤t->mm->mm_users) != 1)) {
rc = snprintf(out_ptr, out_end - out_ptr,
- ".%d", current->pid);
+ ".%d", current->tgid);
if (rc > out_end - out_ptr)
goto out;
out_ptr += rc;
On Sat, 2003-01-25 at 04:56, MAEDA Naoaki wrote:
> Hi,
>
> I found sometimes pid of muitl-threaded core's file name shows
> wrong number in 2.5.59 with NPTL-0.17. Problem is, pid of core file
> name comes from currnet->pid, but I think it should be current->tgid.
The value needs to be unique so that you can dump multiple threads
at the same time and not have one overwrite another. You might want
to add the tgid as another format type to the core name formatting so
users can select the behaviour you desire however ?
On Tue, Jan 28, 2003 at 12:21:25PM +0000, Alan Cox wrote:
> On Sat, 2003-01-25 at 04:56, MAEDA Naoaki wrote:
> > Hi,
> >
> > I found sometimes pid of muitl-threaded core's file name shows
> > wrong number in 2.5.59 with NPTL-0.17. Problem is, pid of core file
> > name comes from currnet->pid, but I think it should be current->tgid.
>
> The value needs to be unique so that you can dump multiple threads
> at the same time and not have one overwrite another. You might want
> to add the tgid as another format type to the core name formatting so
> users can select the behaviour you desire however ?
I think this isn't an issue; multi-threaded core dumps are done by
the core_waiter synchronization, so all other threads will have exited
before the first thread to crash actually writes out its core.
--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer
On Tue, 2003-01-28 at 10:45, Daniel Jacobowitz wrote:
> I think this isn't an issue; multi-threaded core dumps are done by
> the core_waiter synchronization, so all other threads will have exited
> before the first thread to crash actually writes out its core.
I think the problem is the filenames need to not overwrite each other -
not actual synchronization in the kernel (which, as you point out, is
correct).
If we name the coredumps based on ->tgid, then all threads will dump to
the same file. If we use ->pid, each thread will use its unique PID as
its filename.
Robert Love
On Tue, Jan 28, 2003 at 12:27:03PM -0500, Robert Love wrote:
> On Tue, 2003-01-28 at 10:45, Daniel Jacobowitz wrote:
>
> > I think this isn't an issue; multi-threaded core dumps are done by
> > the core_waiter synchronization, so all other threads will have exited
> > before the first thread to crash actually writes out its core.
>
> I think the problem is the filenames need to not overwrite each other -
> not actual synchronization in the kernel (which, as you point out, is
> correct).
>
> If we name the coredumps based on ->tgid, then all threads will dump to
> the same file. If we use ->pid, each thread will use its unique PID as
> its filename.
That wasn't my point. All of the other threads have already terminated
without dumping core at tis point; I don't think it's possible for two
threads of a CLONE_THREAD application to both dump core. See
fs/exec.c:coredump_wait.
Also, once one thread gets into do_coredump it clears mm->dumpable;
nothing else will dump core from that MM anyway.
I think using ->tgid is a good idea.
--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer
On Tue, 2003-01-28 at 12:39, Daniel Jacobowitz wrote:
> That wasn't my point. All of the other threads have already terminated
> without dumping core at tis point; I don't think it's possible for two
> threads of a CLONE_THREAD application to both dump core. See
> fs/exec.c:coredump_wait.
>
> Also, once one thread gets into do_coredump it clears mm->dumpable;
> nothing else will dump core from that MM anyway.
Are you telling me only one thread per thread group can coredump,
period? So if two of them segfault (say concurrently on two different
processors) only one will win the race to dump and the others will
simply exit?
If so, I did not know that. You are right, then.
> I think using ->tgid is a good idea.
I think it is, too - even if what I said is true - as a format option.
Anyhow, I stand corrected.
Robert Love
On Tue, Jan 28, 2003 at 12:42:52PM -0500, Robert Love wrote:
> On Tue, 2003-01-28 at 12:39, Daniel Jacobowitz wrote:
>
> > That wasn't my point. All of the other threads have already terminated
> > without dumping core at tis point; I don't think it's possible for two
> > threads of a CLONE_THREAD application to both dump core. See
> > fs/exec.c:coredump_wait.
> >
> > Also, once one thread gets into do_coredump it clears mm->dumpable;
> > nothing else will dump core from that MM anyway.
>
> Are you telling me only one thread per thread group can coredump,
> period? So if two of them segfault (say concurrently on two different
> processors) only one will win the race to dump and the others will
> simply exit?
That's right. The dump will include all the threads anyway, now.
--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer
On Tue, 2003-01-28 at 17:49, Daniel Jacobowitz wrote:
> > Are you telling me only one thread per thread group can coredump,
> > period? So if two of them segfault (say concurrently on two different
> > processors) only one will win the race to dump and the others will
> > simply exit?
>
> That's right. The dump will include all the threads anyway, now.
In which case using tgid is probably the right thing.