2002-12-16 17:23:36

by Roberto Fichera

[permalink] [raw]
Subject: Multithreaded coredump patch where?

Does anyone point me where can I download a stable
multithread coredump patch for the 2.4.19/20 kernel ?

Thanks in advance,

Roberto Fichera.


______________________________________
E-mail protetta dal servizio antivirus di IsolaWeb Agency & ISP
http://wwww.isolaweb.it


2002-12-17 00:08:11

by mgross

[permalink] [raw]
Subject: Re: Multithreaded coredump patch where?

I haven't rebased the patches I posted back in June for a while now.

Attached is the patch I posted for the 2.4.18 vanilla kernel. Its a bit
controversial, but it seems to work for a number of folks. Let me know if
you have any troubles re-basing it.

I don't know if there is any plan to back port Ingo's version of this feature
to 2.4.x

--mgross



On Monday 16 December 2002 09:28 am, Roberto Fichera wrote:
> Does anyone point me where can I download a stable
> multithread coredump patch for the 2.4.19/20 kernel ?
>
> Thanks in advance,
>
> Roberto Fichera.
>
>
> ______________________________________
> E-mail protetta dal servizio antivirus di IsolaWeb Agency & ISP
> http://wwww.isolaweb.it
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


Attachments:
tcore-2418-061802.pat (25.85 kB)

2002-12-17 11:00:12

by Roberto Fichera

[permalink] [raw]
Subject: Re: Multithreaded coredump patch where?

At 13.21 16/12/02 -0800, mgross wrote:

>I haven't rebased the patches I posted back in June for a while now.
>
>Attached is the patch I posted for the 2.4.18 vanilla kernel. Its a bit
>controversial, but it seems to work for a number of folks. Let me know if
>you have any troubles re-basing it.

Only one hunk failed on include/asm-ia64/elf.h but fixed by hand.
Why do you say a bit controversial ? One difference that I have
notice is in coredump size after your patch. However seem to be
working well for now. I'll try later on a SMP machine.


>I don't know if there is any plan to back port Ingo's version of this feature
>to 2.4.x
>
>--mgross
>
>
>
>On Monday 16 December 2002 09:28 am, Roberto Fichera wrote:
> > Does anyone point me where can I download a stable
> > multithread coredump patch for the 2.4.19/20 kernel ?
> >
> > Thanks in advance,
> >
> > Roberto Fichera.
> >
> >
> > ______________________________________
> > E-mail protetta dal servizio antivirus di IsolaWeb Agency & ISP
> > http://wwww.isolaweb.it
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/

Roberto Fichera.


______________________________________
E-mail protetta dal servizio antivirus di IsolaWeb Agency & ISP
http://wwww.isolaweb.it

2002-12-17 11:57:27

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Multithreaded coredump patch where?

On Tue, 2002-12-17 at 12:05, Roberto Fichera wrote:
> At 13.21 16/12/02 -0800, mgross wrote:
>
> >I haven't rebased the patches I posted back in June for a while now.
> >
> >Attached is the patch I posted for the 2.4.18 vanilla kernel. Its a bit
> >controversial, but it seems to work for a number of folks. Let me know if
> >you have any troubles re-basing it.
>
> Only one hunk failed on include/asm-ia64/elf.h but fixed by hand.
> Why do you say a bit controversial ?

The design has theoretical (but probably in practice not trivial to
trigger) deadlocks; by design it prevents processes that are sleeping
from running, regardless whether those processes are in kernel space or
not. If they are in kernel space, they can accidentally be holding a
semaphore that something in the core dumping path will need to get (but
can't because it never will be released). There are not that many of
such semaphores (kmap semaphore is one, and filesystems can have several
internally)


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2002-12-17 12:00:25

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Multithreaded coredump patch where?

On Tue, 2002-12-17 at 12:05, Roberto Fichera wrote:
> At 13.21 16/12/02 -0800, mgross wrote:
>
> >I haven't rebased the patches I posted back in June for a while now.
> >
> >Attached is the patch I posted for the 2.4.18 vanilla kernel. Its a bit
> >controversial, but it seems to work for a number of folks. Let me know if
> >you have any troubles re-basing it.
>
> Only one hunk failed on include/asm-ia64/elf.h but fixed by hand.
> Why do you say a bit controversial ?
The design has theoretical (but probably in practice not trivial to
trigger) deadlocks; by design it prevents processes that are sleeping
from running, regardless whether those processes are in kernel space or
not. If they are in kernel space, they can accidentally be holding a
semaphore that something in the core dumping path will need to get (


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2002-12-17 12:07:04

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Multithreaded coredump patch where?

On Mon, 2002-12-16 at 22:21, mgross wrote:

> I don't know if there is any plan to back port Ingo's version of this feature
> to 2.4.x

the current Red Hat Rawhide kernels have an attempt for a backport but
it's not fully working right yet unfortionatly


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2002-12-17 12:46:06

by Roberto Fichera

[permalink] [raw]
Subject: Re: Multithreaded coredump patch where?

At 13.05 17/12/02 +0100, Arjan van de Ven wrote:
>On Tue, 2002-12-17 at 12:05, Roberto Fichera wrote:
> > At 13.21 16/12/02 -0800, mgross wrote:
> >
> > >I haven't rebased the patches I posted back in June for a while now.
> > >
> > >Attached is the patch I posted for the 2.4.18 vanilla kernel. Its a bit
> > >controversial, but it seems to work for a number of folks. Let me know if
> > >you have any troubles re-basing it.
> >
> > Only one hunk failed on include/asm-ia64/elf.h but fixed by hand.
> > Why do you say a bit controversial ?
>
>The design has theoretical (but probably in practice not trivial to
>trigger) deadlocks; by design it prevents processes that are sleeping
>from running, regardless whether those processes are in kernel space or
>not. If they are in kernel space, they can accidentally be holding a
>semaphore that something in the core dumping path will need to get (but
>can't because it never will be released). There are not that many of
>such semaphores (kmap semaphore is one, and filesystems can have several
>internally)

Ok! Now I see why! This problem should be avoided if the coredump algo
will permit to complete the kernel execution for all the threads that
need it, and just before to reenter in userspace, all the threads will
freeze in
a know point so the coredump can continue with the snapshot. Not easy ;-)!



Roberto Fichera.


______________________________________
E-mail protetta dal servizio antivirus di IsolaWeb Agency & ISP
http://wwww.isolaweb.it

2002-12-18 19:00:16

by mgross

[permalink] [raw]
Subject: Re: Multithreaded coredump patch where?

On Tuesday 17 December 2002 03:05 am, Roberto Fichera wrote:
> >I haven't rebased the patches I posted back in June for a while now.
> >
> >Attached is the patch I posted for the 2.4.18 vanilla kernel. ?Its a bit
> >controversial, but it seems to work for a number of folks. ?Let me know if
> >you have any troubles re-basing it.
>
> Only one hunk failed on include/asm-ia64/elf.h but fixed by hand.
> Why do you say a bit controversial ? One difference that I have
> notice is in coredump size after your patch. However seem to be
> working well for now. I'll try later on a SMP machine.

There are 2 issues with this implementation that will likely not effect you.


First, when dumping core of MT applications with LOTS of threads the pthread
library signals all the threads in the application to exit. Sometimes
the process that is dumping core fails to suspend other threads in the
application before some exit. The result of this is that for such
applications you will not see them in the core file.

You have to work at it to see this failure. The way I reproduce this is to
run a test application with about 555 pthread threads in it and send it a
sig_quit. When I look at the core file wont have all 555 threads. SMP makes
this effect a bit more noticeable.

Ingo's design to fix this change the exit path for thread to wait for the
core file to get dumped before finishing the process clean up. I like this
approach, I just wish I thought of it ;)

Second, the controversial issue is in the way my design pauses the other
threads in the MT application. Its not semaphore lock safe. Although no
instance of the following failure has been seen, it is possible with new
kernel code.

If one of the processes in the MT application is currently holding semaphore
lock when the dumping process pauses it, AND the dumping process does any
blocking operation that could attempt to grab this same semaphore, THEN the
core dump will deadlock. Boom.

My patch is good for developers, pending the back port of Ingo's version.

Do let me know how it works out for you.

--mgross

2002-12-18 19:05:55

by mgross

[permalink] [raw]
Subject: Re: Multithreaded coredump patch where?

On Tuesday 17 December 2002 04:14 am, Arjan van de Ven wrote:
> On Mon, 2002-12-16 at 22:21, mgross wrote:
> > I don't know if there is any plan to back port Ingo's version of this
> > feature to 2.4.x
>
> the current Red Hat Rawhide kernels have an attempt for a backport but
> it's not fully working right yet unfortionatly

Is this because you are doing both the NPTL kernel support and the
multi-threaded core dump in a combined back port making the effort more
complex?

What types of issues are you seeing with the back port?

I'm just wondering. These are cool features.

--mgross

2002-12-19 09:19:01

by Roberto Fichera

[permalink] [raw]
Subject: Re: Multithreaded coredump patch where?

At 08.13 18/12/02 -0800, mgross wrote:

>On Tuesday 17 December 2002 03:05 am, Roberto Fichera wrote:
> > >I haven't rebased the patches I posted back in June for a while now.
> > >
> > >Attached is the patch I posted for the 2.4.18 vanilla kernel. Its a bit
> > >controversial, but it seems to work for a number of folks. Let me know if
> > >you have any troubles re-basing it.
> >
> > Only one hunk failed on include/asm-ia64/elf.h but fixed by hand.
> > Why do you say a bit controversial ? One difference that I have
> > notice is in coredump size after your patch. However seem to be
> > working well for now. I'll try later on a SMP machine.
>
>There are 2 issues with this implementation that will likely not effect you.
>
>
>First, when dumping core of MT applications with LOTS of threads the pthread
>library signals all the threads in the application to exit. Sometimes
>the process that is dumping core fails to suspend other threads in the
>application before some exit. The result of this is that for such
>applications you will not see them in the core file.
>
>You have to work at it to see this failure. The way I reproduce this is to
>run a test application with about 555 pthread threads in it and send it a
>sig_quit. When I look at the core file wont have all 555 threads. SMP makes
>this effect a bit more noticeable.

This could be a problem for me. I haven't tried yet your patch with my SMP
machine but I'll try today, I hope.

>Ingo's design to fix this change the exit path for thread to wait for the
>core file to get dumped before finishing the process clean up. I like this
>approach, I just wish I thought of it ;)

Yes seem to be a good solution.

>Second, the controversial issue is in the way my design pauses the other
>threads in the MT application. Its not semaphore lock safe. Although no
>instance of the following failure has been seen, it is possible with new
>kernel code.
>
>If one of the processes in the MT application is currently holding semaphore
>lock when the dumping process pauses it, AND the dumping process does any
>blocking operation that could attempt to grab this same semaphore, THEN the
>core dump will deadlock. Boom.

This problem could be reproducible under load doing some IO (fs & net),
cpu bound process and swapping (vm IO). In this way we could have some
possibility to catch it.


>My patch is good for developers, pending the back port of Ingo's version.
>
>Do let me know how it works out for you.
>
>--mgross

Roberto Fichera.