2013-09-03 08:40:59

by Janani Venkataraman

[permalink] [raw]
Subject: RFD: Non-Disruptive Core Dump Infrastructure

Hello,

We are working on an infrastructure to create a system core file of a specific
process at run-time, non-disruptively. It can also be extended to a case where
a process is able to take a self-core dump.

gcore, an existing utility creates a core image of the specified process. It
attaches to the process using gdb and runs the gdb gcore command and then
detaches. In gcore the dump cannot be issued from a signal handler context as
fork() is not signal safe and moreover it is disruptive in nature as the gdb
attaches using ptrace which sends a SIGSTOP signal. Hence the gcore method
cannot be used if the process wants to initiate a self dump.

Previously the non-disruptive dump was tried with the Utrace approach [1].
First, all the threads would be assembled at a common place and quiesced using
UTRACE_INTERRUPT. Then the core dump would be triggered upon receiving the
event, indicating that the last thread of the process has quiesced, from its
quiesce callback. After several reviews and discussions, the Linux community
decided not to accept this proposal and has not pushed it upstream due to
various dependencies and potential risk of breaking existing implementations.
Hence the UTRACE approach is not being pursued. Also Roland had mentioned that
even if the approach worked smoothly,the pause could be a significant
perturbation [2].

Another approach was using the Freezer subsystem[3]. The freezer functions in
kernel essentially help start and stop sets of tasks and this approach
exploited the existing freezer subsystem kernel interface effectively to
quiesce all the threads of the application before triggering the core dump.
This approach was not accepted due to the potential Dos attack. Also the
community discussed that "freeze" is a bit dangerous because an application
which is frozen cannot be ended and while it's frozen and there is no
information "its frozen" via usual user commands as 'ps' or 'top'.

So ideally what we are trying to do is to export the infrastructure using
/proc/pid/core. Reading the file would give an ELF Format core-dump at that
instant non-disruptively,without killing the process.

This would involve basically three operations:

1) Holding the threads of a process without sending a signal (SIGSTOP). At this
point we can collect the register set snapshot and collect other information
required to create the ELF header. The above operation could be initiated with
the open() call.
2) Once the ELF header is created, read() can return the CORE DUMP data
including, the process memory page-by-page, based on the fpos (file position).
3) The threads could be released upon a close().

So the sub-problem here would be "How to hold these threads,collect the data
and release them non-disruptively?" in order to take a consistent dump.

As Roland had mentioned we could have a user option of having a minimal dump or
a full dump. The minimal dump can get a full register snapshot of the threads
running in user mode, and as much information as possible for those threads
that are blocked. Wheres a full dump can additionally get a memory dump as well.

If we provide the user a way to abort the operation, say keeping the threads in
an interruptible state, we should be able to prevent the doS attack which was
present in the method using the Freezer subsystem. For example we can send a
signal to the process and it should abort the dump operation and release the
threads.

We have analyzed the following options and we would like to know what people
think is the best or if there are any other mechanisms to perform the operation,
we would be happy to look at it.

1) Task work add

task_work_add() is an interface and an API. The task work add will run any
queued work before returning to user space from the kernel. So that work is
guaranteed to be done before user space can run again.

* Exploit this function to hold the threads when they are returning to the
user space.
* Wait until all the threads of the process to be dumped, reach task_work_add.
* Once all the threads have reached, the dump is taken and they are released.

Disadvantage :
* A thread which is blocked in kernel space,would not return to user space soon
and hence wouldn't be trapped in the task_work_add function
* The dump may be delayed as the other threads would be waiting for this
specific blocked thread to reach.

Solution:
* A way to solve this problem is to make the other threads that are waiting,
wait for a fixed time for the blocked thread and then just create a pt_note
with zeroes to indicate the presence of the blocked thread.

2) CRIU Approach :

This makes use of the CRIU tool and checkpoints when a dump is called, collects
the required details and continues the running process.
* A self dump cannot be initiated using the command line CRIU which is similar
to the limitation of gcore.
* A system call to do the same is being implemented which would help us create
a self dump.The system call is not upstream yet. We could explore that option as
well.

3) PTRACE (SEIZE + INTERRUPT) via kernel thread

In this approach, a kernel thread will play the role of seizing and registering
the states of the threads of the process to be dumped. We could make use of the
PTRACE_SEIZE + PTRACE_INTERRUPT within the open() to stop the threads without
SIGSTOP. However during self dump, we cannot make use of the PTRACE_SEIZE as a
self seize isn't permitted. One option is to offload this to a kernel thread
and let it capture the information. Once it is complete,the caller may be
released, so that it could continue with the dump.

* The open call reaches the kernel space during a self dump, a kernel thread
is spawned to seize all the threads of the process including the caller (the
process that called open) using a PTRACE_SEIZE.
* A PTRACE_INTERRUPT is issued and the required information is collected.
* On a self-dump, the kernel thread releases the caller, so that it can proceed
with the dumping.


APPENDIX:

[1] http://www.redhat.com/archives/utrace-devel/2009-July/msg00149.html
[2] http://www.redhat.com/archives/utrace-devel/2009-August/msg00006.html
[3] http://lwn.net/Articles/419756//

Thanking You.
With Regards,
Janani Venkataraman


2013-09-03 10:49:04

by Janani Venkataraman

[permalink] [raw]
Subject: Re: RFD: Non-Disruptive Core Dump Infrastructure

On 09/03/2013 04:01 PM, Pavel Emelyanov wrote:
> On 09/03/2013 12:39 PM, Janani Venkataraman wrote:
>> Hello,
>>
>> We are working on an infrastructure to create a system core file of a specific
>> process at run-time, non-disruptively. It can also be extended to a case where
>> a process is able to take a self-core dump.
>
> This is very close to what we're trying to do in CRIU. And although image files
> containing info about processes are not ELF files, an ability to generate ELF-cores
> out of existing CRIU images is one of the features that we were asked for.
>
>> 2) CRIU Approach :
>>
>> This makes use of the CRIU tool and checkpoints when a dump is called, collects
>> the required details and continues the running process.
>> * A self dump cannot be initiated using the command line CRIU which is similar
>> to the limitation of gcore.
>
> This is something we're trying to fix at the moment, as people ask for 'self-dump'
> ability as well. We plan to have this implemented in v0.8 (the v0.7 is coming out
> today/tomorrow) in about a month.
>
> I can shed more light on this, if required.
>
>> * A system call to do the same is being implemented which would help us create
>> a self dump.The system call is not upstream yet. We could explore that option as
>> well.
>
> Thanks,
> Pavel
>
Hi,

I would like to know more about the "self-dump" ability of CRIU. This is
the implementation using system calls if I am not wrong.

Thanking You.
Regards,
Janani Venkataraman

2013-09-04 10:54:53

by Janani Venkataraman

[permalink] [raw]
Subject: Re: RFD: Non-Disruptive Core Dump Infrastructure

On 09/03/2013 04:24 PM, Pavel Emelyanov wrote:
> On 09/03/2013 02:47 PM, Janani Venkataraman wrote:
>> On 09/03/2013 04:01 PM, Pavel Emelyanov wrote:
>>> On 09/03/2013 12:39 PM, Janani Venkataraman wrote:
>>>> Hello,
>>>>
>>>> We are working on an infrastructure to create a system core file of a specific
>>>> process at run-time, non-disruptively. It can also be extended to a case where
>>>> a process is able to take a self-core dump.
>>>
>>> This is very close to what we're trying to do in CRIU. And although image files
>>> containing info about processes are not ELF files, an ability to generate ELF-cores
>>> out of existing CRIU images is one of the features that we were asked for.
>>>
>>>> 2) CRIU Approach :
>>>>
>>>> This makes use of the CRIU tool and checkpoints when a dump is called, collects
>>>> the required details and continues the running process.
>>>> * A self dump cannot be initiated using the command line CRIU which is similar
>>>> to the limitation of gcore.
>>>
>>> This is something we're trying to fix at the moment, as people ask for 'self-dump'
>>> ability as well. We plan to have this implemented in v0.8 (the v0.7 is coming out
>>> today/tomorrow) in about a month.
>>>
>>> I can shed more light on this, if required.
>>>
>>>> * A system call to do the same is being implemented which would help us create
>>>> a self dump.The system call is not upstream yet. We could explore that option as
>>>> well.
>>>
>>> Thanks,
>>> Pavel
>>>
>> Hi,
>>
>> I would like to know more about the "self-dump" ability of CRIU. This is
>> the implementation using system calls if I am not wrong.
>
> Not exactly.
>
> In CRIU project since it's earliest days, we had to heavily patch the kernel
> to make it provide additional APIs for getting more info about running tasks
> and kernel objects. You can find all the patches we've created on the page
> http://criu.org/Commits
>
> For almost all the new APIs we proposed the community asked us to restrict them
> with CAP_SYS_ADMIN checks, so CRIU even for very basic stuff should be run from
> root. The intention was to create the proof-of-concept with maximal and most
> strict protection, and then think harder about less strict checks.
>
> With this the self-dump functionality cannot be implemented as just "CRIU in a
> .so file", since this would only be usable by root processes. So, instead of
> just wrapping the whole CRIU stuff into a library, we use a trickier approach.
> It's described here -- http://criu.org/Self_dump
>
> Briefly -- we will implement the CRIU service, which is a daemon running from
> root and listening on a unix socket. When a task wants to dump himself, it sends
> to the service a "dump me" message. The service then goes and dumps the process.
>
> Thanks,
> Pavel
>

Hi,

What we require for our infrastructure is just a register snapshot and a
memory dump.Do we require CAP_SYS privileges,if we want to dump the only
regset and memory ?

Is it possible to librarize the dump generation routine so that it is
transparent to the user. Also, ideally a single API for dump generation
is preferred for generating the dump, irrespective of whether it is a
self dump or not.

One another aspect we might want to look at is the DoS attacks. Are
there any cases where it is prone to such attacks.

We also looked into the Self-dump page you had mentioned and we would
like to know more. Is there any additional information/prototype which
you share with us .Also would it be possible for us to test a few
patches for the self dump case ?

If converting the dump,to ELF-core format from the existing CRIU Image
format has not yet been done,we would be happy to contribute towards it.

Thanks,
Janani

2013-09-04 17:52:43

by Andi Kleen

[permalink] [raw]
Subject: Re: RFD: Non-Disruptive Core Dump Infrastructure

> Briefly -- we will implement the CRIU service, which is a daemon running from
> root and listening on a unix socket. When a task wants to dump himself, it sends
> to the service a "dump me" message. The service then goes and dumps the process.

Maybe I'm missing something, but if the dump file is then readable by
the process and includes the output of the new interfaces
any potential security leaks exposed by the new interfaces would
be already there for unpriv. users?

-Andi

--
[email protected] -- Speaking for myself only.

2013-09-05 07:42:52

by Janani Venkataraman

[permalink] [raw]
Subject: Re: RFD: Non-Disruptive Core Dump Infrastructure

On 09/04/2013 05:03 PM, Pavel Emelyanov wrote:
> On 09/04/2013 02:53 PM, Janani Venkataraman wrote:
>> On 09/03/2013 04:24 PM, Pavel Emelyanov wrote:
>>> On 09/03/2013 02:47 PM, Janani Venkataraman wrote:
>>>> On 09/03/2013 04:01 PM, Pavel Emelyanov wrote:
>>>>> On 09/03/2013 12:39 PM, Janani Venkataraman wrote:
>>>>>> Hello,
>>>>>>
>>>>>> We are working on an infrastructure to create a system core file of a specific
>>>>>> process at run-time, non-disruptively. It can also be extended to a case where
>>>>>> a process is able to take a self-core dump.
>>>>>
>>>>> This is very close to what we're trying to do in CRIU. And although image files
>>>>> containing info about processes are not ELF files, an ability to generate ELF-cores
>>>>> out of existing CRIU images is one of the features that we were asked for.
>>>>>
>>>>>> 2) CRIU Approach :
>>>>>>
>>>>>> This makes use of the CRIU tool and checkpoints when a dump is called, collects
>>>>>> the required details and continues the running process.
>>>>>> * A self dump cannot be initiated using the command line CRIU which is similar
>>>>>> to the limitation of gcore.
>>>>>
>>>>> This is something we're trying to fix at the moment, as people ask for 'self-dump'
>>>>> ability as well. We plan to have this implemented in v0.8 (the v0.7 is coming out
>>>>> today/tomorrow) in about a month.
>>>>>
>>>>> I can shed more light on this, if required.
>>>>>
>>>>>> * A system call to do the same is being implemented which would help us create
>>>>>> a self dump.The system call is not upstream yet. We could explore that option as
>>>>>> well.
>>>>>
>>>>> Thanks,
>>>>> Pavel
>>>>>
>>>> Hi,
>>>>
>>>> I would like to know more about the "self-dump" ability of CRIU. This is
>>>> the implementation using system calls if I am not wrong.
>>>
>>> Not exactly.
>>>
>>> In CRIU project since it's earliest days, we had to heavily patch the kernel
>>> to make it provide additional APIs for getting more info about running tasks
>>> and kernel objects. You can find all the patches we've created on the page
>>> http://criu.org/Commits
>>>
>>> For almost all the new APIs we proposed the community asked us to restrict them
>>> with CAP_SYS_ADMIN checks, so CRIU even for very basic stuff should be run from
>>> root. The intention was to create the proof-of-concept with maximal and most
>>> strict protection, and then think harder about less strict checks.
>>>
>>> With this the self-dump functionality cannot be implemented as just "CRIU in a
>>> .so file", since this would only be usable by root processes. So, instead of
>>> just wrapping the whole CRIU stuff into a library, we use a trickier approach.
>>> It's described here -- http://criu.org/Self_dump
>>>
>>> Briefly -- we will implement the CRIU service, which is a daemon running from
>>> root and listening on a unix socket. When a task wants to dump himself, it sends
>>> to the service a "dump me" message. The service then goes and dumps the process.
>>>
>>> Thanks,
>>> Pavel
>>>
>>
>> Hi,
>>
>> What we require for our infrastructure is just a register snapshot and a
>> memory dump.Do we require CAP_SYS privileges,if we want to dump the only
>> regset and memory ?
>
> For registers and just the contents of memory you should just have enough
> rights to attach to the "victim" with the debugger. This usually means
> uid-s equivalence or CAP_SYS_PTRACE capability otherwise.
>
>> Is it possible to librarize the dump generation routine so that it is
>> transparent to the user. Also, ideally a single API for dump generation
>> is preferred for generating the dump, irrespective of whether it is a
>> self dump or not.
>
> We're currently developing a protobuf-RPC protocol to talk to criu service.
> Additionally there will be a .so library, that will provide C API above this
> protocol.
>
>> One another aspect we might want to look at is the DoS attacks. Are
>> there any cases where it is prone to such attacks.
>
> Well, checkpoint takes time, memory and disk, so if performed too often, may
> cause starvation on these resources.
>
>> We also looked into the Self-dump page you had mentioned and we would
>> like to know more. Is there any additional information/prototype which
>> you share with us .Also would it be possible for us to test a few
>> patches for the self dump case ?
>
> Currently this is work-in-progress, you can check criu mailing list
> archives at http://lists.openvz.org/pipermail/criu/, the patches from
> Ruslan Kuprieiev <kupruser@> are mostly about it.
>
>> If converting the dump,to ELF-core format from the existing CRIU Image
>> format has not yet been done,we would be happy to contribute towards it.
>
> Oh, that's great! The criu images format is described at http://criu.org/Images,
> feel free to ask questions if you find this information not enough.

I will look into this and get back to you at the earliest.
>
> Thanks,
> Pavel
>
Thanks,
Janani

2013-09-11 19:27:08

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: RFD: Non-Disruptive Core Dump Infrastructure

(9/3/13 4:39 AM), Janani Venkataraman wrote:
> Hello,
>
> We are working on an infrastructure to create a system core file of a specific
> process at run-time, non-disruptively. It can also be extended to a case where
> a process is able to take a self-core dump.
>
> gcore, an existing utility creates a core image of the specified process. It
> attaches to the process using gdb and runs the gdb gcore command and then
> detaches. In gcore the dump cannot be issued from a signal handler context as
> fork() is not signal safe and moreover it is disruptive in nature as the gdb
> attaches using ptrace which sends a SIGSTOP signal. Hence the gcore method
> cannot be used if the process wants to initiate a self dump.

Maybe I'm missing something. But why gcore uses c-level fork()? gcore need to
call pthread-at-fork handler? No. gcore need to flush stdio buffer? No.

2013-09-12 04:46:02

by suzuki

[permalink] [raw]
Subject: Re: RFD: Non-Disruptive Core Dump Infrastructure

On 09/12/2013 12:57 AM, KOSAKI Motohiro wrote:
> (9/3/13 4:39 AM), Janani Venkataraman wrote:
>> Hello,
>>
>> We are working on an infrastructure to create a system core file of a
>> specific
>> process at run-time, non-disruptively. It can also be extended to a
>> case where
>> a process is able to take a self-core dump.
>>
>> gcore, an existing utility creates a core image of the specified
>> process. It
>> attaches to the process using gdb and runs the gdb gcore command and then
>> detaches. In gcore the dump cannot be issued from a signal handler
>> context as
>> fork() is not signal safe and moreover it is disruptive in nature as
>> the gdb
>> attaches using ptrace which sends a SIGSTOP signal. Hence the gcore
>> method
>> cannot be used if the process wants to initiate a self dump.
>
> Maybe I'm missing something. But why gcore uses c-level fork()? gcore
> need to
> call pthread-at-fork handler? No. gcore need to flush stdio buffer? No.
>
Let me clarify. If an application wants to dump itself, it has to do a
fork() and then exec the gcore with the pid of the appication to
generate the dump.

So, if the application wants to initiate the dump from a signal handler
context, it may lead to trouble.

Thanks
Suzuki

2013-09-14 02:47:50

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: RFD: Non-Disruptive Core Dump Infrastructure

On 9/12/2013 12:45 AM, Suzuki K. Poulose wrote:
> On 09/12/2013 12:57 AM, KOSAKI Motohiro wrote:
>> (9/3/13 4:39 AM), Janani Venkataraman wrote:
>>> Hello,
>>>
>>> We are working on an infrastructure to create a system core file of a
>>> specific
>>> process at run-time, non-disruptively. It can also be extended to a
>>> case where
>>> a process is able to take a self-core dump.
>>>
>>> gcore, an existing utility creates a core image of the specified
>>> process. It
>>> attaches to the process using gdb and runs the gdb gcore command and then
>>> detaches. In gcore the dump cannot be issued from a signal handler
>>> context as
>>> fork() is not signal safe and moreover it is disruptive in nature as
>>> the gdb
>>> attaches using ptrace which sends a SIGSTOP signal. Hence the gcore
>>> method
>>> cannot be used if the process wants to initiate a self dump.
>>
>> Maybe I'm missing something. But why gcore uses c-level fork()? gcore
>> need to
>> call pthread-at-fork handler? No. gcore need to flush stdio buffer? No.
>>
> Let me clarify. If an application wants to dump itself, it has to do a
> fork() and then exec the gcore with the pid of the appication to
> generate the dump.

Oh, I did think the fork() is used for no application stop dump. But it is
incorrect.

Hmm. However, if an application _itself_ want to dump itself. They can avoid
to use signal handler properly. I'm missing the point of this discussion
completely.

So, I'd keep silence while.

>
> So, if the application wants to initiate the dump from a signal handler
> context, it may lead to trouble.