Hello All,
I have been working on an idea of creating an executable from a
running process image.
MOTIVATION:
Process migration among the nodes in distributed computing,
checkpointing process state.
BASIS:
The basis of my idea would be update the existing executable with
extra PHDRS (Program Headers) with type PT_LOAD and each of these
headers corresponding the vaddr mapping from /proc/<pid>/maps.
I have done some basic study of kernels loders code in
'fs/binfmt_elf.c' especially code in 'load_elf_binary' function, the
following is my understanding.
<------------------------------------------>
bss=0;
brk=0;
foreach (phdr in elf_header){
if(phdr->type == PT_LOAD){
if( phdr->filesize < phdr->memsize){
/* Segment with .bss, so update brk and bss*/
}
else {
/* Just map it*/
}
}
/*Update brk bss*/
}
<------------------------------------>
from the above the kernel is updating brk, thus creating the start of
sbrk(0) only when it sees a PT_LOAD segment with filesize<memsize. So
if I create a elf executable with all PT_LOAD segments with out any
segments with filesize < memsize. The kernel will set brk base i.e
sbrk(0) to the value phdr->vaddr+phdr->memsize of the last PT_LOAD
segment its mapping? so do I need to reoder my PT_LOAD segments so
that the heap goes as the last PT_LOAD segment?
Is there any way we can tell the elf loader to force the vaddr for
sbrk(0) i.e brk base ?
Let me know your suggestion on this idea?
Really appreciate your valuable comments.
Sincerely,
Vamsi
[PS: I dont know if some one has already implemented this idea??]
> [PS: I dont know if some one has already implemented this idea??]
Have you looked at the emacs dumper? This is more or less describing how
emacs makes it's executable during the build process ;)
(and yes it's horrid ;)
Arjan van de Ven wrote:
> Have you looked at the emacs dumper? This is more or less describing how
> emacs makes it's executable during the build process ;)
> (and yes it's horrid ;)
The emacs dumper does not address the fundamental problem with the kernel,
which Vamsi Krishna identified:
> Is there any way we can tell the elf loader to force the vaddr for
> sbrk(0) i.e brk base ?
load_elf_binary()/binfmt_elf.c sets the brk base from the PT_LOAD with
highest virtual address range. A re-executed dump (or newly decompressed
executable that was stored compressed in the file system, etc.) may well
want to set the brk base below some of its "initial" PT_LOAD [initial as
far as execve() is concerned], but the kernel provides no means to cooperate.
Emacs does not care because it colludes on both ends (the state save and
the restore), but the user does not want to require that the general
restored process must know these details of history.
It seems to me that a proper solution requires a new .p_type PT_BRK
which (if present) would cause the kernel to set the brk base
from the corresponding .p_vaddr, independent of the address ranges
specified in any PT_LOAD. Or, eliminate the whole concept of brk, which
is an anachronism from the days of primitive address-space management.
--
vamsi krishna wrote:
> Hello All,
>
> I have been working on an idea of creating an executable from a
> running process image.
>
> MOTIVATION:
> Process migration among the nodes in distributed computing,
> checkpointing process state.
>
> BASIS:
>
> The basis of my idea would be update the existing executable with
> extra PHDRS (Program Headers) with type PT_LOAD and each of these
> headers corresponding the vaddr mapping from /proc/<pid>/maps.
>
> I have done some basic study of kernels loders code in
> 'fs/binfmt_elf.c' especially code in 'load_elf_binary' function, the
> following is my understanding.
> <------------------------------------------>
> bss=0;
> brk=0;
> foreach (phdr in elf_header){
>
> if(phdr->type == PT_LOAD){
> if( phdr->filesize < phdr->memsize){
> /* Segment with .bss, so update brk and bss*/
> }
> else {
> /* Just map it*/
> }
> }
> /*Update brk bss*/
> }
> <------------------------------------>
>
> from the above the kernel is updating brk, thus creating the start of
> sbrk(0) only when it sees a PT_LOAD segment with filesize<memsize. So
> if I create a elf executable with all PT_LOAD segments with out any
> segments with filesize < memsize. The kernel will set brk base i.e
> sbrk(0) to the value phdr->vaddr+phdr->memsize of the last PT_LOAD
> segment its mapping? so do I need to reoder my PT_LOAD segments so
> that the heap goes as the last PT_LOAD segment?
Why don't you let execve() finish its job before modifying the mapping ?
Once execve returns, the segments are mapped and you are free to remap them
however you want and fill them in with a state previously saved on disk.
C.
> Why don't you let execve() finish its job before modifying the mapping ?
>
> Once execve returns, the segments are mapped and you are free to remap them
> however you want and fill them in with a state previously saved on disk.
>
I dont want to remap myself after execve() because considering the
potential problem of ASLR (Address Space Layout Randomization), since
the segments may contain sections merged into it especially the
segments with permissions 'rw-p' has .dynamic, .got sections merged
into it so if I do that after execve the .dynamic and .got are put
back with the old contents which crashes.
So I want to write the all the virtual adress mappings as PT_LOAD
segments and leave the mapping job to the elf loader itself.
Thank you,
Vamsi kundeti
> I have been working on an idea of creating an executable from a
> running process image.
This has been done before.
> [PS: I dont know if some one has already implemented this idea??]
This may suit your needs:
http://www.phrack.org/phrack/63/p63-0x0c_Process_Dump_and_Binary_Reconstruction.txt
IIRC, Silvio Cesare was the first one a couple of years ago that wrote a
proof of concept tool which dumped the memory space of a process to an
ELF executable, somewhere around 1999:
http://www.transient-iss.com/pit/elf.txt
Best regards,
Roberto Nibali, ratz
--
echo
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
On 3/18/06, vamsi krishna <[email protected]> wrote:
> [PS: I dont know if some one has already implemented this idea??]
You might find it useful to look at BProc, specifically its Virtual Memory
Area Dumper. It is used for process migration from a cluster's front
end node to its compute nodes. I don't believe anyone has used it to
checkpoint a process.
http://bproc.sourceforge.net/c268.html#AEN279
Also, are you familiar with http://www.checkpointing.org ?
--
Andrew Shewmaker