2002-12-28 15:39:23

by Jeff Dike

[permalink] [raw]
Subject: [PATCH] Allow UML kernel to run in a separate host address space

Please pull either
http://uml.bkbits.net/skas-2.5
or http://jdike.stearns.org:5000/skas-2.5

This allows the UML kernel to run in a different address space from its
processes. The benefits include better security and much improved performance.
This is a large patch, but
it's all under arch/um and include/asm-um
a lot of it is code movement

This is described fairly completely in
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&safe=off&selm=fa.ia69pmv.e4qnq1%40ifi.uio.no

Jeff

arch/um/Kconfig | 8
arch/um/Makefile | 77 ++--
arch/um/Makefile-i386 | 4
arch/um/drivers/chan_kern.c | 66 +++
arch/um/drivers/chan_user.c | 33 -
arch/um/drivers/fd.c | 6
arch/um/drivers/line.c | 108 +++++-
arch/um/drivers/mconsole_kern.c | 62 +++
arch/um/drivers/null.c | 5
arch/um/drivers/port_kern.c | 13
arch/um/drivers/port_user.c | 9
arch/um/drivers/pty.c | 19 -
arch/um/drivers/ssl.c | 36 +-
arch/um/drivers/stdio_console.c | 32 +
arch/um/drivers/tty.c | 8
arch/um/drivers/ubd_kern.c | 39 ++
arch/um/drivers/xterm.c | 4
arch/um/include/chan_kern.h | 3
arch/um/include/chan_user.h | 4
arch/um/include/choose-mode.h | 35 ++
arch/um/include/debug.h | 26 -
arch/um/include/frame.h | 18 -
arch/um/include/kern.h | 2
arch/um/include/kern_util.h | 28 -
arch/um/include/line.h | 11
arch/um/include/mconsole_kern.h | 16
arch/um/include/mem.h | 1
arch/um/include/mem_user.h | 12
arch/um/include/mode.h | 30 +
arch/um/include/mode_kern.h | 30 +
arch/um/include/os.h | 8
arch/um/include/sigcontext.h | 2
arch/um/include/syscall_user.h | 13
arch/um/include/sysdep-i386/checksum.h | 217 ++++++++++++
arch/um/include/sysdep-i386/frame_kern.h | 9
arch/um/include/sysdep-i386/ptrace.h | 109 ++++--
arch/um/include/sysdep-i386/sigcontext.h | 33 +
arch/um/include/time_user.h | 2
arch/um/include/um_mmu.h | 40 ++
arch/um/include/um_uaccess.h | 73 ++++
arch/um/include/user_util.h | 26 -
arch/um/kernel/Makefile | 58 +--
arch/um/kernel/checksum.c | 42 ++
arch/um/kernel/exec_kern.c | 64 ---
arch/um/kernel/exec_user.c | 49 --
arch/um/kernel/frame.c | 204 ++++++-----
arch/um/kernel/frame_kern.c | 85 +++-
arch/um/kernel/helper.c | 5
arch/um/kernel/init_task.c | 2
arch/um/kernel/ksyms.c | 17
arch/um/kernel/mem.c | 71 +---
arch/um/kernel/mem_user.c | 44 --
arch/um/kernel/process.c | 136 +++++--
arch/um/kernel/process_kern.c | 477 +--------------------------
arch/um/kernel/ptrace.c | 61 +++
arch/um/kernel/reboot.c | 25 -
arch/um/kernel/sigio_user.c | 13
arch/um/kernel/signal_kern.c | 19 -
arch/um/kernel/signal_user.c | 9
arch/um/kernel/skas/Makefile | 24 +
arch/um/kernel/skas/exec_kern.c | 41 ++
arch/um/kernel/skas/exec_user.c | 61 +++
arch/um/kernel/skas/include/mmu.h | 27 +
arch/um/kernel/skas/include/mode.h | 34 +
arch/um/kernel/skas/include/mode_kern.h | 52 +++
arch/um/kernel/skas/include/proc_mm.h | 55 +++
arch/um/kernel/skas/include/ptrace-skas.h | 57 +++
arch/um/kernel/skas/include/skas.h | 49 ++
arch/um/kernel/skas/include/skas_ptrace.h | 36 ++
arch/um/kernel/skas/include/uaccess.h | 236 +++++++++++++
arch/um/kernel/skas/mem.c | 35 ++
arch/um/kernel/skas/mem_user.c | 95 +++++
arch/um/kernel/skas/mmu.c | 46 ++
arch/um/kernel/skas/process.c | 386 ++++++++++++++++++++++
arch/um/kernel/skas/process_kern.c | 195 +++++++++++
arch/um/kernel/skas/sys-i386/Makefile | 14
arch/um/kernel/skas/sys-i386/sigcontext.c | 114 ++++++
arch/um/kernel/skas/syscall_kern.c | 42 ++
arch/um/kernel/skas/syscall_user.c | 47 ++
arch/um/kernel/skas/time.c | 30 +
arch/um/kernel/skas/tlb.c | 156 +++++++++
arch/um/kernel/skas/trap_user.c | 66 +++
arch/um/kernel/skas/util/Makefile | 10
arch/um/kernel/skas/util/mk_ptregs.c | 50 ++
arch/um/kernel/smp.c | 12
arch/um/kernel/sys_call_table.c | 15
arch/um/kernel/syscall_kern.c | 126 -------
arch/um/kernel/syscall_user.c | 79 ----
arch/um/kernel/sysrq.c | 8
arch/um/kernel/time.c | 22 -
arch/um/kernel/time_kern.c | 3
arch/um/kernel/tlb.c | 225 +------------
arch/um/kernel/trap_kern.c | 361 +++------------------
arch/um/kernel/trap_user.c | 492 +---------------------------
arch/um/kernel/tt/Makefile | 20 +
arch/um/kernel/tt/exec_kern.c | 84 ++++
arch/um/kernel/tt/exec_user.c | 49 ++
arch/um/kernel/tt/gdb.c | 278 ++++++++++++++++
arch/um/kernel/tt/gdb_kern.c | 40 ++
arch/um/kernel/tt/include/debug.h | 29 +
arch/um/kernel/tt/include/mmu.h | 23 +
arch/um/kernel/tt/include/mode.h | 35 ++
arch/um/kernel/tt/include/mode_kern.h | 53 +++
arch/um/kernel/tt/include/ptrace-tt.h | 26 +
arch/um/kernel/tt/include/tt.h | 45 ++
arch/um/kernel/tt/include/uaccess.h | 119 ++++++
arch/um/kernel/tt/ksyms.c | 28 +
arch/um/kernel/tt/mem.c | 77 ++++
arch/um/kernel/tt/process_kern.c | 513 ++++++++++++++++++++++++++++++
arch/um/kernel/tt/ptproxy/Makefile | 13
arch/um/kernel/tt/ptproxy/proxy.c | 370 +++++++++++++++++++++
arch/um/kernel/tt/ptproxy/ptproxy.h | 61 +++
arch/um/kernel/tt/ptproxy/ptrace.c | 239 +++++++++++++
arch/um/kernel/tt/ptproxy/sysdep.c | 71 ++++
arch/um/kernel/tt/ptproxy/sysdep.h | 25 +
arch/um/kernel/tt/ptproxy/wait.c | 86 +++++
arch/um/kernel/tt/ptproxy/wait.h | 15
arch/um/kernel/tt/sys-i386/Makefile | 14
arch/um/kernel/tt/sys-i386/sigcontext.c | 59 +++
arch/um/kernel/tt/syscall_kern.c | 140 ++++++++
arch/um/kernel/tt/syscall_user.c | 90 +++++
arch/um/kernel/tt/time.c | 28 +
arch/um/kernel/tt/tlb.c | 226 +++++++++++++
arch/um/kernel/tt/tracer.c | 466 +++++++++++++++++++++++++++
arch/um/kernel/tt/trap_user.c | 58 +++
arch/um/kernel/tt/uaccess_user.c | 126 +++++++
arch/um/kernel/uaccess_user.c | 126 -------
arch/um/kernel/um_arch.c | 119 +++---
arch/um/kernel/umid.c | 6
arch/um/main.c | 64 ---
arch/um/os-Linux/Makefile | 6
arch/um/os-Linux/drivers/Makefile | 3
arch/um/os-Linux/process.c | 42 ++
arch/um/os-Linux/tty.c | 2
arch/um/ptproxy/Makefile | 10
arch/um/ptproxy/proxy.c | 370 ---------------------
arch/um/ptproxy/ptproxy.h | 61 ---
arch/um/ptproxy/ptrace.c | 238 -------------
arch/um/ptproxy/sysdep.c | 71 ----
arch/um/ptproxy/sysdep.h | 25 -
arch/um/ptproxy/wait.c | 86 -----
arch/um/ptproxy/wait.h | 15
arch/um/sys-i386/Makefile | 36 --
arch/um/sys-i386/checksum.S | 460 ++++++++++++++++++++++++++
arch/um/sys-i386/ksyms.c | 3
arch/um/sys-i386/ldt.c | 69 +++-
arch/um/sys-i386/ptrace.c | 57 ++-
arch/um/sys-i386/ptrace_user.c | 2
arch/um/sys-i386/sigcontext.c | 39 --
arch/um/sys-i386/util/mk_thread_kern.c | 16
arch/um/sys-i386/util/mk_thread_user.c | 32 +
arch/um/uml.lds.S | 1
arch/um/util/Makefile | 11
arch/um/util/mk_constants_kern.c | 24 +
arch/um/util/mk_constants_user.c | 28 +
include/asm-um/a.out.h | 8
include/asm-um/checksum.h | 2
include/asm-um/mmu.h | 18 -
include/asm-um/mmu_context.h | 51 ++
include/asm-um/page.h | 2
include/asm-um/processor-generic.h | 39 +-
include/asm-um/ptrace-generic.h | 2
include/asm-um/uaccess.h | 100 -----
163 files changed, 8121 insertions(+), 3588 deletions(-)

[email protected], 2002-12-17 02:55:00-05:00, [email protected]
Merge jdike.stearns.org:linux/skas-2.5
into uml.karaya.com:/home/jdike/linux/2.5/skas-2.5

[email protected], 2002-12-17 02:34:29-05:00, [email protected]
Merge

[email protected], 2002-12-17 02:21:56-05:00, [email protected]
Removed includes of Rules.mk.

[email protected], 2002-12-17 01:06:44-05:00, [email protected]
Merged the 2.5.52 Makefile changes.

[email protected], 2002-12-06 21:30:55-05:00, [email protected]
Merge jdike.stearns.org:linux/skas-2.5
into uml.karaya.com:/home/jdike/linux/2.5/skas-2.5

[email protected], 2002-12-06 21:25:54-05:00, [email protected]
Added a couple of includes as part of the 2.5.50 update.

[email protected], 2002-12-06 19:04:22-05:00, [email protected]
Merge jdike.wstearns.org:/home/jdike/linux/linus-2.5
into jdike.wstearns.org:/home/jdike/linux/skas-2.5

[email protected], 2002-12-06 18:14:59-05:00, [email protected]
Merge uml.karaya.com:/home/jdike/linux/2.5/linus-2.5
into uml.karaya.com:/home/jdike/linux/2.5/skas-2.5

[email protected], 2002-11-25 22:07:47-05:00, [email protected]
Fixed a stupid compile bug.

[email protected], 2002-11-25 21:03:24-05:00, [email protected]
Small fixes to sync up the 2.4 and 2.5 pools.
Also fixed a stupid signal handling bug.

[email protected], 2002-11-25 13:41:02-05:00, [email protected]
A whole lot of small changes to sync up the 2.4 and 2.5 pools
somewhat. Mostly whitespace changes, plus some code movement.
Also added checksum.S to the repository, which I had somehow
missed before.

[email protected], 2002-11-23 21:37:53-05:00, [email protected]
Merge

[email protected], 2002-11-23 19:25:48-05:00, [email protected]
Updated to 2.5.49, which involved fixing the calls to do_fork.

[email protected], 2002-11-23 16:49:59-05:00, [email protected]
Finished the skas merge by eliminating a syntax error, fixing the
new compilation warnings, and fixing a call to handle_page_fault.

[email protected], 2002-11-22 21:47:15-05:00, [email protected]
Merged the rest of the skas changes.

[email protected], 2002-11-22 21:22:57-05:00, [email protected]
Fixed various build problems with the tlb.c merge.

[email protected], 2002-11-22 20:39:33-05:00, [email protected]
Merged the tlb.c changes from the skas patch.

[email protected], 2002-11-22 14:27:24-05:00, [email protected]
Minor build fixes to the last batch of skas merges.

[email protected], 2002-11-22 12:53:13-05:00, [email protected]
Merged a number of small skas changes.

[email protected], 2002-11-21 23:22:43-05:00, [email protected]
Some small build fixes to the IP checksum merge.

[email protected], 2002-11-21 23:21:41-05:00, [email protected]
Removed the checksum.S symlink from arch/um/sys-i386/Makefile.

[email protected], 2002-11-21 22:30:24-05:00, [email protected]
Merged the IP checksum changes from the skas code.

[email protected], 2002-11-21 22:26:06-05:00, [email protected]
Some minor build and compilation fixes to the copy_sc merge.

[email protected], 2002-11-21 22:00:31-05:00, [email protected]
Applied the sigcontext changes in the skas code.

[email protected], 2002-11-21 21:38:56-05:00, [email protected]
A number of small fixes for the uaccess merge.

[email protected], 2002-11-21 18:54:16-05:00, [email protected]
Added the uaccess changes from the skas merge.

[email protected], 2002-11-21 17:16:25-05:00, [email protected]
Resolved the conflict between the skas and get_config changes in
line.h.

[email protected], 2002-11-21 14:59:43-05:00, [email protected]
Added skas/mem_user.c and tt/gdb.c

[email protected], 2002-11-21 14:48:11-05:00, [email protected]
Added a bunch of C files under arch/um/kernel/skas and
arch/um/kernel/tt.

[email protected], 2002-11-21 14:31:45-05:00, [email protected]
Added a batch of files under arch/um/kernel/skas.

[email protected], 2002-11-21 14:09:26-05:00, [email protected]
Added arch/um/include/mode_kern.h

[email protected], 2002-11-21 14:05:13-05:00, [email protected]
Changed the config to pull in zlib.

[email protected], 2002-11-21 13:23:40-05:00, [email protected]
Added the mode mmu.h and mode.h headers.

[email protected], 2002-11-21 13:15:09-05:00, [email protected]
Added mode.h, mk_constants_kern.c, mk_constants_user.c, and um_mmu.h

[email protected], 2002-11-21 12:58:41-05:00, [email protected]
Added ptrace-skas.h and ptrace-tt.h.

[email protected], 2002-11-21 12:52:36-05:00, [email protected]
Added arch/um/kernel/skas/util/*, which I missed somehow.

[email protected], 2002-11-20 23:04:22-05:00, [email protected]
Merged most of the rest of the skas changes.

[email protected], 2002-11-19 14:54:08-05:00, [email protected]
Declared mode_tt in user_util.h.

[email protected], 2002-11-19 14:53:18-05:00, [email protected]
Merged the skas exec reorg.

[email protected], 2002-11-19 13:47:41-05:00, [email protected]
Fixed a couple of buglets in the signal frame merge.

[email protected], 2002-11-19 00:54:26-05:00, [email protected]
Merged the signal frame cleanups and fixes from 2.4.

[email protected], 2002-11-19 00:47:18-05:00, [email protected]
Fixes to the last merge.

[email protected], 2002-11-19 00:13:26-05:00, [email protected]
Merged the os_kill_process and the driver from_user changes from
the 2.4 pool.
Also merged some other cleanups.

[email protected], 2002-11-18 23:28:32-05:00, [email protected]
Fixed the Makefiles so that the ptproxy move from arch/um/ptproxy
to arch/um/kernel/tt/ptproxy works.

[email protected], 2002-11-18 22:47:18-05:00, [email protected]
Moved the ptproxy code from arch/um/ptproxy to
arch/um/kernel/tt/ptproxy.

[email protected], 2002-11-18 20:03:13-05:00, [email protected]
A few more fixes to get 2.4.48 to boot.

[email protected], 2002-11-18 15:57:40-05:00, [email protected]
Merged the get_config changes from 2.4.

[email protected], 2002-11-18 15:57:00-05:00, [email protected]
Updated to 2.5.48



2002-12-28 19:31:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Allow UML kernel to run in a separate host address space


On Sat, 28 Dec 2002, Jeff Dike wrote:
> This is a large patch, but
> it's all under arch/um and include/asm-um
> a lot of it is code movement

Pulled, but that /proc/mm crap has to go (it wasn't in this patch, or I
would have rejected it).

What are the semantics the host code wants/needs, and how can we implement
a sane generic mechanism that doesn't involve opening magic files?

Having co-processes isn't wrong in itself, I just want the support to be
clean and generic, instead of a huge hack.

Linus

2002-12-28 20:12:31

by Jeff Dike

[permalink] [raw]
Subject: Re: [PATCH] Allow UML kernel to run in a separate host address space

[email protected] said:
> Pulled, but that /proc/mm crap has to go (it wasn't in this patch, or
> I would have rejected it).

Which is exactly why it's not in that patch. I realize that it's a lousy
interface - I'm putting it out there because I don't really have any better
ideas and I'm hoping other people do.

The next iteration of that patch will turn /proc/mm into /dev/mm, but that's
not really a great improvement. It just improves things around the edges a
little.

> What are the semantics the host code wants/needs,

1 - Multiple address spaces per process
2 - Ability to make a child switch between address spaces
3 - Ability to manipulate a child's address space (i.e. mmap, munmap, mprotect
on an address space which is not current->mm)

> and how can we
> implement a sane generic mechanism that doesn't involve opening magic
> files?

Beats me. My first suggestion was to add another file descriptor argument
to mmap et al which would represent the address space to be modified. Alan
didn't like that idea too much.

That still requires getting the descriptor from somewhere. The obvious
alternative to opening a magic file is a system call, new_mm() or something.

BTW, there is some attraction to being able to open /proc/<pid>/mm and getting
a handle on that process' address space. UML doesn't need this, but I bet
there are people who could figure out how to put it to good use.

So, here are the alternatives that I know of. Sane replacements are craved.

Creating a new, empty address space -
open /proc/mm (current UML host patch)
or system call new_mm()

Switch a child from one address to another -
PTRACE_SWITCH_MM (current UML host patch)

Manipulate a child's address space -
write a request struct to a /proc/mm fd (current UML host patch)
or add another fd to the mmap et al calls

Some obvious extensions to this (which UML doesn't need)
switch yourself from one address space to another
open and change another process' existing address space - if we're
going with system calls instead of magic files, then get_mm(pid) would suffice
for the open.

Jeff

2002-12-28 20:48:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Allow UML kernel to run in a separate host address space


On Sat, 28 Dec 2002, Jeff Dike wrote:
>
> > What are the semantics the host code wants/needs,
>
> 1 - Multiple address spaces per process
> 2 - Ability to make a child switch between address spaces
> 3 - Ability to manipulate a child's address space (i.e. mmap, munmap, mprotect
> on an address space which is not current->mm)

Well, #3 falls under "ptrace()" as far as I'm concerned, I don't really
want to expose things through /proc (or /dev, which is even _worse_).

We used to have things that could be done with /proc/<pid>/mem, and it was
a total security disaster. It was removed in the 2.3.x series because of
that.

As to #1, that certainly shouldn't be a problem at all. We already do it
temporarily internally inside the kernel for execve() setup and for things
liek lazy TLB switching for kernel threads, and there's nothing keeping us
from having multiple "struct mm_struct" per process. The only issue is
what the interfaces should be to create one (/dev/mm is right _out_), and
how to switch them around sanely.

Having a

int fd = create_mm();

system call is certainly not wrong per se (but thinking that it should be
done using a special file is wrong - we don't have /dev/pipe either). And
creating that system call is trivial - but only worth it if there are good
sane interfaces to switch mm's around and do interesting things with them.

Done right, it should be possible to have "posix_spawn()" etc done using
something like that, ie

/* Create new VM */
int fd = create_mm();

/* populate the dang thing.. */
mmap_mm(fd, .. );

/* start it up */
clone_with_mm(fd, ...);

and the internal implementation should be perfectly trivial, since the
kernel already largely works this way internally anyway (yeah, it is
likely to need some re-organization of clone() to handle pre-created VM's
etc, but that's nothing really fundamental).

> Beats me. My first suggestion was to add another file descriptor argument
> to mmap et al which would represent the address space to be modified. Alan
> didn't like that idea too much.

I do believe that fd's are a natural way to handle it, since it needs
_some_ kind of handle, and the only generic handles the kernel has is a
file descriptor. We could create a new kind of handle, but it would be
likely to be just more complexity.

HOWEVER, the part I worry about is creating tons of new system calls that
just duplicate existing ones by adding a "fd" argument. That part I really
don't much like. Because if this were to really be a generic feature, it
really wants pretty much _all_ system calls supported, ie things like

fd = open(<mm,ptr>, flags, ...);

retval = read(<mm,ptr>..

to allow the user to not just mmap but generally "take the guise of" any
other mm for the duration of the system call.

Which really means that I _think_ the right approach would be to literally
have a "indirect-system-call-using-this-mm" system call, which does
something like

asmlinkage sys_mm_indirect(int fd, struct syscall_descriptor_block *user_args)
{
struct mm_struct *old_mm;
struct syscall_descriptor_block args;

if (memcpy_from_user(&args, user_args, sizeof(args)))
return -EFAULT;

mm = get_fd_mm(fd);
old_mm = current->mm;
current->mm = mm;
switch_mm(mm);

arch_do_syscall(&args);

current->mm = old_mm;
switch_mm(old_mm);
put_mm(mm);
}

which allows _any_ system call to be made for that mm.

Linus

2002-12-28 23:25:32

by Jeff Dike

[permalink] [raw]
Subject: Re: [PATCH] Allow UML kernel to run in a separate host address space

[email protected] said:
> On Sat, 28 Dec 2002, Jeff Dike wrote:
>
> > 3 - Ability to manipulate a child's address space (i.e. mmap, munmap,
> > mprotect on an address space which is not current->mm)
>
> Well, #3 falls under "ptrace()" as far as I'm concerned,

Not exactly. UML needs to be able to fiddle an address space that has no
process in it (swapout, COWing, maybe a few other things).

UML has two relevant processes, one which runs userspace, and one which runs
the kernel and ptraces the userspace process. The kernel process creates
an address space per UML process, and makes the userspace process switch
between them during a UML context switch. So, when there's swapping going
on, pages need to be unmapped from UML processes, and thus from the
corresponding host address spaces.

> Which really means that I _think_ the right approach would be to
> literally have a "indirect-system-call-using-this-mm" system call,
> which does something like
>
> asmlinkage sys_mm_indirect(int fd, struct syscall_descriptor_block
> *user_args)
> {
> struct mm_struct *old_mm;
> struct syscall_descriptor_block args;
>
> if (memcpy_from_user(&args, user_args, sizeof(args)))
> return -EFAULT;
>
> mm = get_fd_mm(fd);
> old_mm = current->mm;
> current->mm = mm;
> switch_mm(mm);
>
> arch_do_syscall(&args);
>
> current->mm = old_mm;
> switch_mm(old_mm);
> put_mm(mm);
> }

Hmmm, I wasn't planning on going that far, but this certainly works for UML,
as long as there is also PTRACE_SWITCH_MM to make a child jump from one mm
to another.

The calls to switch_mm() are needed when the system call is going to modify
data within the address space, but not if it's going to change mappings,
correct?

If those will cause a noticable performance hit, would it be OK to special-case
syscalls which don't need it?

if(!dont_need_no_stinkin_switch_mm[args.syscall])
switch_mm(mm);

arch_do_syscall(&args);

current->mm = old_mm;
if(!dont_need_no_stinkin_switch_mm[args.syscall])
switch_mm(old_mm);

Sorry about the double negative, but it seems easiest to sparsely populate
an array with system calls that really don't want the switch_mm().

Jeff

2002-12-29 00:49:32

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: [PATCH] Allow UML kernel to run in a separate host address space

On Sat, Dec 28, 2002 at 12:50:53PM -0800, Linus Torvalds wrote:
>
> On Sat, 28 Dec 2002, Jeff Dike wrote:
> >
> > > What are the semantics the host code wants/needs,
> >
> > 1 - Multiple address spaces per process
> > 2 - Ability to make a child switch between address spaces
> > 3 - Ability to manipulate a child's address space (i.e. mmap, munmap, mprotect
> > on an address space which is not current->mm)
>
> Well, #3 falls under "ptrace()" as far as I'm concerned, I don't really
> want to expose things through /proc (or /dev, which is even _worse_).
>
> We used to have things that could be done with /proc/<pid>/mem, and it was
> a total security disaster. It was removed in the 2.3.x series because of
> that.

FWIW, GDB also would like to have #3. We can do without it; GDB
already supports calling functions in the inferior by a stack or code
trampoline, so we could just make the child call mprotect; but it would
be faster and simpler to have a ptrace op for it. HP/UX had, among
other things, TT_PROC_SET_MPROTECT and TT_PROC_GET_MPROTECT; I don't
think we have a system call equivalent to GET_MPROTECT right now.

Of course, without more comprehensive kernel support doing
protection-based watchpoints this way is murder for perfomance, almost
as bad as just doing it by single-stepping. You need to disable them
at every syscall entry, which means that you can't have multiple
threads running in userspace while one thread is in a syscall, or you
might miss a watchpoint event.

It would be ideal to have some way to set the permissions such that
accesses from inside the kernel succeeded and from userspace failed
(i.e. render it temporarily a kernel page; but not exactly; we'd want
things like "normally writeable; currently writeable by the kernel;
still currently readable by userspace" for a normal watchpoint).
I don't know if that's practical without impacting MM performance.

Suggestions welcome; I haven't really started to work on this yet
although it's creeping up my list of important debugger projects.
PowerPC MMUs have a mechanism that could be used for this but I don't
know if other architectures do.

--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer

2002-12-29 03:55:23

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH] Allow UML kernel to run in a separate host address space

On Sat, 2002-12-28 at 12:24, Jeff Dike wrote:
> 1 - Multiple address spaces per process
> 2 - Ability to make a child switch between address spaces
> 3 - Ability to manipulate a child's address space (i.e. mmap, munmap, mprotect
> on an address space which is not current->mm)

I suspect Valgrind could use this too at some point. There hasn't been
much discussion about it yet, but I think Valgrind may well move towards
a more complete virtualization in a later round of development, and
isolating the virtual virtual address space from the Valgrind's real
virtual address space would be very useful. (Jeff suggested the idea of
merging Valgrind and UML at some level, which does raise some
interesting possibilities.)

J

2002-12-29 04:11:02

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Allow UML kernel to run in a separate host address space


On Sat, 28 Dec 2002, Jeff Dike wrote:

> [email protected] said:
> > On Sat, 28 Dec 2002, Jeff Dike wrote:
> >
> > > 3 - Ability to manipulate a child's address space (i.e. mmap, munmap,
> > > mprotect on an address space which is not current->mm)
> >
> > Well, #3 falls under "ptrace()" as far as I'm concerned,
>
> Not exactly. UML needs to be able to fiddle an address space that has no
> process in it (swapout, COWing, maybe a few other things).

But that is an address space that it should already has access to through,
since it created it in the first place (ie it would fall under the normal
"sys_mm_indirect()" case).

The thing that I _really_ don't want to have is soem uncontrolled way to
generate accesses to existing "struct mm_struct"s, since that is really
dangerous from a security standpoint.

We could have a PTRACE_GET_MM_FD kind of thing for ptrace (and then the
gdb/tracer can use that to create mappings in the process), but the reason
I want that "hook" to be through ptrace itself is simply that it's a known
interface to control other unrelated processes.

So if you create the MM's yourself, you can use the indirection directly.
But if you want to control your children or unrelated processes, you use
ptrace to get the hook.

Linus

2002-12-29 05:00:36

by Jeff Dike

[permalink] [raw]
Subject: Re: [PATCH] Allow UML kernel to run in a separate host address space

[email protected] said:
> I suspect Valgrind could use this too at some point. There hasn't
> been much discussion about it yet, but I think Valgrind may well move
> towards a more complete virtualization in a later round of
> development, and isolating the virtual virtual address space from the
> Valgrind's real virtual address space would be very useful. (Jeff
> suggested the idea of merging Valgrind and UML at some level, which
> does raise some interesting possibilities.)

Yes, valgrind already has a pseudo-scheduler, a psuedo-threads library, it
delivers signals by hand, and it wants to run its client in a separate
thread so it can get out of the business of being an LD_PRELOAD shared
library.

This is all stuff that UML has, that UML does right (/me crosses fingers),
and that is usable by Valgrind (and anything else that's interested) with
some repackaging of UML as a library.

Replacing Valgrind's signal delivery with UML's is a no-brainer. Replacing
its scheduler and threads library would involve it creating UML processes
by calling UML's do_fork(). Valgrind would need to provide the low-level
switch_to, I think. There are probably other things that Valgrind would
need to provide, but I see no reason this wouldn't work.

Jeff

2002-12-29 05:00:29

by Jeff Dike

[permalink] [raw]
Subject: Re: [PATCH] Allow UML kernel to run in a separate host address space

[email protected] said:
> But that is an address space that it should already has access to
> through, since it created it in the first place (ie it would fall
> under the normal "sys_mm_indirect()" case).

Yes, and so it doesn't fall under ptrace. I think we're in violent agreement
here.

> The thing that I _really_ don't want to have is soem uncontrolled way
> to generate accesses to existing "struct mm_struct"s, since that is
> really dangerous from a security standpoint.

Fine by me. UML has no need for manipulating pre-existing address spaces.

> We could have a PTRACE_GET_MM_FD kind of thing for ptrace (and then
> the gdb/tracer can use that to create mappings in the process), but
> the reason I want that "hook" to be through ptrace itself is simply
> that it's a known interface to control other unrelated processes.
>
> So if you create the MM's yourself, you can use the indirection
> directly. But if you want to control your children or unrelated
> processes, you use ptrace to get the hook.

Yup. As far as UML is concerned, this is all fine. It has no need of
a PTRACE_GET_MM_FD since it creates all address spaces itself, but other
tools might.

Jeff