Hi,
I'm wondering if there is any way to provide per process bitmasks of
available/illegal syscalls. Obviously this should most likely be
inherited through exec/fork.
For example specyfying that pid N should return -ENOSYS on all syscalls
except read/write/exit.
The reason I'm asking is because I want to run totally untrusted
statically linked binary code (automatically compiled from user
submitted untrusted sources) which only needs read/write access to stdio
which means it only requires syscalls read/write/exit + a few more for
memory alloc/free (like brk) + a few more generated before main is called
(execve and uname I believe).
Currently I'm running the code in a chroot'ed environment (to an empty
dir) under a 'nobody' uid/gid with no open fd's except for std in/out/err
with limits for mem, processor usage, open files, processes (to 1), etc.
Obviously this still allows calling code like 'time', 'getuid', etc and
the like.
Modifying the compiler (or removing the headers) won't help since at worst
I can code it in asm in the source or even in a plain byte table.
I have a working (very much a hack) patch which turns of all but 7 (or
so) of the syscalls (via pseudo-bitmaps).
Basically my question is: has this been done before (if so where/when?),
what would be considered 'the right' way to do this, would this be a
feature to include in the main kernel source?
Thanks,
MaZe.
> if this syscall activity is so low then it might be much more flexible to
> control the binary via ptrace and reject all but the desired syscalls.
> This will cause a context switch but if it's stdio only then it's not a
> big issue. Plus this would work on any existing Linux kernel.
Unfortunately sometimes the data transfer through stdio can be counted in
hundreds of MB (or even in extreme cases a couple of GB), plus it is
important to not slow down the execution of the code (we're timing and
comparing execution speed of different approaches). Would doing this via
ptrace increase the runtime of the parent pid or of the child pid or both?
ie. would this make any syscall costly timewise (stdio is either from a
ram disk or piped to/from a generating/checking process) or would this be
unnoticeable?
Thx,
MaZe.
On Fri, 26 Sep 2003, Maciej Zenczykowski wrote:
> The reason I'm asking is because I want to run totally untrusted
> statically linked binary code (automatically compiled from user
> submitted untrusted sources) which only needs read/write access to stdio
> which means it only requires syscalls read/write/exit + a few more for
> memory alloc/free (like brk) + a few more generated before main is
> called (execve and uname I believe).
if this syscall activity is so low then it might be much more flexible to
control the binary via ptrace and reject all but the desired syscalls.
This will cause a context switch but if it's stdio only then it's not a
big issue. Plus this would work on any existing Linux kernel.
Ingo
On Fri, 26 Sep 2003, Maciej Zenczykowski wrote:
> Unfortunately sometimes the data transfer through stdio can be counted
> in hundreds of MB (or even in extreme cases a couple of GB), plus it is
> important to not slow down the execution of the code (we're timing and
> comparing execution speed of different approaches). Would doing this
> via ptrace increase the runtime of the parent pid or of the child pid or
> both? ie. would this make any syscall costly timewise (stdio is either
> from a ram disk or piped to/from a generating/checking process) or would
> this be unnoticeable?
you can measure this effect by doing "strace -o /dev/null <program>" of
such a program. (strace will have higher overhead than a simple syscall
filtering ptrace app, but it should show you the kind of effects ptrace
causes.)
Ingo
On Fri, 26 Sep 2003, Maciej Zenczykowski wrote:
>> if this syscall activity is so low then it might be much more flexible to
>> control the binary via ptrace and reject all but the desired syscalls.
>> This will cause a context switch but if it's stdio only then it's not a
>> big issue. Plus this would work on any existing Linux kernel.
>
>Unfortunately sometimes the data transfer through stdio can be counted in
>hundreds of MB (or even in extreme cases a couple of GB), plus it is
Would running the process under user-mode linux help any? (I'm not sure)
Ruth
--
Ruth Ivimey-Cook
Software engineer and technical writer.
On Fri, 26 Sep 2003, Maciej Zenczykowski wrote:
> > if this syscall activity is so low then it might be much more flexible to
> > control the binary via ptrace and reject all but the desired syscalls.
> > This will cause a context switch but if it's stdio only then it's not a
> > big issue. Plus this would work on any existing Linux kernel.
>
> Unfortunately sometimes the data transfer through stdio can be counted in
> hundreds of MB (or even in extreme cases a couple of GB), plus it is
> important to not slow down the execution of the code (we're timing and
> comparing execution speed of different approaches). Would doing this via
> ptrace increase the runtime of the parent pid or of the child pid or both?
> ie. would this make any syscall costly timewise (stdio is either from a
> ram disk or piped to/from a generating/checking process) or would this be
> unnoticeable?
I beieve that what you're trying to do is a little bit more complicated
then simply blocking a few system calls. There are security softwares
doing this but they do more then blindly blocking system calls. Parameters
of the system call do matter in this scenario. For example you don't want
to block every write(), since the application you're trying to control
must be able to write on its own installation dir for example. They do
this by running the given application and "learning" system calls and
params to create a per-application policy. Every behaviour that violates
the policy trigger an event to the user running it (with a
"human readable" description of what is happening) and the user can either
accept it (by trainig the policy) or reject it.
- Davide
On Fri, Sep 26, 2003 at 04:05:50PM +0200, Maciej Zenczykowski wrote:
> I'm wondering if there is any way to provide per process bitmasks of
> available/illegal syscalls. Obviously this should most likely be
> inherited through exec/fork.
syscalltrack can do it, per executable / user / syscall parameters /
whatever, but it's per syscall. Writing a perl script or C program to
iterate over the supplied syscall list and write the allow/deny rules
is pretty simple. Also, syscalltrack is meant for debugging, not
security, so if you want something that's 100% tight you'd better go
with one of the Linux security modules based on the LSM framework.
> For example specyfying that pid N should return -ENOSYS on all syscalls
> except read/write/exit.
Yeah, syscalltrack can do that ;-)
> The reason I'm asking is because I want to run totally untrusted
> statically linked binary code (automatically compiled from user
> submitted untrusted sources) which only needs read/write access to stdio
> which means it only requires syscalls read/write/exit + a few more for
> memory alloc/free (like brk) + a few more generated before main is called
> (execve and uname I believe).
Since it's a known binary, if you can handle the increased run time,
strace is your best shot. syscalltrack and other kernel based
solutions are best when you need something that is "system wide".
> Basically my question is: has this been done before (if so where/when?),
> what would be considered 'the right' way to do this, would this be a
> feature to include in the main kernel source?
Previous discussion seemed to conclude that features like these are
"not interesting enough to the majority of users". Maybe it's time to
revise those discussions (c.f. the inclusion of SELinux, for
example).
--
Muli Ben-Yehuda
http://www.mulix.org
* Maciej Zenczykowski <[email protected]> [030926 10:06]:
> Hi,
>
> I'm wondering if there is any way to provide per process bitmasks of
> available/illegal syscalls. Obviously this should most likely be
> inherited through exec/fork.
>
> For example specyfying that pid N should return -ENOSYS on all syscalls
> except read/write/exit.
Look at Systrace. http://www.citi.umich.edu/u/provos/systrace/
- Joe
> >Unfortunately sometimes the data transfer through stdio can be counted in
> >hundreds of MB (or even in extreme cases a couple of GB), plus it is
>
> Would running the process under user-mode linux help any? (I'm not sure)
I think that's trying to kill a fly with a cannon. Especially since
afterwards the process in UML would still need to be somehow protected
from calling UML syscalls - I'm not quite sure how UML works (never really
used it), but I'm assuming it will still allow getuid/gettimeofday etc
syscalls. Correct me if I'm wrong _or_ if i'm misinterpreting your idea.
Besides sometimes these processes are spawned in the dozens (sometimes
they spawn massively with very little CPU intensity, other times very
rarely but with massive CPU use) - would I then need a seperate UML kernel
per spawn? and if not then how would this help?
Thx,
MaZe.
* Maciej Zenczykowski ([email protected]) wrote:
> I'm wondering if there is any way to provide per process bitmasks of
> available/illegal syscalls. Obviously this should most likely be
> inherited through exec/fork.
A simple LSM module can do this for you. It will have a little
more overhead than denying at the syscall entry point, but it's
certainly going to be more flexible.
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net
> I beieve that what you're trying to do is a little bit more complicated
> then simply blocking a few system calls. There are security softwares
> doing this but they do more then blindly blocking system calls. Parameters
> of the system call do matter in this scenario. For example you don't want
> to block every write(), since the application you're trying to control
> must be able to write on its own installation dir for example. They do
Actually in this case all disk-access (and net-access) is illegal, and
we're running in an empty chroot environment anyway. :) We're not really
running aps - they're more along the lines of CPU calculation pipes - data
in -> calc in system memory -> data out.
> this by running the given application and "learning" system calls and
> params to create a per-application policy. Every behaviour that violates
> the policy trigger an event to the user running it (with a
> "human readable" description of what is happening) and the user can either
> accept it (by trainig the policy) or reject it.
I'm afraid this has to run without user-intervention and the policy is
trivial - allow mem-management (brk/mmap...) + exit + read stdin + write
stdout.
Thx,
MaZe.
> syscalltrack can do it, per executable / user / syscall parameters /
> whatever, but it's per syscall. Writing a perl script or C program to
> iterate over the supplied syscall list and write the allow/deny rules
> is pretty simple. Also, syscalltrack is meant for debugging, not
> security, so if you want something that's 100% tight you'd better go
> with one of the Linux security modules based on the LSM framework.
OK, thx, I'll take a look.
> Since it's a known binary, if you can handle the increased run time,
> strace is your best shot. syscalltrack and other kernel based
> solutions are best when you need something that is "system wide".
It's known only in the sense that I have it. The process is accept
submission from outside network (source code). Compile it (in a security
playbox) statically to produce a single binary. Then run this, time it,
verify correctness of outcoming data. Send the results back out to the
outside world. Iterate for each submission - sometimes one every couple
seconds other times one per hour (depends on the current data set etc).
> Previous discussion seemed to conclude that features like these are
> "not interesting enough to the majority of users". Maybe it's time to
> revise those discussions (c.f. the inclusion of SELinux, for
> example).
This is for an information technology algorithmic programming contest -
currently being used on a single comp, but likely to be required (in
time...) by all such online contests (like the one funded by IBM/ACM)
which might mean a few hundred maybe thousand worldwide.
Cheers,
MaZe.
On Fri, 2003-09-26 at 16:16, Maciej Zenczykowski wrote:
> > if this syscall activity is so low then it might be much more flexible to
> > control the binary via ptrace and reject all but the desired syscalls.
> > This will cause a context switch but if it's stdio only then it's not a
> > big issue. Plus this would work on any existing Linux kernel.
>
> Unfortunately sometimes the data transfer through stdio can be counted in
> hundreds of MB (or even in extreme cases a couple of GB), plus it is
> important to not slow down the execution of the code (we're timing and
> comparing execution speed of different approaches). Would doing this via
> ptrace increase the runtime of the parent pid or of the child pid or both?
> ie. would this make any syscall costly timewise (stdio is either from a
> ram disk or piped to/from a generating/checking process) or would this be
> unnoticeable?
Depends how the application writes the data it's not the amount that is
the problem it's the frequency of the calls.
It should however be possible to meassure the overhead and remove that
from the result.
As far as I know it's not possible to abort a syscall with ptrace on
entry but you can change the syscall number to something harmless like
getpid and fix the return values on exit. But it is all very much arch
dependent code.