2009-01-07 05:58:23

by Michael Stone

[permalink] [raw]
Subject: RFC: Network privilege separation.

Dear lkml and netdev,

I'm trying to implement a kernel facility for unprivileged processes to
irrevocably discard their and their future children's ability to perform
unrestricted network I/O. (Restricted network I/O, e.g. on sockets which were
connected before the privilege-reduction or on filesystem-based sockets is
okay.)

I want the kernel to provide a facility like this one because such a facility
will make it much easier for users, authors, and distributors of userland
software to protect themselves and one another from a broad class of malicious
software.

For the sake of discussion, I have written up and documented one possible
implementation of this concept based on the idea of a new rlimit named
RLIMIT_NETWORK in the following patch series.

I eagerly await your questions, comments, suggestions, and improvements.

Thanks very much,

Michael

P.S. - I'm not subscribed to either lkml or netdev, so please CC me on
responses. Thanks!


2009-01-07 05:58:39

by Michael Stone

[permalink] [raw]
Subject: [PATCH] Security: Implement and document RLIMIT_NETWORK.

Daniel Bernstein has observed [1] that security-conscious userland processes
may benefit from the ability to irrevocably remove their ability to create,
bind, connect to, or send messages except in the case of previously connected
sockets or AF_UNIX filesystem sockets. We provide this facility by implementing
support for a new rlimit called RLIMIT_NETWORK.

This facility is particularly attractive to security platforms like OLPC
Bitfrost [2] and to isolation programs like Rainbow [3] and Plash [4].

[1]: http://cr.yp.to/unix/disablenetwork.html
[2]: http://wiki.laptop.org/go/OLPC_Bitfrost
[3]: http://wiki.laptop.org/go/Rainbow
[4]: http://plash.beasts.org/
---
Documentation/rlimit_network.txt | 48 ++++++++++++++++++++++++++++++++
fs/proc/base.c | 1 +
include/asm-generic/resource.h | 4 ++-
net/socket.c | 57 +++++++++++++++++++++++++++++--------
net/unix/af_unix.c | 28 ++++++++++++++++++
5 files changed, 124 insertions(+), 14 deletions(-)
create mode 100644 Documentation/rlimit_network.txt

diff --git a/Documentation/rlimit_network.txt b/Documentation/rlimit_network.txt
new file mode 100644
index 0000000..e7cc3e4
--- /dev/null
+++ b/Documentation/rlimit_network.txt
@@ -0,0 +1,48 @@
+Purpose
+-------
+
+Daniel Bernstein has observed [1] that security-conscious userland processes
+may benefit from the ability to irrevocably remove their ability to create,
+bind, connect to, or send messages except in the case of previously connected
+sockets or AF_UNIX filesystem sockets.
+
+This facility is particularly attractive to security platforms like OLPC
+Bitfrost [2] and to isolation programs like Rainbow [3] and Plash [4] because:
+
+ * it integrates well with standard techniques for writing privilege-separated
+ Unix programs
+
+ * it's available to unprivileged programs
+
+ * it's a discretionary feature available to all of distributors,
+ administrators, authors, and users
+
+ * its effect is entirely local, rather than global (like netfilter)
+
+ * it's simple enough to have some hope of being used correctly
+
+Implementation
+--------------
+
+After considering implementations based on the Linux Security Module (LSM)
+framework, on SELinux in particular, and on direct modification of the kernel
+syscall and task_struct APIs, we came to the conclusion that the best way to
+implement the feature was to extend the resource limits framework with a new
+RLIMIT_NETWORK field and to modify the implementations of the relevant socket
+calls to return -EACCES when
+
+ current->signal->rlim[RLIMIT_NETWORK].rlim_cur == 0
+
+unless we are manipulating an AF_UNIX socket whose name does not begin with \0
+or, in the case of sendmsg(), unless we are manipulating a previously connected
+socket, i.e. one with
+
+ msg.msg_name == NULL && msg.msg_namelen == 0
+
+References
+----------
+
+[1]: http://cr.yp.to/unix/disablenetwork.html
+[2]: http://wiki.laptop.org/go/OLPC_Bitfrost
+[3]: http://wiki.laptop.org/go/Rainbow
+[4]: http://plash.beasts.org/
diff --git a/fs/proc/base.c b/fs/proc/base.c
index d467760..75c230a 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -455,6 +455,7 @@ static const struct limit_names lnames[RLIM_NLIMITS] = {
[RLIMIT_NICE] = {"Max nice priority", NULL},
[RLIMIT_RTPRIO] = {"Max realtime priority", NULL},
[RLIMIT_RTTIME] = {"Max realtime timeout", "us"},
+ [RLIMIT_NETWORK] = {"Network access permitted", "boolean"},
};

/* Display limits for a process */
diff --git a/include/asm-generic/resource.h b/include/asm-generic/resource.h
index 587566f..7930bd5 100644
--- a/include/asm-generic/resource.h
+++ b/include/asm-generic/resource.h
@@ -45,7 +45,8 @@
0-39 for nice level 19 .. -20 */
#define RLIMIT_RTPRIO 14 /* maximum realtime priority */
#define RLIMIT_RTTIME 15 /* timeout for RT tasks in us */
-#define RLIM_NLIMITS 16
+#define RLIMIT_NETWORK 16 /* permit network access */
+#define RLIM_NLIMITS 17

/*
* SuS says limits have to be unsigned.
@@ -87,6 +88,7 @@
[RLIMIT_NICE] = { 0, 0 }, \
[RLIMIT_RTPRIO] = { 0, 0 }, \
[RLIMIT_RTTIME] = { RLIM_INFINITY, RLIM_INFINITY }, \
+ [RLIMIT_NETWORK] = { RLIM_INFINITY, RLIM_INFINITY }, \
}

#endif /* __KERNEL__ */
diff --git a/net/socket.c b/net/socket.c
index 76ba80a..550722f 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -90,6 +90,7 @@

#include <asm/uaccess.h>
#include <asm/unistd.h>
+#include <asm/resource.h>

#include <net/compat.h>
#include <net/wext.h>
@@ -561,6 +562,13 @@ static inline int __sock_sendmsg(struct kiocb *iocb, struct socket *sock,
if (err)
return err;

+ /* See Documentation/rlimit_network.txt */
+ err = -EACCES;
+ if (sock->sk->sk_family != AF_UNIX \
+ && !current->signal->rlim[RLIMIT_NETWORK].rlim_cur \
+ && (msg->msg_name != NULL || msg->msg_namelen != 0))
+ return err;
+
return sock->ops->sendmsg(iocb, sock, msg, size);
}

@@ -1126,6 +1134,12 @@ static int __sock_create(struct net *net, int family, int type, int protocol,
if (err)
return err;

+ /* See Documentation/rlimit_network.txt */
+ err = (family == AF_UNIX \
+ || current->signal->rlim[RLIMIT_NETWORK].rlim_cur) ? 0 : -EACCES;
+ if (err)
+ return err;
+
/*
* Allocate the socket and allow the family to set things up. if
* the protocol is 0, the family is instructed to select an appropriate
@@ -1371,19 +1385,30 @@ asmlinkage long sys_bind(int fd, struct sockaddr __user *umyaddr, int addrlen)
int err, fput_needed;

sock = sockfd_lookup_light(fd, &err, &fput_needed);
- if (sock) {
- err = move_addr_to_kernel(umyaddr, addrlen, (struct sockaddr *)&address);
- if (err >= 0) {
- err = security_socket_bind(sock,
- (struct sockaddr *)&address,
- addrlen);
- if (!err)
- err = sock->ops->bind(sock,
- (struct sockaddr *)
- &address, addrlen);
- }
- fput_light(sock->file, fput_needed);
- }
+ if (!sock)
+ goto out;
+
+ err = move_addr_to_kernel(umyaddr, addrlen, (struct sockaddr *)&address);
+ if (err < 0)
+ goto out_fput;
+
+ err = security_socket_bind(sock,
+ (struct sockaddr *)&address,
+ addrlen);
+ if (err)
+ goto out_fput;
+
+ /* See Documentation/rlimit_network.txt */
+ err = (((struct sockaddr *)&address)->sa_family == AF_UNIX \
+ || current->signal->rlim[RLIMIT_NETWORK].rlim_cur) ? 0 : -EACCES;
+ if (err)
+ goto out_fput;
+
+ err = sock->ops->bind(sock, (struct sockaddr *) &address, addrlen);
+
+out_fput:
+ fput_light(sock->file, fput_needed);
+out:
return err;
}

@@ -1547,6 +1572,12 @@ asmlinkage long sys_connect(int fd, struct sockaddr __user *uservaddr,
if (err)
goto out_put;

+ /* See Documentation/rlimit_network.txt */
+ err = (((struct sockaddr *)&address)->sa_family == AF_UNIX \
+ || current->signal->rlim[RLIMIT_NETWORK].rlim_cur) ? 0 : -EACCES;
+ if (err)
+ goto out_put;
+
err = sock->ops->connect(sock, (struct sockaddr *)&address, addrlen,
sock->file->f_flags);
out_put:
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 66d5ac4..e536d15 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -99,6 +99,7 @@
#include <linux/fs.h>
#include <linux/slab.h>
#include <asm/uaccess.h>
+#include <asm/resource.h>
#include <linux/skbuff.h>
#include <linux/netdevice.h>
#include <net/net_namespace.h>
@@ -789,6 +790,12 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
goto out;
addr_len = err;

+ /* See Documentation/rlimit_network.txt */
+ err = (current->signal->rlim[RLIMIT_NETWORK].rlim_cur \
+ || sunaddr->sun_path[0]) ? 0 : -EACCES;
+ if (err)
+ goto out;
+
mutex_lock(&u->readlock);

err = -EINVAL;
@@ -922,6 +929,12 @@ static int unix_dgram_connect(struct socket *sock, struct sockaddr *addr,
goto out;
alen = err;

+ /* See Documentation/rlimit_network.txt */
+ err = (current->signal->rlim[RLIMIT_NETWORK].rlim_cur \
+ || sunaddr->sun_path[0]) ? 0 : -EACCES;
+ if (err)
+ goto out;
+
if (test_bit(SOCK_PASSCRED, &sock->flags) &&
!unix_sk(sk)->addr && (err = unix_autobind(sock)) != 0)
goto out;
@@ -1021,6 +1034,12 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
goto out;
addr_len = err;

+ /* See Documentation/rlimit_network.txt */
+ err = (current->signal->rlim[RLIMIT_NETWORK].rlim_cur \
+ || sunaddr->sun_path[0]) ? 0 : -EACCES;
+ if (err)
+ goto out;
+
if (test_bit(SOCK_PASSCRED, &sock->flags)
&& !u->addr && (err = unix_autobind(sock)) != 0)
goto out;
@@ -1357,6 +1376,12 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
if (err < 0)
goto out;
namelen = err;
+
+ /* See Documentation/rlimit_network.txt */
+ err = -EACCES;
+ if (!current->signal->rlim[RLIMIT_NETWORK].rlim_cur \
+ && !sunaddr->sun_path[0])
+ goto out;
} else {
sunaddr = NULL;
err = -ENOTCONN;
@@ -1506,6 +1531,9 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
if (msg->msg_namelen) {
err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EOPNOTSUPP;
goto out_err;
+ /* RLIMIT_NETWORK requires no change here since connection-less
+ * unix stream sockets are not supported.
+ * See Documentation/rlimit_network.txt for details. */
} else {
sunaddr = NULL;
err = -ENOTCONN;
--
1.5.6.6

2009-01-07 11:47:38

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH] Security: Implement and document RLIMIT_NETWORK.

Hi Michael.

On Wed, Jan 07, 2009 at 12:48:54AM -0500, Michael Stone ([email protected]) wrote:
> Daniel Bernstein has observed [1] that security-conscious userland processes
> may benefit from the ability to irrevocably remove their ability to create,
> bind, connect to, or send messages except in the case of previously connected
> sockets or AF_UNIX filesystem sockets. We provide this facility by implementing
> support for a new rlimit called RLIMIT_NETWORK.
>
> This facility is particularly attractive to security platforms like OLPC
> Bitfrost [2] and to isolation programs like Rainbow [3] and Plash [4].
>
> [1]: http://cr.yp.to/unix/disablenetwork.html
> [2]: http://wiki.laptop.org/go/OLPC_Bitfrost
> [3]: http://wiki.laptop.org/go/Rainbow
> [4]: http://plash.beasts.org/
> ---
> Documentation/rlimit_network.txt | 48 ++++++++++++++++++++++++++++++++
> fs/proc/base.c | 1 +
> include/asm-generic/resource.h | 4 ++-
> net/socket.c | 57 +++++++++++++++++++++++++++++--------
> net/unix/af_unix.c | 28 ++++++++++++++++++
> 5 files changed, 124 insertions(+), 14 deletions(-)
> create mode 100644 Documentation/rlimit_network.txt
>
> diff --git a/Documentation/rlimit_network.txt b/Documentation/rlimit_network.txt
> new file mode 100644
> index 0000000..e7cc3e4
> --- /dev/null
> +++ b/Documentation/rlimit_network.txt
> @@ -0,0 +1,48 @@
> +Purpose
> +-------
> +
> +Daniel Bernstein has observed [1] that security-conscious userland processes
> +may benefit from the ability to irrevocably remove their ability to create,
> +bind, connect to, or send messages except in the case of previously connected
> +sockets or AF_UNIX filesystem sockets.
> +
> +This facility is particularly attractive to security platforms like OLPC
> +Bitfrost [2] and to isolation programs like Rainbow [3] and Plash [4] because:
> +
> + * it integrates well with standard techniques for writing privilege-separated
> + Unix programs
> +
> + * it's available to unprivileged programs
> +

It isn't, since it can not set rlimit, and if it can, it still can drop
it.

Your code does not cover sendpage() interface (aka splice() and
sendfile()) and with your approach application will suddenly stops
sending data even into old sockets, but will be able to receive it from
anywhere. Is it intentional?

The same goal can be achieved with 'owner' iptables match module btw.

--
Evgeniy Polyakov

2009-01-07 17:25:16

by Rémi Denis-Courmont

[permalink] [raw]
Subject: Re: [PATCH] Security: Implement and document RLIMIT_NETWORK.

Le mercredi 7 janvier 2009 13:47:03 Evgeniy Polyakov, vous avez ?crit?:
> The same goal can be achieved with 'owner' iptables match module btw.

Err no. iptables is _not_ suitable for userland applications dropping their
_own_ privileges. For privileged processes, it's clumsy at best, as iptables
does not quite work if more than one applications uses it. That's typically
your firewall configuration wizard or some custom admin-made script.
As for UNprivileged processes, iptables is not allowed.

As I understand it, Michael is trying to build something similar to SECCOMP,
only way less restrictive and way more usable by real-life userland programs.

--
R?mi Denis-Courmont
http://www.remlab.net/

2009-01-07 17:48:28

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH] Security: Implement and document RLIMIT_NETWORK.

On Wed, Jan 07, 2009 at 06:52:27PM +0200, Rémi Denis-Courmont ([email protected]) wrote:
> Le mercredi 7 janvier 2009 13:47:03 Evgeniy Polyakov, vous avez écrit :
> > The same goal can be achieved with 'owner' iptables match module btw.
>
> Err no. iptables is _not_ suitable for userland applications dropping their
> _own_ privileges. For privileged processes, it's clumsy at best, as iptables
> does not quite work if more than one applications uses it. That's typically
> your firewall configuration wizard or some custom admin-made script.
> As for UNprivileged processes, iptables is not allowed.

If setting that rlimit does not require admin priviledges, then it does
not require to drop this. So it is superuser or admin who does this.
And exactly the same can be achieved with 'owner' iptables module.

If process itself changes own rlimit, then it is not a rlimit, but a
hint to how it is supposed to work.

Plus I did not see how fork is protected, i.e. does children get the
same rlimit, it looks like it does not.

> As I understand it, Michael is trying to build something similar to SECCOMP,
> only way less restrictive and way more usable by real-life userland programs.

Security and unpriveledged setup are mutually impossible cases.

--
Evgeniy Polyakov

2009-01-07 18:35:33

by C. Scott Ananian

[permalink] [raw]
Subject: Re: [PATCH] Security: Implement and document RLIMIT_NETWORK.

On Wed, Jan 7, 2009 at 6:47 AM, Evgeniy Polyakov <[email protected]> wrote:
>> +This facility is particularly attractive to security platforms like OLPC
>> +Bitfrost [2] and to isolation programs like Rainbow [3] and Plash [4] because:
>> + * it integrates well with standard techniques for writing privilege-separated
>> + Unix programs
>> + * it's available to unprivileged programs

> It isn't, since it can not set rlimit, and if it can, it still can drop it.

Privilege dropping is voluntary, in the same way that a setuid root
program can voluntarily drop root permissions after it has finished
using them. (Standard example: apache starts as root to open port 80,
and then drops root by changing uid to www or nobody before it
actually processes requests.)

If I understand correctly, rlimit has both 'hard' and 'soft' limits.
An unpriviledged process can change its soft limit at will, up to the
hard limit, but can only *irrevocably lower* its hard limit. (man 2
setrlimit)

I haven't reviewed the patch to confirm this, but this is how I would
expect RLIMIT_NETWORK functions. A trusted process like inetd (say)
would accept a network connection and create a file handle. It would
then fork, drop the hard and soft RLIMIT_NETWORK to 0, and then exec
the untrusted client program. This would allow the untrusted program
to use the 'trusted' network resource via the open file handle, but
prevent it from (say) leaking sensitive transaction data by making
further connections to some other network resource. (There are better
use cases than inetd, of course.)

According to man 2 setrlimit, "A child process created via fork(2)
inherits its parent's resource limits. Resource limits are preserved
across execve(2).".
--scott

--
( http://cscott.net/ )

2009-01-07 19:02:48

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH] Security: Implement and document RLIMIT_NETWORK.

On Wed, Jan 07, 2009 at 01:35:13PM -0500, C. Scott Ananian ([email protected]) wrote:
> I haven't reviewed the patch to confirm this, but this is how I would
> expect RLIMIT_NETWORK functions. A trusted process like inetd (say)
> would accept a network connection and create a file handle. It would
> then fork, drop the hard and soft RLIMIT_NETWORK to 0, and then exec
> the untrusted client program. This would allow the untrusted program
> to use the 'trusted' network resource via the open file handle, but
> prevent it from (say) leaking sensitive transaction data by making
> further connections to some other network resource. (There are better
> use cases than inetd, of course.)

So effectively it requires higher-prio process to set the limit and then
drop own priviledges. And by default network rlimit is turned off, so
this does not work for usual processes?

The same inetd may setup iptables rule btw. I do not say this is the way
to go, just that it already exists.

> According to man 2 setrlimit, "A child process created via fork(2)
> inherits its parent's resource limits. Resource limits are preserved
> across execve(2).".

Yes, rlimits are copied in copy_signal(), but when parent sets the
rlimit it is not updated in the childs, so was my question, sorry for
confusion.

--
Evgeniy Polyakov

2009-01-07 19:39:44

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH] Security: Implement and document RLIMIT_NETWORK.

On Wed, Jan 07, 2009 at 10:02:30PM +0300, Evgeniy Polyakov ([email protected]) wrote:
> > I haven't reviewed the patch to confirm this, but this is how I would
> > expect RLIMIT_NETWORK functions. A trusted process like inetd (say)
> > would accept a network connection and create a file handle. It would
> > then fork, drop the hard and soft RLIMIT_NETWORK to 0, and then exec
> > the untrusted client program. This would allow the untrusted program
> > to use the 'trusted' network resource via the open file handle, but
> > prevent it from (say) leaking sensitive transaction data by making
> > further connections to some other network resource. (There are better
> > use cases than inetd, of course.)
>
> So effectively it requires higher-prio process to set the limit and then
> drop own priviledges. And by default network rlimit is turned off, so
> this does not work for usual processes?

More on this: patch operates with rlim_cur, which can be set and cleared
without checking the permissions at all, so process is allowed to set it
to 1, which means disabling all sendmsg calls except for unix sockets,
but then the same (or child) process is allowed to set rlim_cur to zero.

Setting rlim_max to be higher than it is requires CAP_SYS_RESOURCE
capabilities.

--
Evgeniy Polyakov

2009-01-07 20:53:56

by Rémi Denis-Courmont

[permalink] [raw]
Subject: Re: [PATCH] Security: Implement and document RLIMIT_NETWORK.

Le mercredi 7 janvier 2009 19:48:09 Evgeniy Polyakov, vous avez écrit :
> If setting that rlimit does not require admin priviledges, then it does
> not require to drop this. So it is superuser or admin who does this.
> And exactly the same can be achieved with 'owner' iptables module.

No no no.

There is a huge fundamental difference between setrlimit, prctl(SECCOMP),
set*uid and chroot on the one side, and iptables on the other side: The first
ones are APIs for a process to control its own permission. iptables is an
interface to control the _whole_ system.

In other words, the first ones are usable programmatically. iptables is not,
unless you're willing to assume the kernel only operates one single
userland "software".

>From the perspective of distros and system admins, perhaps SELinux and
iptables are sufficient to address this. But from that of a third-party,
upstream, distro-independent or whatever-you-want-to-call-it software vendor,
they don't quite work due to their centralized nature.

> > As I understand it, Michael is trying to build something similar to
> > SECCOMP, only way less restrictive and way more usable by real-life
> > userland programs.

> Security and unpriveledged setup are mutually impossible cases.

On a high-level, sure. You need a trusted privileged entity somewhere.

But when it comes _specifically_ to "unprivileged" as in "non-root", I believe
there is a use case for something less restrictive than SECCOMP, yet more
restrictive than just being a normal non-root process. Something along the
lines of: cannot debug other processes, cannot send signal to them, cannot
create file descriptors, cannot bind sockets, yet can allocate memory, can
read timers, can read/write from any type of (already opened) file. Or
whatever brighter and more knowledgeable mind than mine could define.

Or can someone prove that there is no set of permissions bigger than those of
SECCOMP that would effectively equate to those of a normal non-privileged
process?

--
Rémi Denis-Courmont
http://www.remlab.net/

2009-01-07 21:08:21

by Michael Stone

[permalink] [raw]
Subject: Re: [PATCH] Security: Implement and document RLIMIT_NETWORK.

Evgeniy,

First, thanks very much for all your comments and questions.

On Wed, Jan 07, 2009 at 02:47:03PM +0300, Evgeniy Polyakov wrote:

>> + * it's available to unprivileged programs
>> +
>
>It isn't, since it can not set rlimit, and if it can, it still can drop
>it.

Some sample code will probably clarify the use of my patch:

http://dev.laptop.org/git?p=users/mstone/test-rlimit-network;a=blob;f=disable_network.c;hb=HEAD

This C code describes a 'disable_network' exec-chain script which, when run as
any user, irrevocably disables network access as described in my previous
emails.

As you can see, processes start with full access to the 'network' resource and
may, at any time, irrevocably (modulo CAP_SYS_RESOURCE) limit their and their
future children's access to this resource by lowering both their soft and hard
limits to 0.

>Your code does not cover sendpage() interface (aka splice() and
>sendfile())

Nor should it. Applications should continue to be able to send data on any
sockets where were already connected and should be able to accept new
connections on sockets which were already bound.

I have done some primitive testing to ensure that the patch implements this
functionality by means of the test utilities provided here:

http://dev.laptop.org/git?p=users/mstone/test-rlimit-network;a=tree

Can you confirm my results?

> and with your approach application will suddenly stops sending data even into
> old sockets, but will be able to receive it from anywhere. Is it intentional?

Why do you think this would happen?

(My test results, e.g. via
http://dev.laptop.org/git?p=users/mstone/test-rlimit-network;a=blob;f=positive_localhost_tcp;hb=HEAD
show otherwise.)

>The same goal can be achieved with 'owner' iptables match module btw.

As R?mi explained, the iptables 'owner' match module solves a different
problem.

> but when parent sets the rlimit it is not updated in the childs.

This is by design. Limiting my shell's networking privileges in preparation for
running an untrusted command should not limit the privileges of programs that I
have previously started running from the same shell process.

Regards,

Michael

P.S. - R?mi, Scott: Thanks very much for your supportive comments.

2009-01-07 21:10:51

by Andi Kleen

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

Michael Stone <[email protected]> writes:

> For the sake of discussion, I have written up and documented one possible
> implementation of this concept based on the idea of a new rlimit named
> RLIMIT_NETWORK in the following patch series.
>
> I eagerly await your questions, comments, suggestions, and improvements.

At least for outgoing packets you could already do it using the netfilter
owner match and a suitable uid. I suppose that could be also extended
for incoming packets.

-Andi
--
[email protected]

2009-01-07 21:42:54

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH] Security: Implement and document RLIMIT_NETWORK.

On Wed, Jan 07, 2009 at 10:54:13PM +0200, Rémi Denis-Courmont ([email protected]) wrote:
> No no no.
>
> There is a huge fundamental difference between setrlimit, prctl(SECCOMP),
> set*uid and chroot on the one side, and iptables on the other side: The first
> ones are APIs for a process to control its own permission. iptables is an
> interface to control the _whole_ system.
>
> In other words, the first ones are usable programmatically. iptables is not,
> unless you're willing to assume the kernel only operates one single
> userland "software".

iptables 'owner' match module exactly 'operates one signel userland software'.

> From the perspective of distros and system admins, perhaps SELinux and
> iptables are sufficient to address this. But from that of a third-party,
> upstream, distro-independent or whatever-you-want-to-call-it software vendor,
> they don't quite work due to their centralized nature.

Actually selinux is even better example although this does depend on the
distro. System which wants to secure network connections already knows
what is the netfilter. This dependency equals to the recent-enough
kernel with the new rlimit.

To be clear: I do _not_ object against this patch. This is likely a good
idea and while it potentially can be implemented via different way, it
has its right for the existance :)

> > > As I understand it, Michael is trying to build something similar to
> > > SECCOMP, only way less restrictive and way more usable by real-life
> > > userland programs.
>
> > Security and unpriveledged setup are mutually impossible cases.
>
> On a high-level, sure. You need a trusted privileged entity somewhere.
>
> But when it comes _specifically_ to "unprivileged" as in "non-root", I believe
> there is a use case for something less restrictive than SECCOMP, yet more
> restrictive than just being a normal non-root process. Something along the
> lines of: cannot debug other processes, cannot send signal to them, cannot
> create file descriptors, cannot bind sockets, yet can allocate memory, can
> read timers, can read/write from any type of (already opened) file. Or
> whatever brighter and more knowledgeable mind than mine could define.
>
> Or can someone prove that there is no set of permissions bigger than those of
> SECCOMP that would effectively equate to those of a normal non-privileged
> process?

We have a good capabilities subsystem and it has proper layered design.
But still rlimit has to be assigned by something higher in this
hierarchy.

--
Evgeniy Polyakov

2009-01-07 22:00:10

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH] Security: Implement and document RLIMIT_NETWORK.

On Wed, Jan 07, 2009 at 04:07:58PM -0500, Michael Stone ([email protected]) wrote:
> First, thanks very much for all your comments and questions.

you are welcome :)

> >It isn't, since it can not set rlimit, and if it can, it still can drop
> >it.
>
> Some sample code will probably clarify the use of my patch:
>
> http://dev.laptop.org/git?p=users/mstone/test-rlimit-network;a=blob;f=disable_network.c;hb=HEAD
>
> This C code describes a 'disable_network' exec-chain script which, when run
> as
> any user, irrevocably disables network access as described in my previous
> emails.
>
> As you can see, processes start with full access to the 'network' resource
> and
> may, at any time, irrevocably (modulo CAP_SYS_RESOURCE) limit their and
> their
> future children's access to this resource by lowering both their soft and
> hard
> limits to 0.

Argh, I see. That clarifies most questions indeed.

> >Your code does not cover sendpage() interface (aka splice() and
> >sendfile())
>
> Nor should it. Applications should continue to be able to send data on any
> sockets where were already connected and should be able to accept new
> connections on sockets which were already bound.
>
> I have done some primitive testing to ensure that the patch implements this
> functionality by means of the test utilities provided here:
>
> http://dev.laptop.org/git?p=users/mstone/test-rlimit-network;a=tree
>
> Can you confirm my results?

Your patch adds a rlimit check into __sock_sendmsg() call, which is
invoked via usual send() path, but sendfile() and splice() are still
exectuted without this check and thus will be able to send data after
rlimit applied.

> >and with your approach application will suddenly stops sending data even
> >into
> >old sockets, but will be able to receive it from anywhere. Is it
> >intentional?
>
> Why do you think this would happen?
>
> (My test results, e.g. via
> http://dev.laptop.org/git?p=users/mstone/test-rlimit-network;a=blob;f=positive_localhost_tcp;hb=HEAD
> show otherwise.)

I meant that connected or accepted socket will not be able to send data
via send() call, but will be able to receive data using recv().

--
Evgeniy Polyakov

2009-01-08 00:56:35

by Michael Stone

[permalink] [raw]
Subject: Re: [PATCH] Security: Implement and document RLIMIT_NETWORK.

On Thu, Jan 08, 2009 at 12:59:36AM +0300, Evgeniy Polyakov wrote:
>I meant that connected or accepted socket will not be able to send data
>via send() call, but will be able to receive data using recv().

A key fact which may not have stood out, since I didn't comment on it
explicitly in the code, is that the disqualification tests inserted by
the __sock_sendmsg() and unix_dgram_sendmsg hunks contain additional
conditions like

__sock_sendmsg():
+ && (msg->msg_name != NULL || msg->msg_namelen != 0))

unix_dgram_sendmsg():
+ && !sunaddr->sun_path[0])

which return us to the usual codepaths whenever we're dealing with an
already-connected socket. Since my tests pass, can you post an example
of a failing send() call which you think should work?

>Your patch adds a rlimit check into __sock_sendmsg() call, which is
>invoked via usual send() path, but sendfile() and splice() are still
>exectuted without this check and thus will be able to send data after
>rlimit applied.

As far as I can tell, sendfile() and splice(), which operate solely on
fds, cannot be used to send messages via a disconnected socket.
Therefore, I /believe/ that they require no modification. Am I terribly
mistaken about this?

Thanks,

Michael

2009-01-08 01:22:32

by James Morris

[permalink] [raw]
Subject: Re: [PATCH] Security: Implement and document RLIMIT_NETWORK.

[added lsm list to the cc]

> Daniel Bernstein has observed [1] that security-conscious userland
> processes may benefit from the ability to irrevocably remove their
> ability to create, bind, connect to, or send messages except in the
> case of previously connected sockets or AF_UNIX filesystem sockets. We
> provide this facility by implementing support for a new rlimit called
> RLIMIT_NETWORK.
>
> This facility is particularly attractive to security platforms like OLPC
> Bitfrost [2] and to isolation programs like Rainbow [3] and Plash [4].
>
> [1]: http://cr.yp.to/unix/disablenetwork.html
> [2]: http://wiki.laptop.org/go/OLPC_Bitfrost
> [3]: http://wiki.laptop.org/go/Rainbow
> [4]: http://plash.beasts.org/

Have you considered utilizing network namespaces [1] ? A process created
with a private network namespace has no network interfaces configured,
except loopback, which is down. Does this do what you want? The launcher
could optionally allow local IP by bringing up the loopback interface.

[1] http://lxc.sourceforge.net/network.php


- James
--
James Morris
<[email protected]>

2009-01-08 02:31:25

by Michael Stone

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

On Wed, Jan 07, 2009 at 10:10:45PM +0100, Andi Kleen wrote:
>Michael Stone <[email protected]> writes:
>
>> For the sake of discussion, I have written up and documented one possible
>> implementation of this concept based on the idea of a new rlimit named
>> RLIMIT_NETWORK in the following patch series.
>>
>> I eagerly await your questions, comments, suggestions, and improvements.
>
>At least for outgoing packets you could already do it using the netfilter
>owner match and a suitable uid. I suppose that could be also extended
>for incoming packets.

While it's certainly true that you can simulate /some/ of the functionality of
my patch with the (deprecated?) netfilter owner match extension and by
synthesizing new uids as needed, I'm fairly sure that you'll run into some
serious complications involving concurrent manipulations of shared mutable
state which my proposal does not suffer from.

For example:

* in order to user owner-match, you need to specify a uid and you probably
need to back it up with an account in the pwd database in order to keep
random bits of userland happy. What uid should you use?

-- if it's the same as Joe User's uid, then you're probably going to break
random other parts of Joe User's software stack. How is Joe going to
debug this?

+ (unless, of course, you've also got CAP_NET_ADMIN, use the new net
namespaces work, /and/ reconfigure your whole networking stack inside
the new NS.)

-- if it's different from Joe User's regular uid, then where did it come
from and how is Joe going to clean it up when he no longer needs it?

+ again, privilege is required, either in the form of a setuid
executable, CAP_SETUID capability, or an NSS module (or some
combination of these)

* so far as I know, netfilter is only commonly used to filter IP traffic. Can
I really use it to limit connections to abstract unix sockets?

* I think there are some problems with resource acquisition, trust, and
finalization:

-- something has to work out the actual firewall rules which need to be
added.

+ why should you or your sysadmin trust whatever is doing this to pick
the right ones?

-- something (with privilege) needs to install the firewall rules and needs
to remove unneeded rules or you've got a space leak.

+ are there any significant race conditions between whatever is
installing the rules and whatever is removing the dead rules?

Conclusion: so far as I can see, RLIMIT_NETWORK is, in every way, a smaller
expansion of the end user's trusted code base and should therefore be preferred
in comparison netfilter-based solutions for process-level network privilege
separation tasks. Do you see things differently?

Thanks very much,

Michael

2009-01-08 02:56:53

by Andi Kleen

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

On Wed, Jan 07, 2009 at 09:31:11PM -0500, Michael Stone wrote:
> -- if it's different from Joe User's regular uid, then where did it come
> from and how is Joe going to clean it up when he no longer needs it?

You always create joe-nonet one when you create joe

Now writing to joe's files: you can either use ACLs or do everything
through group accesses (it's very common to have a "joe" group for this
purpose for each user)

But perhaps it's a good idea to not allow writing to all of Joe's
files by those "no network" processes too. It at least sounds like
that might be useful to combine.

> * so far as I know, netfilter is only commonly used to filter IP traffic.
> Can
> I really use it to limit connections to abstract unix sockets?

No you can't. But is that really your requirement? Why limiting Unix
sockets and not e.g. named pipes? Unix sockets do not talk to the network.

I suppose I don't understand your requirements very well.

>
> * I think there are some problems with resource acquisition, trust, and
> finalization:
>
> -- something has to work out the actual firewall rules which need to be
> added.
>
> + why should you or your sysadmin trust whatever is doing this to
> pick
> the right ones?

You always define static ones at system boot.

It would probably not scale to a lot of users, but I understand you're
talking about the OLPC which probably only has a limited set of users?

Even on a true multiuser system it could be done in a PAM module.

>
> -- something (with privilege) needs to install the firewall rules and
> needs
> to remove unneeded rules or you've got a space leak.
>
> + are there any significant race conditions between whatever is
> installing the rules and whatever is removing the dead rules?
>
> Conclusion: so far as I can see, RLIMIT_NETWORK is, in every way, a smaller
> expansion of the end user's trusted code base and should therefore be
> preferred
> in comparison netfilter-based solutions for process-level network privilege
> separation tasks. Do you see things differently?

Your arguments don't seem very convincing to me, but
the big problem is more the control of incoming packets. I think
it would be possible to fix OWNER match to support the INPUT chain
though.

-Andi
--
[email protected]

2009-01-08 03:35:14

by Michael Stone

[permalink] [raw]
Subject: Re: [PATCH] Security: Implement and document RLIMIT_NETWORK.

On Thu, Jan 08, 2009 at 12:22:17PM +1100, James Morris wrote:
>Have you considered utilizing network namespaces [1] ? A process created
>with a private network namespace has no network interfaces configured,
>except loopback, which is down. Does this do what you want? The launcher
>could optionally allow local IP by bringing up the loopback interface.

James,

This net-namespaces work sounds quite apropos to some of my other
projects but I'm having trouble figuring out whether it can be used to
solve my current problem. Two questions which immediately occur to me
include:

1) As with the netfilter suggestions provided by Andi and Evgeniy, it
seems that processes require special privileges (e.g. CAP_NET_ADMIN) in
order to drop network privileges by means of entering a new net
namespace. Is this correct? If so, why is it necessary or appropriate?

2) What happens if I call unshare(CLONE_NEWNET) after I've bound some
sockets to an address or connected some sockets to remote endpoints?

Perhaps you can help straighten me out, e.g. by pointing me at the
relevant code?

Thanks very much,

Michael

2009-01-08 04:27:30

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH] Security: Implement and document RLIMIT_NETWORK.

On Wed, Jan 07, 2009 at 07:56:21PM -0500, Michael Stone ([email protected]) wrote:
> On Thu, Jan 08, 2009 at 12:59:36AM +0300, Evgeniy Polyakov wrote:
> >I meant that connected or accepted socket will not be able to send data
> >via send() call, but will be able to receive data using recv().
>
> A key fact which may not have stood out, since I didn't comment on it
> explicitly in the code, is that the disqualification tests inserted by
> the __sock_sendmsg() and unix_dgram_sendmsg hunks contain additional
> conditions like
>
> __sock_sendmsg():
> + && (msg->msg_name != NULL || msg->msg_namelen != 0))
>
> unix_dgram_sendmsg():
> + && !sunaddr->sun_path[0])
>
> which return us to the usual codepaths whenever we're dealing with an
> already-connected socket. Since my tests pass, can you post an example
> of a failing send() call which you think should work?

You are right, I misread the documentation part where it is explained
that already connected sockets are allowed to operate. Btw that code
part breaks codying style with trailing '\' and '&&' on the new line.
There should be something wrong in the patch :)

> >Your patch adds a rlimit check into __sock_sendmsg() call, which is
> >invoked via usual send() path, but sendfile() and splice() are still
> >exectuted without this check and thus will be able to send data after
> >rlimit applied.
>
> As far as I can tell, sendfile() and splice(), which operate solely on
> fds, cannot be used to send messages via a disconnected socket.
> Therefore, I /believe/ that they require no modification. Am I terribly
> mistaken about this?

No you are not, as long as user is allowed to operate with already
connected and/or bound sockets there should be no problems.

--
Evgeniy Polyakov

2009-01-08 04:51:38

by Michael Stone

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

On Thu, Jan 08, 2009 at 04:10:42AM +0100, Andi Kleen wrote:
>On Wed, Jan 07, 2009 at 09:31:11PM -0500, Michael Stone wrote:
>I suppose I don't understand your requirements very well.

In that case, allow me to try to state them more clearly so that you can better
appreciate why your proposed solution is wholly unsatisfactory for my purposes.

In short, I'm trying to provide a general-purpose facility for

* limiting networking per _process_, not per user,

* with an api that requires no privilege to exercise,

* which is suitable for widespread adoption by lots of Unix vendors and
related standards bodies,

* which is atomic from userland's perspective (i.e. so that userland never
sees inconsistent or partial state)

* which requires no userland refcounting or gc to maintain on long-running
hosts

* which is brain-dead simple to use

* and which functions according to the standard Unix discretionary access
control paradigm; namely that

+ when your process has privileges, it can open resources (files,
sockets, fds, ...) for later use and

+ when it drops privileges, it can still use any open resources that
it has descriptors for regardless of how it got them but

+ when it drop privileges, it becomes unable to acquire new resources on
its own

+ though other processes may still be able to send your process tokens
which give it access to resources which it couldn't open on its own.

Does this help clarify the causes of my design choices?

>> * so far as I know, netfilter is only commonly used to filter IP traffic.
>> Can I really use it to limit connections to abstract unix sockets?
>
>No you can't. But is that really your requirement?

It's my first-draft proposal but it's not a hard requirement. I picked it from
among several plausible alternate policies like:

* permit localhost/loopback IP and abstract unix sockets
* permit all unix sockets but no IP
* permit only filesystem-based unix sockets

because it's the functionality that I personally want to be available to people
writing privilege-separated software and because Mark Seaborn (the author of
plash) criticised my previous choice of the second option in his review of one
of my previous attempts to implement a similar facility:

http://lists.laptop.org/pipermail/security/2008-April/000391.html

After considering the matter, I came to agree with his position that permitting
low-privilege processes to connect to arbitrary "local" sockets is "not quite
safe" on the grounds that such sockets may be excellent vectors for user-land
privilege-escalation attacks.

(NB: This time, though, I have been careful to leave some room in my proposed
API for other people to implement other variations on this continuum by means
of RLIMIT_NETWORK values between 0 and RLIM_INFINITY.)

> Why limiting Unix sockets and not e.g. named pipes?

Named pipes, like non-abstract unix sockets, are manageable through the
filesystem, e.g. by DAC and namespace manipulating tools like bind-mounts and
chroots.

> Unix sockets do not talk to the network.

Depends on your definition of "network". For the purposes of this discussion,
mine basically means "every sort of inter-process communication which is not
naturally mediated by UNIX DAC."

Regards,

Michael

P.S. - Thanks very much for your questions; I feel that they're definitely
helping me to clarify my thinking and arguments.

2009-01-08 05:27:52

by Andi Kleen

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

On Wed, Jan 07, 2009 at 11:51:08PM -0500, Michael Stone wrote:
> In short, I'm trying to provide a general-purpose facility for
>
> * limiting networking per _process_, not per user,
>
> * with an api that requires no privilege to exercise,

While it may seem old fashioned the good old suid wrapper
is still working well for such things. That assumes you don't
want to do that in the middle of a process execution,
but that seems like a reasonable restriction.

> * which is suitable for widespread adoption by lots of Unix vendors and
> related standards bodies,

Hmm, I suppose RLIMIT_* does not qualify then.

> + when it drop privileges, it becomes unable to acquire new resources
> on
> its own

That's difficult, for example are you going to disallow fork/exec?
If no then the process might still exploit something suid.

Also typically such sandbox schemes want to restrict more system
calls (basically using a white list), just in case to protect against unknown
kernel holes in the more complex ones.

> + though other processes may still be able to send your process tokens
> which give it access to resources which it couldn't open on its own.
>
> Does this help clarify the causes of my design choices?

It would be probably better if you had a few concrete use cases too
(so not just what, but also why)

Anyways, it all sounds like more like it should be done as a special
case in a more general sandbox framework to me. Linux already
has several ones (e.g. selinux, secure computing, AA out of tree). Perhaps one
of them could be adapted to your needs.

> >Why limiting Unix sockets and not e.g. named pipes?
>
> Named pipes, like non-abstract unix sockets, are manageable through the
> filesystem, e.g. by DAC and namespace manipulating tools like bind-mounts

In Linux, at least traditional BSD semantics ignore file system
access checks on Unix sockets.

> and
> chroots.

There are more IPC mechanisms around too, e.g. sysv message passing
or queued signals with payload. You probably need to consider
all of those.

-Andi

--
[email protected]

2009-01-08 06:50:11

by David Lang

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

On Thu, 8 Jan 2009, Andi Kleen wrote:

> On Wed, Jan 07, 2009 at 09:31:11PM -0500, Michael Stone wrote:
>> -- if it's different from Joe User's regular uid, then where did it come
>> from and how is Joe going to clean it up when he no longer needs it?
>
> You always create joe-nonet one when you create joe
>
> Now writing to joe's files: you can either use ACLs or do everything
> through group accesses (it's very common to have a "joe" group for this
> purpose for each user)
>
> But perhaps it's a good idea to not allow writing to all of Joe's
> files by those "no network" processes too. It at least sounds like
> that might be useful to combine.

there are times when that would be nice, but it's also a bit of a pain to
have to change the permissions so that joe-nonet can access all the files
that joe can access (they will have to be set with the correct group
ownership and hope that there wasn't a reason to use any other group)

David Lang

2009-01-08 07:05:58

by Oliver Hartkopp

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

Andi Kleen wrote:
> On Wed, Jan 07, 2009 at 09:31:11PM -0500, Michael Stone wrote:
>
>> * so far as I know, netfilter is only commonly used to filter IP traffic.
>> Can
>> I really use it to limit connections to abstract unix sockets?
>>
>
> No you can't. But is that really your requirement? Why limiting Unix
> sockets and not e.g. named pipes? Unix sockets do not talk to the network.
>
> I suppose I don't understand your requirements very well.
>

I think it would be very interesting for PF_CAN sockets also.
CAN has no IP at all and the suggested idea of 'self-limiting' a user
process to use only the already open sockets could be a way to address
the use-cases Michael stated in his RFC.

Regards,
Oliver

2009-01-08 10:43:05

by Alan

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

> Conclusion: so far as I can see, RLIMIT_NETWORK is, in every way, a smaller
> expansion of the end user's trusted code base and should therefore be preferred
> in comparison netfilter-based solutions for process-level network privilege
> separation tasks. Do you see things differently?

If you have the same uid then you can just use ptrace to drive another
task with that uid to do the creations for you. Chances are you can also
attack shared executable files (eg that uids .bashrc)

That to me says controlling network access is only useful as part of a
more fine grained and general purpose interface. We already have that
interface in the form of things like SELinux. We already have systems
actively using it to control stuff like which ports are accessed by some
services.

Alan

2009-01-08 12:09:37

by Herbert Xu

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

Michael Stone <[email protected]> wrote:
>
> In short, I'm trying to provide a general-purpose facility for
>
> * limiting networking per _process_, not per user,

You do realise that this is trivial to get around with ptrace,
right? So you'll need to stop ptrace as well. Then you'll have
to think about all the other ways the process can escape this
networking jail because processes belonging to the same user
just aren't designed to be separated from each other.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2009-01-08 12:10:53

by Herbert Xu

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

Alan Cox <[email protected]> wrote:
>
> That to me says controlling network access is only useful as part of a
> more fine grained and general purpose interface. We already have that
> interface in the form of things like SELinux. We already have systems
> actively using it to control stuff like which ports are accessed by some
> services.

Exactly. If people want this they should go the SELinux/LSM route.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2009-01-12 18:44:35

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

On Thu, 08 Jan 2009 10:43:05 GMT, Alan Cox said:

> If you have the same uid then you can just use ptrace to drive another
> task with that uid to do the creations for you. Chances are you can also
> attack shared executable files (eg that uids .bashrc)
>
> That to me says controlling network access is only useful as part of a
> more fine grained and general purpose interface. We already have that
> interface in the form of things like SELinux. We already have systems
> actively using it to control stuff like which ports are accessed by some
> services.

Yes, the network access part *is* something that should be part of a more
general interface. Having said that, we currently are lacking a way for a
*general user* program to say "I'm all set up, and would like to disavow any
other further resource access (except maybe r/o access as "other" to file
systems)".

It's pretty easy for stuff running as root to play setuid()/capability() games
to throw away access rights. It's damned hard for mortal users to do it.



Attachments:
(No filename) (226.00 B)

2009-01-12 19:09:56

by Bryan Donlan

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

On Mon, Jan 12, 2009 at 1:44 PM, <[email protected]> wrote:
> On Thu, 08 Jan 2009 10:43:05 GMT, Alan Cox said:
>
>> If you have the same uid then you can just use ptrace to drive another
>> task with that uid to do the creations for you. Chances are you can also
>> attack shared executable files (eg that uids .bashrc)
>>
>> That to me says controlling network access is only useful as part of a
>> more fine grained and general purpose interface. We already have that
>> interface in the form of things like SELinux. We already have systems
>> actively using it to control stuff like which ports are accessed by some
>> services.
>
> Yes, the network access part *is* something that should be part of a more
> general interface. Having said that, we currently are lacking a way for a
> *general user* program to say "I'm all set up, and would like to disavow any
> other further resource access (except maybe r/o access as "other" to file
> systems)".
>
> It's pretty easy for stuff running as root to play setuid()/capability() games
> to throw away access rights. It's damned hard for mortal users to do it.

Would this be something covered by namespaces? Eg, once you're done
with setup, clone into a new network and UID namespace. Now you have
no network interfaces, so you shouldn't be able to make any new
connections, and you won't be able to access any files except those
with 'other' access rights, right?

2009-01-12 19:29:24

by Andi Kleen

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

> Yes, the network access part *is* something that should be part of a more
> general interface. Having said that, we currently are lacking a way for a
> *general user* program to say "I'm all set up, and would like to disavow any
> other further resource access (except maybe r/o access as "other" to file
> systems)".

seccomp does exactly that. It's quite obscure, but available in most
linux kernels. Basically it blocks everything except
read/write on already open file descriptors.

I always thought it would be nice if codecs (which tend
to be full of security holes) ran in such jails by default

-Andi


--
[email protected] -- Speaking for myself only.

2009-01-12 19:48:36

by Rémi Denis-Courmont

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

Le lundi 12 janvier 2009 21:43:33 Andi Kleen, vous avez ?crit?:
> > Yes, the network access part *is* something that should be part of a more
> > general interface. Having said that, we currently are lacking a way for
> > a *general user* program to say "I'm all set up, and would like to
> > disavow any other further resource access (except maybe r/o access as
> > "other" to file systems)".
>
> seccomp does exactly that. It's quite obscure, but available in most
> linux kernels. Basically it blocks everything except
> read/write on already open file descriptors.
>
> I always thought it would be nice if codecs (which tend
> to be full of security holes) ran in such jails by default

Yeah, and there are not going to do that because there are lots of useful
stuff codecs like to do that represents no security issue but is nevertheless
impossible with SECCOMP (according to the documentation).

Expanding the heap, mapping memory. Getting timestamps. Waiting on futexes,
catching signals, polling file descriptors. Seeking, doing vectorized I/O.
Cloning.


Codecs don't like to read/write raw video through a pipe...

--
R?mi Denis-Courmont
http://www.remlab.net/

2009-01-12 20:00:41

by Andi Kleen

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

> Expanding the heap,

That's a problem agreed Ok you can just always use very
bss arrays sized for the worst case.

> Getting timestamps.

At least on 64bit that's done in ring 3 only with a vsyscall.

> Waiting on futexes,
> catching signals, polling file descriptors. Seeking, doing vectorized I/O.
> Cloning.

That all can be done by the frontend reading/feeding
data into the pipe. But it shouldn't directly access the user data
to be immune against attacks.

> Codecs don't like to read/write raw video through a pipe...

I don't think that's given. It would need some restructuring,
but I think the end result would be likely worth it.

-Andi

--
[email protected] -- Speaking for myself only.

2009-01-12 20:15:44

by Rémi Denis-Courmont

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

Le lundi 12 janvier 2009 22:14:35 Andi Kleen, vous avez ?crit?:
> > Expanding the heap,
>
> That's a problem agreed Ok you can just always use very
> bss arrays sized for the worst case.
>
> > Getting timestamps.
>
> At least on 64bit that's done in ring 3 only with a vsyscall.
>
> > Waiting on futexes,
> > catching signals, polling file descriptors. Seeking, doing vectorized
> > I/O. Cloning.
>
> That all can be done by the frontend reading/feeding
> data into the pipe. But it shouldn't directly access the user data
> to be immune against attacks.

What's the point of writing a parser (that could also have bugs) when the
kernel can do it? One could argue that shared futexes could be dangerous, but
not the rest?

> > Codecs don't like to read/write raw video through a pipe...
>
> I don't think that's given. It would need some restructuring,
> but I think the end result would be likely worth it.

A normal DVD would be over 30 megabytes per seconds once decoded, just for the
video. And remember vmsplice() is not allowed by SECCOMP. Media players have
assembly-coded memory copy optimizations (like the kernel) for some reason.

--
R?mi Denis-Courmont
http://www.remlab.net/

2009-01-12 20:25:25

by Andi Kleen

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

> What's the point of writing a parser (that could also have bugs) when the

Sorry you lost me. What do you mean with parser here?

> kernel can do it?

And what does it have to do with the kernel?

> A normal DVD would be over 30 megabytes per seconds once decoded, just for the
On many modern systems 30MB/s copies is nothing ... Also in this
case they tend to be cache hot, which makes them much cheaper.

Yes it would be somewhat slower, but if it avoids a couple of security
updates that would be probably worth it.

-Andi
--
[email protected] -- Speaking for myself only.

2009-01-12 20:27:40

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

On Mon, Jan 12, 2009 at 10:15:27PM +0200, Rémi Denis-Courmont ([email protected]) wrote:
> > I don't think that's given. It would need some restructuring,
> > but I think the end result would be likely worth it.
>
> A normal DVD would be over 30 megabytes per seconds once decoded, just for the
> video. And remember vmsplice() is not allowed by SECCOMP. Media players have
> assembly-coded memory copy optimizations (like the kernel) for some reason.

Just a note: memory copy is way too faster than 30 mb/s, and very likely
it is not the memory copy, but (de)compression, since likely all modern
codecs are limited by the CPU and not the memory bandwidth.

--
Evgeniy Polyakov

2009-01-12 20:30:44

by Rémi Denis-Courmont

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

Le lundi 12 janvier 2009 22:39:31 Andi Kleen, vous avez ?crit?:
> > What's the point of writing a parser (that could also have bugs) when the
>
> Sorry you lost me. What do you mean with parser here?
>
> > kernel can do it?
>
> And what does it have to do with the kernel?

The parser at the other end of the pipe. The more intricate the over-the-pipe
protocol is, the more likely it is to be buggy and the security scheme to
break.

> > A normal DVD would be over 30 megabytes per seconds once decoded, just
> > for the
>
> On many modern systems 30MB/s copies is nothing ... Also in this
> case they tend to be cache hot, which makes them much cheaper.

> Yes it would be somewhat slower, but if it avoids a couple of security
> updates that would be probably worth it.

If codecs did not care about performance, they'd be written in some high-level
language that could easily be sandboxed by its own VM.

As the guy who's been dealing with VLC security issues for the past two years,
I have to say, I am in no way interested in SECCOMP as it _currently_ is.

--
R?mi Denis-Courmont
http://www.remlab.net/

2009-01-12 20:41:44

by Andi Kleen

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

On Mon, Jan 12, 2009 at 10:30:25PM +0200, R?mi Denis-Courmont wrote:
> Le lundi 12 janvier 2009 22:39:31 Andi Kleen, vous avez ?crit?:
> > > What's the point of writing a parser (that could also have bugs) when the
> >
> > Sorry you lost me. What do you mean with parser here?
> >
> > > kernel can do it?
> >
> > And what does it have to do with the kernel?
>
> The parser at the other end of the pipe. The more intricate the over-the-pipe
> protocol is, the more likely it is to be buggy and the security scheme to
> break.

That would be very little code that would also not
change very often so that it could be probably effectively
audited.

> > Yes it would be somewhat slower, but if it avoids a couple of security
> > updates that would be probably worth it.
>
> If codecs did not care about performance, they'd be written in some high-level
> language that could easily be sandboxed by its own VM.

I don't think using a full JIT is anywhere comparable in
performance impact to adding two cache hot copies to
otherwise fully optimized code.

>
> As the guy who's been dealing with VLC security issues for the past two years,
> I have to say, I am in no way interested in SECCOMP as it _currently_ is.

Fair point, although I'm afraid you didn't do a very good
job explaining your reasons, so it sounds like a
quite arbitary decision.

-Andi

--
[email protected] -- Speaking for myself only.

2009-01-12 20:47:42

by Rémi Denis-Courmont

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

Le lundi 12 janvier 2009 22:55:47 Andi Kleen, vous avez ?crit?:
> Fair point, although I'm afraid you didn't do a very good
> job explaining your reasons, so it sounds like a
> quite arbitary decision.

Fair enough. It's just way too much interface/adaptation work compared to the
benefit. Especially considering that it would be much easier, and almost as
secure, with a "relaxed" SECCOMP. And on top of that, it's causing
unnecessary overhead (we're also interested in those small Linux-based
handsets that aren't as fast and power-hungry as desktop PCs).

--
R?mi Denis-Courmont
http://www.remlab.net/

2009-01-12 21:35:50

by Andi Kleen

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.

On Mon, Jan 12, 2009 at 10:47:21PM +0200, R?mi Denis-Courmont wrote:
> Le lundi 12 janvier 2009 22:55:47 Andi Kleen, vous avez ?crit?:
> > Fair point, although I'm afraid you didn't do a very good
> > job explaining your reasons, so it sounds like a
> > quite arbitary decision.
>
> Fair enough. It's just way too much interface/adaptation work compared to the
> benefit. Especially considering that it would be much easier, and almost as
> secure, with a "relaxed" SECCOMP.

What system calls would you want in a relaxed SECCOMP?

> And on top of that, it's causing
> unnecessary overhead (we're also interested in those small Linux-based

Would be interesting to try that out -- just adding two memcpyies to
the existing code and see how much slower it gets. My guess
would be not very, even e.g. on a Atom system (which are really
not all that slow).

Presumably you could always #ifdef it if it's really a problem
on some specific system. That would be needed anyways for
non linux systems.

-And

--
[email protected] -- Speaking for myself only.

2009-01-13 08:06:50

by Rémi Denis-Courmont

[permalink] [raw]
Subject: Re: RFC: Network privilege separation.


Hello,

On Mon, 12 Jan 2009 22:50:01 +0100, Andi Kleen <[email protected]> wrote:
>> Fair enough. It's just way too much interface/adaptation work compared
>> to the benefit. Especially considering that it would be much easier, and
>> almost as secure, with a "relaxed" SECCOMP.
>
> What system calls would you want in a relaxed SECCOMP?

I already listed them on a high level. At the very very least, brk(),
sbrk(), mmap(), mremap(), getpagesize() and munmap() so that we can re-use
the libc memory allocator. If one wants to limit memory usage, there is
always setrlimit() before enabling SECCOMP.

Also readv() and writev() could not hurt; I suppose neither could
vmsplice().

I don't know what are the security implications of inter-process futex(). I
assume it's possible to freeze, and perhaps busy loop other task that would
use the same futexes. That is OK to me. But it's not clear to me if
something really bad, like arbitrary memory access could be achieved.
Notably, with futex() allowable, we can also grant clone(), gettid() and
then we can run parallelized codecs on SMP.

>> And on top of that, it's causing
>> unnecessary overhead (we're also interested in those small Linux-based
>
> Would be interesting to try that out -- just adding two memcpyies to
> the existing code and see how much slower it gets. My guess
> would be not very, even e.g. on a Atom system (which are really
> not all that slow).
>
> Presumably you could always #ifdef it if it's really a problem
> on some specific system. That would be needed anyways for
> non linux systems.

In practice, I suspect the most work would come from reworking the build
system such that codecs are executable rather than libraries. I suspect a
plain fork() with _no_ exec() would leave to many unused anonymous pages
from the main media player process, not to mention information leakage.

I'd like to point out that, from my personnal experience, we've had more
problem this far with file format parsers than codecs. I don't know why. It
might be because they are easier for pirates to talk to, as they're one
step closer to the data than codecs. Or it is that the VideoLAN project is
writing overall more buggy code than, say, FFMPEG :-( ?

To sandbox parsers, we might need a whole lot more stuff, especially timers
and events handling stuff.

--
Rémi Denis-Courmont