2005-04-29 23:19:39

by Bodo Eggert

[permalink] [raw]
Subject: Re: [PATCH] cifs: handle termination of cifs oplockd kernel thread

Steve French <[email protected]> wrote:

> As we try to gradually obsolete smbfs, this came up with various users
> (there was even a bugzilla bug opened for adding it) who said that they
> need the ability to unmount their own mounts for network filesystems
> without using /etc/fstab. Unfortunately for network filesytsems,
> unlike local filesystems, it is impractical to put every possible mount
> target in /etc/fstab since servers get renamed and the universe of
> possible cifs mount targets for a user is large.
>
> There seemed only three alternatives -
> 1) mimic the smbfs ioctl - as can be seen from smbfs and smbumount
> source this has portability problems because apparently there is no
> guarantee that uid_t is the same size in kernel and in userspace - smbfs
> actually has two ioctls for different sizes of uid field - this seemed
> like a bad idea
> 2) export the uid in /proc/mounts - same problem as above
> 3) call into the kernel to see if current matches the uid of the mounter
> - this has no 16/32/64 bit uid portability issues since the check is
> made in kernel

4) Turn umounting own mounts into a non-privileged operation.

As (AFAI see) the only thing that needs suid privileges is the umount
operation, and it is granted if the user mounted it himself, you can as
well do this simple check in the kernel.
--
Funny quotes:
40. Isn't making a smoking section in a restaurant like making a peeing
section in a swimming pool?


2005-04-30 07:32:55

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] cifs: handle termination of cifs oplockd kernel thread

On Sat, Apr 30, 2005 at 01:18:19AM +0200, Bodo Eggert <[email protected]> wrote:
> 4) Turn umounting own mounts into a non-privileged operation.
>
> As (AFAI see) the only thing that needs suid privileges is the umount
> operation, and it is granted if the user mounted it himself, you can as
> well do this simple check in the kernel.

Except that we don't have the concept of a mount owner at the VFS level
right now, because everyone is adding stupid suid wrapper hacks instead
of trying to fix the problems for real.

2005-04-30 08:16:00

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] cifs: handle termination of cifs oplockd kernel thread

> Except that we don't have the concept of a mount owner at the VFS level
> right now, because everyone is adding stupid suid wrapper hacks instead
> of trying to fix the problems for real.

Having a mount owner is not a problem. Having a good policy for
accepting mounts is rather more so, according to some:

http://marc.theaimsgroup.com/?l=linux-kernel&m=107705608603071&w=2

Just a little taste of what that policy would involve:

- global limit on user mounts
- possibly per user limit on mounts
- acceptable mountpoints (unlimited writablity is probably a good minimum)
- acceptable mount options (nosuid, nodev are obviously not)
- filesystems "safe" to mount by users

I'm not against something like that, but I'd like to hear other
people's opinion before trying to push a solution to mainline.

Thanks,
Miklos

2005-04-30 08:29:57

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] cifs: handle termination of cifs oplockd kernel thread

On Sat, Apr 30, 2005 at 10:14:07AM +0200, Miklos Szeredi wrote:
> > Except that we don't have the concept of a mount owner at the VFS level
> > right now, because everyone is adding stupid suid wrapper hacks instead
> > of trying to fix the problems for real.
>
> Having a mount owner is not a problem. Having a good policy for
> accepting mounts is rather more so, according to some:
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=107705608603071&w=2
>
> Just a little taste of what that policy would involve:
>
> - global limit on user mounts

I don't think we need that one.

> - possibly per user limit on mounts

Makes sense as an ulimit, that way the sysadmin can easily disable the
user mount feature aswell.

> - acceptable mountpoints (unlimited writablity is probably a good minimum)

Yupp.

> - acceptable mount options (nosuid, nodev are obviously not)

noexecis a bit too much, so the above look good.

> - filesystems "safe" to mount by users

what filesystem do you think is unsafe?

- virtual filesystems exporting kernel data are obviously safe as
they enforce permissions no matter who mounted them. (actually we'd
need to check for some odd mount options)

- block-based filesystems should be safe as long as the mounter has
access to the underlying block device

- network/userspace filesystems should be fine aswell

2005-04-30 09:24:04

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] cifs: handle termination of cifs oplockd kernel thread

> >
> > - global limit on user mounts
>
> I don't think we need that one.

We have that for open files '/proc/sys/fs/file-max'. It limits the
total memory usage of the thing. Which otherwise is hard if you have
a system with lots of users.

> > - possibly per user limit on mounts
>
> Makes sense as an ulimit, that way the sysadmin can easily disable the
> user mount feature aswell.
>
> > - acceptable mountpoints (unlimited writablity is probably a good minimum)
>
> Yupp.
>
> > - acceptable mount options (nosuid, nodev are obviously not)
>
> noexecis a bit too much, so the above look good.
>
> > - filesystems "safe" to mount by users
>
> what filesystem do you think is unsafe?
>
> - virtual filesystems exporting kernel data are obviously safe as
> they enforce permissions no matter who mounted them. (actually we'd
> need to check for some odd mount options)

Maybe sysadmin doesn't want to let users see /sys for example. How
would you disable it if users can mount it themselfes? OK, you can
disable user mounts completely, but that's probably not fine grained
enough for some.

> - block-based filesystems should be safe as long as the mounter has
> access to the underlying block device

Not true if user controls the baking device (e.g. loopback). Most
block based filesystems are probably unsafe at the moment, because not
enough consistency checking is done at runtime. Are things like
non-cyclic directory graphs ensured by all filesystems? My guess is
not. See also Linus' comment about the state of the iso9660
filesystem:

http://lwn.net/Articles/128744/

> - network/userspace filesystems should be fine aswell

They should, but again I wonder if NFS with all it's complexity is
being careful enough with what it accepts from the server.

Smbfs might be close to safe, since it's already available for users
to mount from an arbitrary server.

So safeness is a per-filesystem property, determined, how well it
checks input.

Miklos

2005-04-30 10:59:34

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] cifs: handle termination of cifs oplockd kernel thread

> > - network/userspace filesystems should be fine aswell
>
> They should, but again I wonder if NFS with all it's complexity is
> being careful enough with what it accepts from the server.
>
> Smbfs might be close to safe, since it's already available for users
> to mount from an arbitrary server.

I take that back. Any filesystem using page cache and allowing shared
writable mapping is currently unsafe because of OOM deadlock if
mounted from local machine, or simply DoS against client by delaying
writeback.

So other than FUSE, what's left as safe?

Miklos

2005-04-30 13:55:17

by Steve French

[permalink] [raw]
Subject: Re: [PATCH] cifs: handle termination of cifs oplockd kernel thread

On Sat, 2005-04-30 at 03:29, Christoph Hellwig wrote:
> >
> > Having a mount owner is not a problem.
Perhaps some day "mount owner" might be more complex than simply the
uid_t of the owner (I don't know if there will be future cases in which
you might want to check the gid_t at mount time or some SELinux specific
security context), but I would prefer that mnt_uid be stored in the
superblock so I could get rid of those few lines of code in cifs, and
that is a fairly non-controversial start. Coming up with the policy
as Miklos and Christoph were suggesting may be doable in small stages.

> Having a good policy for
> > accepting mounts is rather more so, according to some:
> >
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=107705608603071&w=2
> >
> > Just a little taste of what that policy would involve:
> >
> > - global limit on user mounts
>
> I don't think we need that one.

agreed

>
> > - possibly per user limit on mounts
>
> Makes sense as an ulimit, that way the sysadmin can easily disable the
> user mount feature aswell.
>

agreed.

> > - acceptable mountpoints (unlimited writablity is probably a good minimum)
>
> Yupp.
Yes, although not sure what unlimited means here since the filesystem
you are mounting will often forbid writes (at the server)

>
> > - acceptable mount options (nosuid, nodev are obviously not)
>
> noexecis a bit too much, so the above look good.

There are cases in which adding noexec might make sense as a system
policy for user mounts, but the typical case in which user mounts are
needed are for home directories over the network or equivalent in which
noexec makes it tough for them to be very useful. nosuid and nodev on
the other hand should be restricted and users are used to this already
since they are the two flags that are added by mount.cifs if a non-root
user mounts and the admin has configured mount.cifs to allow user
mounts, so that would be consistent.

2005-04-30 14:31:02

by Steve French

[permalink] [raw]
Subject: Re: [PATCH] cifs: handle termination of cifs oplockd kernel thread

Miklos Szeredi wrote:

>>> - network/userspace filesystems should be fine aswell
>>>
>>>
>>They should, but again I wonder if NFS with all it's complexity is
>>being careful enough with what it accepts from the server.
>>
>>
That is the fun of trying to get our network filesystems up to the 20th
century. There is
at lot more work that has to be done here, but it is gradually
improving. At least for cifs
but probably for NFSv4 (and possibly AFS) it is possible for the client
to validate that the
server is who it says it is, and both NFSv4 (actually the newer NFS RPC)
and CIFS of course
allow packet signing which helps, not sure if AFS allows packet
signing. There does
need to be even more testing in one area though - selective packet
corruption testing
(which can be done by temporarily modifying the server to inject random
invalid packet
information on a subset of fields every thousand packets or so) since
the biggest danger
in network filesystems is the huge variety of servers with different
server bugs that you have
to be able to workaround. Working around server bugs is a huge problem with
the client side of networking code.

>>I take that back. Any filesystem using page cache and allowing shared
>>writable mapping is currently unsafe because of OOM deadlock if
>>mounted from local machine, or simply DoS against client by delaying
>>writeback.
>>
>>So other than FUSE, what's left as safe?
>>
>>Miklos
>>
Don't see how FUSE is that much safer, if you allocate kernel memory at
all you eventually can create DoS,
and you can not do a filesystem without allocating some kernel memory,
but it does not seem that easy to
do intentionally. At least for the CIFS case you can turn off the page
cache for inode data on a per mount
basis (with the forcedirectio mount flag) if you worry about the server
intentionally holding up writes.
Unless the write is past end of file, writes are timed out reasonably
quickly anyway, and end up
killing the session, which depending on the setting of the hard/soft
flag probably should result in a page fault.

There is one remaining issue with mount and umount - the user space
utilities. By the way who maintains
them these days? Although the mount utilities allow filesystem
specific mount and umount helpers
to be placed in /sbin and automatically executed for the matching
filesystem type, there are a few functions
that belong in common code - in a system library which today have to be
implemented in every helper
function and of course are implemented in different ways in different
distros and
different tools with possibility of corruption of the /etc/mtab. It
may be that the file /etc/mtab
does not matter and that it needs to go away and everyone needs to look
at /proc/mounts instead, but
in the meantime /etc/mtab can easily get out of sync with the actual
list of mounts, although that
usuallly does not prevent unmount from working it may be confusing.
The basic problem is that the
"lock the mtab / add a new line to it / unlock" is not available in an
exported function (alternatively if
lock and unlock mtab functions were exported that would help) and
similarly with umount.there
is no "safely remove the matching line from mtab" - Looking at the
mount utils and various mount helper
functions it looks like you can not make any assumptions about the name
of the lock file used to protect
mtab (I am not even sure that you can guarantee that /etc/mtab is a
file, or even a symlink).
the lock file used to lock the mtab

2005-04-30 14:53:24

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] cifs: handle termination of cifs oplockd kernel thread

On Sat, Apr 30, 2005 at 08:28:27AM -0500, Steve French wrote:
> Miklos Szeredi wrote:
>
> >>>- network/userspace filesystems should be fine aswell
> >>>
> >>>
> >>They should, but again I wonder if NFS with all it's complexity is
> >>being careful enough with what it accepts from the server.
> >>
> >>
> That is the fun of trying to get our network filesystems up to the
> 20th century. There is at lot more work that has to be done here, but
> it is gradually improving. At least for cifs but probably for NFSv4
> (and possibly AFS) it is possible for the client to validate that the
> server is who it says it is, and both NFSv4 (actually the newer NFS
> RPC) and CIFS of course allow packet signing which helps, not sure if
> AFS allows packet signing.

None of this helps in the situation Miklos is considering, where the
attacker is a user on the client doing the mount. So presumably the
user gets to choose a server under his/her control, and all the
authentication does is prove to the user that s/he got the right server,
which doesn't protect the kernel at all.

The only defense is auditing the client code's handling of data it
receives from the server.

--b.

2005-04-30 15:53:27

by Steve French

[permalink] [raw]
Subject: Re: [PATCH] cifs: handle termination of cifs oplockd kernel thread

J. Bruce Fields wrote:

>On Sat, Apr 30, 2005 at 08:28:27AM -0500, Steve French wrote:
>
>
>>Miklos Szeredi wrote:
>>
>>
>>
>>>>>- network/userspace filesystems should be fine aswell
>>>>>
>>>>>
>>>>>
>>>>>
>>>>They should, but again I wonder if NFS with all it's complexity is
>>>>being careful enough with what it accepts from the server.
>>>>
>>>>
>>>>
>>it is possible for the client to validate that the
>>server is who it says it is, and both NFSv4 (actually the newer NFS
>>RPC) and CIFS of course allow packet signing which helps, not sure if
>>AFS allows packet signing.
>>
>>
>
>None of this helps in the situation Miklos is considering, where the
>attacker is a user on the client doing the mount. So presumably the
>user gets to choose a server under his/her control, and all the
>authentication does is prove to the user that s/he got the right server,
>which doesn't protect the kernel at all.
>
>The only defense is auditing the client code's handling of data it
>receives from the server
>
>
I agree that periodic auditing of returned data, and testing with
intentionally corrupted server responses
is necessary and should continue.

But you bring up an interesting point about security policy. For the
case of evil user trying to mount
to evil server (e.g. one under evil user's control), in one sense it is
no different than allowing a user to
mount an evil cdrom or usb storage device with evil contens - a device
which may contain specially
crafted data (file and directory contents and metadata) designed to
crash the system, but there is
a difference - for network filesystems the server also can delay the
responses, throw away
the responses or corrupt the frame headers (this can just as often
happen due to buggy network
hardware and routers too). Obviously we need to continue to audit for
both cases - but it would
not hurt to optionally verify that the server can prove its identity and
prove that it has been properly
added to a security domain that you trust - ie allow an NFSv4 mount and
the CIFS mount helper
to be configured/built so that users could only mount to servers that are:
1) In the clients security domain (ie kerberos realm)
2) In a trusted domain (realm)
IIRC this is already done for the case of some services such as winbind,
and I vaguely remember
IBM OS/2 doing this (verifying the server's identity during mount) when
using Kerberized SMB back in
the early 1990s in the Directory and Security server product.

2005-04-30 16:17:39

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] cifs: handle termination of cifs oplockd kernel thread

> Don't see how FUSE is that much safer, if you allocate kernel memory
> at all you eventually can create DoS, and you can not do a
> filesystem without allocating some kernel memory, but it does not
> seem that easy to do intentionally.

Allocating kernel memory is usually not a problem, when it's
associated with some object, whose number is already limited by the
kernel. These are: cache entries (inode, dentry), file pointers
(limited in various ways), or super blocks (should be limited in case
of user mounts).

The big problem is the page cache, because that is not limited. The
user can mmap huge amounts of memory, dirty them, and then when the
machine runs out of memory, and writeback kicks in, it may already be
too late.

This problem can be demonstrated with _any_ network filesystem that
supports shared writable mapping, and is mounted from the local
machine. One exception is CODA, because it uses disk files as file
backing, and so does not have problems with writeback.

FUSE solves the problem by simply not allowing shared writable
mapping. It's a _very_ hard thing to solve otherwise. CIFS, smbfs,
etc, can do the same for unprivileged mounts, or untrusted servers.

> At least for the CIFS case you can turn off the page cache for
> inode data on a per mount basis (with the forcedirectio mount flag)
> if you worry about the server intentionally holding up writes.

That's sounds like a solution to this problem.

> Unless the write is past end of file, writes are timed out
> reasonably quickly anyway, and end up killing the session, which
> depending on the setting of the hard/soft flag probably should
> result in a page fault.

A timeout is also OK, but you should be careful, that the page under
writeback does get freed after the timeout.

Miklos

2005-04-30 16:30:07

by Steve French

[permalink] [raw]
Subject: Re: [PATCH] cifs: handle termination of cifs oplockd kernel thread

On Sat, 2005-04-30 at 11:16, Miklos Szeredi wrote:

> The big problem is the page cache, because that is not limited. The
> user can mmap huge amounts of memory, dirty them, and then when the
> machine runs out of memory, and writeback kicks in, it may already be
> too late.

Yes - we have seen the inode caching get too aggressive in testcases
with paired machines each mounting two clients and also running two
servers - in particular running an NFS and CIFS mounts from the same
client to a server running nfsd and Samba, and then doing the reverse
and launching large file copy testcases going both directions at the
same time. To make it nastier they add exports for Samba and NFS of
more than one local filesystem type. Fortunately most of the issues
there have been fixed, but it is an incredibly hard thing to guarantee
that there is enough memory in those cases because so much is taken up
by the page manager doing inode data caching and multimachine deadlock
could occur if the server is blocked on a client doing writepage which
is blocked on the other server which is blocked on the other client ...

2005-04-30 17:24:35

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] cifs: handle termination of cifs oplockd kernel thread

> But you bring up an interesting point about security policy. For
> the case of evil user trying to mount to evil server (e.g. one under
> evil user's control), in one sense it is no different than allowing
> a user to mount an evil cdrom or usb storage device with evil
> contens - a device which may contain specially crafted data (file
> and directory contents and metadata) designed to crash the system,
> but there is a difference - for network filesystems the server also
> can delay the responses, throw away the responses or corrupt the
> frame headers (this can just as often happen due to buggy network
> hardware and routers too).

There's another difference. Mounting a cdrom or usb stick needs
_physical_ access to the machine in question. If you have physical
access you don't need to craft special filesystems to crash the
machine, just pull out the plug from the wall.

So network/userspace filesystems which allow the user to mount an
arbitrary server should be _extra_ careful to verify data from the
server. Otherwise they can remotely crash the machine or gain
elevated privileges.

Miklos

2005-04-30 23:57:43

by Bodo Eggert

[permalink] [raw]
Subject: Re: [PATCH] cifs: handle termination of cifs oplockd kernel thread

On Sat, 30 Apr 2005, Steve French wrote:

> There is one remaining issue with mount and umount - the user space
> utilities. By the way who maintains
> them these days? Although the mount utilities allow filesystem
> specific mount and umount helpers
> to be placed in /sbin and automatically executed for the matching
> filesystem type, there are a few functions
> that belong in common code - in a system library which today have to be
> implemented in every helper
> function and of course are implemented in different ways in different
> distros and
> different tools with possibility of corruption of the /etc/mtab.

For user mounts, there should be no practical way of maintaining mtab,
especially if the users are using private namespaces (as suggested in
another thread) or if they are supposed to be able to unmount using
a non-suid generic umount.

> It
> may be that the file /etc/mtab
> does not matter and that it needs to go away and everyone needs to look
> at /proc/mounts instead, but
> in the meantime /etc/mtab can easily get out of sync with the actual
> list of mounts, although that
> usuallly does not prevent unmount from working it may be confusing.

The drawback of /proc/mounts is not showing the -oloop information.
Either it's easy to implement showing that extra information, or you'll
need a ~/.etc/mtab

2005-05-11 09:00:32

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] cifs: handle termination of cifs oplockd kernel thread

On Sat, Apr 30, 2005 at 11:22:48AM +0200, Miklos Szeredi wrote:
> > - virtual filesystems exporting kernel data are obviously safe as
> > they enforce permissions no matter who mounted them. (actually we'd
> > need to check for some odd mount options)
>
> Maybe sysadmin doesn't want to let users see /sys for example. How
> would you disable it if users can mount it themselfes? OK, you can
> disable user mounts completely, but that's probably not fine grained
> enough for some.

the sysadmin shouldn't do that. sysfs is needed for proper operation
in a modern system and there's nothing to hid in there.

> > - block-based filesystems should be safe as long as the mounter has
> > access to the underlying block device
>
> Not true if user controls the baking device (e.g. loopback).

I didn't say we should make using the loopback driver a non-privilegued
operation.

> Most
> block based filesystems are probably unsafe at the moment, because not
> enough consistency checking is done at runtime. Are things like
> non-cyclic directory graphs ensured by all filesystems? My guess is
> not. See also Linus' comment about the state of the iso9660
> filesystem:
>
> http://lwn.net/Articles/128744/

It just mean that a) the system admin should be careful what drivers are
loaded and b) we need to audit block based filesystems better.

Note that in many current distributions users can mount iso9660 filesystems
through user mount hacks already. Accessible to everyone and in the global
namespace.