On 1/29/2019 1:44 PM, Eric W. Biederman wrote:
> All,
>
> With the existing mount API it is possible to mount a filesystem
> like:
>
> mount -t ext4 /dev/sda1 -o user_xattr /some/path
> mount -t ext4 /dev/sda1 -o nouser_xattr /some/other/path
>
> And have both mount commands succeed and have two mounts of the same
> filesystem. If the mounter is not attentive or the first mount is added
> earlier it may not be immediately noticed that a mount option that is
> needed for the correct operation or the security of the system is lost.
>
> We have seen this failure mode with both devpts and proc. So it is not
> theoretical, and it has resulted in CVEs.
>
> In some cases the existing mount API (such as a conflict between ro and
> rw) handles this by returning -EBUSY. So we may be able to correct this
> in the existing mount API. But it is always very tricky to to get
> adequate testing for a change like that to avoid regressions, so I am
> proposing we change this in the new mount api.
>
> This has been brought up before and I have been told it is technically
> infeasible to make this work. To counter that I have sat down and
> implemented it.
>
> The basic idea is:
> - get a handle to a filesystem
> (passing enough options to uniquely identify the super block).
> Also capture enough state in the file handle to let you know if
> the file system has it's mount options changed between system calls.
> (essentially this is just the fs code that calls sget)
>
> - If the super block has not been configured allow setting the file
> systems options.
>
> - If the super block has already been configured require reading the
> file systems mount options before setting/updating the file systems
> mount options.
>
> To complement that I have functionality that:
> - Allows reading a file systems current mount options.
> - Allows reading the mount options that are needed to get a handle to
> the filesystem. For most filesystems it is just the block device
> name. For nfs is is essentially all mount options. For btrfs
> it is the block device name, and the "devices=" mount option for
> secondary block device names.
Are you taking the LSM specific mount options into account?
>
> Please find below a tree where all of this is implemented and working.
> Not all file systems have been converted but the most of the unconverted
> ones are just a couple minutes work as I have converted all of the file
> system mount helper routines.
>
> Also please find below an example mount program that takes the same set
> of mount options as mount(8) today and mounts filesystems with the
> proposed new mount api.
> - Without having any filesystem mount knowledge it sucessfully figures
>
> out which system calls different mount options needs to be applied
> to.
>
> - Without having any filesystem specific knowledge in most cases it
> can detect if a mount option you specify is already specified to an
> existing mount or not. For duplicates it can detect it ignores them.
> For the other cases it fails the mount as it thinks the mount options
> are different.
>
> - Which demonstrates it safe to put the detection and remediation of
> multiple mount commands resolving to the same filesystem in user
> space.
>
> I really don't care whose code gets used as long as it works. I do very
> much care that we don't add a new mount api that has the confusion flaws
> of the existing mount api.
>
> Along the way I have also detected a lot of room for improvement on the
> mount path for filesystems. Those cleanup patches are in my tree below
> and will be extracting them and sending them along shortly.
>
> Comments?
>
> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git no-losing-mount-options-proof-of-concept
>
>
>
> Eric
>
Casey Schaufler <[email protected]> writes:
> On 1/29/2019 1:44 PM, Eric W. Biederman wrote:
>> All,
>>
>> With the existing mount API it is possible to mount a filesystem
>> like:
>>
>> mount -t ext4 /dev/sda1 -o user_xattr /some/path
>> mount -t ext4 /dev/sda1 -o nouser_xattr /some/other/path
>>
>> And have both mount commands succeed and have two mounts of the same
>> filesystem. If the mounter is not attentive or the first mount is added
>> earlier it may not be immediately noticed that a mount option that is
>> needed for the correct operation or the security of the system is lost.
>>
>> We have seen this failure mode with both devpts and proc. So it is not
>> theoretical, and it has resulted in CVEs.
>>
>> In some cases the existing mount API (such as a conflict between ro and
>> rw) handles this by returning -EBUSY. So we may be able to correct this
>> in the existing mount API. But it is always very tricky to to get
>> adequate testing for a change like that to avoid regressions, so I am
>> proposing we change this in the new mount api.
>>
>> This has been brought up before and I have been told it is technically
>> infeasible to make this work. To counter that I have sat down and
>> implemented it.
>>
>> The basic idea is:
>> - get a handle to a filesystem
>> (passing enough options to uniquely identify the super block).
>> Also capture enough state in the file handle to let you know if
>> the file system has it's mount options changed between system calls.
>> (essentially this is just the fs code that calls sget)
>>
>> - If the super block has not been configured allow setting the file
>> systems options.
>>
>> - If the super block has already been configured require reading the
>> file systems mount options before setting/updating the file systems
>> mount options.
>>
>> To complement that I have functionality that:
>> - Allows reading a file systems current mount options.
>> - Allows reading the mount options that are needed to get a handle to
>> the filesystem. For most filesystems it is just the block device
>> name. For nfs is is essentially all mount options. For btrfs
>> it is the block device name, and the "devices=" mount option for
>> secondary block device names.
>
> Are you taking the LSM specific mount options into account?
In the design yes, and I allow setting them. It appears in the code
to retrieve the mount options I forgot to call security_sb_show_options.
For finding the super block that you are going to mount the LSM mount
options are not relevant. Even nfs will not want to set those early as
they do not help determine the nfs super block. So the only place where
there is anything interesting in my api is in reading back the security
options so they can be compared to the options the mounter is setting.
I will add the missing call to security_sb_show_options which is enough
to fix selinux. Unfortunately smack does not currently implement
.sb_show_options. Not implementing smack_sb_show_options means
/proc/mounts fails to match /etc/mtab which is a bug and it is likely
a real workd bug for the people who use smack and don't want to depend
on /etc/mtab, or are transitioning away from it.
Casey do you want to implement smack_sb_show_options or should I put it
on my todo list?
Eric
[email protected] (Eric W. Biederman) writes:
> Casey Schaufler <[email protected]> writes:
>> Are you taking the LSM specific mount options into account?
>
> In the design yes, and I allow setting them. It appears in the code
> to retrieve the mount options I forgot to call security_sb_show_options.
>
> For finding the super block that you are going to mount the LSM mount
> options are not relevant. Even nfs will not want to set those early as
> they do not help determine the nfs super block. So the only place where
> there is anything interesting in my api is in reading back the security
> options so they can be compared to the options the mounter is setting.
>
> I will add the missing call to security_sb_show_options which is enough
> to fix selinux. Unfortunately smack does not currently implement
> .sb_show_options. Not implementing smack_sb_show_options means
> /proc/mounts fails to match /etc/mtab which is a bug and it is likely
> a real workd bug for the people who use smack and don't want to depend
> on /etc/mtab, or are transitioning away from it.
>
> Casey do you want to implement smack_sb_show_options or should I put it
> on my todo list?
Oh. I should add that I am always parsing the LSM mount options out so
that there is not a chance of the individual filesystems implementing
comflicting options even when there are no LSMs active. Without that I
am afraid we run the risk of having LSM mount otions in conflict with
ordinary filesystems options at some point and by the time we discover
it it would start introducing filesystem regressions.
That does help with stack though as there is no fundamental reason only
one LSM could process mount options.
Eric
On Tue, Jan 29, 2019 at 03:44:22PM -0600, Eric W. Biederman wrote:
> so I am proposing we change this in the new mount api.
Well, this forces me to ask what the new API is? :-)
It seems that David uses fsconfig() and fsinfo() to set and get
mount options, and your patch introduces fsset() and fsoptions().
IMHO differentiate between FS driver and FS instance is a good idea it
makes things more extendable. The sequence number in the instance is a
good example.
But for me David's fsinfo() seems better that fsoptions() and
fsspecifier(). I'm not sure about "all mount options as one string"
From your example is pretty obvious how much energy is necessary to
split and join the strings.
It seems more elegant is to ask for Nth option as expected by fsinfo().
It also seems that fsinfo() is able to replace fsname() and fstype().
It would be better to extend David's fsinfo() to work with FS instance
and to return specifiers. And use fsconfig() rather than fsset().
Karel
--
Karel Zak <[email protected]>
http://karelzak.blogspot.com
[email protected] (Eric W. Biederman) writes:
> [email protected] (Eric W. Biederman) writes:
>
>> Casey Schaufler <[email protected]> writes:
>>> Are you taking the LSM specific mount options into account?
>>
>> In the design yes, and I allow setting them. It appears in the code
>> to retrieve the mount options I forgot to call security_sb_show_options.
>>
>> For finding the super block that you are going to mount the LSM mount
>> options are not relevant. Even nfs will not want to set those early as
>> they do not help determine the nfs super block. So the only place where
>> there is anything interesting in my api is in reading back the security
>> options so they can be compared to the options the mounter is setting.
>>
>> I will add the missing call to security_sb_show_options which is enough
>> to fix selinux. Unfortunately smack does not currently implement
>> .sb_show_options. Not implementing smack_sb_show_options means
>> /proc/mounts fails to match /etc/mtab which is a bug and it is likely
>> a real workd bug for the people who use smack and don't want to depend
>> on /etc/mtab, or are transitioning away from it.
>>
>> Casey do you want to implement smack_sb_show_options or should I put it
>> on my todo list?
>
> Oh. I should add that I am always parsing the LSM mount options out so
> that there is not a chance of the individual filesystems implementing
> comflicting options even when there are no LSMs active. Without that I
> am afraid we run the risk of having LSM mount otions in conflict with
> ordinary filesystems options at some point and by the time we discover
> it it would start introducing filesystem regressions.
>
> That does help with stack though as there is no fundamental reason only
> one LSM could process mount options.
Sigh. I just realized that there is a smack variant of the bug I am
working to fix.
smack on remount does not fail if you change the smack mount options.
It just silently ignores the smack mount options. Which is exactly the
same poor interaction with userspace that has surprised user space
and caused CVEs.
How much do you think the smack users will care if you start verifying
that if smack options are present in remount that they are unchanged
from mount?
I suspect the smack userbase is small enough, and the corner case is
crazy enough we can fix this poor communication by smack. Otherwise it
looks like there needs to be a new security hook so old and new remounts
can be distinguished by the LSMs, and smack can be fixed in the new
version.
Eric
You need to rebase on linus/master. A bunch of your patches are obsoleted by
Al's security changes there.
David
Karel Zak <[email protected]> wrote:
> It seems more elegant is to ask for Nth option as expected by fsinfo().
More elegant yes, but there's an issue with atomiticity[*]. I'm in the
process of switching to something that returns you a single buffer with all
the options in, but each key and each value is preceded by a length count.
The reasons for not using separator characters are:
(1) There's no separator char that cannot validly occur within an option[**].
(2) Makes it possible to return binary values if we need to.
David
[*] Atomic with respect to remount calls, that is.
[**] Oh, and look at cifs where you can *change* the separator char during
option parsing ("sep=<char>").
David Howells <[email protected]> writes:
> You need to rebase on linus/master. A bunch of your patches are obsoleted by
> Al's security changes there.
Before anything is merged definitely.
Al dealt with mount options from the LSMs in a slightly different way
than I did. At a practical level Als version of the changes to the LSMs
and mine are you say po-tae-toe I say po-tah-toe differences so I don't
see that influencing semantics up at the api level.
For purposes of disucssing an API (not of merging one) I chose to start
with code I had a reasonable amount of testing against, so that other
people could play with it without expecting trouble.
Eric
David Howells <[email protected]> writes:
> Karel Zak <[email protected]> wrote:
>
>> It seems more elegant is to ask for Nth option as expected by fsinfo().
>
> More elegant yes, but there's an issue with atomiticity[*]. I'm in the
> process of switching to something that returns you a single buffer with all
> the options in, but each key and each value is preceded by a length count.
>
> The reasons for not using separator characters are:
>
> (1) There's no separator char that cannot validly occur within an option[**].
*Blink* I had missed the cifs issue. So yes we certainly need
a better way to encode things in the buffer. I just used a single
string as an easy way to place everything in a buffer.
> (2) Makes it possible to return binary values if we need to.
I don't totally disagree with this. But I will point out that
except for coda passing a file descriptor there are no filesystems
that currently take or need binary options.
I suspect that as long as userspace supports /etc/fstab and we in turn
support /proc/mounts there is going to be a lot of pressure to keep
the majority of options so they can be encoded in a string separated by
commas.
> David
>
> [*] Atomic with respect to remount calls, that is.
There are also mount options that depend on each other and whose order
matters with respect to other mount options extN's ("sb=<NNNN>") for
example.
> [**] Oh, and look at cifs where you can *change* the separator char during
> option parsing ("sep=<char>").
Karel Zak <[email protected]> writes:
> On Tue, Jan 29, 2019 at 03:44:22PM -0600, Eric W. Biederman wrote:
>> so I am proposing we change this in the new mount api.
>
> Well, this forces me to ask what the new API is? :-)
>
> It seems that David uses fsconfig() and fsinfo() to set and get
> mount options, and your patch introduces fsset() and fsoptions().
>
> IMHO differentiate between FS driver and FS instance is a good idea it
> makes things more extendable. The sequence number in the instance is a
> good example.
>
> But for me David's fsinfo() seems better that fsoptions() and
> fsspecifier(). I'm not sure about "all mount options as one string"
> From your example is pretty obvious how much energy is necessary to
> split and join the strings.
>
> It seems more elegant is to ask for Nth option as expected by fsinfo().
> It also seems that fsinfo() is able to replace fsname() and fstype().
>
> It would be better to extend David's fsinfo() to work with FS instance
> and to return specifiers. And use fsconfig() rather than fsset().
As David has pointed out with cifs having a sep= option we need a better
story of parsing the options in the kernel.
What my branch does is demonstrate there is at least one way we can
avoid mount options being silently different from what userspace
expects.
Which means my branch is fine for looking at semantics and possible
system calls, but not much else.
I actually used multiple system calls just so I could avoid dealing
with multi-plexor systems calls.
Eric
On 1/30/2019 4:47 AM, Eric W. Biederman wrote:
> [email protected] (Eric W. Biederman) writes:
>
>> [email protected] (Eric W. Biederman) writes:
>>
>>> Casey Schaufler <[email protected]> writes:
>>>> Are you taking the LSM specific mount options into account?
>>> In the design yes, and I allow setting them. It appears in the code
>>> to retrieve the mount options I forgot to call security_sb_show_options.
>>>
>>> For finding the super block that you are going to mount the LSM mount
>>> options are not relevant. Even nfs will not want to set those early as
>>> they do not help determine the nfs super block. So the only place where
>>> there is anything interesting in my api is in reading back the security
>>> options so they can be compared to the options the mounter is setting.
>>>
>>> I will add the missing call to security_sb_show_options which is enough
>>> to fix selinux. Unfortunately smack does not currently implement
>>> .sb_show_options. Not implementing smack_sb_show_options means
>>> /proc/mounts fails to match /etc/mtab which is a bug and it is likely
>>> a real workd bug for the people who use smack and don't want to depend
>>> on /etc/mtab, or are transitioning away from it.
>>>
>>> Casey do you want to implement smack_sb_show_options or should I put it
>>> on my todo list?
>> Oh. I should add that I am always parsing the LSM mount options out so
>> that there is not a chance of the individual filesystems implementing
>> comflicting options even when there are no LSMs active. Without that I
>> am afraid we run the risk of having LSM mount otions in conflict with
>> ordinary filesystems options at some point and by the time we discover
>> it it would start introducing filesystem regressions.
>>
>> That does help with stack though as there is no fundamental reason only
>> one LSM could process mount options.
> Sigh. I just realized that there is a smack variant of the bug I am
> working to fix.
>
> smack on remount does not fail if you change the smack mount options.
> It just silently ignores the smack mount options. Which is exactly the
> same poor interaction with userspace that has surprised user space
> and caused CVEs.
>
> How much do you think the smack users will care if you start verifying
> that if smack options are present in remount that they are unchanged
> from mount?
I've added the smack-discuss list to the conversation.
> I suspect the smack userbase is small enough, and the corner case is
> crazy enough we can fix this poor communication by smack. Otherwise it
> looks like there needs to be a new security hook so old and new remounts
> can be distinguished by the LSMs, and smack can be fixed in the new
> version.
I fear that it may be worse than that. It's not enough to distinguish
a mount from a remount. On remount you need an LSM specific way to
compare mount options. Smack may decide that it's OK to remount a
filesystem with more restrictive smackfshat values, for example. Or,
it may allow smackfsroot=Pop for one and smackfstransmute=Pop on
the other. I'm not sure about the 2nd case, but you should get the idea.
>
> Eric
>
>
On Wed, Jan 30, 2019 at 01:01:54PM +0000, David Howells wrote:
> Karel Zak <[email protected]> wrote:
>
> > It seems more elegant is to ask for Nth option as expected by fsinfo().
>
> More elegant yes, but there's an issue with atomiticity[*]. I'm in the
> process of switching to something that returns you a single buffer with all
> the options in, but each key and each value is preceded by a length count.
Sounds good, for me is important to avoid all the split/join
operations with the strings.
> The reasons for not using separator characters are:
>
> (1) There's no separator char that cannot validly occur within an option[**].
Yes, it's pretty common for selinux mount options where "," is
used within an option, so mount options string looks like
'context="system_u:object_r:tmp_t:s0:c127,c456",noexec'
and I have doubts all the parses in userspace are compatible with this
use case...
> (2) Makes it possible to return binary values if we need to.
Yes.
> [**] Oh, and look at cifs where you can *change* the separator char during
> option parsing ("sep=<char>").
No comment :-)
Karel
--
Karel Zak <[email protected]>
http://karelzak.blogspot.com
On Wed, Jan 30, 2019 at 07:35:39AM -0600, Eric W. Biederman wrote:
> I suspect that as long as userspace supports /etc/fstab and we in turn
> support /proc/mounts there is going to be a lot of pressure to keep
> the majority of options so they can be encoded in a string separated by
> commas.
Well, we're doing many crazy things in userspace ;-) For example we do
not distinguish between VFS flags (MS_*), FS specific mount options
and userspace mount options (loop=) in our config files.
You already need to parse fstab/command line before you can use the
strings for mount(2) syscall. It's already not straightforward, see
all the code in libmount...
/proc/mounts is only for backward compatibility, /proc/self/mountinfo
is better way, because it allows to see VFS and FS, etc.
IMHO it would be better to not care about way how we use mount options
strings now (in userspace) when you think about the new API design.
Karel
--
Karel Zak <[email protected]>
http://karelzak.blogspot.com