by Stephen Smalley

[permalink] [raw]

Subject: Re: [RFC][PATCH 0/8] Mount, FS, Block and Keyrings notifications [ver #2]

On 6/5/19 12:19 AM, Andy Lutomirski wrote:
> On Tue, Jun 4, 2019 at 6:18 PM Stephen Smalley
> <[email protected]> wrote:
>>
>> On Tue, Jun 4, 2019 at 4:58 PM Andy Lutomirski <[email protected]> wrote:
>>>
>>> On Tue, Jun 4, 2019 at 1:39 PM David Howells <[email protected]> wrote:
>>>>
>>>> Andy Lutomirski <[email protected]> wrote:
>>>>
>>>>>> Here's a set of patches to add a general variable-length notification queue
>>>>>> concept and to add sources of events for:
>>>>>
>>>>> I asked before and didn't see a response, so I'll ask again. Why are you
>>>>> paying any attention at all to the creds that generate an event?
>>>>
>>>> Casey responded to you. It's one of his requirements.
>>>>
>>>
>>> It being a "requirement" doesn't make it okay.
>>>
>>>> However, the LSMs (or at least SELinux) ignore f_cred and use current_cred()
>>>> when checking permissions. See selinux_revalidate_file_permission() for
>>>> example - it uses current_cred() not file->f_cred to re-evaluate the perms,
>>>> and the fd might be shared between a number of processes with different creds.
>>>
>>> That's a bug. It's arguably a rather severe bug. If I ever get
>>> around to writing the patch I keep thinking of that will warn if we
>>> use creds from invalid contexts, it will warn.
>>
>>
>> No, not a bug. Working as designed. Initial validation on open, but revalidation upon read/write if something has changed since open (process SID differs from opener, inode SID has changed, policy has changed). Current subject SID should be used for the revalidation. It's a MAC vs DAC difference.
>>
>
> Can you explain how the design is valid, then? Consider nasty cases like this:
>
> $ sudo -u lotsofgarbage 2>/dev/whatever

(sorry for the previous html email; gmail or my inability to properly
use it strikes again!)

Here we have four (or more) opportunities to say no:
1) Upon selinux_inode_permission(), when checking write access to
/dev/whatever in the context of the shell process,
2) Upon selinux_file_open(), when checking and caching the open and
write access for shell to /dev/whatever in the file security struct,
3) Upon selinux_bprm_committing_creds() -> flush_unauthorized_files(),
when revalidating write access to /dev/whatever in the context of sudo,
4) Upon selinux_file_permission() ->
selinux_revalidate_file_permission(), when revalidating write access to
/dev/whatever in the context of sudo.

If any of those fail, then access is denied, so unless both the shell
and sudo are authorized to write to /dev/whatever, it is a no-go. NB
Only the shell context requires open permission here; the sudo context
only needs write.

> It is certainly the case that drivers, fs code, and other core code
> MUST NOT look at current_cred() in the context of syscalls like
> open(). Jann, I, and others have found quite a few rootable bugs of
> this sort. What makes MAC special here?

Do you mean syscalls like write(), not open()? I think your concern is
that they apply some check only during write() and not open() and
therefore are susceptible to confused deputy scenario above. In
contrast we are validating access at open, transfer/inherit, and use. If
we use file->f_cred instead of current_cred() in
selinux_revalidate_file_permission() and the current process SID differs
from that of the opener, we'll never apply a check for the actual
security context performing the write(), so information can flow in
violation of the MAC policy.

> I would believe there are cases where auditing write() callers makes
> some sense, but anyone reading those logs needs to understand that the
> creds are dubious at best.

2019-06-05 14:52:45

Casey Schaufler <[email protected]> wrote:

> Right. You're mixing the kind of things that can generate events,
> and that makes having a single policy difficult.

Whilst that's true, the notifications are clearly marked as to type, so it
should be possible to select different policies for different notification
types.

Question for you: what does the LSM *actually* need? There are a bunch of
things available, some of which may be the same thing:

(1) The creds of the process that created a watch_queue (ie. opened
/dev/watch_queue).

(2) The creds of the process that set a watch (ie. called sb_notify,
KEYCTL_NOTIFY, ...);

(3) The creds of the process that tripped the event (which might be the
system).

(4) The security attributes of the object on which the watch was set (uid,
gid, mode, labels).

(5) The security attributes of the object on which the event was tripped.

(6) The security attributes of all the objects between the object in (5) and
the object in (4), assuming we work from (5) towards (4) if the two
aren't coincident (WATCH_INFO_RECURSIVE).

At the moment, when post_one_notification() wants to write a notification into
a queue, it calls security_post_notification() to ask if it should be allowed
to do so. This is passed (1) and (3) above plus the notification record.

The only problem I really have is that for a destruction message you want to
get the creds of who did the last put on an object and caused it to be
destroyed - I think everything else probably gets the right creds, even if
they aren't even in the same namespaces (mount propagation, yuck).

However, that one is a biggie because close()/exit() must propagate it to
deferred-fput, which must propagate it to af_unix-cleanup, and thence back to
deferred-fput and thence to implicit unmount (dissolve_on_fput()[*]).

[*] Though it should be noted that if this happens, the subtree cannot be
attached to the root of a namespace.

> > In any case, that's what I was referring to when I said I might need to call
> > inode_permission(). But UIDs don't exist for all filesystems, for example,
> > and there are no UIDs on superblocks, mount objects or hardware events.
>
> If you open() or stat() a file on those filesystems the UID
> used in the access control comes from somewhere. Setting a watch
> on things with UIDs should use the access mode on the file,
> just like any other filesystem operation.

Another question for you: Do I need to let the LSM pass judgement on a watch
that a process is trying to set? I think I probably do. This would require
separate hooks for different object types:

int security_watch_key(struct watch *watch, struct key *key);
int security_watch_sb(struct watch *watch, struct path *path);
int security_watch_mount(struct watch *watch, struct path *path);
int security_watch_devices(struct watch *watch);

so that the LSM can see the object the watch is being placed on (the last has
a global queue, so there is no object).

Further, do I need to put a "void *security" pointer in struct watch and
indicate to the LSM the object bring watched? The watch could then be passed
to security_post_notification() instead of the watch queue creds (which I
could then dispense with).

security_post_notification(const struct watch *watch,
const struct cred *trigger_cred,
struct watch_notification *n);

Also, should I let the LSM audit/edit the filter set by
IOC_WATCH_QUEUE_SET_FILTER? Userspace can't retrieve the filter, so the LSM
could edit it to exclude certain things. That might be a bit too complicated,
though.

> Things like superblocks are sticker because we don't generally
> think of them as objects. If you can do statfs(), you should be
> able to set a watch on the filesystem metadata.
>
> How would you specify a watch for a hardware event? If you say
> you have to open /dev/mumble to sent a watch for mumbles, you're
> good there, too.

That's not how that works at the moment. There's a global watch list for
device events. I've repurposed it to carry any device's events - so it will
carry blockdev events (I/O errors only at the moment) and usb events
(add/remove device, add/remove bus, reset device at the moment).

> > Now, I could see that you ignore UIDs on things like keys and
> > hardware-triggered events, but how does this interact with things like mount
> > watches that see directories that have UIDs?
> >
> > Are you advocating making it such that process B can only see events
> > triggered by process A if they have the same UID, for example?
>
> It's always seemed arbitrary to me that you can't open your process up to
> get signals from other users. What about putting mode bits on your ring
> buffer? By default you could only accept your own events, but you could do a
> rb_chmod(0222) and let all events through.

Ummm... This mechanism is pretty much about events generated by others.
Depend on what you mean by 'you' and 'your own events', it might be considered
that you would know what events you were directly causing and wouldn't need a
notification system for it.

> Subject to LSM addition restrictions, of course. That would require the cred
> of the process that triggered the event or a system cred for "hardware"
> events. If you don't like mode bits you could use an ACL for fine
> granularity or a single "let'em all in" bit for coarse.

I'm not entirely sure how an ACL would help. If someone creates a watch
queue, sets an ACL with only a "let everything in" ACE, we're back to the
situation we're in now.

As I understand it, the issue you have is stopping them getting events that
they're willing to accept that you think they shouldn't be allowed.

> I'm not against access, I'm against uncontrolled access in conflict with
> basic system policy.

David