2015-12-03 00:04:39

by Kees Cook

[permalink] [raw]
Subject: [PATCH v2] fs: clear file privilege bits when mmap writing

Normally, when a user can modify a file that has setuid or setgid bits,
those bits are cleared when they are not the file owner or a member
of the group. This is enforced when using write and truncate but not
when writing to a shared mmap on the file. This could allow the file
writer to gain privileges by changing a binary without losing the
setuid/setgid/caps bits.

Changing the bits requires holding inode->i_mutex, so it cannot be done
during the page fault (due to mmap_sem being held during the fault).
Instead, clear the bits if PROT_WRITE is being used at mmap time.

Signed-off-by: Kees Cook <[email protected]>
Cc: [email protected]
---
v2:
- move check from page fault to mmap open
---
mm/mmap.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/mm/mmap.c b/mm/mmap.c
index 2ce04a649f6b..a27735aabc73 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1340,6 +1340,17 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
if (locks_verify_locked(file))
return -EAGAIN;

+ /*
+ * If we must remove privs, we do it here since
+ * doing it during page COW is expensive and
+ * cannot hold inode->i_mutex.
+ */
+ if (prot & PROT_WRITE && !IS_NOSEC(inode)) {
+ mutex_lock(&inode->i_mutex);
+ file_remove_privs(file);
+ mutex_unlock(&inode->i_mutex);
+ }
+
vm_flags |= VM_SHARED | VM_MAYSHARE;
if (!(file->f_mode & FMODE_WRITE))
vm_flags &= ~(VM_MAYWRITE | VM_SHARED);
--
1.9.1


--
Kees Cook
Chrome OS & Brillo Security


2015-12-03 00:18:55

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2] fs: clear file privilege bits when mmap writing

On Wed, 2 Dec 2015 16:03:42 -0800 Kees Cook <[email protected]> wrote:

> Normally, when a user can modify a file that has setuid or setgid bits,
> those bits are cleared when they are not the file owner or a member
> of the group. This is enforced when using write and truncate but not
> when writing to a shared mmap on the file. This could allow the file
> writer to gain privileges by changing a binary without losing the
> setuid/setgid/caps bits.
>
> Changing the bits requires holding inode->i_mutex, so it cannot be done
> during the page fault (due to mmap_sem being held during the fault).
> Instead, clear the bits if PROT_WRITE is being used at mmap time.
>
> ...
>
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1340,6 +1340,17 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> if (locks_verify_locked(file))
> return -EAGAIN;
>
> + /*
> + * If we must remove privs, we do it here since
> + * doing it during page COW is expensive and
> + * cannot hold inode->i_mutex.
> + */
> + if (prot & PROT_WRITE && !IS_NOSEC(inode)) {
> + mutex_lock(&inode->i_mutex);
> + file_remove_privs(file);
> + mutex_unlock(&inode->i_mutex);
> + }
> +

Still ignoring the file_remove_privs() return value. If this is
deliberate then a description of the reasons should be included?

2015-12-03 16:07:39

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2] fs: clear file privilege bits when mmap writing

On Wed, Dec 2, 2015 at 4:18 PM, Andrew Morton <[email protected]> wrote:
> On Wed, 2 Dec 2015 16:03:42 -0800 Kees Cook <[email protected]> wrote:
>
>> Normally, when a user can modify a file that has setuid or setgid bits,
>> those bits are cleared when they are not the file owner or a member
>> of the group. This is enforced when using write and truncate but not
>> when writing to a shared mmap on the file. This could allow the file
>> writer to gain privileges by changing a binary without losing the
>> setuid/setgid/caps bits.
>>
>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>> during the page fault (due to mmap_sem being held during the fault).
>> Instead, clear the bits if PROT_WRITE is being used at mmap time.
>>
>> ...
>>
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -1340,6 +1340,17 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>> if (locks_verify_locked(file))
>> return -EAGAIN;
>>
>> + /*
>> + * If we must remove privs, we do it here since
>> + * doing it during page COW is expensive and
>> + * cannot hold inode->i_mutex.
>> + */
>> + if (prot & PROT_WRITE && !IS_NOSEC(inode)) {
>> + mutex_lock(&inode->i_mutex);
>> + file_remove_privs(file);
>> + mutex_unlock(&inode->i_mutex);
>> + }
>> +
>
> Still ignoring the file_remove_privs() return value. If this is
> deliberate then a description of the reasons should be included?

Argh, yes, sorry. I will send a v3.

-Kees

--
Kees Cook
Chrome OS & Brillo Security

2015-12-03 18:19:14

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2] fs: clear file privilege bits when mmap writing

On Wed, Dec 2, 2015 at 4:18 PM, Andrew Morton <[email protected]> wrote:
> On Wed, 2 Dec 2015 16:03:42 -0800 Kees Cook <[email protected]> wrote:
>
>> Normally, when a user can modify a file that has setuid or setgid bits,
>> those bits are cleared when they are not the file owner or a member
>> of the group. This is enforced when using write and truncate but not
>> when writing to a shared mmap on the file. This could allow the file
>> writer to gain privileges by changing a binary without losing the
>> setuid/setgid/caps bits.
>>
>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>> during the page fault (due to mmap_sem being held during the fault).
>> Instead, clear the bits if PROT_WRITE is being used at mmap time.
>>
>> ...
>>
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -1340,6 +1340,17 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>> if (locks_verify_locked(file))
>> return -EAGAIN;
>>
>> + /*
>> + * If we must remove privs, we do it here since
>> + * doing it during page COW is expensive and
>> + * cannot hold inode->i_mutex.
>> + */
>> + if (prot & PROT_WRITE && !IS_NOSEC(inode)) {
>> + mutex_lock(&inode->i_mutex);
>> + file_remove_privs(file);
>> + mutex_unlock(&inode->i_mutex);
>> + }
>> +
>
> Still ignoring the file_remove_privs() return value. If this is
> deliberate then a description of the reasons should be included?

Actually, there is a bigger problem:
https://lists.01.org/pipermail/lkp/2015-December/003185.html

[ 37.741286] trinity-c0/742 is trying to acquire lock:
[ 37.741982] (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<811c3b34>]
do_mmap+0x544/0x670
[ 37.752562]
[ 37.752562] but task is already holding lock:
[ 37.753442] (&mm->mmap_sem){++++++}, at: [<811c3d70>]
SyS_remap_file_pages+0xe0/0x350

Jan, any thoughts on avoiding this?

-Kees

--
Kees Cook
Chrome OS & Brillo Security

2015-12-04 01:45:11

by yalin wang

[permalink] [raw]
Subject: Re: [PATCH v2] clear file privilege bits when mmap writing


> On Dec 2, 2015, at 16:03, Kees Cook <[email protected]> wrote:
>
> Normally, when a user can modify a file that has setuid or setgid bits,
> those bits are cleared when they are not the file owner or a member
> of the group. This is enforced when using write and truncate but not
> when writing to a shared mmap on the file. This could allow the file
> writer to gain privileges by changing a binary without losing the
> setuid/setgid/caps bits.
>
> Changing the bits requires holding inode->i_mutex, so it cannot be done
> during the page fault (due to mmap_sem being held during the fault).
> Instead, clear the bits if PROT_WRITE is being used at mmap time.
>
> Signed-off-by: Kees Cook <[email protected]>
> Cc: [email protected]
> —

is this means mprotect() sys call also need add this check?
mprotect() can change to PROT_WRITE, then it can write to a
read only map again , also a secure hole here .

Thanks

2015-12-07 22:42:28

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2] clear file privilege bits when mmap writing

On Thu, Dec 3, 2015 at 5:45 PM, yalin wang <[email protected]> wrote:
>
>> On Dec 2, 2015, at 16:03, Kees Cook <[email protected]> wrote:
>>
>> Normally, when a user can modify a file that has setuid or setgid bits,
>> those bits are cleared when they are not the file owner or a member
>> of the group. This is enforced when using write and truncate but not
>> when writing to a shared mmap on the file. This could allow the file
>> writer to gain privileges by changing a binary without losing the
>> setuid/setgid/caps bits.
>>
>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>> during the page fault (due to mmap_sem being held during the fault).
>> Instead, clear the bits if PROT_WRITE is being used at mmap time.
>>
>> Signed-off-by: Kees Cook <[email protected]>
>> Cc: [email protected]
>> —
>
> is this means mprotect() sys call also need add this check?
> mprotect() can change to PROT_WRITE, then it can write to a
> read only map again , also a secure hole here .

Yes, good point. This needs to be added. I will send a new patch. Thanks!

-Kees

--
Kees Cook
Chrome OS & Brillo Security

2015-12-08 00:40:20

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2] clear file privilege bits when mmap writing

On Mon, Dec 7, 2015 at 2:42 PM, Kees Cook <[email protected]> wrote:
> On Thu, Dec 3, 2015 at 5:45 PM, yalin wang <[email protected]> wrote:
>>
>>> On Dec 2, 2015, at 16:03, Kees Cook <[email protected]> wrote:
>>>
>>> Normally, when a user can modify a file that has setuid or setgid bits,
>>> those bits are cleared when they are not the file owner or a member
>>> of the group. This is enforced when using write and truncate but not
>>> when writing to a shared mmap on the file. This could allow the file
>>> writer to gain privileges by changing a binary without losing the
>>> setuid/setgid/caps bits.
>>>
>>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>>> during the page fault (due to mmap_sem being held during the fault).
>>> Instead, clear the bits if PROT_WRITE is being used at mmap time.
>>>
>>> Signed-off-by: Kees Cook <[email protected]>
>>> Cc: [email protected]
>>> —
>>
>> is this means mprotect() sys call also need add this check?
>> mprotect() can change to PROT_WRITE, then it can write to a
>> read only map again , also a secure hole here .
>
> Yes, good point. This needs to be added. I will send a new patch. Thanks!

This continues to look worse and worse.

So... to check this at mprotect time, I have to know it's MAP_SHARED,
but that's in the vma_flags, which I can only see after holding
mmap_sem.

The best I can think of now is to strip the bits at munmap time, since
you can't execute an mmapped file until it closes.

Jan, thoughts on this?

-Kees

--
Kees Cook
Chrome OS & Brillo Security

2015-12-09 08:26:48

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2] clear file privilege bits when mmap writing

On Mon 07-12-15 16:40:14, Kees Cook wrote:
> On Mon, Dec 7, 2015 at 2:42 PM, Kees Cook <[email protected]> wrote:
> > On Thu, Dec 3, 2015 at 5:45 PM, yalin wang <[email protected]> wrote:
> >>
> >>> On Dec 2, 2015, at 16:03, Kees Cook <[email protected]> wrote:
> >>>
> >>> Normally, when a user can modify a file that has setuid or setgid bits,
> >>> those bits are cleared when they are not the file owner or a member
> >>> of the group. This is enforced when using write and truncate but not
> >>> when writing to a shared mmap on the file. This could allow the file
> >>> writer to gain privileges by changing a binary without losing the
> >>> setuid/setgid/caps bits.
> >>>
> >>> Changing the bits requires holding inode->i_mutex, so it cannot be done
> >>> during the page fault (due to mmap_sem being held during the fault).
> >>> Instead, clear the bits if PROT_WRITE is being used at mmap time.
> >>>
> >>> Signed-off-by: Kees Cook <[email protected]>
> >>> Cc: [email protected]
> >>> —
> >>
> >> is this means mprotect() sys call also need add this check?
> >> mprotect() can change to PROT_WRITE, then it can write to a
> >> read only map again , also a secure hole here .
> >
> > Yes, good point. This needs to be added. I will send a new patch. Thanks!
>
> This continues to look worse and worse.
>
> So... to check this at mprotect time, I have to know it's MAP_SHARED,
> but that's in the vma_flags, which I can only see after holding
> mmap_sem.
>
> The best I can think of now is to strip the bits at munmap time, since
> you can't execute an mmapped file until it closes.
>
> Jan, thoughts on this?

Umm, so we actually refuse to execute a file while someone has it open for
writing (deny_write_access() in do_open_execat()). So dropping the suid /
sgid bits when closing file for writing could be plausible. Grabbing
i_mutex from __fput() context is safe (it gets called from task_work
context when returning to userspace).

That way we could actually remove the checks done for each write. To avoid
unexpected removal of suid/sgid bits when someone just opens & closes the
file, we could mark the file as needing suid/sgid treatment by a flag in
inode->i_flags when file gets written to or mmaped and then check for this
in __fput().

I've added Al Viro to CC just in case he is aware of some issues with
this...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-12-09 22:52:54

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2] clear file privilege bits when mmap writing

On Wed, Dec 9, 2015 at 12:26 AM, Jan Kara <[email protected]> wrote:
> On Mon 07-12-15 16:40:14, Kees Cook wrote:
>> On Mon, Dec 7, 2015 at 2:42 PM, Kees Cook <[email protected]> wrote:
>> > On Thu, Dec 3, 2015 at 5:45 PM, yalin wang <[email protected]> wrote:
>> >>
>> >>> On Dec 2, 2015, at 16:03, Kees Cook <[email protected]> wrote:
>> >>>
>> >>> Normally, when a user can modify a file that has setuid or setgid bits,
>> >>> those bits are cleared when they are not the file owner or a member
>> >>> of the group. This is enforced when using write and truncate but not
>> >>> when writing to a shared mmap on the file. This could allow the file
>> >>> writer to gain privileges by changing a binary without losing the
>> >>> setuid/setgid/caps bits.
>> >>>
>> >>> Changing the bits requires holding inode->i_mutex, so it cannot be done
>> >>> during the page fault (due to mmap_sem being held during the fault).
>> >>> Instead, clear the bits if PROT_WRITE is being used at mmap time.
>> >>>
>> >>> Signed-off-by: Kees Cook <[email protected]>
>> >>> Cc: [email protected]
>> >>> —
>> >>
>> >> is this means mprotect() sys call also need add this check?
>> >> mprotect() can change to PROT_WRITE, then it can write to a
>> >> read only map again , also a secure hole here .
>> >
>> > Yes, good point. This needs to be added. I will send a new patch. Thanks!
>>
>> This continues to look worse and worse.
>>
>> So... to check this at mprotect time, I have to know it's MAP_SHARED,
>> but that's in the vma_flags, which I can only see after holding
>> mmap_sem.
>>
>> The best I can think of now is to strip the bits at munmap time, since
>> you can't execute an mmapped file until it closes.
>>
>> Jan, thoughts on this?
>
> Umm, so we actually refuse to execute a file while someone has it open for
> writing (deny_write_access() in do_open_execat()). So dropping the suid /
> sgid bits when closing file for writing could be plausible. Grabbing
> i_mutex from __fput() context is safe (it gets called from task_work
> context when returning to userspace).
>
> That way we could actually remove the checks done for each write. To avoid
> unexpected removal of suid/sgid bits when someone just opens & closes the
> file, we could mark the file as needing suid/sgid treatment by a flag in
> inode->i_flags when file gets written to or mmaped and then check for this
> in __fput().

Yeah, this is ultimately where I ended up for the v4 (and fixed up in
v5). I added the flag to file, though, not inode. Sending v5 now...

-Kees

>
> I've added Al Viro to CC just in case he is aware of some issues with
> this...
>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR



--
Kees Cook
Chrome OS & Brillo Security