2012-11-20 20:42:45

by Kees Cook

[permalink] [raw]
Subject: [PATCH v2] devtmpfs: mount with noexec and nosuid

Since devtmpfs is writable, make the default noexec,nosuid as well. This
protects from the case of a privileged process having an arbitrary file
write flaw and an argumentless arbitrary execution (i.e. it would lack
the ability to run "mount -o remount,exec,suid /dev").

Cc: [email protected]
Cc: Kay Sievers <[email protected]>
Cc: Roland Eggner <[email protected]>
Signed-off-by: Kees Cook <[email protected]>

---
v2:
- use CONFIG_DEVTMPFS_SAFE to wrap the logic.
---
drivers/base/Kconfig | 12 ++++++++++++
drivers/base/devtmpfs.c | 12 ++++++++++--
2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index b34b5cd..a37fcf2 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -56,6 +56,18 @@ config DEVTMPFS_MOUNT
rescue mode with init=/bin/sh, even when the /dev directory
on the rootfs is completely empty.

+config DEVTMPFS_SAFE
+ bool "Use nosuid,noexec mount options on devtmpfs"
+ depends on DEVTMPFS
+ help
+ This instructs the kernel to include the MS_NOEXEC and
+ MS_NOSUID mount flags when mounting devtmpfs. This prevents
+ certain kinds of code-execution attacks on embedded platforms.
+
+ Notice: If enabled, things like /dev/mem cannot be mmapped
+ with the PROT_EXEC flag. This can break, for example, non-KMS
+ video drivers.
+
config STANDALONE
bool "Select only drivers that don't need compile-time external firmware" if EXPERIMENTAL
default y
diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index 147d1a4..bf85fbf 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -340,6 +340,10 @@ static int handle_remove(const char *nodename, struct device *dev)
int devtmpfs_mount(const char *mntdir)
{
int err;
+ int mflags = MS_SILENT;
+#ifdef CONFIG_DEVTMPFS_SAFE
+ mflags |= MS_NOEXEC | MS_NOSUID;
+#endif

if (!mount_dev)
return 0;
@@ -347,7 +351,7 @@ int devtmpfs_mount(const char *mntdir)
if (!thread)
return 0;

- err = sys_mount("devtmpfs", (char *)mntdir, "devtmpfs", MS_SILENT, NULL);
+ err = sys_mount("devtmpfs", (char *)mntdir, "devtmpfs", mflags, NULL);
if (err)
printk(KERN_INFO "devtmpfs: error mounting %i\n", err);
else
@@ -369,10 +373,14 @@ static int devtmpfsd(void *p)
{
char options[] = "mode=0755";
int *err = p;
+ int mflags = MS_SILENT;
+#ifdef CONFIG_DEVTMPFS_SAFE
+ mflags |= MS_NOEXEC | MS_NOSUID;
+#endif
*err = sys_unshare(CLONE_NEWNS);
if (*err)
goto out;
- *err = sys_mount("devtmpfs", "/", "devtmpfs", MS_SILENT, options);
+ *err = sys_mount("devtmpfs", "/", "devtmpfs", mflags, options);
if (*err)
goto out;
sys_chdir("/.."); /* will traverse into overmounted root */
--
1.7.9.5


--
Kees Cook
Chrome OS Security


2012-11-20 20:55:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2] devtmpfs: mount with noexec and nosuid

On Tue, Nov 20, 2012 at 12:42:38PM -0800, Kees Cook wrote:
> Since devtmpfs is writable, make the default noexec,nosuid as well. This
> protects from the case of a privileged process having an arbitrary file
> write flaw and an argumentless arbitrary execution (i.e. it would lack
> the ability to run "mount -o remount,exec,suid /dev").
>
> Cc: [email protected]
> Cc: Kay Sievers <[email protected]>
> Cc: Roland Eggner <[email protected]>
> Signed-off-by: Kees Cook <[email protected]>
>
> ---
> v2:
> - use CONFIG_DEVTMPFS_SAFE to wrap the logic.

That's better, thanks.

One tiny bikeshead request though:

> --- a/drivers/base/devtmpfs.c
> +++ b/drivers/base/devtmpfs.c
> @@ -340,6 +340,10 @@ static int handle_remove(const char *nodename, struct device *dev)
> int devtmpfs_mount(const char *mntdir)
> {
> int err;
> + int mflags = MS_SILENT;
> +#ifdef CONFIG_DEVTMPFS_SAFE
> + mflags |= MS_NOEXEC | MS_NOSUID;
> +#endif

You duplicate this in two places, which makes the c code harder to read.
How about, at the top of the file, you do:

#ifdef CONFIG_DEVTMPFS_SAFE
#define DEVTMPFS_MFLAGS MS_SILENT | MS_NOEXEC | MS_NOSUID
#else
#define DEVTMPFS_MFLAGS MS_SILENT
#endif

And then just use DEVTMPFS_MFLAGS in both places?

That should make the patch smaller, which is always nice :)

thanks,

greg k-h

2012-11-20 21:08:00

by Alan

[permalink] [raw]
Subject: Re: [PATCH v2] devtmpfs: mount with noexec and nosuid

On Tue, 20 Nov 2012 12:42:38 -0800
Kees Cook <[email protected]> wrote:

> Since devtmpfs is writable, make the default noexec,nosuid as well. This
> protects from the case of a privileged process having an arbitrary file
> write flaw and an argumentless arbitrary execution (i.e. it would lack
> the ability to run "mount -o remount,exec,suid /dev").

Ok this looks crap on two levels.

1. Why not just have your userspace mount -o remount the file system this
way already in early boot. (and if you trojanned boot that early then any
supposed security gain is already lost)

2. If you want to do this right then you need to work out what you are
trying to prevent. Your devtmpfs can force file permissions on the
underlying device nodes by having its own operation handling for chmod.

At that point you can force permissions on anything that you want to
avoid floating around that filesystem with other rights, while not
touching it on device or directory nodes where the meaning is different.

In its current form however it appears to be a kernel implementation of
"mount is too hard".

Alan

2012-11-20 21:41:57

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2] devtmpfs: mount with noexec and nosuid

On Tue, Nov 20, 2012 at 12:54 PM, Greg Kroah-Hartman
<[email protected]> wrote:
> On Tue, Nov 20, 2012 at 12:42:38PM -0800, Kees Cook wrote:
>> Since devtmpfs is writable, make the default noexec,nosuid as well. This
>> protects from the case of a privileged process having an arbitrary file
>> write flaw and an argumentless arbitrary execution (i.e. it would lack
>> the ability to run "mount -o remount,exec,suid /dev").
>>
>> Cc: [email protected]
>> Cc: Kay Sievers <[email protected]>
>> Cc: Roland Eggner <[email protected]>
>> Signed-off-by: Kees Cook <[email protected]>
>>
>> ---
>> v2:
>> - use CONFIG_DEVTMPFS_SAFE to wrap the logic.
>
> That's better, thanks.
>
> One tiny bikeshead request though:
>
>> --- a/drivers/base/devtmpfs.c
>> +++ b/drivers/base/devtmpfs.c
>> @@ -340,6 +340,10 @@ static int handle_remove(const char *nodename, struct device *dev)
>> int devtmpfs_mount(const char *mntdir)
>> {
>> int err;
>> + int mflags = MS_SILENT;
>> +#ifdef CONFIG_DEVTMPFS_SAFE
>> + mflags |= MS_NOEXEC | MS_NOSUID;
>> +#endif
>
> You duplicate this in two places, which makes the c code harder to read.
> How about, at the top of the file, you do:
>
> #ifdef CONFIG_DEVTMPFS_SAFE
> #define DEVTMPFS_MFLAGS MS_SILENT | MS_NOEXEC | MS_NOSUID
> #else
> #define DEVTMPFS_MFLAGS MS_SILENT
> #endif
>
> And then just use DEVTMPFS_MFLAGS in both places?
>
> That should make the patch smaller, which is always nice :)

Excellent idea.

-Kees

--
Kees Cook
Chrome OS Security

2012-11-20 21:50:00

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2] devtmpfs: mount with noexec and nosuid

On Tue, Nov 20, 2012 at 1:13 PM, Alan Cox <[email protected]> wrote:
> On Tue, 20 Nov 2012 12:42:38 -0800
> Kees Cook <[email protected]> wrote:
>
>> Since devtmpfs is writable, make the default noexec,nosuid as well. This
>> protects from the case of a privileged process having an arbitrary file
>> write flaw and an argumentless arbitrary execution (i.e. it would lack
>> the ability to run "mount -o remount,exec,suid /dev").
>
> Ok this looks crap on two levels.
>
> 1. Why not just have your userspace mount -o remount the file system this
> way already in early boot. (and if you trojanned boot that early then any
> supposed security gain is already lost)

That's certainly possible, but I am hoping to avoid adding any extra
boot time. The kernel is responsible for this mount, so its flags
should be configurable, resulting in no time penalty anywhere.

> 2. If you want to do this right then you need to work out what you are
> trying to prevent. Your devtmpfs can force file permissions on the
> underlying device nodes by having its own operation handling for chmod.
>
> At that point you can force permissions on anything that you want to
> avoid floating around that filesystem with other rights, while not
> touching it on device or directory nodes where the meaning is different.
>
> In its current form however it appears to be a kernel implementation of
> "mount is too hard".

This change also stops mmap() with PROT_EXEC which a chmod handler
wouldn't be able to do. The noexec and nosuid mount options were
designed for this sort of thing, so I think that's where it should be
handled.

-Kees

--
Kees Cook
Chrome OS Security

2012-11-20 23:48:42

by Alan

[permalink] [raw]
Subject: Re: [PATCH v2] devtmpfs: mount with noexec and nosuid

> > 1. Why not just have your userspace mount -o remount the file system this
> > way already in early boot. (and if you trojanned boot that early then any
> > supposed security gain is already lost)
>
> That's certainly possible, but I am hoping to avoid adding any extra
> boot time. The kernel is responsible for this mount, so its flags
> should be configurable, resulting in no time penalty anywhere.

You just broke my bullshitometer

It's a single syscall from your init binary, its microseconds.

> > 2. If you want to do this right then you need to work out what you are
> > trying to prevent. Your devtmpfs can force file permissions on the
> > underlying device nodes by having its own operation handling for chmod.
> >
> > At that point you can force permissions on anything that you want to
> > avoid floating around that filesystem with other rights, while not
> > touching it on device or directory nodes where the meaning is different.
> >
> > In its current form however it appears to be a kernel implementation of
> > "mount is too hard".
>
> This change also stops mmap() with PROT_EXEC which a chmod handler
> wouldn't be able to do.

You don't want to stop mmap with PROT_EXEC on /dev/mem as that breaks a
load of stuff, you want to stop people adding stuff to that file system
and executing it.

Think about it more carefully - if I've got access to /dev/mem you
already lost.

The patch is as is nonsense. It's doable in all kernels (including
existing legacy ones) as a single syscall in your init code. That's
portable back compatible and works all over the place. Your patch
introduces a pointless config option to make things less compatible than
before. It's bogus.

Alan

2012-11-20 23:53:32

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2] devtmpfs: mount with noexec and nosuid

On Tue, Nov 20, 2012 at 3:53 PM, Alan Cox <[email protected]> wrote:
>> > 1. Why not just have your userspace mount -o remount the file system this
>> > way already in early boot. (and if you trojanned boot that early then any
>> > supposed security gain is already lost)
>>
>> That's certainly possible, but I am hoping to avoid adding any extra
>> boot time. The kernel is responsible for this mount, so its flags
>> should be configurable, resulting in no time penalty anywhere.
>
> You just broke my bullshitometer
>
> It's a single syscall from your init binary, its microseconds.

Whatever, I still see it as a needless inefficiency.

>> > 2. If you want to do this right then you need to work out what you are
>> > trying to prevent. Your devtmpfs can force file permissions on the
>> > underlying device nodes by having its own operation handling for chmod.
>> >
>> > At that point you can force permissions on anything that you want to
>> > avoid floating around that filesystem with other rights, while not
>> > touching it on device or directory nodes where the meaning is different.
>> >
>> > In its current form however it appears to be a kernel implementation of
>> > "mount is too hard".
>>
>> This change also stops mmap() with PROT_EXEC which a chmod handler
>> wouldn't be able to do.
>
> You don't want to stop mmap with PROT_EXEC on /dev/mem as that breaks a
> load of stuff, you want to stop people adding stuff to that file system
> and executing it.

Well, initially the latter, yes. But as it turns out, setting noexec
also stops PROT_EXEC on /dev/mem. Since the systems I'm building for
all use KMS, there's no need to execute regions of /dev/mem (e.g. VESA
BIOS init, etc).

-Kees

--
Kees Cook
Chrome OS Security

2012-11-21 00:14:06

by Kay Sievers

[permalink] [raw]
Subject: Re: [PATCH v2] devtmpfs: mount with noexec and nosuid

On Tue, Nov 20, 2012 at 9:42 PM, Kees Cook <[email protected]> wrote:
> Since devtmpfs is writable, make the default noexec,nosuid as well. This
> protects from the case of a privileged process having an arbitrary file
> write flaw and an argumentless arbitrary execution (i.e. it would lack
> the ability to run "mount -o remount,exec,suid /dev").

This only really applies to systems without an initramfs when the
kernel mounts /dev over the rootfs it has mounted; with an initramfs,
/dev is always mounted by user code.

Just checking, that is the use case you are doing that for?

Kay

2012-11-21 00:18:35

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2] devtmpfs: mount with noexec and nosuid

On Tue, Nov 20, 2012 at 4:13 PM, Kay Sievers <[email protected]> wrote:
> On Tue, Nov 20, 2012 at 9:42 PM, Kees Cook <[email protected]> wrote:
>> Since devtmpfs is writable, make the default noexec,nosuid as well. This
>> protects from the case of a privileged process having an arbitrary file
>> write flaw and an argumentless arbitrary execution (i.e. it would lack
>> the ability to run "mount -o remount,exec,suid /dev").
>
> This only really applies to systems without an initramfs when the
> kernel mounts /dev over the rootfs it has mounted; with an initramfs,
> /dev is always mounted by user code.
>
> Just checking, that is the use case you are doing that for?

Correct. We're using this in Chrome OS, which does not use an initramfs.

-Kees

--
Kees Cook
Chrome OS Security

2012-11-21 00:18:56

by Alan

[permalink] [raw]
Subject: Re: [PATCH v2] devtmpfs: mount with noexec and nosuid

> > You just broke my bullshitometer
> >
> > It's a single syscall from your init binary, its microseconds.
>
> Whatever, I still see it as a needless inefficiency.

As opposed to adding permanent kernel code and having to change the
setting by recompilation, and not being able to deploy this with older
kernels.

Right now you can add one call to your user space and it just works, with
old kernels, with new kernels, with Android etc. You think its *more
efficient* to add permanently loaded kernel hacks.

Wrong. Big time wrong. A page is 4096 bytes, your change is probably
about 8. So 1 in 512 builds will see an extra page of kernel space used
assuming an even distribution. For each of those users the moment they've
had a few page faults your solution is *less* efficient, and the moment
they've paged one page because of having less memory your solution has
become vastly less efficient.

So can we put the crap about efficiency away please. This basically reads
like

"Mummy, the mount syscall looks complicated let me hack everyones kernel
with a crass extra Kconfig option because I'm crap"

> > You don't want to stop mmap with PROT_EXEC on /dev/mem as that breaks a
> > load of stuff, you want to stop people adding stuff to that file system
> > and executing it.
>
> Well, initially the latter, yes. But as it turns out, setting noexec
> also stops PROT_EXEC on /dev/mem. Since the systems I'm building for
> all use KMS, there's no need to execute regions of /dev/mem (e.g. VESA
> BIOS init, etc).

I have news for you: this is the upstream kernel, you specific personal
needs should not define what is good design - this isn't GNOME.

If you block exec on anything but devices then all is happy. You could
even block it on anything but specific devices. At that point you could
get rid of the config option and make it more useful.

We don't need extra confusion Kconfig options, we don't need extra kernel
code combinations to fail to maintain. Your patch offers *NO* feature
advantages over the existing kernel.

NAK


Alan

2012-11-21 00:26:48

by Alan

[permalink] [raw]
Subject: Re: [PATCH v2] devtmpfs: mount with noexec and nosuid

On Tue, 20 Nov 2012 16:18:33 -0800
Kees Cook <[email protected]> wrote:

> On Tue, Nov 20, 2012 at 4:13 PM, Kay Sievers <[email protected]> wrote:
> > On Tue, Nov 20, 2012 at 9:42 PM, Kees Cook <[email protected]> wrote:
> >> Since devtmpfs is writable, make the default noexec,nosuid as well. This
> >> protects from the case of a privileged process having an arbitrary file
> >> write flaw and an argumentless arbitrary execution (i.e. it would lack
> >> the ability to run "mount -o remount,exec,suid /dev").
> >
> > This only really applies to systems without an initramfs when the
> > kernel mounts /dev over the rootfs it has mounted; with an initramfs,
> > /dev is always mounted by user code.
> >
> > Just checking, that is the use case you are doing that for?
>
> Correct. We're using this in Chrome OS, which does not use an initramfs.

But which has a perfectly good init process of its own, so it's just fine.

If you fix your init to do the work then you can deploy it to all your
Google partners and onto existing devices in an update even with old
kernels. Far better security practise.

Alan

2012-11-21 00:41:29

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2] devtmpfs: mount with noexec and nosuid

On Tue, Nov 20, 2012 at 4:24 PM, Alan Cox <[email protected]> wrote:
>> > You just broke my bullshitometer
>> >
>> > It's a single syscall from your init binary, its microseconds.
>>
>> Whatever, I still see it as a needless inefficiency.
>
> As opposed to adding permanent kernel code and having to change the
> setting by recompilation, and not being able to deploy this with older
> kernels.

Some organizations only support their single "current" kernel, and
always move forward.

> Right now you can add one call to your user space and it just works, with
> old kernels, with new kernels, with Android etc. You think its *more
> efficient* to add permanently loaded kernel hacks.

I'm not interested in those things. That said, if it genuinely agreed
that this clear lack of security best-practice (W^X) in the kernel
isn't the place to fix it, I'll move on. There are still legacy video
drivers and things that can't operate with this configuration, but the
kernel mounting devtmpfs shouldn't _require_ userspace to clean up
after a buggy kernel.

> Wrong. Big time wrong. A page is 4096 bytes, your change is probably
> about 8. So 1 in 512 builds will see an extra page of kernel space used
> assuming an even distribution. For each of those users the moment they've
> had a few page faults your solution is *less* efficient, and the moment
> they've paged one page because of having less memory your solution has
> become vastly less efficient.

My patch doesn't change the code size there after the most recent
change to use a constant value again, so the only change I can see
would be if someone built with /proc/config.gz which would show the
new config.

> So can we put the crap about efficiency away please. This basically reads
> like
>
> "Mummy, the mount syscall looks complicated let me hack everyones kernel
> with a crass extra Kconfig option because I'm crap"

I think you just switched from calling my code crap to calling me crap.

>> > You don't want to stop mmap with PROT_EXEC on /dev/mem as that breaks a
>> > load of stuff, you want to stop people adding stuff to that file system
>> > and executing it.
>>
>> Well, initially the latter, yes. But as it turns out, setting noexec
>> also stops PROT_EXEC on /dev/mem. Since the systems I'm building for
>> all use KMS, there's no need to execute regions of /dev/mem (e.g. VESA
>> BIOS init, etc).
>
> I have news for you: this is the upstream kernel, you specific personal
> needs should not define what is good design - this isn't GNOME.

Heh, I support your dig at GNOME, but refute this being a "specific
personal need". Best-practices are clear here. The only reason the
kernel hasn't been doing it is fear of breaking legacy video drivers.

> If you block exec on anything but devices then all is happy. You could
> even block it on anything but specific devices. At that point you could
> get rid of the config option and make it more useful.

Security is layers, and is never perfect. We already do a fair bit of
process confinement, but this change seemed to be clear low-hanging
fruit.

> We don't need extra confusion Kconfig options, we don't need extra kernel
> code combinations to fail to maintain. Your patch offers *NO* feature
> advantages over the existing kernel.

I disagree. It allows a system builder to opt-in to correcting a
kernel flaw introduced by the need to support legacy devices in the
case where an initramfs isn't used.

> NAK

I don't understand why you're so violently opposed to letting an
arguably flawed constant get fixed. I'm not even asking for this to be
on by default.

-Kees

--
Kees Cook
Chrome OS Security

2012-11-21 00:55:43

by Alan

[permalink] [raw]
Subject: Re: [PATCH v2] devtmpfs: mount with noexec and nosuid

On Tue, 20 Nov 2012 16:41:27 -0800
Kees Cook <[email protected]> wrote:

> On Tue, Nov 20, 2012 at 4:24 PM, Alan Cox <[email protected]> wrote:
> >> > You just broke my bullshitometer
> >> >
> >> > It's a single syscall from your init binary, its microseconds.
> >>
> >> Whatever, I still see it as a needless inefficiency.
> >
> > As opposed to adding permanent kernel code and having to change the
> > setting by recompilation, and not being able to deploy this with older
> > kernels.
>
> Some organizations only support their single "current" kernel, and
> always move forward.

This is not GNOME. Your specific organisational choices are not the sole
decider.

> I'm not interested in those things.

This is not GNOME.

> that this clear lack of security best-practice (W^X) in the kernel
> isn't the place to fix it, I'll move on. There are still legacy video
> drivers and things that can't operate with this configuration, but the
> kernel mounting devtmpfs shouldn't _require_ userspace to clean up
> after a buggy kernel.

The kernel is not "buggy". It merely has defaults you don't personally
like. They are configurable via userspace. Policy belongs in userspace.

> > Wrong. Big time wrong. A page is 4096 bytes, your change is probably
> > about 8. So 1 in 512 builds will see an extra page of kernel space used
> > assuming an even distribution. For each of those users the moment they've
> > had a few page faults your solution is *less* efficient, and the moment
> > they've paged one page because of having less memory your solution has
> > become vastly less efficient.
>
> My patch doesn't change the code size there after the most recent
> change to use a constant value again, so the only change I can see
> would be if someone built with /proc/config.gz which would show the
> new config.

So in actual fact going by distro choices its worse than I thought 8)

> Heh, I support your dig at GNOME, but refute this being a "specific
> personal need". Best-practices are clear here. The only reason the
> kernel hasn't been doing it is fear of breaking legacy video drivers.

Thats not true. Nor is it a case of "Not doing". This feature has been
supported for over ten years. We also support more flexible alternatives
like using SELinux, Smack or AppArmor for it. Right now I can do no exec
for /dev except for specific files entirely by security rules.

> Security is layers, and is never perfect. We already do a fair bit of
> process confinement, but this change seemed to be clear low-hanging
> fruit.

It's not a bad idea - but its a single userspace syscall in your init
code. Has been for over a decade.

> > We don't need extra confusion Kconfig options, we don't need extra kernel
> > code combinations to fail to maintain. Your patch offers *NO* feature
> > advantages over the existing kernel.
>
> I disagree. It allows a system builder to opt-in to correcting a
> kernel flaw introduced by the need to support legacy devices in the
> case where an initramfs isn't used.

Thats a fiction you invented. It's completely irrelevant to most systems
and its already long supported. I think doing that mount in Chrome is
sensible. I think doing it in userspace as we've supported for over a
decade is rather smarter than kernel hacks. You don't have to pee in
other people's ponds that way or achieve consensus you can just go do
what you need.

> I don't understand why you're so violently opposed to letting an
> arguably flawed constant get fixed. I'm not even asking for this to be
> on by default.

Because its a pointless option. It adds nothing but an extra Kconfig
option for everyone. It's a complete waste. Policy belongs in user space.

Fix your userspace. It can't be *that* hard to add a mount call to a
chromium init. It's hardly rocket science. That gives you what you want,
avoids more configuration options, and doesn't mess up compatibility or
add confusing interactions for those users already using more flexible
security options than your mount.

Alan