2007-08-06 13:54:03

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

Apologies for the resend, but the original sending had the date in the
email header and it caused some of these to bounce...

( Please consider trimming the Cc list if discussing some aspect of this
that doesn't concern everyone.)

When an unprivileged process attempts to modify a file that has the
setuid or setgid bits set, the VFS will attempt to clear these bits. The
VFS will set the ATTR_KILL_SUID or ATTR_KILL_SGID bits in the ia_valid
mask, and then call notify_change to clear these bits and set the mode
accordingly.

With a networked filesystem (NFS in particular but most likely others),
the client machine may not have credentials that allow for the clearing
of these bits. In some situations, this can lead to file corruption, or
to an operation failing outright because the setattr fails.

In this situation, we'd like to just leave the handling of this to
the server and ignore these bits. The problem is that by the time
nfs_setattr is called, the VFS has already reinterpreted the ATTR_KILL_*
bits into a mode change. We can't fix this in the filesystems where
this is a problem, as doing so would leave us having to second-guess
what the VFS wants us to do. So we need to change it so that filesystems
have more flexibility in how to interpret the ATTR_KILL_* bits.

The first patch in the following patchset moves this logic into a helper
function, and then only calls this helper function for inodes that do
not have a setattr operation defined. The subsequent patches fix up
individual filesystem setattr functions to call this helper function.

The upshot of this is that with this change, filesystems that define
a setattr inode operation are now responsible for handling the ATTR_KILL
bits as well. They can trivially do so by calling the helper, but they
must do so.

Some of the follow-on patches may not be strictly necessary, but I
decided that it was better to take the conservative approach and call
the helper when I wasn't sure. I've tried to CC the maintainers
for the individual filesystems as well where I could find them,
please let me know if there are others who should be informed.

Comments and suggestions appreciated...

Signed-off-by: Jeff Layton <[email protected]>


2007-08-07 20:49:09

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

First thanks a lot for doing this work, it's been long needed.

Second please don't send out that many patches. We encourage people
to split things into small patches when the changes are logially
separated. Which these are not - it's a flag day change (which btw
is fine despite the rants soe people spewed in reply to this), so it
should be one single patch. (Or one for all mainline filesystems +
one per fs only in -mm to make Andrew's life a little easier if you
really care.)

2007-08-07 22:13:49

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

On Tue, 7 Aug 2007 21:49:09 +0100
Christoph Hellwig <[email protected]> wrote:

> First thanks a lot for doing this work, it's been long needed.
>
> Second please don't send out that many patches. We encourage people
> to split things into small patches when the changes are logially
> separated. Which these are not - it's a flag day change (which btw
> is fine despite the rants soe people spewed in reply to this), so it
> should be one single patch. (Or one for all mainline filesystems +
> one per fs only in -mm to make Andrew's life a little easier if you
> really care.)

Thanks. I debated about how best to split these up. A coworker
mentioned that Andrew had tossed him back a single patch that
touched several mainline filesystems and asked him to break it
up. I took that to mean that the patches should generally be split
out, but I guess I took that too far ;-)

--
Jeff Layton <[email protected]>

2007-08-08 00:15:01

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

On Mon, 6 Aug 2007 09:54:03 -0400
Jeff Layton <[email protected]> wrote:

> Apologies for the resend, but the original sending had the date in the
> email header and it caused some of these to bounce...
>
> ( Please consider trimming the Cc list if discussing some aspect of this
> that doesn't concern everyone.)
>
> When an unprivileged process attempts to modify a file that has the
> setuid or setgid bits set, the VFS will attempt to clear these bits. The
> VFS will set the ATTR_KILL_SUID or ATTR_KILL_SGID bits in the ia_valid
> mask, and then call notify_change to clear these bits and set the mode
> accordingly.
>
> With a networked filesystem (NFS in particular but most likely others),
> the client machine may not have credentials that allow for the clearing
> of these bits. In some situations, this can lead to file corruption, or
> to an operation failing outright because the setattr fails.
>
> In this situation, we'd like to just leave the handling of this to
> the server and ignore these bits. The problem is that by the time
> nfs_setattr is called, the VFS has already reinterpreted the ATTR_KILL_*
> bits into a mode change. We can't fix this in the filesystems where
> this is a problem, as doing so would leave us having to second-guess
> what the VFS wants us to do. So we need to change it so that filesystems
> have more flexibility in how to interpret the ATTR_KILL_* bits.
>
> The first patch in the following patchset moves this logic into a helper
> function, and then only calls this helper function for inodes that do
> not have a setattr operation defined. The subsequent patches fix up
> individual filesystem setattr functions to call this helper function.
>
> The upshot of this is that with this change, filesystems that define
> a setattr inode operation are now responsible for handling the ATTR_KILL
> bits as well. They can trivially do so by calling the helper, but they
> must do so.
>
> Some of the follow-on patches may not be strictly necessary, but I
> decided that it was better to take the conservative approach and call
> the helper when I wasn't sure. I've tried to CC the maintainers
> for the individual filesystems as well where I could find them,
> please let me know if there are others who should be informed.
>
> Comments and suggestions appreciated...
>

>From a purely practical standpoint: it's a concern that all filesytems need
patching to continue to correctly function after this change. There might
be filesystems which you missed, and there are out-of-tree filesystems
which won't be updated.

And I think the impact upon the out-of-tree filesystems would be fairly
severe: they quietly and subtly get their secutiry guarantees broken (I
think?)

Is there any way in which we can prevent these problems? Say

- rename something so that unconverted filesystems will reliably fail to
compile?

- leave existing filesystems alone, but add a new
inode_operations.setattr_jeff, which the networked filesytems can
implement, and teach core vfs to call setattr_jeff in preference to
setattr?

Something else?

2007-08-08 00:45:34

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

On Tue, 2007-08-07 at 17:15 -0700, Andrew Morton wrote:

> Is there any way in which we can prevent these problems? Say

The problem here is that we occasionally DO need to add new flags, and
yes, they MAY be security related. The whole reason why we're now having
to change the semantics of setattr is because somebody tried to hack
their way around the write+suid issue.

I suspect we will see the exact same thing will happen again in a couple
of years with Serge's ATTR_KILL_PRIV flag.

> - rename something so that unconverted filesystems will reliably fail to
> compile?
>
> - leave existing filesystems alone, but add a new
> inode_operations.setattr_jeff, which the networked filesytems can
> implement, and teach core vfs to call setattr_jeff in preference to
> setattr?

If you really need to know that the filesystem is handling the flags,
then how about instead having ->setattr() return something which
indicates which flags it actually handled? That is likely to be a far
more intrusive change, but it is one which is future-proof.

Trond

2007-08-08 00:54:02

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

On Tue, 07 Aug 2007 20:45:34 -0400
Trond Myklebust <[email protected]> wrote:

> > - rename something so that unconverted filesystems will reliably fail to
> > compile?
> >
> > - leave existing filesystems alone, but add a new
> > inode_operations.setattr_jeff, which the networked filesytems can
> > implement, and teach core vfs to call setattr_jeff in preference to
> > setattr?
>
> If you really need to know that the filesystem is handling the flags,
> then how about instead having ->setattr() return something which
> indicates which flags it actually handled? That is likely to be a far
> more intrusive change, but it is one which is future-proof.

If we change ->setattr so that it will return a positive, non-zero value
which the caller can then check and reliably do printk("that filesystem
needs updating") then that addresses my concern, sure.

2007-08-08 05:14:37

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

> >From a purely practical standpoint: it's a concern that all filesytems need
> patching to continue to correctly function after this change. There might
> be filesystems which you missed, and there are out-of-tree filesystems
> which won't be updated.
>
> And I think the impact upon the out-of-tree filesystems would be fairly
> severe: they quietly and subtly get their secutiry guarantees broken (I
> think?)
>
> Is there any way in which we can prevent these problems? Say
>
> - rename something so that unconverted filesystems will reliably fail to
> compile?

Maybe renaming ATTR_MODE to ATTR_MODE_SET (changing it's value as
well, so that binary stuff breaks visibly as well)?

This would make sense, because we are changing what this attribute
acually means. In the new code attr->ia_mode only contains the
originally set mode, not the ones we've added to change the suid bits.

Miklos

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

2007-08-08 12:54:35

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

On Tue, 7 Aug 2007 17:15:01 -0700
Andrew Morton <[email protected]> wrote:

> On Mon, 6 Aug 2007 09:54:03 -0400
> Jeff Layton <[email protected]> wrote:
>
> > Apologies for the resend, but the original sending had the date in the
> > email header and it caused some of these to bounce...
> >
> > ( Please consider trimming the Cc list if discussing some aspect of this
> > that doesn't concern everyone.)
> >
> > When an unprivileged process attempts to modify a file that has the
> > setuid or setgid bits set, the VFS will attempt to clear these bits. The
> > VFS will set the ATTR_KILL_SUID or ATTR_KILL_SGID bits in the ia_valid
> > mask, and then call notify_change to clear these bits and set the mode
> > accordingly.
> >
> > With a networked filesystem (NFS in particular but most likely others),
> > the client machine may not have credentials that allow for the clearing
> > of these bits. In some situations, this can lead to file corruption, or
> > to an operation failing outright because the setattr fails.
> >
> > In this situation, we'd like to just leave the handling of this to
> > the server and ignore these bits. The problem is that by the time
> > nfs_setattr is called, the VFS has already reinterpreted the ATTR_KILL_*
> > bits into a mode change. We can't fix this in the filesystems where
> > this is a problem, as doing so would leave us having to second-guess
> > what the VFS wants us to do. So we need to change it so that filesystems
> > have more flexibility in how to interpret the ATTR_KILL_* bits.
> >
> > The first patch in the following patchset moves this logic into a helper
> > function, and then only calls this helper function for inodes that do
> > not have a setattr operation defined. The subsequent patches fix up
> > individual filesystem setattr functions to call this helper function.
> >
> > The upshot of this is that with this change, filesystems that define
> > a setattr inode operation are now responsible for handling the ATTR_KILL
> > bits as well. They can trivially do so by calling the helper, but they
> > must do so.
> >
> > Some of the follow-on patches may not be strictly necessary, but I
> > decided that it was better to take the conservative approach and call
> > the helper when I wasn't sure. I've tried to CC the maintainers
> > for the individual filesystems as well where I could find them,
> > please let me know if there are others who should be informed.
> >
> > Comments and suggestions appreciated...
> >
>
> From a purely practical standpoint: it's a concern that all filesytems need
> patching to continue to correctly function after this change. There might
> be filesystems which you missed, and there are out-of-tree filesystems
> which won't be updated.
>
> And I think the impact upon the out-of-tree filesystems would be fairly
> severe: they quietly and subtly get their secutiry guarantees broken (I
> think?)
>

Yep. Any filesystem that declares a setattr op will have to deal with
the ATTR_KILL_S* flags themselves. The breakage will be silent too.

> Is there any way in which we can prevent these problems? Say
>
> - rename something so that unconverted filesystems will reliably fail to
> compile?
>

I suppose we could rename the .setattr inode operation to something
else, but then we'll be stuck with it for at least a while. That seems
sort of kludgey too...

> - leave existing filesystems alone, but add a new
> inode_operations.setattr_jeff, which the networked filesytems can
> implement, and teach core vfs to call setattr_jeff in preference to
> setattr?
>
> Something else?

There's also the approach suggested by Miklos: Add a new inode flag that
tells notify_change not to convert ATTR_KILL_S* flags into a mode
change. Basically, allow filesystems to "opt out" of that behavior.

I'd definitly pick that over a new inode op. That would also allow the
default case be for the VFS to continue handling these flags.
Everything would continue to work but filesystems that need to handle
these flags differently would be able to do so.

Thoughts?

--
Jeff Layton <[email protected]>

2007-08-08 16:48:53

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

On Wed, 8 Aug 2007 08:54:35 -0400 Jeff Layton <[email protected]> wrote:

> On Tue, 7 Aug 2007 17:15:01 -0700
> Andrew Morton <akpm-de/[email protected]> wrote:
>
> > On Mon, 6 Aug 2007 09:54:03 -0400
> > Jeff Layton <[email protected]> wrote:
> >
> > Is there any way in which we can prevent these problems? Say
> >
> > - rename something so that unconverted filesystems will reliably fail to
> > compile?
> >
>
> I suppose we could rename the .setattr inode operation to something
> else, but then we'll be stuck with it for at least a while. That seems
> sort of kludgey too...

Sure. We're changing the required behaviour of .setattr. Changing its
name is a fine and reasonably reliable way to communicate that fact.

> > - leave existing filesystems alone, but add a new
> > inode_operations.setattr_jeff, which the networked filesytems can
> > implement, and teach core vfs to call setattr_jeff in preference to
> > setattr?
> >
> > Something else?
>
> There's also the approach suggested by Miklos: Add a new inode flag that
> tells notify_change not to convert ATTR_KILL_S* flags into a mode
> change. Basically, allow filesystems to "opt out" of that behavior.
>
> I'd definitly pick that over a new inode op. That would also allow the
> default case be for the VFS to continue handling these flags.
> Everything would continue to work but filesystems that need to handle
> these flags differently would be able to do so.
>

We should opt for whatever produces the best end state in the kernel tree.
ie: if it takes more work and a larger patch to create a better result,
let's go for the better result. We merge large patches all the time. We
prefer to smash through, get it right whatever the transient cost. But
quietly making out-of-tree filesystems less secure is a pretty high cost.

I'm suspecting that adding more flags and some code to test them purely to
minimise the size of the patch and to retain compatibility with the old
.setattr is not a good tradeoff, given that we'd carry the flags and tests
for evermore.

So I'd suggest s/setattr/something_else/g.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

2007-08-08 20:05:13

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)


On Aug 8 2007 09:48, Andrew Morton wrote:
>> > On Mon, 6 Aug 2007 09:54:03 -0400
>> > Jeff Layton <[email protected]> wrote:
>> >
>> > Is there any way in which we can prevent these problems? Say
>> >
>> > - rename something so that unconverted filesystems will reliably fail to
>> > compile?
>> >
>>
>> I suppose we could rename the .setattr inode operation to something
>> else, but then we'll be stuck with it for at least a while. That seems
>> sort of kludgey too...
>
>Sure. We're changing the required behaviour of .setattr. Changing its
>name is a fine and reasonably reliable way to communicate that fact.

Maybe ->chattr/->chgattr?


Jan
--

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

2007-08-09 11:55:02

by Jeff Layton

[permalink] [raw]
Subject: Re: [fuse-devel] [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

On Wed, 8 Aug 2007 22:05:13 +0200 (CEST)
Jan Engelhardt <[email protected]> wrote:

>
> On Aug 8 2007 09:48, Andrew Morton wrote:
> >> > On Mon, 6 Aug 2007 09:54:03 -0400
> >> > Jeff Layton <[email protected]> wrote:
> >> >
> >> > Is there any way in which we can prevent these problems? Say
> >> >
> >> > - rename something so that unconverted filesystems will reliably fail to
> >> > compile?
> >> >
> >>
> >> I suppose we could rename the .setattr inode operation to something
> >> else, but then we'll be stuck with it for at least a while. That seems
> >> sort of kludgey too...
> >
> >Sure. We're changing the required behaviour of .setattr. Changing its
> >name is a fine and reasonably reliable way to communicate that fact.
>
> Maybe ->chattr/->chgattr?
>
>

That seems like a good replacement name. :-)

Now that I think on this further though, maybe Trond's suggestion to
change how the return code works is the best one. That would
(hopefully) catch this problem at runtime, so if someone is using a
precompiled but unconverted module then that would be detected too.

--
Jeff Layton <[email protected]>

2007-08-10 20:47:52

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

On Tue, 07 Aug 2007 20:45:34 -0400
Trond Myklebust <[email protected]> wrote:
> > - rename something so that unconverted filesystems will reliably fail to
> > compile?
> >
> > - leave existing filesystems alone, but add a new
> > inode_operations.setattr_jeff, which the networked filesytems can
> > implement, and teach core vfs to call setattr_jeff in preference to
> > setattr?
>
> If you really need to know that the filesystem is handling the flags,
> then how about instead having ->setattr() return something which
> indicates which flags it actually handled? That is likely to be a far
> more intrusive change, but it is one which is future-proof.
>

One thing that we could do here is have notify_change check
attr->ia_valid after the setattr operation returns. If either ATTR_KILL_*
bit is set then BUG(). The helper function already clears those bits
so anything using it should automatically be ok. We'd have to fix
up NFS and a few others that don't implement suid/sgid.

This is not as certain as changing the name of the inode operation. It
would only pop when someone is attempting to change a setuid/setgid
file on these filesystems. Still, it should conceivably catch most if
not all offenders. Would that be sufficient to take care of everyone's
concerns?

--
Jeff Layton <[email protected]>

2007-08-11 02:57:39

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

On Fri, Aug 10, 2007 at 04:47:52PM -0400, Jeff Layton wrote:
> attr->ia_valid after the setattr operation returns. If either ATTR_KILL_*
> bit is set then BUG(). The helper function already clears those bits
> so anything using it should automatically be ok. We'd have to fix
> up NFS and a few others that don't implement suid/sgid.
>
> This is not as certain as changing the name of the inode operation. It
> would only pop when someone is attempting to change a setuid/setgid
> file on these filesystems. Still, it should conceivably catch most if
> not all offenders. Would that be sufficient to take care of everyone's
> concerns?

I like the idea of checking ia_valid after return a lot. But instead of
going BUG() it should just do the default action, that we can avoid
touching all the filesystem and only need to change those that need
special care. I also have plans to add some new AT_ flags for implementing
some filesystem ioctl in generic code that would benefit greatly from
the ia_valid checkin after return to return ENOTTY fr filesystems not
implementing those ioctls.

2007-08-13 12:01:34

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

On Sat, 11 Aug 2007 03:57:39 +0100
Christoph Hellwig <[email protected]> wrote:
>
> I like the idea of checking ia_valid after return a lot. But instead of
> going BUG() it should just do the default action, that we can avoid
> touching all the filesystem and only need to change those that need
> special care. I also have plans to add some new AT_ flags for implementing
> some filesystem ioctl in generic code that would benefit greatly from
> the ia_valid checkin after return to return ENOTTY fr filesystems not
> implementing those ioctls.

That sounds good (if I follow your meaning correctly). How about
something like the patch below? If either ATTR_KILL bit is set after
the setattr, try to handle them in the "standard" way with a second
setattr call. It also does a printk in this situation to alert
filesystem developers that they should convert to the "new" scheme,
so they can avoid this second setattr call.

If this idea seems sound then I'll start the grunt work to fix up the
in-tree filesystems so that they don't need the second setattr call.

Signed-off-by: Jeff Layton <[email protected]>

commit 521ada37ab165d9d0a12dfb59a631a2ec58a8f84
Author: Jeff Layton <[email protected]>
Date: Mon Aug 13 07:43:56 2007 -0400

VFS: move ATTR_KILL handling from notify_change into helper function

Separate the handling of the local ia_valid bitmask from the one in
attr->ia_valid. This allows us to hand off the actual handling of the
ATTR_KILL_* flags to the .setattr i_op when one is defined.

notify_change still needs to process those flags for the local ia_valid
variable, since it uses that to decide whether to return early, and to pass
a (hopefully) appropriate bitmask to fsnotify_change.

Also, check the ia_valid after the setattr op returns and see if either
ATTR_KILL_* bit is set. If so, then throw a warning and try to clear the
bits in the "standard" way. This should help us to catch filesystems that
don't handle these bits correctly without breaking them outright.

diff --git a/fs/attr.c b/fs/attr.c
index f8dfc22..7cbb883 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -100,15 +100,53 @@ int inode_setattr(struct inode * inode, struct iattr * attr)
}
EXPORT_SYMBOL(inode_setattr);

+/**
+ * generic_attrkill - helper to convert ATTR_KILL_* bits into mode change
+ * @mode: current mode of inode
+ * @attr: inode attribute changes requested by VFS
+ * Context: anywhere
+ *
+ * This is a helper function to convert ATTR_KILL_SUID and ATTR_KILL_SGID
+ * into a mode change. Filesystems should call this from their setattr
+ * inode op when they want "conventional" setuid-clearing behavior.
+ *
+ * Filesystems that declare a setattr inode operation are now expected to
+ * handle the ATTR_KILL_SUID and ATTR_KILL_SGID bits appropriately. The VFS
+ * no longer automatically converts these bits to a mode change for
+ * inodes that have their own setattr operation.
+ **/
+void generic_attrkill(mode_t mode, struct iattr *attr)
+{
+ if (attr->ia_valid & ATTR_KILL_SUID) {
+ attr->ia_valid &= ~ATTR_KILL_SUID;
+ if (mode & S_ISUID) {
+ if (!(attr->ia_valid & ATTR_MODE)) {
+ attr->ia_valid |= ATTR_MODE;
+ attr->ia_mode = mode;
+ }
+ attr->ia_mode &= ~S_ISUID;
+ }
+ }
+ if (attr->ia_valid & ATTR_KILL_SGID) {
+ attr->ia_valid &= ~ATTR_KILL_SGID;
+ if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
+ if (!(attr->ia_valid & ATTR_MODE)) {
+ attr->ia_valid |= ATTR_MODE;
+ attr->ia_mode = mode;
+ }
+ attr->ia_mode &= ~S_ISGID;
+ }
+ }
+}
+EXPORT_SYMBOL(generic_attrkill);
+
int notify_change(struct dentry * dentry, struct iattr * attr)
{
struct inode *inode = dentry->d_inode;
- mode_t mode;
- int error;
+ int error, once = 0;
struct timespec now;
unsigned int ia_valid = attr->ia_valid;

- mode = inode->i_mode;
now = current_fs_time(inode->i_sb);

attr->ia_ctime = now;
@@ -117,36 +155,48 @@ int notify_change(struct dentry * dentry, struct iattr * attr)
if (!(ia_valid & ATTR_MTIME_SET))
attr->ia_mtime = now;
if (ia_valid & ATTR_KILL_SUID) {
- attr->ia_valid &= ~ATTR_KILL_SUID;
- if (mode & S_ISUID) {
- if (!(ia_valid & ATTR_MODE)) {
- ia_valid = attr->ia_valid |= ATTR_MODE;
- attr->ia_mode = inode->i_mode;
- }
- attr->ia_mode &= ~S_ISUID;
- }
+ ia_valid &= ~ATTR_KILL_SUID;
+ if (inode->i_mode & S_ISUID)
+ ia_valid |= ATTR_MODE;
}
if (ia_valid & ATTR_KILL_SGID) {
- attr->ia_valid &= ~ ATTR_KILL_SGID;
- if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
- if (!(ia_valid & ATTR_MODE)) {
- ia_valid = attr->ia_valid |= ATTR_MODE;
- attr->ia_mode = inode->i_mode;
- }
- attr->ia_mode &= ~S_ISGID;
- }
+ ia_valid &= ~ATTR_KILL_SGID;
+ if ((inode->i_mode & (S_ISGID | S_IXGRP)) ==
+ (S_ISGID | S_IXGRP))
+ ia_valid |= ATTR_MODE;
}
- if (!attr->ia_valid)
+ if (!ia_valid)
return 0;

if (ia_valid & ATTR_SIZE)
down_write(&dentry->d_inode->i_alloc_sem);

if (inode->i_op && inode->i_op->setattr) {
+try_again:
error = security_inode_setattr(dentry, attr);
if (!error)
error = inode->i_op->setattr(dentry, attr);
+ /*
+ * if ATTR_KILL_SUID or ATTR_KILL_SGID is still set, then
+ * assume that the setattr inode op didn't handle them
+ * correctly. Try to clear these bits the standard way
+ * and throw a warning.
+ */
+ if (!error && !once && unlikely(attr->ia_valid &
+ (ATTR_KILL_SUID|ATTR_KILL_SGID))) {
+ attr->ia_valid &=
+ (ATTR_MODE|ATTR_KILL_SUID|ATTR_KILL_SGID);
+ generic_attrkill(inode->i_mode, attr);
+ once = 1;
+ if (printk_ratelimit())
+ printk(KERN_ERR "%s: %s doesn't seem to "
+ "handle setuid/setgid bit clearing "
+ "correctly.\n"
+ __func__, inode->i_sb->s_type->name);
+ goto try_again;
+ }
} else {
+ generic_attrkill(inode->i_mode, attr);
error = inode_change_ok(inode, attr);
if (!error)
error = security_inode_setattr(dentry, attr);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6bf1395..daac0e5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1561,6 +1561,7 @@ extern int do_remount_sb(struct super_block *sb, int flags,
#ifdef CONFIG_BLOCK
extern sector_t bmap(struct inode *, sector_t);
#endif
+extern void generic_attrkill(mode_t mode, struct iattr *attr);
extern int notify_change(struct dentry *, struct iattr *);
extern int permission(struct inode *, int, struct nameidata *);
extern int generic_permission(struct inode *, int,

2007-08-13 12:36:16

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

On Mon, 13 Aug 2007 08:01:34 -0400
Jeff Layton <[email protected]> wrote:

> On Sat, 11 Aug 2007 03:57:39 +0100
> Christoph Hellwig <[email protected]> wrote:
> >
> > I like the idea of checking ia_valid after return a lot. But instead of
> > going BUG() it should just do the default action, that we can avoid
> > touching all the filesystem and only need to change those that need
> > special care. I also have plans to add some new AT_ flags for implementing
> > some filesystem ioctl in generic code that would benefit greatly from
> > the ia_valid checkin after return to return ENOTTY fr filesystems not
> > implementing those ioctls.
>
> That sounds good (if I follow your meaning correctly). How about
> something like the patch below? If either ATTR_KILL bit is set after
> the setattr, try to handle them in the "standard" way with a second
> setattr call. It also does a printk in this situation to alert
> filesystem developers that they should convert to the "new" scheme,
> so they can avoid this second setattr call.
>
> If this idea seems sound then I'll start the grunt work to fix up the
> in-tree filesystems so that they don't need the second setattr call.
>

The earlier patch has a misplaced comma and won't build. This one
builds. This should be considered an RFC, as I've not yet done any
real testing of this patch:

Signed-off-by: Jeff Layton <[email protected]>

commit ad00450c4532da32718efe39e27d99060f04e396
Author: Jeff Layton <[email protected]>
Date: Mon Aug 13 08:33:10 2007 -0400

VFS: move ATTR_KILL handling from notify_change into helper function

Separate the handling of the local ia_valid bitmask from the one in
attr->ia_valid. This allows us to hand off the actual handling of the
ATTR_KILL_* flags to the .setattr i_op when one is defined.

notify_change still needs to process those flags for the local ia_valid
variable, since it uses that to decide whether to return early, and to pass
a (hopefully) appropriate bitmask to fsnotify_change.

Also, check the ia_valid after the setattr op returns and see if either
ATTR_KILL_* bit is set. If so, then throw a warning and try to clear the
bits in the "standard" way. This should help us to catch filesystems that
don't handle these bits correctly without breaking them outright.

diff --git a/fs/attr.c b/fs/attr.c
index f8dfc22..9585e45 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -100,15 +100,53 @@ int inode_setattr(struct inode * inode, struct iattr * attr)
}
EXPORT_SYMBOL(inode_setattr);

+/**
+ * generic_attrkill - helper to convert ATTR_KILL_* bits into mode change
+ * @mode: current mode of inode
+ * @attr: inode attribute changes requested by VFS
+ * Context: anywhere
+ *
+ * This is a helper function to convert ATTR_KILL_SUID and ATTR_KILL_SGID
+ * into a mode change. Filesystems should call this from their setattr
+ * inode op when they want "conventional" setuid-clearing behavior.
+ *
+ * Filesystems that declare a setattr inode operation are now expected to
+ * handle the ATTR_KILL_SUID and ATTR_KILL_SGID bits appropriately. The VFS
+ * no longer automatically converts these bits to a mode change for
+ * inodes that have their own setattr operation.
+ **/
+void generic_attrkill(mode_t mode, struct iattr *attr)
+{
+ if (attr->ia_valid & ATTR_KILL_SUID) {
+ attr->ia_valid &= ~ATTR_KILL_SUID;
+ if (mode & S_ISUID) {
+ if (!(attr->ia_valid & ATTR_MODE)) {
+ attr->ia_valid |= ATTR_MODE;
+ attr->ia_mode = mode;
+ }
+ attr->ia_mode &= ~S_ISUID;
+ }
+ }
+ if (attr->ia_valid & ATTR_KILL_SGID) {
+ attr->ia_valid &= ~ATTR_KILL_SGID;
+ if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
+ if (!(attr->ia_valid & ATTR_MODE)) {
+ attr->ia_valid |= ATTR_MODE;
+ attr->ia_mode = mode;
+ }
+ attr->ia_mode &= ~S_ISGID;
+ }
+ }
+}
+EXPORT_SYMBOL(generic_attrkill);
+
int notify_change(struct dentry * dentry, struct iattr * attr)
{
struct inode *inode = dentry->d_inode;
- mode_t mode;
- int error;
+ int error, once = 0;
struct timespec now;
unsigned int ia_valid = attr->ia_valid;

- mode = inode->i_mode;
now = current_fs_time(inode->i_sb);

attr->ia_ctime = now;
@@ -117,36 +155,48 @@ int notify_change(struct dentry * dentry, struct iattr * attr)
if (!(ia_valid & ATTR_MTIME_SET))
attr->ia_mtime = now;
if (ia_valid & ATTR_KILL_SUID) {
- attr->ia_valid &= ~ATTR_KILL_SUID;
- if (mode & S_ISUID) {
- if (!(ia_valid & ATTR_MODE)) {
- ia_valid = attr->ia_valid |= ATTR_MODE;
- attr->ia_mode = inode->i_mode;
- }
- attr->ia_mode &= ~S_ISUID;
- }
+ ia_valid &= ~ATTR_KILL_SUID;
+ if (inode->i_mode & S_ISUID)
+ ia_valid |= ATTR_MODE;
}
if (ia_valid & ATTR_KILL_SGID) {
- attr->ia_valid &= ~ ATTR_KILL_SGID;
- if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
- if (!(ia_valid & ATTR_MODE)) {
- ia_valid = attr->ia_valid |= ATTR_MODE;
- attr->ia_mode = inode->i_mode;
- }
- attr->ia_mode &= ~S_ISGID;
- }
+ ia_valid &= ~ATTR_KILL_SGID;
+ if ((inode->i_mode & (S_ISGID | S_IXGRP)) ==
+ (S_ISGID | S_IXGRP))
+ ia_valid |= ATTR_MODE;
}
- if (!attr->ia_valid)
+ if (!ia_valid)
return 0;

if (ia_valid & ATTR_SIZE)
down_write(&dentry->d_inode->i_alloc_sem);

if (inode->i_op && inode->i_op->setattr) {
+try_again:
error = security_inode_setattr(dentry, attr);
if (!error)
error = inode->i_op->setattr(dentry, attr);
+ /*
+ * if ATTR_KILL_SUID or ATTR_KILL_SGID is still set, then
+ * assume that the setattr inode op didn't handle them
+ * correctly. Try to clear these bits the standard way
+ * and throw a warning.
+ */
+ if (!error && !once && unlikely(attr->ia_valid &
+ (ATTR_KILL_SUID|ATTR_KILL_SGID))) {
+ attr->ia_valid &=
+ (ATTR_MODE|ATTR_KILL_SUID|ATTR_KILL_SGID);
+ generic_attrkill(inode->i_mode, attr);
+ once = 1;
+ if (printk_ratelimit())
+ printk(KERN_ERR "%s: %s doesn't seem to "
+ "handle setuid/setgid bit clearing "
+ "correctly.\n", __func__,
+ inode->i_sb->s_type->name);
+ goto try_again;
+ }
} else {
+ generic_attrkill(inode->i_mode, attr);
error = inode_change_ok(inode, attr);
if (!error)
error = security_inode_setattr(dentry, attr);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6bf1395..daac0e5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1561,6 +1561,7 @@ extern int do_remount_sb(struct super_block *sb, int flags,
#ifdef CONFIG_BLOCK
extern sector_t bmap(struct inode *, sector_t);
#endif
+extern void generic_attrkill(mode_t mode, struct iattr *attr);
extern int notify_change(struct dentry *, struct iattr *);
extern int permission(struct inode *, int, struct nameidata *);
extern int generic_permission(struct inode *, int,