Remove the need for having CAP_SYS_RAWIO when doing a FIBMAP call on an open file descriptor.
It would be nice to allow users to have permission to see where their data is landing on disk, and there really isn't a good reason to keep them from getting at this information.
Signed-off-by: Mike Waychison <[email protected]>
fs/ioctl.c | 2 --
1 file changed, 2 deletions(-)
Index: linux-2.6.23/fs/ioctl.c
===================================================================
--- linux-2.6.23.orig/fs/ioctl.c 2007-10-09 13:31:38.000000000 -0700
+++ linux-2.6.23/fs/ioctl.c 2007-10-25 15:48:24.000000000 -0700
@@ -56,8 +56,6 @@ static int file_ioctl(struct file *filp,
/* do we support this mess? */
if (!mapping->a_ops->bmap)
return -EINVAL;
- if (!capable(CAP_SYS_RAWIO))
- return -EPERM;
if ((error = get_user(block, p)) != 0)
return error;
--
On Thu, 25 Oct 2007 16:06:40 -0700
Mike Waychison <[email protected]> wrote:
> Remove the need for having CAP_SYS_RAWIO when doing a FIBMAP call on an open file descriptor.
>
> It would be nice to allow users to have permission to see where their data is landing on disk, and there really isn't a good reason to keep them from getting at this information.
Historically this was done because people felt it was more secure. It
also allows you to make some deductions about other activities on the
disk but thats probably only a concern for very very security crazed
compartmentalised boxes
Also historically at least FIBMAP could be abused to crash the system.
Now if you can verify that has been fixed I have no problem, but given
that I can find no record of that being fixed it would be wise to audit
it first and review Chris Evans and other reports about what occurs when
FIBMAP is passed random block numbers.
FIBMAP has another problem for this general use as well - it takes an int
but the block number can now be bigger for very large files on 32bit.
Alan
Alan Cox wrote:
> On Thu, 25 Oct 2007 16:06:40 -0700
> Mike Waychison <[email protected]> wrote:
>
>> Remove the need for having CAP_SYS_RAWIO when doing a FIBMAP call on an open file descriptor.
>>
>> It would be nice to allow users to have permission to see where their data is landing on disk, and there really isn't a good reason to keep them from getting at this information.
>
> Historically this was done because people felt it was more secure. It
> also allows you to make some deductions about other activities on the
> disk but thats probably only a concern for very very security crazed
> compartmentalised boxes
>
> Also historically at least FIBMAP could be abused to crash the system.
> Now if you can verify that has been fixed I have no problem, but given
> that I can find no record of that being fixed it would be wise to audit
> it first and review Chris Evans and other reports about what occurs when
> FIBMAP is passed random block numbers.
>
> FIBMAP has another problem for this general use as well - it takes an int
> but the block number can now be bigger for very large files on 32bit.
>
> Alan
I found Chris's comment about negative block numbers, I'll send a patch
out for that.
You mentioned back in 99 about racing with ftruncate. Is it sufficient
to mutex_lock(i_mutex) and down_read(i_alloc_sem)?
Mike Waychison
> I found Chris's comment about negative block numbers, I'll send a patch
> out for that.
>
> You mentioned back in 99 about racing with ftruncate. Is it sufficient
> to mutex_lock(i_mutex) and down_read(i_alloc_sem)?
One for the fs guys. That code has changed far beyond anything I
understand any more 8)
Jason Uhlenkott wrote:
> On Fri, Oct 26, 2007 at 01:22:17 +0100, Alan Cox wrote:
>> On Thu, 25 Oct 2007 16:06:40 -0700
>> Mike Waychison <[email protected]> wrote:
>>
>>> Remove the need for having CAP_SYS_RAWIO when doing a FIBMAP call on an open file descriptor.
>>>
>>> It would be nice to allow users to have permission to see where their data is landing on disk, and there really isn't a good reason to keep them from getting at this information.
>> Historically this was done because people felt it was more secure. It
>> also allows you to make some deductions about other activities on the
>> disk but thats probably only a concern for very very security crazed
>> compartmentalised boxes
>>
>> Also historically at least FIBMAP could be abused to crash the system.
>> Now if you can verify that has been fixed I have no problem, but given
>> that I can find no record of that being fixed it would be wise to audit
>> it first and review Chris Evans and other reports about what occurs when
>> FIBMAP is passed random block numbers.
>>
>> FIBMAP has another problem for this general use as well - it takes an int
>> but the block number can now be bigger for very large files on 32bit.
>
> Additionally, ext3_bmap() has this to say about it:
>
> if (EXT3_I(inode)->i_state & EXT3_STATE_JDATA) {
> /*
> * This is a REALLY heavyweight approach, but the use of
> * bmap on dirty files is expected to be extremely rare:
> * only if we run lilo or swapon on a freshly made file
> * do we expect this to happen.
> *
> * (bmap requires CAP_SYS_RAWIO so this does not
> * represent an unprivileged user DOS attack --- we'd be
> * in trouble if mortal users could trigger this path at
> * will.)
Hmm. I don't know what the right approach to this is. This seems to be
the same situation as the delayed allocation problem, no?
What if we just returned 0? Tools like lilo are already doing sync(),
would that cause the journal to get flushed explicitly anyway?
Mike Waychison
On Fri, Oct 26, 2007 at 01:22:17 +0100, Alan Cox wrote:
> On Thu, 25 Oct 2007 16:06:40 -0700
> Mike Waychison <[email protected]> wrote:
>
> > Remove the need for having CAP_SYS_RAWIO when doing a FIBMAP call on an open file descriptor.
> >
> > It would be nice to allow users to have permission to see where their data is landing on disk, and there really isn't a good reason to keep them from getting at this information.
>
> Historically this was done because people felt it was more secure. It
> also allows you to make some deductions about other activities on the
> disk but thats probably only a concern for very very security crazed
> compartmentalised boxes
>
> Also historically at least FIBMAP could be abused to crash the system.
> Now if you can verify that has been fixed I have no problem, but given
> that I can find no record of that being fixed it would be wise to audit
> it first and review Chris Evans and other reports about what occurs when
> FIBMAP is passed random block numbers.
>
> FIBMAP has another problem for this general use as well - it takes an int
> but the block number can now be bigger for very large files on 32bit.
Additionally, ext3_bmap() has this to say about it:
if (EXT3_I(inode)->i_state & EXT3_STATE_JDATA) {
/*
* This is a REALLY heavyweight approach, but the use of
* bmap on dirty files is expected to be extremely rare:
* only if we run lilo or swapon on a freshly made file
* do we expect this to happen.
*
* (bmap requires CAP_SYS_RAWIO so this does not
* represent an unprivileged user DOS attack --- we'd be
* in trouble if mortal users could trigger this path at
* will.)
On Fri, Oct 26, 2007 at 14:59:57 -0700, Mike Waychison wrote:
> Jason Uhlenkott wrote:
> >Additionally, ext3_bmap() has this to say about it:
> >
> > if (EXT3_I(inode)->i_state & EXT3_STATE_JDATA) {
> > /*
> > * This is a REALLY heavyweight approach, but the use of
> > * bmap on dirty files is expected to be extremely rare:
> > * only if we run lilo or swapon on a freshly made file
> > * do we expect this to happen.
> > *
> > * (bmap requires CAP_SYS_RAWIO so this does not
> > * represent an unprivileged user DOS attack --- we'd be
> > * in trouble if mortal users could trigger this path at
> > * will.)
>
> Hmm. I don't know what the right approach to this is. This seems to be
> the same situation as the delayed allocation problem, no?
Yup.
> What if we just returned 0? Tools like lilo are already doing sync(),
> would that cause the journal to get flushed explicitly anyway?
Not sure, but I'd be pretty nervous about breaking any existing users
which aren't explicitly syncing.
Are you envisioning users who want to see where their data is landing
for performance reasons? It seems like such users are going to have
sufficiently different desires from existing FIBMAP users (who need to
know where everything is because they intend to fiddle with the raw
device) that a different interface might be warranted.
Jason Uhlenkott wrote:
> On Fri, Oct 26, 2007 at 14:59:57 -0700, Mike Waychison wrote:
>> Jason Uhlenkott wrote:
>>> Additionally, ext3_bmap() has this to say about it:
>>>
>>> if (EXT3_I(inode)->i_state & EXT3_STATE_JDATA) {
>>> /*
>>> * This is a REALLY heavyweight approach, but the use of
>>> * bmap on dirty files is expected to be extremely rare:
>>> * only if we run lilo or swapon on a freshly made file
>>> * do we expect this to happen.
>>> *
>>> * (bmap requires CAP_SYS_RAWIO so this does not
>>> * represent an unprivileged user DOS attack --- we'd be
>>> * in trouble if mortal users could trigger this path at
>>> * will.)
>> Hmm. I don't know what the right approach to this is. This seems to be
>> the same situation as the delayed allocation problem, no?
>
> Yup.
>
>> What if we just returned 0? Tools like lilo are already doing sync(),
>> would that cause the journal to get flushed explicitly anyway?
>
> Not sure, but I'd be pretty nervous about breaking any existing users
> which aren't explicitly syncing.
True. We can probably get away with an implicit flush when
CAP_SYS_RAWIO is set, but that's pretty gross :(
>
> Are you envisioning users who want to see where their data is landing
> for performance reasons? It seems like such users are going to have
> sufficiently different desires from existing FIBMAP users (who need to
> know where everything is because they intend to fiddle with the raw
> device) that a different interface might be warranted.
A little of both ;)
We could introduce a new API, though either way, the same fundamental
problems apply wrt auditing.
I see three reasons that new APIs are warranted:
a) to deal with block numbers > 2^31 --> FIBMAP64
b) to have a path where no syncing is required due to worries about user
DoS (delayed allocation / data in journal).
c) possibly some way to FIBMAP a range so that userspace doesn't need to
syscall for each block, something like how mincore() does it?
I have a patchset ready that I'll send out shortly that introduces
FIBMAP64. The last patch in that set drops the CAP_SYS_RAWIO, but it's
probably not what we want given DoS case. I'd like to send it out
anyway to get some comments on some of the sanity checks and locking I'm
adding.
Handling (c) above is just extra sugar and isn't something I'm too
worried about implementing.
Mike Waychison
Hi!
> Remove the need for having CAP_SYS_RAWIO when doing a FIBMAP call on an open file descriptor.
>
> It would be nice to allow users to have permission to see where their data is landing on disk, and there really isn't a good reason to keep them from getting at this information.
I believe it is to prevent users from intentionally creating extremely
fragmented files...
You can read 60MB in a second, but fragmented 60MB file could take
10msec * 60MB/4KB = 150 seconds. That's factor 150 slowdown...
...but I agree that SYS_RAWIO may be wrong capability to cover this.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Pavel Machek wrote:
> Hi!
>
>> Remove the need for having CAP_SYS_RAWIO when doing a FIBMAP call on an open file descriptor.
>>
>> It would be nice to allow users to have permission to see where their data is landing on disk, and there really isn't a good reason to keep them from getting at this information.
>
> I believe it is to prevent users from intentionally creating extremely
> fragmented files...
>
> You can read 60MB in a second, but fragmented 60MB file could take
> 10msec * 60MB/4KB = 150 seconds. That's factor 150 slowdown...
>
> ...but I agree that SYS_RAWIO may be wrong capability to cover this.
>
> Pavel
I don't see how restricting FIBMAP use helps prevent fragmentation since FIBMAP
just allows you to see what damage was already done.
You can create nicely fragmented files simply by having multiple threads writing
concurrently to one or more files in the same directory (depending on the file
system, allocation policy, etc).
ric