2015-12-17 01:42:14

by Qu Wenruo

[permalink] [raw]
Subject: Ideas on unified real-ro mount option across all filesystems

Hi,

In a recent btrfs patch, it is going to add a mount option to disable
log replay for btrfs, just like "norecovery" for ext4/xfs.

But in the discussion on the mount option name and use case, it seems
better to have an unified and fs independent mount option alias for real
RO mount

Reasons:
1) Some file system may have already used [no]"recovery" mount option
In fact, btrfs has already used "recovery" mount option.
Using "norecovery" mount option will be quite confusing for btrfs.

2) More straight forward mount option
Currently, to get real RO mount, for ext4/xfs, user must use -o
ro,norecovery.
Just ro won't ensure real RO, and norecovery can't be used alone.
If we have a simple alias, it would be much better for user to use.
(it maybe done just in user space mount)

Not to mention some fs (yeah, btrfs again) doesn't have "norecovery"
but "nologreplay".

3) A lot of user even don't now mount ro can still modify device
Yes, I didn't know this point until I checked the log replay code of
btrfs.
Adding such mount option alias may raise some attention of users.


Any ideas about this?

Thanks,
Qu




2015-12-17 01:58:32

by Qu Wenruo

[permalink] [raw]
Subject: Re: Ideas on unified real-ro mount option across all filesystems

And here is the existing discussion in btrfs mail list, just for reference:

http://thread.gmane.org/gmane.comp.file-systems.btrfs/51098

Thanks,
Qu

Qu Wenruo wrote on 2015/12/17 09:41 +0800:
> Hi,
>
> In a recent btrfs patch, it is going to add a mount option to disable
> log replay for btrfs, just like "norecovery" for ext4/xfs.
>
> But in the discussion on the mount option name and use case, it seems
> better to have an unified and fs independent mount option alias for real
> RO mount
>
> Reasons:
> 1) Some file system may have already used [no]"recovery" mount option
> In fact, btrfs has already used "recovery" mount option.
> Using "norecovery" mount option will be quite confusing for btrfs.
>
> 2) More straight forward mount option
> Currently, to get real RO mount, for ext4/xfs, user must use -o
> ro,norecovery.
> Just ro won't ensure real RO, and norecovery can't be used alone.
> If we have a simple alias, it would be much better for user to use.
> (it maybe done just in user space mount)
>
> Not to mention some fs (yeah, btrfs again) doesn't have "norecovery"
> but "nologreplay".
>
> 3) A lot of user even don't now mount ro can still modify device
> Yes, I didn't know this point until I checked the log replay code of
> btrfs.
> Adding such mount option alias may raise some attention of users.
>
>
> Any ideas about this?
>
> Thanks,
> Qu
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html



2015-12-17 03:15:59

by Eric Sandeen

[permalink] [raw]
Subject: Re: Ideas on unified real-ro mount option across all filesystems

<xfs list address fixed>

On 12/16/15 7:41 PM, Qu Wenruo wrote:
> Hi,
>
> In a recent btrfs patch, it is going to add a mount option to disable
> log replay for btrfs, just like "norecovery" for ext4/xfs.
>
> But in the discussion on the mount option name and use case, it seems
> better to have an unified and fs independent mount option alias for
> real RO mount
>
> Reasons:
> 1) Some file system may have already used [no]"recovery" mount option
> In fact, btrfs has already used "recovery" mount option.
> Using "norecovery" mount option will be quite confusing for btrfs.

Too bad btrfs picked those semantics when "norecovery" has existed on
other filesystems for quite some time with a different meaning... :(

> 2) More straight forward mount option
> Currently, to get real RO mount, for ext4/xfs, user must use -o
> ro,norecovery.
> Just ro won't ensure real RO, and norecovery can't be used alone.
> If we have a simple alias, it would be much better for user to use.
> (it maybe done just in user space mount)

mount(8) simply says:

ro Mount the filesystem read-only.

and mount(2) is no more illustrative:

MS_RDONLY
Mount file system read-only.

kernel code is no help, either:

#define MS_RDONLY 1 /* Mount read-only */

They say nothing about what, exactly, "read-only" means. But since at least
the early ext3 days, it means that you cannot write through the filesystem, not
that the filesystem will leave the block device unmodified when it mounts.

I have always interpreted it as simply "no user changes to the filesystem,"
and that is clearly what the vfs does with the flag...

> Not to mention some fs (yeah, btrfs again) doesn't have "norecovery"
> but "nologreplay".

well, again, btrfs picked unfortunate semantics, given the precedent set
by other filesystems.

f2fs, ext4, gfs2, nilfs2, and xfs all support "norecovery" - xfs since
forever, ext4 & f2fs since 2009, etc.

> 3) A lot of user even don't now mount ro can still modify device
> Yes, I didn't know this point until I checked the log replay code of
> btrfs.
> Adding such mount option alias may raise some attention of users.

Given that nothing in the documentation implies that the block device itself
must remain unchanged on a read-only mount, I don't see any problem which
needs fixing. MS_RDONLY rejects user IO; that's all.

If you want to be sure your block device rejects all IO for forensics or
what have you, I'd suggest # blockdev --setro /dev/whatever prior to mount,
and take it out of the filesystem's control. Or better yet, making an
image and not touching the original.

-Eric

> Any ideas about this?



_______________________________________________
xfs mailing list
[email protected]
http://oss.sgi.com/mailman/listinfo/xfs

2015-12-17 03:27:12

by Darrick J. Wong

[permalink] [raw]
Subject: Re: Ideas on unified real-ro mount option across all filesystems

On Wed, Dec 16, 2015 at 09:15:59PM -0600, Eric Sandeen wrote:
> <xfs list address fixed>
>
> On 12/16/15 7:41 PM, Qu Wenruo wrote:
> > Hi,
> >
> > In a recent btrfs patch, it is going to add a mount option to disable
> > log replay for btrfs, just like "norecovery" for ext4/xfs.
> >
> > But in the discussion on the mount option name and use case, it seems
> > better to have an unified and fs independent mount option alias for
> > real RO mount
> >
> > Reasons:
> > 1) Some file system may have already used [no]"recovery" mount option
> > In fact, btrfs has already used "recovery" mount option.
> > Using "norecovery" mount option will be quite confusing for btrfs.
>
> Too bad btrfs picked those semantics when "norecovery" has existed on
> other filesystems for quite some time with a different meaning... :(
>
> > 2) More straight forward mount option
> > Currently, to get real RO mount, for ext4/xfs, user must use -o
> > ro,norecovery.
> > Just ro won't ensure real RO, and norecovery can't be used alone.
> > If we have a simple alias, it would be much better for user to use.
> > (it maybe done just in user space mount)
>
> mount(8) simply says:
>
> ro Mount the filesystem read-only.
>
> and mount(2) is no more illustrative:
>
> MS_RDONLY
> Mount file system read-only.
>
> kernel code is no help, either:
>
> #define MS_RDONLY 1 /* Mount read-only */
>
> They say nothing about what, exactly, "read-only" means. But since at least
> the early ext3 days, it means that you cannot write through the filesystem, not
> that the filesystem will leave the block device unmodified when it mounts.
>
> I have always interpreted it as simply "no user changes to the filesystem,"
> and that is clearly what the vfs does with the flag...

That ("-o ro means no user changes") has always been my understanding too. You
/want/ the FS to replay the journal on an RO mount so that regular FS operation
picks up the committed transactions.

--D

>
> > Not to mention some fs (yeah, btrfs again) doesn't have "norecovery"
> > but "nologreplay".
>
> well, again, btrfs picked unfortunate semantics, given the precedent set
> by other filesystems.
>
> f2fs, ext4, gfs2, nilfs2, and xfs all support "norecovery" - xfs since
> forever, ext4 & f2fs since 2009, etc.
>
> > 3) A lot of user even don't now mount ro can still modify device
> > Yes, I didn't know this point until I checked the log replay code of
> > btrfs.
> > Adding such mount option alias may raise some attention of users.
>
> Given that nothing in the documentation implies that the block device itself
> must remain unchanged on a read-only mount, I don't see any problem which
> needs fixing. MS_RDONLY rejects user IO; that's all.
>
> If you want to be sure your block device rejects all IO for forensics or
> what have you, I'd suggest # blockdev --setro /dev/whatever prior to mount,
> and take it out of the filesystem's control. Or better yet, making an
> image and not touching the original.
>
> -Eric
>
> > Any ideas about this?
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-12-17 14:09:00

by Karel Zak

[permalink] [raw]
Subject: Re: Ideas on unified real-ro mount option across all filesystems

On Wed, Dec 16, 2015 at 09:15:59PM -0600, Eric Sandeen wrote:
> I have always interpreted it as simply "no user changes to the filesystem,"
> and that is clearly what the vfs does with the flag...

Yep,

> Given that nothing in the documentation implies that the block device itself
> must remain unchanged on a read-only mount, I don't see any problem which
> needs fixing. MS_RDONLY rejects user IO; that's all.

I agree, it's FS specific business to interpret the 'ro'.

And it's already enough complicated, we have three levels of "read-only":

- read-only device (blockdev --setro ioctl)
- read-only filesystem (mount -o ro)
- read-only VFS node (mount -o remount,ro,bind /src /dst)

and for example in /proc/self/mountinfo we distinguish between FS "ro"
and VFS "ro" flag:

# grep test /proc/self/mountinfo
185 59 8:5 / /mnt/test ro,relatime shared:32 - ext4 /dev/sda5 rw,data=ordered
^^ ^^


BTW, util-linux 2.27 mount(8) man page:

-r, --read-only
Mount the filesystem read-only. A synonym is -o ro.

Note that, depending on the filesystem type, state and
kernel behavior, the system may still write to the device. For
example, ext3 and ext4 will replay the journal if the
filesystem is dirty. To prevent this kind of write access, you
may want to mount an ext3 or ext4 filesystem with the ro,noload
mount options or set the block device itself to read-only
mode, see the blockdev(8) command.


(maybe we need to copy this note to "ro" description too and add hint
about btrfs too :-)

Karel


--
Karel Zak <[email protected]>
http://karelzak.blogspot.com

2015-12-18 01:29:16

by Qu Wenruo

[permalink] [raw]
Subject: Re: Ideas on unified real-ro mount option across all filesystems



Eric Sandeen wrote on 2015/12/16 21:15 -0600:
> <xfs list address fixed>
>
> On 12/16/15 7:41 PM, Qu Wenruo wrote:
>> Hi,
>>
>> In a recent btrfs patch, it is going to add a mount option to disable
>> log replay for btrfs, just like "norecovery" for ext4/xfs.
>>
>> But in the discussion on the mount option name and use case, it seems
>> better to have an unified and fs independent mount option alias for
>> real RO mount
>>
>> Reasons:
>> 1) Some file system may have already used [no]"recovery" mount option
>> In fact, btrfs has already used "recovery" mount option.
>> Using "norecovery" mount option will be quite confusing for btrfs.
>
> Too bad btrfs picked those semantics when "norecovery" has existed on
> other filesystems for quite some time with a different meaning... :(
>
>> 2) More straight forward mount option
>> Currently, to get real RO mount, for ext4/xfs, user must use -o
>> ro,norecovery.
>> Just ro won't ensure real RO, and norecovery can't be used alone.
>> If we have a simple alias, it would be much better for user to use.
>> (it maybe done just in user space mount)
>
> mount(8) simply says:
>
> ro Mount the filesystem read-only.
>
> and mount(2) is no more illustrative:
>
> MS_RDONLY
> Mount file system read-only.
>
> kernel code is no help, either:
>
> #define MS_RDONLY 1 /* Mount read-only */
>
> They say nothing about what, exactly, "read-only" means. But since at least
> the early ext3 days, it means that you cannot write through the filesystem, not
> that the filesystem will leave the block device unmodified when it mounts.
>
> I have always interpreted it as simply "no user changes to the filesystem,"
> and that is clearly what the vfs does with the flag...
>
>> Not to mention some fs (yeah, btrfs again) doesn't have "norecovery"
>> but "nologreplay".
>
> well, again, btrfs picked unfortunate semantics, given the precedent set
> by other filesystems.
>
> f2fs, ext4, gfs2, nilfs2, and xfs all support "norecovery" - xfs since
> forever, ext4 & f2fs since 2009, etc.

I understand it's btrfs' fault.
Considering how many filesystems are already using "norecovery", it is
almost a standard.

Not sure if it's possible to change the "recovery" mount option to other
name for btrfs, but it seems using "norecovery" would be the best solution.

>
>> 3) A lot of user even don't now mount ro can still modify device
>> Yes, I didn't know this point until I checked the log replay code of
>> btrfs.
>> Adding such mount option alias may raise some attention of users.
>
> Given that nothing in the documentation implies that the block device itself
> must remain unchanged on a read-only mount, I don't see any problem which
> needs fixing. MS_RDONLY rejects user IO; that's all.

And thanks for the info provided by Karel, it's clear that at least
mount(8) itself already has explain on what ro will do and what it won't do.

Thanks,
Qu
>
> If you want to be sure your block device rejects all IO for forensics or
> what have you, I'd suggest # blockdev --setro /dev/whatever prior to mount,
> and take it out of the filesystem's control. Or better yet, making an
> image and not touching the original.
>
> -Eric
>
>> Any ideas about this?
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>


_______________________________________________
xfs mailing list
[email protected]
http://oss.sgi.com/mailman/listinfo/xfs

2015-12-18 02:01:06

by Christoph Anton Mitterer

[permalink] [raw]
Subject: Re: Ideas on unified real-ro mount option across all filesystems

On Fri, 2015-12-18 at 09:29 +0800, Qu Wenruo wrote:
> Given that nothing in the documentation implies that the block
> > device itself
> > must remain unchanged on a read-only mount, I don't see any problem
> > which
> > needs fixing.  MS_RDONLY rejects user IO; that's all.
>
> And thanks for the info provided by Karel, it's clear that at least 
> mount(8) itself already has explain on what ro will do and what it
> won't do.

I wouldn't really agree, here. At least not from the non-developer side
(and one should hope filesystems and their manpages aren't only made
for fs-devlopers).

The manpage says:
> ro     Mount the filesystem read-only.
> rw     Mount the filesystem read-write.

IMHO, that leaves absolutely unclear, what this actually means,
especially given that most end-users will probably consider the
filesystem and its device being basically "the same".

Especially, "the filesystem read-only", does not imply at all, whether
this property means "anything accessed via the filesystem", where the
filesystem would be the interface to the data, so that would mean, that
users cannot change files, their properties, permissions, xattrs, acls,
*time, and of course contents of any files,... nor whether it means
"the filesystem as the full number of bytes that comprise the
filesystem", which would include anything that is hidden from the user
(like journals etc.).

In fact, though I know it's not the case in practise, alone from the
wording above, which uses "filesystem", I'd rather tend to says that
this includes any hidden structures and that "ro" in fact means, that
the filesystem driver doesn't change the filesystem (in other words, no
writes to the device, from the fs).

From what the mount option "ro" actually means, I'd rather expect that
the manpage should read:
> ro     Mount the filehierarchy read-only.


I do not dispute, that it makes sense to have a soft "ro" which just
means that the exported file-hierarchy is read-only.
But similarly it makes sense to have a "hard ro", which means, that the
filesystem doesn't do any writes. That implies of course the soft ro,
but also means, no journal replays, no mount time updates, no rebuilds
in case of multi-device setups.


This may be helpful in several situations (when doing device backups
from mounted filesystems, in any disaster recovery or forensic
situation).
Now people may argue, that there's blockdev --setro, which is true of
course,... but a) I'd now quite some people whom I'd consider not to be
totally stupid end-users and who'd, by the above documentation of "ro"
assume that there are no changes on the block device, and b) it may be
helpful for the filesystem driver itself, to not even try to do writes
(and then error out), but from the beginning be in a I-don't-wanna-
write-mode.


As it has been discussed by some people on linux-btrfs, norecovery, no
load, nologreplay or whatever options there are, which *currently* make
certain fs types behave like that, i.e. make them "hard ro", are not
that suited, neither from their name, nor from their semantics.
"nologreplay" may be the option that *right now* makes btrfs not
writing to the device, but in 5 years other features may have been
introduced which would also people require to add the mountoption
"nofoobarmagic".
For normal users, not following each commit of development (and who
certainly don't read the manpage every day again, not to talk that
these often are ambiguous or outdated), a option which is defined to
imply everything that's necessary to make the filesystem not touch its
underlying devices seems quite reasonable.

I myself had proposed the name "nodevwrites" (or something like that)
over at linux-btrfs, since that seems the semantics that were desired
(plus it's, AFAICS, not used already).
It could be commonly "reserved" for that purpose, and each filesystem
type that wants to support it could make it imply everything needed to
make mounts of that filesystem truly read-only in the sense of "do not
change a single bit on the device".
Doesn't seem to be a too invasive change, and seems worth it.

And obviously, such a option that means "hard ro", should per
definition include noatime (just in the case there are filesystems
left, which would update the atimes even when mounted "ro").


Oh and I'd actually think, that changing the mount(8) manpage to use
"file-hierarchy" instead of "filesystem" and perhaps loosing a short
sentence what this actually means (something like "no file data
changes, no permissions, owners, acls, xattrs, [a|m|c|*]time changes -
but especially any other fs internal changes are still allowed").


Cheers,
Chris.


Attachments:
smime.p7s (5.19 kB)

2015-12-18 02:51:22

by Eric Sandeen

[permalink] [raw]
Subject: Re: Ideas on unified real-ro mount option across all filesystems



On 12/17/15 8:01 PM, Christoph Anton Mitterer wrote:
> On Fri, 2015-12-18 at 09:29 +0800, Qu Wenruo wrote:
>> Given that nothing in the documentation implies that the block
>>> device itself
>>> must remain unchanged on a read-only mount, I don't see any problem
>>> which
>>> needs fixing. MS_RDONLY rejects user IO; that's all.
>>
>> And thanks for the info provided by Karel, it's clear that at least
>> mount(8) itself already has explain on what ro will do and what it
>> won't do.
>
> I wouldn't really agree, here. At least not from the non-developer side
> (and one should hope filesystems and their manpages aren't only made
> for fs-devlopers).
>
> The manpage says:
>> ro Mount the filesystem read-only.
>> rw Mount the filesystem read-write.
>
> IMHO, that leaves absolutely unclear, what this actually means,
> especially given that most end-users will probably consider the
> filesystem and its device being basically "the same".

<lots of words snipped>

Karel pointed out that recent mount(8) says:

> -r, --read-only
> Mount the filesystem read-only. A synonym is -o ro.
>
> Note that, depending on the filesystem type, state and
> kernel behavior, the system may still write to the device. For
> example, ext3 and ext4 will replay the journal if the
> filesystem is dirty. To prevent this kind of write access, you
> may want to mount an ext3 or ext4 filesystem with the ro,noload
> mount options or set the block device itself to read-only
> mode, see the blockdev(8) command.

which should leave nothing to the imagination.

-Eric

_______________________________________________
xfs mailing list
[email protected]
http://oss.sgi.com/mailman/listinfo/xfs

2015-12-18 04:20:23

by Christoph Anton Mitterer

[permalink] [raw]
Subject: Re: Ideas on unified real-ro mount option across all filesystems

On Thu, 2015-12-17 at 20:51 -0600, Eric Sandeen wrote:
> >   -r, --read-only
> >        Mount the filesystem read-only.  A synonym is -o ro.
> >
> >        Note  that,  depending  on the filesystem type, state and
> >        kernel behavior, the system may still write to the
> > device.  For
> >        example, ext3 and ext4 will replay the journal if the
> >        filesystem is dirty.  To prevent this kind of write access,
> > you
> >        may want to mount an ext3 or ext4 filesystem with the
> > ro,noload
> >        mount options or set the  block  device itself to read-only
> >        mode, see the blockdev(8) command.
>
> which should leave nothing to the imagination.
hmm apparently Debian sid's mount(8) is a bit outdated, but anyway,
this just means that the behaviour of "ro" is not properly (end-user-
friendly) documented. :-)

I still see basically the following left:
a) Could filesystems benefit from knowing that they shouldn't write to
   the device (e.g. by spitting out less errors or that like)?
b) Or does each filesystem auto-detect, that the blockdev is "ro" and
   does handle accordingly?
c) Are there other cases left, where using blockdev --setro may be a
   worse solution to having a dedicated mount option for "hard ro"?


If only (b) would apply, then the state as is seems fine, and pointing
people to blockdev --setro would be enough.
But would (b) work in case of multi-dev fs, where e.g. some can be ro
(perhaps seed devices) while others can be rw?

If (a) would apply, then it would IMHO still make sense to have
dedicated "hard ro" mountoption. The above manpage snipped would
already show that it names "noload", but this is only for ext* and as
I've said before, manpages which can easily be out of date, may not
list all options that are required by then, especially not for each fs.

If (c) would apply, then I think for the same reasons, having a
dedicated mount option would be beneficial.

Examples I can think of:
btrfs with multidevices.
Imagine you have a big box with some 100 disks, and multiple multi-
device btrfs filesystems on these.
One goes bad, and you want to start with recovery or forensics, but
that fs comprised of 34 devices (thus it's not easy to do this on some
other node).
Do we really want to let people start to call --setro on these 34
devices (maybe some of them are multipath) and perhaps even
accidentally setting the wrong device ro, thereby causing damage to
live filesystem?
It would be much simpler if one had a mount option, wouldn't it.


Well just some thoughts, though.


Cheers,
Chris.


Attachments:
smime.p7s (5.19 kB)

2015-12-22 01:32:21

by Kai Krakow

[permalink] [raw]
Subject: Re: Ideas on unified real-ro mount option across all filesystems

Am Fri, 18 Dec 2015 03:01:06 +0100
schrieb Christoph Anton Mitterer <[email protected]>:

> The manpage says:
> > ro     Mount the filesystem read-only.
> > rw     Mount the filesystem read-write.

That means: the filesystem... Not the block device...

Sorry, it's kinda nitpicking. But actually, the file system IS
read-only: You cannot modify files from user's view.

What you actually want is not modifying the underlying storage which is
the block device and includes stuff like meta and journal data (which
is only indirectly visible to users at best).

You can argue that man pages are not particularly end-user friendly.
But for an admin this makes sense without being an fs developer.

--
Regards,
Kai

Replies to list-only preferred.

2015-12-22 12:41:04

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: Ideas on unified real-ro mount option across all filesystems

On 2015-12-21 20:32, Kai Krakow wrote:
> Am Fri, 18 Dec 2015 03:01:06 +0100
> schrieb Christoph Anton Mitterer <[email protected]>:
>
>> The manpage says:
>>> ro Mount the filesystem read-only.
>>> rw Mount the filesystem read-write.
>
> That means: the filesystem... Not the block device...
No, that means: That particular instantiation of the VFS layer to access
the filesystem.
Not the filesystem (the filesystem is the data and metadata on disk),
not the block device (which is an abstraction used as a container for
the filesystem).
>
> Sorry, it's kinda nitpicking. But actually, the file system IS
> read-only: You cannot modify files from user's view.
From a non technical view point, yes, that is correct; until you have
undetected corruption in the journal or log or whatever other structure
is used for consistency, at which point it isn't read-only because the
filesystem just changed by virtue of you mounting it (and even without
that type of corruption, stuff gets changed on a 'read-only' mount
regardless in many filesystems, many of them track when the filesystem
was last mounted, how many times it's been mounted, and other similar
things).
>
> What you actually want is not modifying the underlying storage which is
> the block device and includes stuff like meta and journal data (which
> is only indirectly visible to users at best).
No, the metadata and journal are a integral part of the filesystem
itself. Without those, there is no filesystem. That and the metadata
_is_ directly visible to the user, in the form of directory structure,
stat(), output from lsattr, and even stuff like FIEMAP and filefrag.

The filesystem _is_ the data and metadata on disk, as such, the
filesystem being read-only means that none of that data or metadata
should change.
>
> You can argue that man pages are not particularly end-user friendly.
> But for an admin this makes sense without being an fs developer.
That really depends. I'm not a FS developer, but I still expect when I
see 'read-only' that it means the same as 'immutable for everything
managed by that particular object that has been made read-only, for all
access methods through that object'. And while I bet most
administrators wouldn't use quite the same terminology, I would be
willing to bet that many of them have essentially the same expectation
unless specifically told otherwise on a case-by-case basis.

2015-12-23 23:22:31

by Stewart Smith

[permalink] [raw]
Subject: Re: Ideas on unified real-ro mount option across all filesystems

Eric Sandeen <[email protected]> writes:
>> 3) A lot of user even don't now mount ro can still modify device
>> Yes, I didn't know this point until I checked the log replay code of
>> btrfs.
>> Adding such mount option alias may raise some attention of users.
>
> Given that nothing in the documentation implies that the block device itself
> must remain unchanged on a read-only mount, I don't see any problem which
> needs fixing. MS_RDONLY rejects user IO; that's all.
>
> If you want to be sure your block device rejects all IO for forensics or
> what have you, I'd suggest # blockdev --setro /dev/whatever prior to mount,
> and take it out of the filesystem's control. Or better yet, making an
> image and not touching the original.

What we do for the petitboot bootloader in POWER and OpenPower firmware
(a linux+initramfs that does kexec to boot) is that we use device mapper
to make a snapshot in memory where we run recovery (for some
filesystems, notably XFS is different due to journal not being endian
safe). We also have to have an option *not* to do that, just in case
there's a bug in journal replay... and we're lucky in the fact that we
probably do have enough memory to complete replay, this solution could
be completely impossible on lower memory machines.

As such, I believe we're the only bit of firmware/bootloader ever that
has correctly parsed a journalling filesystem.

--
Stewart Smith


Attachments:
signature.asc (818.00 B)
(No filename) (121.00 B)
Download all attachments

2015-12-26 22:53:59

by Dave Chinner

[permalink] [raw]
Subject: Re: Ideas on unified real-ro mount option across all filesystems

On Thu, Dec 24, 2015 at 10:22:31AM +1100, Stewart Smith wrote:
> Eric Sandeen <[email protected]> writes:
> >> 3) A lot of user even don't now mount ro can still modify device
> >> Yes, I didn't know this point until I checked the log replay code of
> >> btrfs.
> >> Adding such mount option alias may raise some attention of users.
> >
> > Given that nothing in the documentation implies that the block device itself
> > must remain unchanged on a read-only mount, I don't see any problem which
> > needs fixing. MS_RDONLY rejects user IO; that's all.
> >
> > If you want to be sure your block device rejects all IO for forensics or
> > what have you, I'd suggest # blockdev --setro /dev/whatever prior to mount,
> > and take it out of the filesystem's control. Or better yet, making an
> > image and not touching the original.
>
> What we do for the petitboot bootloader in POWER and OpenPower firmware
> (a linux+initramfs that does kexec to boot) is that we use device mapper
> to make a snapshot in memory where we run recovery (for some
> filesystems, notably XFS is different due to journal not being endian
> safe). We also have to have an option *not* to do that, just in case
> there's a bug in journal replay... and we're lucky in the fact that we
> probably do have enough memory to complete replay, this solution could
> be completely impossible on lower memory machines.

Which means the boot loader is going to break horribly when we
change the on-disk format and feature flags the boot loader doesn't
understand get set in the root filesystem. Then the bootloader will
refuse to mount the filesystem and the system won't boot anymore...

IOWs, developers and users can't make a root XFS filesystem with a
new/experimental feature on POWER/OpenPower machines because the
bootloader will refuse to mount it regardless of the clean/dirty
state of the journal....

> As such, I believe we're the only bit of firmware/bootloader ever that
> has correctly parsed a journalling filesystem.

Nope. The Irix bootloader (sash) could do this 20 years ago - there
are even feature mask bits reserved specifically for SASH in the XFS
superblock. However, seeing as the bootloader was always upgraded
during the install of each new Irix release, the bootloader was
always up-to-date with the on-disk features the kernel supported and
so there was never a problem with mismatched feature support.

However, Linux users can upgrade or change the kernel at any time
independently of the bootloader, so it's pretty much guaranteed that
mismatched bootloader/kernel filesystem capabilities will cause
users problems at some point in the not-too-distant future.

Cheers,

Dave.
--
Dave Chinner
[email protected]