2014-05-05 07:14:03

by Nikola Ciprich

[permalink] [raw]
Subject: info about filesystem errors in /sys/fs/ext4/... ?

Hello,

I was wondering, is it possible to find out whether some filesystem with
errors in mounted apart from parsing kernel log?

Would it be too complicated to add such info to /sys/fs/ext4/.../ or to
some other location? Would such change make sense to you?

with regards

nik



--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.: +420 591 166 214
fax: +420 596 621 273
mobil: +420 777 093 799
http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------


Attachments:
(No filename) (589.00 B)
(No filename) (198.00 B)
Download all attachments

2014-05-05 11:59:26

by Theodore Ts'o

[permalink] [raw]
Subject: Re: info about filesystem errors in /sys/fs/ext4/... ?

On Mon, May 05, 2014 at 01:03:17PM +0200, Lukáš Czerner wrote:
>
> However we might to go a step further, because I do not
> really like the idea of allowing to mount the file system with
> errors by default. It does not really make sense to me and I wonder
> whether someone actually intend to do it this way.

We would need to make an exception for the root file system, of
course. And I've been receiving patches from folks who want to allow
e2fsck to be able to fix a mounted, read-only /usr partition, since
systemd is forcing folks to have to mount /usr read-only before it
will start, which means /usr needs to be mounted before e2fsck gets
started by systemd.

So that implies that we may need to have an exception for certain
non-root file systems as well, if we want to disallow mounting file
systems that have errors by default. Or perhaps we should only
disallow read/write mounts of file systems that have errors without
some kind of override mount option. Although I'm not sure we could
just start doing that by default without breaking some set of systems
and their current set of init scripts.


As far as determining whether or not a file system has errors, having
a /sys entry makes sense; although userspace could just read this out
of the ext2/3/4 superblock directly.

I'll note that internally inside Google, we've used multiple
mechanisms over time and for different use cases. In some cases we've
scraped the system log files; in some cases we've used a hack that
redirected ext4_error() information into a custom semi-structured data
stream delivered via a netlink socket; and in some cases we've had a
userspace daemon parse the output of "dumpe2fs -h /dev/XXX".

Cheers,

- Ted

2014-05-05 19:16:08

by Theodore Ts'o

[permalink] [raw]
Subject: Re: info about filesystem errors in /sys/fs/ext4/... ?

For the record, since this was discussed on the ext4 weekly
teleconference...

The reason why I've been hesitant about allowing any file system to be
checked by e2fsck while being mounted read-only is because of the
following failure scenario:

1) The kernel discovers that a file system has been corrupted, so it
marks the file system as being inconsistent and it remounts the file
system read-only.

2) The user runs e2fsck on the file system, while it is still mounted
read-only, and fixes it.

3) The kernel still has cached data structures with incorrect inode
reference counts, etc. So when the user then remounts the file system
read/write, the file system gets corrupted again, and the user suffers
data loss.


This could happen with the root file system as well, of course, but
there is a big, large, scary message making it clear that you *MUST*
reboot after repairing a corrupted root file system. The real issue
is encouraging users from checking mounted file systems at all. One
approach would be do to require a command-line option of the form
--i-know-this-is-dangerous-and-I-could-lose-data, or some such.
Apparently xfs does something like this, with a xfs_repair -d ('D' is
for Dangerous).

Another approach which Andreas Dilger suggested, and which we will
likely use, is one where we snapshot the last fsck time from the
superblock when the file system is mounted or remounted read-only.
Then when the user tries to remount the file system read-write, if the
last fsck time has been changed, we reject the r/w remount request.

Regards,

- Ted

2014-05-05 11:03:41

by Lukas Czerner

[permalink] [raw]
Subject: Re: info about filesystem errors in /sys/fs/ext4/... ?

On Mon, 5 May 2014, Nikola Ciprich wrote:

> Date: Mon, 5 May 2014 09:08:23 +0200
> From: Nikola Ciprich <[email protected]>
> To: [email protected]
> Subject: info about filesystem errors in /sys/fs/ext4/... ?
>
> Hello,
>
> I was wondering, is it possible to find out whether some filesystem with
> errors in mounted apart from parsing kernel log?
>
> Would it be too complicated to add such info to /sys/fs/ext4/.../ or to
> some other location? Would such change make sense to you?
>
> with regards
>
> nik

Currently I do not think there is a way to check whether mounted
file system contains errors (EXT2_ERROR_FS flag is set in super
block).

You either have to check the logs, or run fsck before mounting the
file system.

It really seems like a optimal thing to provide a way to inform user
space about this without the need to parse the log. I think that
sysfs is a perfect place for this.

However we might to go a step further, because I do not
really like the idea of allowing to mount the file system with
errors by default. It does not really make sense to me and I wonder
whether someone actually intend to do it this way.

What about having this scenario respect "errors=" setting ? Of
course it might not make sense to panic when mounting file system
with errors with "errors=panic" option, we can just fail the mount.

Will that help your case ?

Thanks!
-Lukas

2014-05-05 14:53:53

by Eric Sandeen

[permalink] [raw]
Subject: Re: info about filesystem errors in /sys/fs/ext4/... ?

On 5/5/14, 6:59 AM, Theodore Ts'o wrote:
> On Mon, May 05, 2014 at 01:03:17PM +0200, Lukáš Czerner wrote:
>>
>> However we might to go a step further, because I do not
>> really like the idea of allowing to mount the file system with
>> errors by default. It does not really make sense to me and I wonder
>> whether someone actually intend to do it this way.
>
> We would need to make an exception for the root file system, of
> course. And I've been receiving patches from folks who want to allow
> e2fsck to be able to fix a mounted, read-only /usr partition, since
> systemd is forcing folks to have to mount /usr read-only before it
> will start, which means /usr needs to be mounted before e2fsck gets
> started by systemd.

I hope these patches make it to the list, if you're considering them.

I don't really know why fsck would need to treat filesystems differently
based on where they are mounted; either we can repair a readonly fs
or not, right? And if it's done, then the filesystem needs to be immediately
unmounted & remounted[1], or the system rebooted...

-Eric

[1] and I suppose if it can be unmounted & mounted then there was no
good reason to repair it while mounted RO...

2014-05-05 11:14:27

by Nikola Ciprich

[permalink] [raw]
Subject: Re: info about filesystem errors in /sys/fs/ext4/... ?

Hello Lukáš,

> Currently I do not think there is a way to check whether mounted
> file system contains errors (EXT2_ERROR_FS flag is set in super
> block).
>
> You either have to check the logs, or run fsck before mounting the
> file system.
>
> It really seems like a optimal thing to provide a way to inform user
> space about this without the need to parse the log. I think that
> sysfs is a perfect place for this.
>
> However we might to go a step further, because I do not
> really like the idea of allowing to mount the file system with
> errors by default. It does not really make sense to me and I wonder
> whether someone actually intend to do it this way.
>
> What about having this scenario respect "errors=" setting ? Of
> course it might not make sense to panic when mounting file system
> with errors with "errors=panic" option, we can just fail the mount.
>
> Will that help your case ?

Yes for some cases, no for others :) For example I do not want server to
fail booting if it could otherwise start (even though it might need some
intervention, at least running fsck). Giving simple way (like mentioned sysfs
access) would make monitoring such issues much simpler.

Of course for some cases, it's for sure safer to disallow mounting rather
then risking further data corruption...

I'm not sure how big is chance that mounting filesystem with errors can
cause more errors, I guess it depends on the nature of the problem..

cheers!

nik


>
> Thanks!
> -Lukas
>

--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.: +420 591 166 214
fax: +420 596 621 273
mobil: +420 777 093 799
http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------


Attachments:
(No filename) (1.76 kB)
(No filename) (198.00 B)
Download all attachments