2001-07-11 11:36:03

by Pavel Machek

[permalink] [raw]
Subject: Filesystem can be marked clear when it is not

Hi!

Long time ago I noticed that forced reboot from multiuser (/ mounted
rw long time ago) sometimes does not force filesystem check. It
happened again today...

So I remounted r/o and fsck-ed. And hey, they are errors: [zero dtimes
are ok, but bitmap differences really are not].

I *think* that we do not force ordering of "mark filesystem unclean"
and writes to filesystem. And we really *should* force that
ordering... Quick and dirty solution would be to sync just after we
mark filesystem dirty...

Pavel

root@bug:~# fsck -f /dev/hda1
Parallelizing fsck version 1.18 (11-Nov-1999)
e2fsck 1.18, 11-Nov-1999 for EXT2 FS 0.5b, 95/08/09
Pass 1: Checking inodes, blocks, and sizes
Deleted inode 480986 has zero dtime. Fix<y>? yes

Deleted inode 481006 has zero dtime. Fix<y>? yes

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #2 (37, counted=42).
Fix<y>? yes

Free blocks count wrong (14705, counted=14710).
Fix<y>? yes

Inode bitmap differences: -480986 -481006
Fix<y>? yes

Free inodes count wrong for group #15 (24936, counted=24938).
Fix<y>? yes

Free inodes count wrong (755991, counted=755993).
Fix<y>? yes


/dev/hda1: ***** FILE SYSTEM WAS MODIFIED *****
/dev/hda1: 173863/929856 files (10.6% non-contiguous), 915044/929754
blocks
root@bug:~#


--
I'm [email protected]. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [email protected]


2001-07-11 18:09:38

by Andreas Dilger

[permalink] [raw]
Subject: Re: Filesystem can be marked clear when it is not

Pavel writes:
> Long time ago I noticed that forced reboot from multiuser (/ mounted
> rw long time ago) sometimes does not force filesystem check. It
> happened again today...
>
> So I remounted r/o and fsck-ed. And hey, they are errors: [zero dtimes
> are ok, but bitmap differences really are not].

It is funny you should mention this, as I made a patch last night for
exactly this problem (Andrew Morton and Ted Ts'o have noticed this as
well, and we discussed it previously on ext2-devel), and I only just
got around to separating out the patch...

> I *think* that we do not force ordering of "mark filesystem unclean"
> and writes to filesystem. And we really *should* force that
> ordering... Quick and dirty solution would be to sync just after we
> mark filesystem dirty...

Basically this is what the patch does. For the (very rare) cases where
we change the VALID or ERROR flags on the filesystem, we write the super
block to disk synchronously. This does not impact performance under normal
operations, only a millisecond at mount/unmount and on error.

Note that I left ext2_panic() alone. It is not clear whether it should
be doing sync I/O in this case, and it is just a cop-out. It is only
called from a couple of impossible-to-reach places that should probably
be eliminated anyways (i.e. NULL pointers that are always set at fs
mount time).

Cheers, Andreas
============================ ext2-commit.diff ==========================
diff -ru linux.orig/fs/ext2/super.c linux/fs/ext2/super.c
--- linux.orig/fs/ext2/super.c Tue May 29 13:13:19 2001
+++ linux/fs/ext2/super.c Wed Jul 11 11:48:40 2001
@@ -28,39 +28,54 @@
#include <asm/uaccess.h>


+static void ext2_sync_super(struct super_block *sb,
+ struct ext2_super_block *es);

static char error_buf[1024];

+/*
+ * Deal with the reporting of failure conditions on a filesystem such as
+ * inconsistencies detected in the on-disk structure or read IO failures.
+ * We set an error flag in the superblock and e2fsck will check this after
+ * each boot.
+ */
void ext2_error (struct super_block * sb, const char * function,
const char * fmt, ...)
{
va_list args;
+ struct ext2_super_block *es = EXT2_SB(sb)->s_es;

if (!(sb->s_flags & MS_RDONLY)) {
sb->u.ext2_sb.s_mount_state |= EXT2_ERROR_FS;
- sb->u.ext2_sb.s_es->s_state =
- cpu_to_le16(le16_to_cpu(sb->u.ext2_sb.s_es->s_state) | EXT2_ERROR_FS);
- mark_buffer_dirty(sb->u.ext2_sb.s_sbh);
- sb->s_dirt = 1;
+ es->s_state =
+ cpu_to_le16(le16_to_cpu(es->s_state) | EXT2_ERROR_FS);
+ ext2_sync_super(sb, es);
}
va_start (args, fmt);
vsprintf (error_buf, fmt, args);
va_end (args);
if (test_opt (sb, ERRORS_PANIC) ||
- (le16_to_cpu(sb->u.ext2_sb.s_es->s_errors) == EXT2_ERRORS_PANIC &&
+ (le16_to_cpu(es->s_errors) == EXT2_ERRORS_PANIC &&
!test_opt (sb, ERRORS_CONT) && !test_opt (sb, ERRORS_RO)))
panic ("EXT2-fs panic (device %s): %s: %s\n",
bdevname(sb->s_dev), function, error_buf);
printk (KERN_CRIT "EXT2-fs error (device %s): %s: %s\n",
bdevname(sb->s_dev), function, error_buf);
if (test_opt (sb, ERRORS_RO) ||
- (le16_to_cpu(sb->u.ext2_sb.s_es->s_errors) == EXT2_ERRORS_RO &&
- !test_opt (sb, ERRORS_CONT) && !test_opt (sb, ERRORS_PANIC))) {
+ (le16_to_cpu(es->s_errors) == EXT2_ERRORS_RO &&
+ !test_opt (sb, ERRORS_CONT))) {
printk ("Remounting filesystem read-only\n");
sb->s_flags |= MS_RDONLY;
}
}

+/* Deal with the reporting of failure conditions while running, such as
+ * inconsistencies in operation or invalid system states where it is not
+ * possible to continue.
+ *
+ * Use ext2_error() for cases of invalid filesystem states, as that allows
+ * the administrator to control the error behaviour to meet their needs.
+ */
NORET_TYPE void ext2_panic (struct super_block * sb, const char * function,
const char * fmt, ...)
{
@@ -124,8 +139,10 @@
int i;

if (!(sb->s_flags & MS_RDONLY)) {
- sb->u.ext2_sb.s_es->s_state = le16_to_cpu(sb->u.ext2_sb.s_mount_state);
- mark_buffer_dirty(sb->u.ext2_sb.s_sbh);
+ struct ext2_super_block *es = EXT2_SB(sb)->s_es;
+
+ es->s_state = le16_to_cpu(EXT2_SB(sb)->s_mount_state);
+ ext2_sync_super(sb, es);
}
db_count = EXT2_SB(sb)->s_gdb_count;
for (i = 0; i < db_count; i++)
@@ -310,8 +327,7 @@
es->s_max_mnt_count = (__s16) cpu_to_le16(EXT2_DFL_MAX_MNT_COUNT);
es->s_mnt_count=cpu_to_le16(le16_to_cpu(es->s_mnt_count) + 1);
es->s_mtime = cpu_to_le32(CURRENT_TIME);
- mark_buffer_dirty(sb->u.ext2_sb.s_sbh);
- sb->s_dirt = 1;
+ ext2_sync_super(sb, es);
if (test_opt (sb, DEBUG))
printk ("[EXT II FS %s, %s, bs=%lu, fs=%lu, gc=%lu, "
"bpg=%lu, ipg=%lu, mo=%04lx]\n",
@@ -663,6 +679,15 @@
sb->s_dirt = 0;
}

+static void ext2_sync_super(struct super_block *sb, struct ext2_super_block *es)
+{
+ es->s_wtime = cpu_to_le32(CURRENT_TIME);
+ mark_buffer_dirty(EXT2_SB(sb)->s_sbh);
+ ll_rw_block(WRITE, 1, &EXT2_SB(sb)->s_sbh);
+ wait_on_buffer(EXT2_SB(sb)->s_sbh);
+ sb->s_dirt = 0;
+}
+
/*
* In the second extended file system, it is not necessary to
* write the super block since we use a mapping of the
@@ -681,13 +706,13 @@
if (!(sb->s_flags & MS_RDONLY)) {
es = sb->u.ext2_sb.s_es;

- ext2_debug ("setting valid to 0\n");
-
if (le16_to_cpu(es->s_state) & EXT2_VALID_FS) {
+ ext2_debug ("setting valid to 0\n");
es->s_state = cpu_to_le16(le16_to_cpu(es->s_state) & ~EXT2_VALID_FS);
es->s_mtime = cpu_to_le32(CURRENT_TIME);
- }
- ext2_commit_super (sb, es);
+ ext2_sync_super(sb, es);
+ } else
+ ext2_commit_super (sb, es);
}
sb->s_dirt = 0;
}
@@ -724,9 +749,7 @@
*/
es->s_state = cpu_to_le16(sb->u.ext2_sb.s_mount_state);
es->s_mtime = cpu_to_le32(CURRENT_TIME);
- mark_buffer_dirty(sb->u.ext2_sb.s_sbh);
- sb->s_dirt = 1;
- ext2_commit_super (sb, es);
+ ext2_sync_super(sb, es);
}
else {
int ret;
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert

2001-07-11 23:53:27

by Daniel Phillips

[permalink] [raw]
Subject: Re: Filesystem can be marked clear when it is not

On Wednesday 11 July 2001 20:06, Andreas Dilger wrote:
> Pavel writes:
> > Long time ago I noticed that forced reboot from multiuser (/
> > mounted rw long time ago) sometimes does not force filesystem
> > check. It happened again today...
> >
> > So I remounted r/o and fsck-ed. And hey, they are errors: [zero
> > dtimes are ok, but bitmap differences really are not].
>
> It is funny you should mention this, as I made a patch last night for
> exactly this problem (Andrew Morton and Ted Ts'o have noticed this as
> well, and we discussed it previously on ext2-devel), and I only just
> got around to separating out the patch...

I weighed in on that one too:

http://marc.theaimsgroup.com/?l=ext2-devel&m=99090670900520&w=2

with a very simple sort-of patch, which I just made into a real patch.
Do we need anything fancier than this:

--- ../uml.2.4.6.clean/fs/ext2/super.c Wed May 16 19:31:27 2001
+++ fs/ext2/super.c Thu Jul 12 00:57:15 2001
@@ -311,6 +311,8 @@
es->s_mnt_count=cpu_to_le16(le16_to_cpu(es->s_mnt_count) + 1);
es->s_mtime = cpu_to_le32(CURRENT_TIME);
mark_buffer_dirty(sb->u.ext2_sb.s_sbh);
+ ll_rw_block(WRITE, 1, &sb->u.ext2_sb.s_sbh);
+ wait_on_buffer(sb->u.ext2_sb.s_sbh);
sb->s_dirt = 1;
if (test_opt (sb, DEBUG))
printk ("[EXT II FS %s, %s, bs=%lu, fs=%lu, gc=%lu, "

--
Daniel

> > I *think* that we do not force ordering of "mark filesystem
> > unclean" and writes to filesystem. And we really *should* force
> > that ordering... Quick and dirty solution would be to sync just
> > after we mark filesystem dirty...
>
> Basically this is what the patch does. For the (very rare) cases
> where we change the VALID or ERROR flags on the filesystem, we write
> the super block to disk synchronously. This does not impact
> performance under normal operations, only a millisecond at
> mount/unmount and on error.
>
> Note that I left ext2_panic() alone. It is not clear whether it
> should be doing sync I/O in this case, and it is just a cop-out. It
> is only called from a couple of impossible-to-reach places that
> should probably be eliminated anyways (i.e. NULL pointers that are
> always set at fs mount time).


> Cheers, Andreas
> ============================ ext2-commit.diff
> ========================== diff -ru linux.orig/fs/ext2/super.c
> linux/fs/ext2/super.c
> --- linux.orig/fs/ext2/super.c Tue May 29 13:13:19 2001
> +++ linux/fs/ext2/super.c Wed Jul 11 11:48:40 2001
> @@ -28,39 +28,54 @@
> #include <asm/uaccess.h>
>
>
> +static void ext2_sync_super(struct super_block *sb,
> + struct ext2_super_block *es);
>
> static char error_buf[1024];
>
> +/*
> + * Deal with the reporting of failure conditions on a filesystem
> such as + * inconsistencies detected in the on-disk structure or read
> IO failures. + * We set an error flag in the superblock and e2fsck
> will check this after + * each boot.
> + */
> void ext2_error (struct super_block * sb, const char * function,
> const char * fmt, ...)
> {
> va_list args;
> + struct ext2_super_block *es = EXT2_SB(sb)->s_es;
>
> if (!(sb->s_flags & MS_RDONLY)) {
> sb->u.ext2_sb.s_mount_state |= EXT2_ERROR_FS;
> - sb->u.ext2_sb.s_es->s_state =
> - cpu_to_le16(le16_to_cpu(sb->u.ext2_sb.s_es->s_state) |
> EXT2_ERROR_FS); - mark_buffer_dirty(sb->u.ext2_sb.s_sbh);
> - sb->s_dirt = 1;
> + es->s_state =
> + cpu_to_le16(le16_to_cpu(es->s_state) | EXT2_ERROR_FS);
> + ext2_sync_super(sb, es);
> }
> va_start (args, fmt);
> vsprintf (error_buf, fmt, args);
> va_end (args);
> if (test_opt (sb, ERRORS_PANIC) ||
> - (le16_to_cpu(sb->u.ext2_sb.s_es->s_errors) == EXT2_ERRORS_PANIC
> && + (le16_to_cpu(es->s_errors) == EXT2_ERRORS_PANIC &&
> !test_opt (sb, ERRORS_CONT) && !test_opt (sb, ERRORS_RO)))
> panic ("EXT2-fs panic (device %s): %s: %s\n",
> bdevname(sb->s_dev), function, error_buf);
> printk (KERN_CRIT "EXT2-fs error (device %s): %s: %s\n",
> bdevname(sb->s_dev), function, error_buf);
> if (test_opt (sb, ERRORS_RO) ||
> - (le16_to_cpu(sb->u.ext2_sb.s_es->s_errors) == EXT2_ERRORS_RO &&
> - !test_opt (sb, ERRORS_CONT) && !test_opt (sb, ERRORS_PANIC)))
> { + (le16_to_cpu(es->s_errors) == EXT2_ERRORS_RO &&
> + !test_opt (sb, ERRORS_CONT))) {
> printk ("Remounting filesystem read-only\n");
> sb->s_flags |= MS_RDONLY;
> }
> }
>
> +/* Deal with the reporting of failure conditions while running, such
> as + * inconsistencies in operation or invalid system states where it
> is not + * possible to continue.
> + *
> + * Use ext2_error() for cases of invalid filesystem states, as that
> allows + * the administrator to control the error behaviour to meet
> their needs. + */
> NORET_TYPE void ext2_panic (struct super_block * sb, const char *
> function, const char * fmt, ...)
> {
> @@ -124,8 +139,10 @@
> int i;
>
> if (!(sb->s_flags & MS_RDONLY)) {
> - sb->u.ext2_sb.s_es->s_state =
> le16_to_cpu(sb->u.ext2_sb.s_mount_state);
> - mark_buffer_dirty(sb->u.ext2_sb.s_sbh);
> + struct ext2_super_block *es = EXT2_SB(sb)->s_es;
> +
> + es->s_state = le16_to_cpu(EXT2_SB(sb)->s_mount_state);
> + ext2_sync_super(sb, es);
> }
> db_count = EXT2_SB(sb)->s_gdb_count;
> for (i = 0; i < db_count; i++)
> @@ -310,8 +327,7 @@
> es->s_max_mnt_count = (__s16) cpu_to_le16(EXT2_DFL_MAX_MNT_COUNT);
> es->s_mnt_count=cpu_to_le16(le16_to_cpu(es->s_mnt_count) + 1);
> es->s_mtime = cpu_to_le32(CURRENT_TIME);
> - mark_buffer_dirty(sb->u.ext2_sb.s_sbh);
> - sb->s_dirt = 1;
> + ext2_sync_super(sb, es);
> if (test_opt (sb, DEBUG))
> printk ("[EXT II FS %s, %s, bs=%lu, fs=%lu, gc=%lu, "
> "bpg=%lu, ipg=%lu, mo=%04lx]\n",
> @@ -663,6 +679,15 @@
> sb->s_dirt = 0;
> }
>
> +static void ext2_sync_super(struct super_block *sb, struct
> ext2_super_block *es) +{
> + es->s_wtime = cpu_to_le32(CURRENT_TIME);
> + mark_buffer_dirty(EXT2_SB(sb)->s_sbh);
> + ll_rw_block(WRITE, 1, &EXT2_SB(sb)->s_sbh);
> + wait_on_buffer(EXT2_SB(sb)->s_sbh);
> + sb->s_dirt = 0;
> +}
> +
> /*
> * In the second extended file system, it is not necessary to
> * write the super block since we use a mapping of the
> @@ -681,13 +706,13 @@
> if (!(sb->s_flags & MS_RDONLY)) {
> es = sb->u.ext2_sb.s_es;
>
> - ext2_debug ("setting valid to 0\n");
> -
> if (le16_to_cpu(es->s_state) & EXT2_VALID_FS) {
> + ext2_debug ("setting valid to 0\n");
> es->s_state = cpu_to_le16(le16_to_cpu(es->s_state) &
> ~EXT2_VALID_FS); es->s_mtime = cpu_to_le32(CURRENT_TIME);
> - }
> - ext2_commit_super (sb, es);
> + ext2_sync_super(sb, es);
> + } else
> + ext2_commit_super (sb, es);
> }
> sb->s_dirt = 0;
> }
> @@ -724,9 +749,7 @@
> */
> es->s_state = cpu_to_le16(sb->u.ext2_sb.s_mount_state);
> es->s_mtime = cpu_to_le32(CURRENT_TIME);
> - mark_buffer_dirty(sb->u.ext2_sb.s_sbh);
> - sb->s_dirt = 1;
> - ext2_commit_super (sb, es);
> + ext2_sync_super(sb, es);
> }
> else {
> int ret;

2001-07-12 16:25:45

by Andreas Dilger

[permalink] [raw]
Subject: Re: [Ext2-devel] Re: Filesystem can be marked clear when it is not

Daniel writes:
> I weighed in on that one too:
>
> http://marc.theaimsgroup.com/?l=ext2-devel&m=99090670900520&w=2
>
> with a very simple sort-of patch, which I just made into a real patch.

Ok, your patch works in this case (it is leaving sb->s_dirt = 1, however, so
the superblock will be written out again shortly). In fact, the whole
thing can be replaced with a call to ext2_write_super() in my code, which
also turns off EXT2_VALID_FS and sets s_mtime, and writes it synchronously
to disk. This means we can remove 2! lines from ext2_setup_super().

It makes me wonder, though, if we clear EXT2_VALID_FS synchronously in
ext2_setup_super() if we also need it in ext2_write_super(). If people
mount their root fs read-write e2fsck will clear EXT2_VALID_FS and we
may never hit ext2_setup_super() again to clear it, so I guess it needs
to stay there.

One of the other changes from my patch is that errors are also written
out synchronously to disk, for the same reason - in case we crash shortly
after having an error, and before dirty buffers are flushed to disk.

Cheers, Andreas
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert