2009-06-03 15:06:25

by Denis Karpov

[permalink] [raw]
Subject: [PATCH 0/4] FS: userspace notification of errors


Hello,

these patches are resent (a bit re-worked and separated from other stuff).
The issue was discussed here:
http://marc.info/?l=linux-fsdevel&m=124402900920380&w=2

Summary:

1. Generic mechanism for notifications of user space about file system's
errors/inconsistency on a particular partition using:

- sysfs entry /sys/block/<bdev>/<part>/fs_unclean
- uevent KOBJ_CHANGE, uevent's environment variable FS_UNCLEAN=[0:1]

Userspace might want to monitor these notifications (poll2() on sysfs
file or udevd's rule for uevent) and fix the fs damage.
Filesystem can be marked clean again by writing '0' to the
corresponding 'fs_unclean' sysfs file.

Currently some file systems remount themselves r/o on critical errors
(*FAT; EXT2 depending on 'errors' mount option), userspace is generally
unaware of such events. This feature will allow user space to become
aware of possible file system problems and do something about them
(e.g. run fsck automatically or with user's consent).
[PATCH 1]

2. Make FAT and EXT2 file systems use the above mechanism to optionally
notify user space about errors. Implemented as 'notify' mount option
(PATCH 3,4).
FAT error reporting facilities had to be re-factored (PATCH 2) in
order to simplify sending error notifications.

Adrian Hunter and Artem Bityutskiy provided input and ideas on implementing
these features.

Denis Karpov.


2009-06-03 15:06:07

by Denis Karpov

[permalink] [raw]
Subject: [PATCH 3/4] FAT: add 'notify' mount option

Implement FAT fs mount option 'notify'. The effect of this option
is that a notification is sent to userspace on errors that indicate
filesystem damage/inconsistency. Generic filesystem corruption
notification mechnism is used.

Signed-off-by: Denis Karpov <[email protected]>
---
fs/fat/fat.h | 3 ++-
fs/fat/inode.c | 9 ++++++++-
fs/fat/misc.c | 7 +++++++
3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index a811ac0..4b7a394 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -39,7 +39,8 @@ struct fat_mount_options {
nocase:1, /* Does this need case conversion? 0=need case conversion*/
usefree:1, /* Use free_clusters for FAT32 */
tz_utc:1, /* Filesystem timestamps are in UTC */
- rodir:1; /* allow ATTR_RO for directory */
+ rodir:1, /* allow ATTR_RO for directory */
+ err_notify:1; /* Notify userspace on fs errors */
};

#define FAT_HASH_BITS 8
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 2762145..cc299fc 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -835,6 +835,8 @@ static int fat_show_options(struct seq_file *m, struct vfsmount *mnt)
seq_puts(m, ",flush");
if (opts->tz_utc)
seq_puts(m, ",tz=UTC");
+ if (opts->err_notify)
+ seq_puts(m, ",notify");

return 0;
}
@@ -847,7 +849,7 @@ enum {
Opt_charset, Opt_shortname_lower, Opt_shortname_win95,
Opt_shortname_winnt, Opt_shortname_mixed, Opt_utf8_no, Opt_utf8_yes,
Opt_uni_xl_no, Opt_uni_xl_yes, Opt_nonumtail_no, Opt_nonumtail_yes,
- Opt_obsolate, Opt_flush, Opt_tz_utc, Opt_rodir, Opt_err,
+ Opt_obsolate, Opt_flush, Opt_tz_utc, Opt_rodir, Opt_err_notify, Opt_err,
};

static const match_table_t fat_tokens = {
@@ -883,6 +885,7 @@ static const match_table_t fat_tokens = {
{Opt_obsolate, "posix"},
{Opt_flush, "flush"},
{Opt_tz_utc, "tz=UTC"},
+ {Opt_err_notify, "notify"},
{Opt_err, NULL},
};
static const match_table_t msdos_tokens = {
@@ -952,6 +955,7 @@ static int parse_options(char *options, int is_vfat, int silent, int *debug,
opts->numtail = 1;
opts->usefree = opts->nocase = 0;
opts->tz_utc = 0;
+ opts->err_notify = 0;
*debug = 0;

if (!options)
@@ -1044,6 +1048,9 @@ static int parse_options(char *options, int is_vfat, int silent, int *debug,
case Opt_tz_utc:
opts->tz_utc = 1;
break;
+ case Opt_err_notify:
+ opts->err_notify = 1;
+ break;

/* msdos specific */
case Opt_dots:
diff --git a/fs/fat/misc.c b/fs/fat/misc.c
index dca1b97..1d6ed41 100644
--- a/fs/fat/misc.c
+++ b/fs/fat/misc.c
@@ -9,6 +9,7 @@
#include <linux/module.h>
#include <linux/fs.h>
#include <linux/buffer_head.h>
+#include <linux/genhd.h>
#include "fat.h"

/*
@@ -20,6 +21,7 @@ void fat_fs_error(struct super_block *s, const char *function,
const char *fmt, ...)
{
va_list args;
+ struct msdos_sb_info *sbi = MSDOS_SB(s);

printk(KERN_ERR "FAT: Filesystem error (dev %s): %s:\n", s->s_id,
function);
@@ -34,6 +36,8 @@ void fat_fs_error(struct super_block *s, const char *function,
s->s_flags |= MS_RDONLY;
printk(KERN_ERR " File system has been set read-only\n");
}
+ if (sbi->options.err_notify)
+ notify_part_fs_unclean(part_to_dev(s->s_bdev->bd_part), 1);
}
EXPORT_SYMBOL_GPL(fat_fs_error);

@@ -45,6 +49,7 @@ void fat_fs_warning(struct super_block *s, const char * function,
const char *fmt, ...)
{
va_list args;
+ struct msdos_sb_info *sbi = MSDOS_SB(s);

printk(KERN_ERR "FAT: Filesystem warning (dev %s): %s:\n", s->s_id,
function);
@@ -54,6 +59,8 @@ void fat_fs_warning(struct super_block *s, const char * function,
vprintk(fmt, args);
printk("\n");
va_end(args);
+ if (sbi->options.err_notify)
+ notify_part_fs_unclean(part_to_dev(s->s_bdev->bd_part), 1);
}
EXPORT_SYMBOL_GPL(fat_fs_warning);

--
1.6.3.1

2009-06-03 15:06:39

by Denis Karpov

[permalink] [raw]
Subject: [PATCH 1/4] FS: filesystem corruption notification

Add a generic mechnism to notify the userspace about possible filesystem
corruption through sysfs entry (/sys/block/<bdev>/<part>/fs_unclean)
and uevent (KOBJ_CHANGE, uevent's environment variable FS_UNCLEAN=[0:1]).

To mark fs clean (e.g. after fs was fixed by userspace):

echo 0 > /sys/block/<bdev>/<part>/fs_unclean
(you will still receive uevent KOBJ_CHANGE with env var FS_UNCLEAN=0)

Signed-off-by: Denis Karpov <[email protected]>
---
fs/partitions/check.c | 42 ++++++++++++++++++++++++++++++++++++++++++
include/linux/genhd.h | 2 ++
2 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/fs/partitions/check.c b/fs/partitions/check.c
index 99e33ef..191d89e 100644
--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
@@ -196,6 +196,24 @@ check_partition(struct gendisk *hd, struct block_device *bdev)
return ERR_PTR(res);
}

+void notify_part_fs_unclean(struct device *dev, uint8_t unclean)
+{
+ char event_string[13];
+ char *envp[] = { event_string, NULL };
+
+ if ((unclean != 0 && unclean != 1) ||
+ unclean == dev_to_part(dev)->fs_unclean)
+ return;
+
+ dev_to_part(dev)->fs_unclean = unclean;
+
+ sysfs_notify(&dev->kobj, NULL, "fs_unclean");
+
+ sprintf(event_string, "FS_UNCLEAN=%u", unclean);
+ kobject_uevent_env(&dev->kobj, KOBJ_CHANGE, envp);
+}
+EXPORT_SYMBOL(notify_part_fs_unclean);
+
static ssize_t part_partition_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -246,6 +264,26 @@ ssize_t part_stat_show(struct device *dev,
jiffies_to_msecs(part_stat_read(p, time_in_queue)));
}

+ssize_t part_fs_unclean_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct hd_struct *p = dev_to_part(dev);
+
+ return sprintf(buf, "%d\n", p->fs_unclean);
+}
+
+ssize_t part_fs_unclean_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ int i;
+
+ if (count > 0 && sscanf(buf, "%d", &i) > 0)
+ notify_part_fs_unclean(dev, (i == 0) ? 0 : 1);
+
+ return count;
+}
+
#ifdef CONFIG_FAIL_MAKE_REQUEST
ssize_t part_fail_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -273,6 +311,9 @@ static DEVICE_ATTR(partition, S_IRUGO, part_partition_show, NULL);
static DEVICE_ATTR(start, S_IRUGO, part_start_show, NULL);
static DEVICE_ATTR(size, S_IRUGO, part_size_show, NULL);
static DEVICE_ATTR(stat, S_IRUGO, part_stat_show, NULL);
+static struct device_attribute dev_attr_fs_unclean =
+ __ATTR(fs_unclean, S_IRUGO|S_IWUSR, part_fs_unclean_show,
+ part_fs_unclean_store);
#ifdef CONFIG_FAIL_MAKE_REQUEST
static struct device_attribute dev_attr_fail =
__ATTR(make-it-fail, S_IRUGO|S_IWUSR, part_fail_show, part_fail_store);
@@ -283,6 +324,7 @@ static struct attribute *part_attrs[] = {
&dev_attr_start.attr,
&dev_attr_size.attr,
&dev_attr_stat.attr,
+ &dev_attr_fs_unclean.attr,
#ifdef CONFIG_FAIL_MAKE_REQUEST
&dev_attr_fail.attr,
#endif
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index a1a28ca..2e6d42e 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -98,6 +98,7 @@ struct hd_struct {
#endif
unsigned long stamp;
int in_flight;
+ int fs_unclean;
#ifdef CONFIG_SMP
struct disk_stats *dkstats;
#else
@@ -528,6 +529,7 @@ extern struct hd_struct * __must_check add_partition(struct gendisk *disk,
sector_t len, int flags);
extern void delete_partition(struct gendisk *, int);
extern void printk_all_partitions(void);
+extern void notify_part_fs_unclean(struct device *dev, uint8_t unclean);

extern struct gendisk *alloc_disk_node(int minors, int node_id);
extern struct gendisk *alloc_disk(int minors);
--
1.6.3.1

2009-06-03 15:07:15

by Denis Karpov

[permalink] [raw]
Subject: [PATCH 4/4] EXT2: add 'notify' mount option

Implement EXT2 fs mount option 'notify'. The effect of this option
is that a notification is sent to userspace on errors that indicate
filesystem damage/inconsistency. Generic filesystem corruption
notification mechnism is used.

Signed-off-by: Denis Karpov <[email protected]>
---
fs/ext2/super.c | 15 ++++++++++++++-
include/linux/ext2_fs.h | 2 +-
2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 5c4afe6..04802cd 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -32,6 +32,7 @@
#include <linux/mount.h>
#include <linux/log2.h>
#include <linux/quotaops.h>
+#include <linux/genhd.h>
#include <asm/uaccess.h>
#include "ext2.h"
#include "xattr.h"
@@ -68,6 +69,8 @@ void ext2_error (struct super_block * sb, const char * function,
printk("Remounting filesystem read-only\n");
sb->s_flags |= MS_RDONLY;
}
+ if (test_opt(sb, ERR_NOTIFY))
+ notify_part_fs_unclean(part_to_dev(sb->s_bdev->bd_part), 1);
}

void ext2_warning (struct super_block * sb, const char * function,
@@ -81,6 +84,8 @@ void ext2_warning (struct super_block * sb, const char * function,
vprintk(fmt, args);
printk("\n");
va_end(args);
+ if (test_opt(sb, ERR_NOTIFY))
+ notify_part_fs_unclean(part_to_dev(sb->s_bdev->bd_part), 1);
}

void ext2_update_dynamic_rev(struct super_block *sb)
@@ -289,6 +294,9 @@ static int ext2_show_options(struct seq_file *seq, struct vfsmount *vfs)
if (!test_opt(sb, RESERVATION))
seq_puts(seq, ",noreservation");

+ if (!test_opt(sb, ERR_NOTIFY))
+ seq_puts(seq, ",notify");
+
return 0;
}

@@ -391,7 +399,8 @@ enum {
Opt_err_ro, Opt_nouid32, Opt_nocheck, Opt_debug,
Opt_oldalloc, Opt_orlov, Opt_nobh, Opt_user_xattr, Opt_nouser_xattr,
Opt_acl, Opt_noacl, Opt_xip, Opt_ignore, Opt_err, Opt_quota,
- Opt_usrquota, Opt_grpquota, Opt_reservation, Opt_noreservation
+ Opt_usrquota, Opt_grpquota, Opt_reservation, Opt_noreservation,
+ Opt_err_notify,
};

static const match_table_t tokens = {
@@ -425,6 +434,7 @@ static const match_table_t tokens = {
{Opt_usrquota, "usrquota"},
{Opt_reservation, "reservation"},
{Opt_noreservation, "noreservation"},
+ {Opt_err_notify, "notify"},
{Opt_err, NULL}
};

@@ -565,6 +575,9 @@ static int parse_options (char * options,
clear_opt(sbi->s_mount_opt, RESERVATION);
printk("reservations OFF\n");
break;
+ case Opt_err_notify:
+ set_opt(sbi->s_mount_opt, ERR_NOTIFY);
+ break;
case Opt_ignore:
break;
default:
diff --git a/include/linux/ext2_fs.h b/include/linux/ext2_fs.h
index 121720d..ecec20b 100644
--- a/include/linux/ext2_fs.h
+++ b/include/linux/ext2_fs.h
@@ -347,7 +347,7 @@ struct ext2_inode {
#define EXT2_MOUNT_USRQUOTA 0x020000 /* user quota */
#define EXT2_MOUNT_GRPQUOTA 0x040000 /* group quota */
#define EXT2_MOUNT_RESERVATION 0x080000 /* Preallocation */
-
+#define EXT2_MOUNT_ERR_NOTIFY 0x100000 /* Error notifications */

#define clear_opt(o, opt) o &= ~EXT2_MOUNT_##opt
#define set_opt(o, opt) o |= EXT2_MOUNT_##opt
--
1.6.3.1

2009-06-03 15:06:54

by Denis Karpov

[permalink] [raw]
Subject: [PATCH 2/4] FAT: generalize errors and warning printing

Generalize FAT errors and warnings reporting through
fat_fs_error() and fat_fs_warning().

Signed-off-by: Denis Karpov <[email protected]>
---
fs/fat/cache.c | 14 ++++----
fs/fat/dir.c | 11 ++++---
fs/fat/fat.h | 8 ++++-
fs/fat/fatent.c | 15 +++++----
fs/fat/file.c | 6 ++--
fs/fat/inode.c | 81 ++++++++++++++++++++++++++++---------------------
fs/fat/misc.c | 47 +++++++++++++++++++++-------
fs/fat/namei_msdos.c | 6 ++--
fs/fat/namei_vfat.c | 6 ++--
9 files changed, 118 insertions(+), 76 deletions(-)

diff --git a/fs/fat/cache.c b/fs/fat/cache.c
index b426022..b1cf11d 100644
--- a/fs/fat/cache.c
+++ b/fs/fat/cache.c
@@ -241,9 +241,9 @@ int fat_get_cluster(struct inode *inode, int cluster, int *fclus, int *dclus)
while (*fclus < cluster) {
/* prevent the infinite loop of cluster chain */
if (*fclus > limit) {
- fat_fs_panic(sb, "%s: detected the cluster chain loop"
- " (i_pos %lld)", __func__,
- MSDOS_I(inode)->i_pos);
+ fat_fs_error(sb, __func__, "detected the cluster "
+ "chain loop (i_pos %lld)",
+ MSDOS_I(inode)->i_pos);
nr = -EIO;
goto out;
}
@@ -252,8 +252,8 @@ int fat_get_cluster(struct inode *inode, int cluster, int *fclus, int *dclus)
if (nr < 0)
goto out;
else if (nr == FAT_ENT_FREE) {
- fat_fs_panic(sb, "%s: invalid cluster chain"
- " (i_pos %lld)", __func__,
+ fat_fs_error(sb, __func__, "invalid cluster chain"
+ " (i_pos %lld)",
MSDOS_I(inode)->i_pos);
nr = -EIO;
goto out;
@@ -285,8 +285,8 @@ static int fat_bmap_cluster(struct inode *inode, int cluster)
if (ret < 0)
return ret;
else if (ret == FAT_ENT_EOF) {
- fat_fs_panic(sb, "%s: request beyond EOF (i_pos %lld)",
- __func__, MSDOS_I(inode)->i_pos);
+ fat_fs_error(sb, __func__, "request beyond EOF (i_pos %lld)",
+ MSDOS_I(inode)->i_pos);
return -EIO;
}
return dclus;
diff --git a/fs/fat/dir.c b/fs/fat/dir.c
index 3a7f603..390a984 100644
--- a/fs/fat/dir.c
+++ b/fs/fat/dir.c
@@ -85,8 +85,9 @@ next:

*bh = sb_bread(sb, phys);
if (*bh == NULL) {
- printk(KERN_ERR "FAT: Directory bread(block %llu) failed\n",
- (llu)phys);
+ fat_fs_warning(sb, __func__, "Directory bread(block %llu) "
+ "failed",
+ (llu)phys);
/* skip this block */
*pos = (iblock + 1) << sb->s_blocksize_bits;
goto next;
@@ -1269,8 +1270,8 @@ int fat_add_entries(struct inode *dir, void *slots, int nr_slots,
if (sbi->fat_bits != 32)
goto error;
} else if (MSDOS_I(dir)->i_start == 0) {
- printk(KERN_ERR "FAT: Corrupted directory (i_pos %lld)\n",
- MSDOS_I(dir)->i_pos);
+ fat_fs_warning(sb, __func__, "Corrupted directory (i_pos %lld)",
+ MSDOS_I(dir)->i_pos);
err = -EIO;
goto error;
}
@@ -1334,7 +1335,7 @@ found:
goto error_remove;
}
if (dir->i_size & (sbi->cluster_size - 1)) {
- fat_fs_panic(sb, "Odd directory size");
+ fat_fs_error(sb, __func__, "Odd directory size");
dir->i_size = (dir->i_size + sbi->cluster_size - 1)
& ~((loff_t)sbi->cluster_size - 1);
}
diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index ea440d6..a811ac0 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -310,8 +310,12 @@ extern int fat_fill_super(struct super_block *sb, void *data, int silent,
extern int fat_flush_inodes(struct super_block *sb, struct inode *i1,
struct inode *i2);
/* fat/misc.c */
-extern void fat_fs_panic(struct super_block *s, const char *fmt, ...)
- __attribute__ ((format (printf, 2, 3))) __cold;
+extern void fat_fs_error(struct super_block *s, const char *function,
+ const char *fmt, ...)
+ __attribute__ ((format (printf, 3, 4))) __cold;
+extern void fat_fs_warning(struct super_block *s, const char *function,
+ const char *fmt, ...)
+ __attribute__ ((format (printf, 3, 4))) __cold;
extern void fat_clusters_flush(struct super_block *sb);
extern int fat_chain_add(struct inode *inode, int new_dclus, int nr_cluster);
extern void fat_time_fat2unix(struct msdos_sb_info *sbi, struct timespec *ts,
diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c
index da6eea4..13f2de9 100644
--- a/fs/fat/fatent.c
+++ b/fs/fat/fatent.c
@@ -93,7 +93,8 @@ static int fat12_ent_bread(struct super_block *sb, struct fat_entry *fatent,
err_brelse:
brelse(bhs[0]);
err:
- printk(KERN_ERR "FAT: FAT read failed (blocknr %llu)\n", (llu)blocknr);
+ fat_fs_warning(sb, __func__, "FAT read failed (blocknr %llu)",
+ (llu)blocknr);
return -EIO;
}

@@ -105,8 +106,9 @@ static int fat_ent_bread(struct super_block *sb, struct fat_entry *fatent,
WARN_ON(blocknr < MSDOS_SB(sb)->fat_start);
fatent->bhs[0] = sb_bread(sb, blocknr);
if (!fatent->bhs[0]) {
- printk(KERN_ERR "FAT: FAT read failed (blocknr %llu)\n",
- (llu)blocknr);
+ fat_fs_warning(sb, __func__, "FAT: FAT read failed "
+ "(blocknr %llu)",
+ (llu)blocknr);
return -EIO;
}
fatent->nr_bhs = 1;
@@ -345,7 +347,8 @@ int fat_ent_read(struct inode *inode, struct fat_entry *fatent, int entry)

if (entry < FAT_START_ENT || sbi->max_cluster <= entry) {
fatent_brelse(fatent);
- fat_fs_panic(sb, "invalid access to FAT (entry 0x%08x)", entry);
+ fat_fs_error(sb, __func__, "invalid access to FAT "
+ "(entry 0x%08x)", entry);
return -EIO;
}

@@ -557,8 +560,8 @@ int fat_free_clusters(struct inode *inode, int cluster)
err = cluster;
goto error;
} else if (cluster == FAT_ENT_FREE) {
- fat_fs_panic(sb, "%s: deleting FAT entry beyond EOF",
- __func__);
+ fat_fs_error(sb, __func__, "deleting FAT entry beyond "
+ "EOF");
err = -EIO;
goto error;
}
diff --git a/fs/fat/file.c b/fs/fat/file.c
index 0a7f4a9..df65446 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -213,9 +213,9 @@ static int fat_free(struct inode *inode, int skip)
fatent_brelse(&fatent);
return 0;
} else if (ret == FAT_ENT_FREE) {
- fat_fs_panic(sb,
- "%s: invalid cluster chain (i_pos %lld)",
- __func__, MSDOS_I(inode)->i_pos);
+ fat_fs_error(sb, __func__,
+ "invalid cluster chain (i_pos %lld)",
+ MSDOS_I(inode)->i_pos);
ret = -EIO;
} else if (ret > 0) {
err = fat_ent_write(inode, &fatent, FAT_ENT_EOF, wait);
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 296785a..2762145 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -76,7 +76,8 @@ static inline int __fat_get_block(struct inode *inode, sector_t iblock,
return 0;

if (iblock != MSDOS_I(inode)->mmu_private >> sb->s_blocksize_bits) {
- fat_fs_panic(sb, "corrupted file size (i_pos %lld, %lld)",
+ fat_fs_error(sb, __func__, "corrupted file size "
+ "(i_pos %lld, %lld)",
MSDOS_I(inode)->i_pos, MSDOS_I(inode)->mmu_private);
return -EIO;
}
@@ -579,8 +580,8 @@ retry:

bh = sb_bread(sb, i_pos >> sbi->dir_per_block_bits);
if (!bh) {
- printk(KERN_ERR "FAT: unable to read inode block "
- "for updating (i_pos %lld)\n", i_pos);
+ fat_fs_warning(sb, __func__, "unable to read inode block "
+ "for updating (i_pos %lld)", i_pos);
return -EIO;
}
spin_lock(&sbi->inode_hash_lock);
@@ -1107,9 +1108,8 @@ static int parse_options(char *options, int is_vfat, int silent, int *debug,
/* unknown option */
default:
if (!silent) {
- printk(KERN_ERR
- "FAT: Unrecognized mount option \"%s\" "
- "or missing value\n", p);
+ printk(KERN_ERR "FAT: Unrecognized mount "
+ "option \"%s\" or missing value\n", p);
}
return -EINVAL;
}
@@ -1118,9 +1118,9 @@ static int parse_options(char *options, int is_vfat, int silent, int *debug,
out:
/* UTF-8 doesn't provide FAT semantics */
if (!strcmp(opts->iocharset, "utf8")) {
- printk(KERN_ERR "FAT: utf8 is not a recommended IO charset"
- " for FAT filesystems, filesystem will be "
- "case sensitive!\n");
+ printk(KERN_WARNING "FAT: utf8 is not a recommended IO "
+ "charset for FAT filesystems, filesystem will be "
+ "case sensitive!");
}

/* If user doesn't specify allow_utime, it's initialized from dmask. */
@@ -1210,20 +1210,22 @@ int fat_fill_super(struct super_block *sb, void *data, int silent,
sb_min_blocksize(sb, 512);
bh = sb_bread(sb, 0);
if (bh == NULL) {
- printk(KERN_ERR "FAT: unable to read boot sector\n");
+ fat_fs_warning(sb, __func__, "unable to read boot sector");
goto out_fail;
}

b = (struct fat_boot_sector *) bh->b_data;
if (!b->reserved) {
if (!silent)
- printk(KERN_ERR "FAT: bogus number of reserved sectors\n");
+ fat_fs_warning(sb, __func__, "bogus number of reserved "
+ "sectors");
brelse(bh);
goto out_invalid;
}
if (!b->fats) {
if (!silent)
- printk(KERN_ERR "FAT: bogus number of FAT structure\n");
+ fat_fs_warning(sb, __func__, "bogus number of FAT "
+ "structure");
brelse(bh);
goto out_invalid;
}
@@ -1236,8 +1238,9 @@ int fat_fill_super(struct super_block *sb, void *data, int silent,
media = b->media;
if (!fat_valid_media(media)) {
if (!silent)
- printk(KERN_ERR "FAT: invalid media value (0x%02x)\n",
- media);
+ fat_fs_warning(sb, __func__, "invalid media value "
+ "(0x%02x)",
+ media);
brelse(bh);
goto out_invalid;
}
@@ -1246,23 +1249,26 @@ int fat_fill_super(struct super_block *sb, void *data, int silent,
|| (logical_sector_size < 512)
|| (logical_sector_size > 4096)) {
if (!silent)
- printk(KERN_ERR "FAT: bogus logical sector size %u\n",
- logical_sector_size);
+ fat_fs_warning(sb, __func__, "bogus logical sector "
+ "size %u",
+ logical_sector_size);
brelse(bh);
goto out_invalid;
}
sbi->sec_per_clus = b->sec_per_clus;
if (!is_power_of_2(sbi->sec_per_clus)) {
if (!silent)
- printk(KERN_ERR "FAT: bogus sectors per cluster %u\n",
- sbi->sec_per_clus);
+ fat_fs_warning(sb, __func__, "bogus sectors per "
+ "cluster %u",
+ sbi->sec_per_clus);
brelse(bh);
goto out_invalid;
}

if (logical_sector_size < sb->s_blocksize) {
- printk(KERN_ERR "FAT: logical sector size too small for device"
- " (logical sector size = %u)\n", logical_sector_size);
+ fat_fs_warning(sb, __func__, "logical sector size too small "
+ "for device"
+ " (logical sector size = %u)", logical_sector_size);
brelse(bh);
goto out_fail;
}
@@ -1270,15 +1276,16 @@ int fat_fill_super(struct super_block *sb, void *data, int silent,
brelse(bh);

if (!sb_set_blocksize(sb, logical_sector_size)) {
- printk(KERN_ERR "FAT: unable to set blocksize %u\n",
- logical_sector_size);
+ fat_fs_warning(sb, __func__, "unable to set "
+ "blocksize %u",
+ logical_sector_size);
goto out_fail;
}
bh = sb_bread(sb, 0);
if (bh == NULL) {
- printk(KERN_ERR "FAT: unable to read boot sector"
- " (logical sector size = %lu)\n",
- sb->s_blocksize);
+ fat_fs_warning(sb, __func__, "unable to read "
+ "boot sector (logical sector size = %lu)",
+ sb->s_blocksize);
goto out_fail;
}
b = (struct fat_boot_sector *) bh->b_data;
@@ -1313,8 +1320,9 @@ int fat_fill_super(struct super_block *sb, void *data, int silent,

fsinfo_bh = sb_bread(sb, sbi->fsinfo_sector);
if (fsinfo_bh == NULL) {
- printk(KERN_ERR "FAT: bread failed, FSINFO block"
- " (sector = %lu)\n", sbi->fsinfo_sector);
+ fat_fs_warning(sb, __func__, "bread failed, FSINFO "
+ "block (sector = %lu)",
+ sbi->fsinfo_sector);
brelse(bh);
goto out_fail;
}
@@ -1343,8 +1351,9 @@ int fat_fill_super(struct super_block *sb, void *data, int silent,
sbi->dir_entries = get_unaligned_le16(&b->dir_entries);
if (sbi->dir_entries & (sbi->dir_per_block - 1)) {
if (!silent)
- printk(KERN_ERR "FAT: bogus directroy-entries per block"
- " (%u)\n", sbi->dir_entries);
+ fat_fs_warning(sb, __func__, "bogus directroy-entries"
+ " per block (%u)",
+ sbi->dir_entries);
brelse(bh);
goto out_invalid;
}
@@ -1366,8 +1375,9 @@ int fat_fill_super(struct super_block *sb, void *data, int silent,
total_clusters = min(total_clusters, fat_clusters - FAT_START_ENT);
if (total_clusters > MAX_FAT(sb)) {
if (!silent)
- printk(KERN_ERR "FAT: count of clusters too big (%u)\n",
- total_clusters);
+ fat_fs_warning(sb, __func__, "count of clusters too "
+ "big (%u)",
+ total_clusters);
brelse(bh);
goto out_invalid;
}
@@ -1399,7 +1409,7 @@ int fat_fill_super(struct super_block *sb, void *data, int silent,
sprintf(buf, "cp%d", sbi->options.codepage);
sbi->nls_disk = load_nls(buf);
if (!sbi->nls_disk) {
- printk(KERN_ERR "FAT: codepage %s not found\n", buf);
+ fat_fs_warning(sb, __func__, "codepage %s not found", buf);
goto out_fail;
}

@@ -1407,8 +1417,9 @@ int fat_fill_super(struct super_block *sb, void *data, int silent,
if (sbi->options.isvfat) {
sbi->nls_io = load_nls(sbi->options.iocharset);
if (!sbi->nls_io) {
- printk(KERN_ERR "FAT: IO charset %s not found\n",
- sbi->options.iocharset);
+ fat_fs_warning(sb, __func__, "IO charset %s not "
+ "found",
+ sbi->options.iocharset);
goto out_fail;
}
}
@@ -1426,7 +1437,7 @@ int fat_fill_super(struct super_block *sb, void *data, int silent,
insert_inode_hash(root_inode);
sb->s_root = d_alloc_root(root_inode);
if (!sb->s_root) {
- printk(KERN_ERR "FAT: get root inode failed\n");
+ fat_fs_warning(sb, __func__, "get root inode failed");
goto out_fail;
}

diff --git a/fs/fat/misc.c b/fs/fat/misc.c
index ac39ebc..dca1b97 100644
--- a/fs/fat/misc.c
+++ b/fs/fat/misc.c
@@ -12,14 +12,17 @@
#include "fat.h"

/*
- * fat_fs_panic reports a severe file system problem and sets the file system
- * read-only. The file system can be made writable again by remounting it.
+ * fat_fs_error reports a file system problem that might indicate fs data
+ * corruption/inconsistency. The file system is remounted read-only, it can
+ * be made writable again by remounting it r/w.
*/
-void fat_fs_panic(struct super_block *s, const char *fmt, ...)
+void fat_fs_error(struct super_block *s, const char *function,
+ const char *fmt, ...)
{
va_list args;

- printk(KERN_ERR "FAT: Filesystem panic (dev %s)\n", s->s_id);
+ printk(KERN_ERR "FAT: Filesystem error (dev %s): %s:\n", s->s_id,
+ function);

printk(KERN_ERR " ");
va_start(args, fmt);
@@ -32,8 +35,27 @@ void fat_fs_panic(struct super_block *s, const char *fmt, ...)
printk(KERN_ERR " File system has been set read-only\n");
}
}
+EXPORT_SYMBOL_GPL(fat_fs_error);

-EXPORT_SYMBOL_GPL(fat_fs_panic);
+/*
+ * fat_fs_warning reports a file system non-critical problem that stil
+ * might indicate fs data corruption/inconsistency.
+ */
+void fat_fs_warning(struct super_block *s, const char * function,
+ const char *fmt, ...)
+{
+ va_list args;
+
+ printk(KERN_ERR "FAT: Filesystem warning (dev %s): %s:\n", s->s_id,
+ function);
+
+ printk(KERN_ERR " ");
+ va_start(args, fmt);
+ vprintk(fmt, args);
+ printk("\n");
+ va_end(args);
+}
+EXPORT_SYMBOL_GPL(fat_fs_warning);

/* Flushes the number of free clusters on FAT32 */
/* XXX: Need to write one per FSINFO block. Currently only writes 1 */
@@ -48,15 +70,15 @@ void fat_clusters_flush(struct super_block *sb)

bh = sb_bread(sb, sbi->fsinfo_sector);
if (bh == NULL) {
- printk(KERN_ERR "FAT: bread failed in fat_clusters_flush\n");
+ fat_fs_warning(sb, __func__, "bread failed");
return;
}

fsinfo = (struct fat_boot_fsinfo *)bh->b_data;
/* Sanity check */
if (!IS_FSINFO(fsinfo)) {
- printk(KERN_ERR "FAT: Invalid FSINFO signature: "
- "0x%08x, 0x%08x (sector = %lu)\n",
+ fat_fs_warning(sb, __func__, "Invalid FSINFO signature: "
+ "0x%08x, 0x%08x (sector = %lu)",
le32_to_cpu(fsinfo->signature1),
le32_to_cpu(fsinfo->signature2),
sbi->fsinfo_sector);
@@ -124,10 +146,11 @@ int fat_chain_add(struct inode *inode, int new_dclus, int nr_cluster)
mark_inode_dirty(inode);
}
if (new_fclus != (inode->i_blocks >> (sbi->cluster_bits - 9))) {
- fat_fs_panic(sb, "clusters badly computed (%d != %llu)",
- new_fclus,
- (llu)(inode->i_blocks >> (sbi->cluster_bits - 9)));
- fat_cache_inval_inode(inode);
+ fat_fs_error(sb, __func__, "clusters badly computed "
+ "(%d != %llu)",
+ new_fclus,
+ (llu)(inode->i_blocks >> (sbi->cluster_bits - 9)));
+ fat_cache_inval_inode(inode);
}
inode->i_blocks += nr_cluster << (sbi->cluster_bits - 9);

diff --git a/fs/fat/namei_msdos.c b/fs/fat/namei_msdos.c
index da3f361..964f378 100644
--- a/fs/fat/namei_msdos.c
+++ b/fs/fat/namei_msdos.c
@@ -608,9 +608,9 @@ error_inode:
sinfo.bh = NULL;
}
if (corrupt < 0) {
- fat_fs_panic(new_dir->i_sb,
- "%s: Filesystem corrupted (i_pos %lld)",
- __func__, sinfo.i_pos);
+ fat_fs_error(new_dir->i_sb, __func__,
+ "Filesystem corrupted (i_pos %lld)",
+ sinfo.i_pos);
}
goto out;
}
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index a0e00e3..91601d6 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -1030,9 +1030,9 @@ error_inode:
sinfo.bh = NULL;
}
if (corrupt < 0) {
- fat_fs_panic(new_dir->i_sb,
- "%s: Filesystem corrupted (i_pos %lld)",
- __func__, sinfo.i_pos);
+ fat_fs_error(new_dir->i_sb, __func__,
+ "Filesystem corrupted (i_pos %lld)",
+ sinfo.i_pos);
}
goto out;
}
--
1.6.3.1

2009-06-03 15:38:10

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH 0/4] FS: userspace notification of errors

Denis Karpov wrote:
> Hello,
>
> these patches are resent (a bit re-worked and separated from other stuff).
> The issue was discussed here:
> http://marc.info/?l=linux-fsdevel&m=124402900920380&w=2
>
> Summary:
>
> 1. Generic mechanism for notifications of user space about file system's
> errors/inconsistency on a particular partition using:
>
> - sysfs entry /sys/block/<bdev>/<part>/fs_unclean
> - uevent KOBJ_CHANGE, uevent's environment variable FS_UNCLEAN=[0:1]

My first thought here, just at a very high level, is that fs_errors
rather than fs_unclean may be more accurate; at least in my filesystem
developer world, an "unclean" filesystem is one that was not unmounted
cleanly, not one with ... errors. "fs_errors" (or fs_has_errors?)
would also be more in sync with ext3's "errors=" mount options...

> Userspace might want to monitor these notifications (poll2() on sysfs
> file or udevd's rule for uevent) and fix the fs damage.
> Filesystem can be marked clean again by writing '0' to the
> corresponding 'fs_unclean' sysfs file.

It seems a little odd to me that you can just clear this error condition
without necessarily fixing the actual error, but I don't know how else
it should be done....

For ext2/3/4, the fs is -marked- with errors in the superblock, so when
it mounts with that error flag cleared (by fsck), the mount itself could
clear this error condition perhaps? Maybe it could be the filesystem's
choice whether the error condition is clearable from userspace?

It's also possible that the error was encountered in memory rather than
from on-disk, so it might be nice to differentiate somehow, at least for
filesystems which can do this. I'm thinking here of "I read something
from disk that was supposed to be an inode but it had the wrong magic
number" vs. "I hit a programming error that caused the transaction
subsystem to get into a state where the filesystem had to shut down" -
in the latter case, fsck is not going to resolve it...

Thanks,
-Eric

> Currently some file systems remount themselves r/o on critical errors
> (*FAT; EXT2 depending on 'errors' mount option), userspace is generally
> unaware of such events. This feature will allow user space to become
> aware of possible file system problems and do something about them
> (e.g. run fsck automatically or with user's consent).
> [PATCH 1]
>
> 2. Make FAT and EXT2 file systems use the above mechanism to optionally
> notify user space about errors. Implemented as 'notify' mount option
> (PATCH 3,4).
> FAT error reporting facilities had to be re-factored (PATCH 2) in
> order to simplify sending error notifications.
>
> Adrian Hunter and Artem Bityutskiy provided input and ideas on implementing
> these features.
>
> Denis Karpov.

2009-06-03 18:57:19

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/4] FS: userspace notification of errors

On Wed, 3 Jun 2009 18:05:14 +0300
Denis Karpov <[email protected]> wrote:

>
> Hello,
>
> these patches are resent (a bit re-worked and separated from other stuff).
> The issue was discussed here:
> http://marc.info/?l=linux-fsdevel&m=124402900920380&w=2
>
> Summary:
>
> 1. Generic mechanism for notifications of user space about file system's
> errors/inconsistency on a particular partition using:
>
> - sysfs entry /sys/block/<bdev>/<part>/fs_unclean
> - uevent KOBJ_CHANGE, uevent's environment variable FS_UNCLEAN=[0:1]
>
> Userspace might want to monitor these notifications (poll2() on sysfs
> file or udevd's rule for uevent) and fix the fs damage.
> Filesystem can be marked clean again by writing '0' to the
> corresponding 'fs_unclean' sysfs file.
>
> Currently some file systems remount themselves r/o on critical errors
> (*FAT; EXT2 depending on 'errors' mount option), userspace is generally
> unaware of such events. This feature will allow user space to become
> aware of possible file system problems and do something about them
> (e.g. run fsck automatically or with user's consent).
> [PATCH 1]
>
> 2. Make FAT and EXT2 file systems use the above mechanism to optionally
> notify user space about errors. Implemented as 'notify' mount option
> (PATCH 3,4).
> FAT error reporting facilities had to be re-factored (PATCH 2) in
> order to simplify sending error notifications.
>
> Adrian Hunter and Artem Bityutskiy provided input and ideas on implementing
> these features.
>

hm, I'm uncertain on the desirability or otherwise of the overall feature.

Are there users or distros or device manufacturers asking for this?
Where did the requirement come from?

What downstream application will handle the uevent messages? Do you
have some userspace design/plan in mind?

IOW, it would be useful if we were told more about all of this, rather
than just staring at a kernel patch!

One part of the design which you didn't describe, but which I inferred
is that you intend that userspace will see the FS_UNCLEAN=1 messages
and will then poll all the /sys/block/<bdev>/<part>/fs_unclean files to
work out which partition(s) got the error, correct? Please spell all
that out in the changelog.

What use is the FS_UNCLEAN=0 message? I don't get that. Again, please
cover this in the description.

The "unclean" term doesn't seem a good fit. It usually means "has
in-memory data which needs writing back". But here you've redefined
"unclean" to mean "got an IO error" or "detected metadata
inconsistency", or perhaps "dunno, please run fsck to find out". This
all should be spelled out in exacting detail and thought about, please.

2009-06-03 18:58:54

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/4] FS: filesystem corruption notification

On Wed, 3 Jun 2009 18:05:15 +0300
Denis Karpov <[email protected]> wrote:

> Add a generic mechnism to notify the userspace about possible filesystem
> corruption through sysfs entry (/sys/block/<bdev>/<part>/fs_unclean)
> and uevent (KOBJ_CHANGE, uevent's environment variable FS_UNCLEAN=[0:1]).
>
> To mark fs clean (e.g. after fs was fixed by userspace):
>
> echo 0 > /sys/block/<bdev>/<part>/fs_unclean
> (you will still receive uevent KOBJ_CHANGE with env var FS_UNCLEAN=0)
>
> Signed-off-by: Denis Karpov <[email protected]>
> ---
> fs/partitions/check.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> include/linux/genhd.h | 2 ++
> 2 files changed, 44 insertions(+), 0 deletions(-)
>
> diff --git a/fs/partitions/check.c b/fs/partitions/check.c
> index 99e33ef..191d89e 100644
> --- a/fs/partitions/check.c
> +++ b/fs/partitions/check.c
> @@ -196,6 +196,24 @@ check_partition(struct gendisk *hd, struct block_device *bdev)
> return ERR_PTR(res);
> }
>
> +void notify_part_fs_unclean(struct device *dev, uint8_t unclean)
> +{
> + char event_string[13];
> + char *envp[] = { event_string, NULL };
> +
> + if ((unclean != 0 && unclean != 1) ||
> + unclean == dev_to_part(dev)->fs_unclean)
> + return;
> +
> + dev_to_part(dev)->fs_unclean = unclean;
> +
> + sysfs_notify(&dev->kobj, NULL, "fs_unclean");
> +
> + sprintf(event_string, "FS_UNCLEAN=%u", unclean);
> + kobject_uevent_env(&dev->kobj, KOBJ_CHANGE, envp);
> +}
> +EXPORT_SYMBOL(notify_part_fs_unclean);

Please document this function.

That documentation should, amongst other things, explain the semantics
of the `unclean' argument. It can be 0, 1 or "something else". Why?

Also, why is `unclean' a u8 rather than a boring old `int'?

> static ssize_t part_partition_show(struct device *dev,
> struct device_attribute *attr, char *buf)
> {
> @@ -246,6 +264,26 @@ ssize_t part_stat_show(struct device *dev,
> jiffies_to_msecs(part_stat_read(p, time_in_queue)));
> }
>
> +ssize_t part_fs_unclean_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct hd_struct *p = dev_to_part(dev);
> +
> + return sprintf(buf, "%d\n", p->fs_unclean);
> +}
> +
> +ssize_t part_fs_unclean_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + int i;
> +
> + if (count > 0 && sscanf(buf, "%d", &i) > 0)

strict_strtoul(), please.

> + notify_part_fs_unclean(dev, (i == 0) ? 0 : 1);
> +
> + return count;
> +}

2009-06-03 19:00:31

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 3/4] FAT: add 'notify' mount option

On Wed, 3 Jun 2009 18:05:17 +0300
Denis Karpov <[email protected]> wrote:

> Implement FAT fs mount option 'notify'. The effect of this option
> is that a notification is sent to userspace on errors that indicate
> filesystem damage/inconsistency. Generic filesystem corruption
> notification mechnism is used.
>
> Signed-off-by: Denis Karpov <[email protected]>
> ---
> fs/fat/fat.h | 3 ++-
> fs/fat/inode.c | 9 ++++++++-
> fs/fat/misc.c | 7 +++++++

fatfs mount options are documented in
Documentation/filesystems/vfat.txt, please.

2009-06-03 19:01:14

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 4/4] EXT2: add 'notify' mount option

On Wed, 3 Jun 2009 18:05:18 +0300
Denis Karpov <[email protected]> wrote:

> Implement EXT2 fs mount option 'notify'. The effect of this option
> is that a notification is sent to userspace on errors that indicate
> filesystem damage/inconsistency. Generic filesystem corruption
> notification mechnism is used.
>
> Signed-off-by: Denis Karpov <[email protected]>
> ---
> fs/ext2/super.c | 15 ++++++++++++++-
> include/linux/ext2_fs.h | 2 +-

Please document the mount option in Documentation/filesystems/ext2.txt.

2009-06-03 22:30:19

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 0/4] FS: userspace notification of errors

Hi,

> 2. Make FAT and EXT2 file systems use the above mechanism to optionally
> notify user space about errors. Implemented as 'notify' mount option
> (PATCH 3,4).
> FAT error reporting facilities had to be re-factored (PATCH 2) in
> order to simplify sending error notifications.
One question: Why would not a filesystem / admin want the notification
to be sent? Or in other words what's the point of the mount option?

Honza
--
Jan Kara <[email protected]>
SuSE CR Labs

2009-06-04 01:59:53

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH 0/4] FS: userspace notification of errors

Andrew Morton wrote:
> The "unclean" term doesn't seem a good fit. It usually means "has
> in-memory data which needs writing back". But here you've redefined
> "unclean" to mean "got an IO error" or "detected metadata
> inconsistency", or perhaps "dunno, please run fsck to find out". This
> all should be spelled out in exacting detail and thought about, please.

I agree. "unclean" (or "dirty") should be reserved for indicating
that the filesystem has been modified, that is, files written to etc.

Use another term like "fault" or "error"?

-- Jamie

2009-06-04 06:01:39

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH 0/4] FS: userspace notification of errors

Andrew Morton wrote:
> hm, I'm uncertain on the desirability or otherwise of the overall feature.
>
> Are there users or distros or device manufacturers asking for this?
> Where did the requirement come from?
>
> What downstream application will handle the uevent messages? Do you
> have some userspace design/plan in mind?
>
> IOW, it would be useful if we were told more about all of this, rather
> than just staring at a kernel patch!

As the original idea came from me, while whole implementation
and design was done by Denis, I'll comment on this.

Our use-case is about hand-held devices. We are particularly
working with large FAT volumes on MMC. Do not question please
why it is FAT and not something else :-) Anyway, FAT is very
unreliable, and often hits errors, in which case it simply
switches to read-only mode, and usually prints something to
the printk ring buffer.

When FAT becomes read only out of the blue, the user-space
reaction if very different. Often applications just start
failing, dying, etc. From users' perspective, the hand-held
just becomes weird.

What we want instead is to teach FAT to send the user-space a
notification. What our user-space people think to do is to
catch the notification and show a dialog window which tells
something like "Please, check your FS, blah blah", and may
be offer the user to run fsck.vfat, not exactly sure.

> One part of the design which you didn't describe, but which I inferred
> is that you intend that userspace will see the FS_UNCLEAN=1 messages
> and will then poll all the /sys/block/<bdev>/<part>/fs_unclean files to
> work out which partition(s) got the error, correct? Please spell all
> that out in the changelog.

I think this part of the design needs more thought. Not
all FSes have block devices (UBIFS, JFFS2), and some FSes
may (theoretically) span more than one block device (btrfs?).

Probably it is better to go without any sysfs file and
just send udev events.

> What use is the FS_UNCLEAN=0 message? I don't get that. Again, please
> cover this in the description.

Yes, the description should be improved. I think the idea is that
we add an udev rule which invokes a certain user-space script/app
on "FAT became R/O" events.

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

2009-06-04 06:12:58

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH 0/4] FS: userspace notification of errors

ext Jan Kara wrote:
> Hi,
>
>> 2. Make FAT and EXT2 file systems use the above mechanism to optionally
>> notify user space about errors. Implemented as 'notify' mount option
>> (PATCH 3,4).
>> FAT error reporting facilities had to be re-factored (PATCH 2) in
>> order to simplify sending error notifications.
> One question: Why would not a filesystem / admin want the notification
> to be sent? Or in other words what's the point of the mount option?

I agree on this. I guess we should instead implement a VFS helper
like 'send_error_notification(sb, error_type)'. Then we can teach
certain FSes to utilize it. We are particularly interested in FAT,
actually.

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

2009-06-04 12:53:44

by Kay Sievers

[permalink] [raw]
Subject: Re: [PATCH 0/4] FS: userspace notification of errors

On Wed, Jun 3, 2009 at 20:56, Andrew Morton <[email protected]> wrote:
> On Wed,  3 Jun 2009 18:05:14 +0300 Denis Karpov <[email protected]> wrote:

> hm, I'm uncertain on the desirability or otherwise of the overall feature.
>
> Are there users or distros or device manufacturers asking for this?
> Where did the requirement come from?

I think we really need something like this to propagate such errors to
userpace. Printing something to the kernel log is not an useful
interface in any way. But I don't think we want it that way, not with
uevents, not with sysfs, and not tied to block devices.

Uevents should not be used for error reporting, unless it is well
defined within the _device_ context, which a filesystem on top of a
blockdev isn't. We could argue to get events for bad blocks of a
device, but I don't think we want filesystem related stuff ever in
device uevents. For the same reason, there should be no unconditional
fs-specific sysfs file below a block device.

Block device interfaces for filesystems can not handle device-less
virtual mounts which are common these days. There is no direct
relation from the device to the filesystem - so this would only work
for simple direct mounts, which isn't sufficient for a higher-level
interface like this.

And I don't think we want several event sources for the same thing,
uevents _and_ pollable sysfs files.

We already raise events on /proc/self/mountinfo when the mount tree
changes, I guess that's where fs specific stuff belongs, and it will
work with all kind of filesystem setups, regardless of the devices
below it. This is also the established interface for flags and options
and the current state of the filesystem, and does not mix filesystem
options into block device interfaces.

/proc/self/mountinfo could also work properly with namespaces which
might have different meaning for a device in a different namespace.

Thanks,
Kay

2009-06-04 14:28:45

by Denis Karpov

[permalink] [raw]
Subject: Re: [PATCH 0/4] FS: userspace notification of errors

On Thu, Jun 04, 2009 at 07:57:58AM +0200, Bityutskiy Artem (Nokia-D/Helsinki) wrote:
> Andrew Morton wrote:
> > hm, I'm uncertain on the desirability or otherwise of the overall feature.
> >
> > Are there users or distros or device manufacturers asking for this?
> > Where did the requirement come from?
> >
> > What downstream application will handle the uevent messages? Do you
> > have some userspace design/plan in mind?
> >
> > IOW, it would be useful if we were told more about all of this, rather
> > than just staring at a kernel patch!
>
> As the original idea came from me, while whole implementation
> and design was done by Denis, I'll comment on this.
>
> Our use-case is about hand-held devices. We are particularly
> working with large FAT volumes on MMC. Do not question please
> why it is FAT and not something else :-) Anyway, FAT is very
> unreliable, and often hits errors, in which case it simply
> switches to read-only mode, and usually prints something to
> the printk ring buffer.
>
> When FAT becomes read only out of the blue, the user-space
> reaction if very different. Often applications just start
> failing, dying, etc. From users' perspective, the hand-held
> just becomes weird.
>
> What we want instead is to teach FAT to send the user-space a
> notification. What our user-space people think to do is to
> catch the notification and show a dialog window which tells
> something like "Please, check your FS, blah blah", and may
> be offer the user to run fsck.vfat, not exactly sure.

I only can add that we partially worked around the problem implementing
'errors=[remount-ro|continue|panic]' for FAT, just as it's done for ext2.
http://marc.info/?t=124395937100042&r=1&w=2
Still, getting a notification would help userspace.

I'll fix everything related to comments on missing documentation and bad
naming ("fs_error" is ok, I suppose?).

Clearing the fs_error attribute from the user space is wrong, I agree.
The attribute shall be made read-only and reset at mount time (as we
assume we are staring with a clean^H^H^H^H good filesystem). On the
error event, the userspace would be expected to umount the partition,
fsck it and mount back.

> > One part of the design which you didn't describe, but which I inferred
> > is that you intend that userspace will see the FS_UNCLEAN=1 messages
> > and will then poll all the /sys/block/<bdev>/<part>/fs_unclean files to
> > work out which partition(s) got the error, correct? Please spell all
> > that out in the changelog.
>
> I think this part of the design needs more thought. Not
> all FSes have block devices (UBIFS, JFFS2), and some FSes
> may (theoretically) span more than one block device (btrfs?).

Big thanks to everybody participating in this thread, for reviews and critiques.
Here's a proposal/RFC for another way to implement this feature:

Taking into account Artem's and Kay's comments, indeed, having attributes
like 'fs_error' tied to a block device does not seem right.
What we need is an object/entity that:

- is not associated to a block device
- is not associated to a partition
- is not associated to a filesystem as a general entity
- is uniquely associated to a filesystem's 'instance': a mounted volume
carying that filesystem
- apperas at volume mount time and disappears with volume unmount

Sounds like "fs" kobject class answers to this problem. ext4 presents an
example of such kset and kobjects:

/sys/fs/<kset>/<kobjects_fs_volumes>/<attributes>
(e.g. /sys/fs/ext4/sda1/...)
Currently there are no uevents associated with those kobjects and their
attributes.

Currently only ext4 and fuse register kobjects/ksets in fs class.
I suggest to implement corresponding feature for FAT (and any other
filesystem that might need to expose certain
internal data/statistics/parameters/info to userspace).
That's what fs class was meant for, wasn't it ?

/sys/fs/<fs_name>/<volume>/{attributes}
(e.g. /sys/fs/fat/mmcblk0p1/{mount_point,fs_type,fs_error})
kset: fat
kobjects: fat volumes
attributes:
mount_point : <path>, ro
fs_type : <msdos|fat|vfat>, ro
fs_error : <0|1>, ro, when FS is mounted this is set to 0;
upon error this is set to 1, uevent KOBJ_CHANGE is optionally
sent, with following vars:

On fs volume mount/umount: KOBJ_ADD/KOBJ_REMOVE
Env vars:
ACTION=[add|remove]
DEVPATH=/sys/fs/fat/<partition_bdev>
SUBSYSTEM=fs
SEQNUM=<sequence number>
MOUNT_POINT=[path]
FS_TYPE=[msdos|fat|vfat]

On fs error during run-time: KOBJ_CHANGE
ACTION=[change]
DEVPATH=/sys/fs/fat/<partition_bdev>
SUBSYSTEM=fs
SEQNUM=<sequence number>
MOUNT_POINT=[path]
FS_NAME=[msdos|fat|vfat]
FS_ERRORS=1

To have only sysfs structure for polling or uevents interface or both still
remains a question for me. In the context of this specific kobjects the
uevents can be specified clearly enough.

Taking the above one step further, this could be done automatically for all
filesystems - registration of a standard kset and kobjects in fs class
under /sys/fs. Filesystem should be able to extend standard kobjects to be
able to add it's specific attributes. Signalling with uevents can be made
optional (as a parameter of an attribute registration, not as a fs mount
option).

Answering Eric's comments, the above design would give enough flexibility
to report different types of errors/events, depending on particular
filesystem's needs.

Denis Karpov

2009-06-04 14:44:47

by Russell Cattelan

[permalink] [raw]
Subject: Re: [PATCH 0/4] FS: userspace notification of errors

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kay Sievers wrote:
> On Wed, Jun 3, 2009 at 20:56, Andrew Morton
> <[email protected]> wrote:
>> On Wed, 3 Jun 2009 18:05:14 +0300 Denis Karpov
>> <[email protected]> wrote:
>
>> hm, I'm uncertain on the desirability or otherwise of the overall
>> feature.
>>
>> Are there users or distros or device manufacturers asking for
>> this? Where did the requirement come from?
>
> I think we really need something like this to propagate such errors
> to userpace. Printing something to the kernel log is not an useful
> interface in any way. But I don't think we want it that way, not
> with uevents, not with sysfs, and not tied to block devices.
>
> Uevents should not be used for error reporting, unless it is well
> defined within the _device_ context, which a filesystem on top of a
> blockdev isn't. We could argue to get events for bad blocks of a
> device, but I don't think we want filesystem related stuff ever in
> device uevents. For the same reason, there should be no
> unconditional fs-specific sysfs file below a block device.
>
> Block device interfaces for filesystems can not handle device-less
> virtual mounts which are common these days. There is no direct
> relation from the device to the filesystem - so this would only
> work for simple direct mounts, which isn't sufficient for a
> higher-level interface like this.
>
> And I don't think we want several event sources for the same thing,
> uevents _and_ pollable sysfs files.
>
> We already raise events on /proc/self/mountinfo when the mount tree
> changes, I guess that's where fs specific stuff belongs, and it
> will work with all kind of filesystem setups, regardless of the
> devices below it. This is also the established interface for flags
> and options and the current state of the filesystem, and does not
> mix filesystem options into block device interfaces.
Given the infrastructure of /proc/self/mountinfo already exists it
seems like a better place to add
fs_fault notifications.

What I'm wondering is that going to be enough? What actions should be
taken next to correct the
issue? I can see in the case of a hand held device using FAT on mmc
the options are limited,
but in large SAN environments the fault may be due to any number of
failures and thus require
various very different actions.

Coming up with an interface that has a little more rich error
reporting would probably be better
in the long run. usespace utils could then decide if the error is
something it can handle automatically
or leave alone and let the admin/user handle it.

>
> /proc/self/mountinfo could also work properly with namespaces which
> might have different meaning for a device in a different
> namespace.
>
> Thanks, Kay -- To unsubscribe from this list: send the line
> "unsubscribe linux-fsdevel" in the body of a message to
> [email protected] More majordomo info at
> http://vger.kernel.org/majordomo-info.html

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFKJ9o+NRmM+OaGhBgRAttkAJ9cTSqDMdjTztLu4UMt4PYpG8vB0gCfYBog
3RpJbi2pE+qRSBCUL/ka8bo=
=Nkoe
-----END PGP SIGNATURE-----

2009-06-05 11:09:10

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH 0/4] FS: userspace notification of errors

Kay Sievers wrote:
> And I don't think we want several event sources for the same thing,
> uevents _and_ pollable sysfs files.
>
> We already raise events on /proc/self/mountinfo when the mount tree
> changes, I guess that's where fs specific stuff belongs, and it will
> work with all kind of filesystem setups, regardless of the devices
> below it. This is also the established interface for flags and options
> and the current state of the filesystem, and does not mix filesystem
> options into block device interfaces.
>
> /proc/self/mountinfo could also work properly with namespaces which
> might have different meaning for a device in a different namespace.

Well, Denis suggests /sys/fs instead. But how would we pass stuff like
error code via /proc/self/mountinfo? And what if later some one wants
to provide user-space stuff like bogus inode number? IMO, /sys/fs
sounds better.

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

2009-06-05 11:52:01

by Denis Karpov

[permalink] [raw]
Subject: Re: [PATCH 0/4] FS: userspace notification of errors

On Fri, Jun 05, 2009 at 01:07:59PM +0200, Bityutskiy Artem (Nokia-D/Helsinki) wrote:
> Kay Sievers wrote:
> > And I don't think we want several event sources for the same thing,
> > uevents _and_ pollable sysfs files.
> >
> > We already raise events on /proc/self/mountinfo when the mount tree
> > changes, I guess that's where fs specific stuff belongs, and it will
> > work with all kind of filesystem setups, regardless of the devices
> > below it. This is also the established interface for flags and options
> > and the current state of the filesystem, and does not mix filesystem
> > options into block device interfaces.
> >
> > /proc/self/mountinfo could also work properly with namespaces which
> > might have different meaning for a device in a different namespace.
>
> Well, Denis suggests /sys/fs instead. But how would we pass stuff like
> error code via /proc/self/mountinfo? And what if later some one wants
> to provide user-space stuff like bogus inode number?

This is doable, e.g. in the form of optional fields "tag[:value]"
(field 7, Documentation/filesystems/proc.txti for mountinfo).

Kay, sorry I didn't answer to your email separately. I tried to summarize
and address all the posted comments/critiques in a single email earlier
in this thread.
http://marc.info/?l=linux-kernel&m=124412575828015&w=2

But is using procfs generally a good idea ? Last several years all a lot of
stuff moved out from procfs into sysfs. Not to forget what procfs is
originally meant for: storing the proceses related information.

/proc/self/mountinfo solution:
pros:
- existing solution
cons:
- polling only
- dedicated userspace tool to poll/parse/act
- additional parsing overhead and event filtering (mountinfo changes for many
reasons)
- probably this info does not belong to procfs

/sys/fs/<fs>/<volume>/{attributes,..} solution:
pros:
- nice hierarchy reflecting structure of entities in the kernel
- extensible (other errors, conditions, events can be reflected)
- no parsing: dedicated file for each attribute
- uevent interface with existing userspace tool (udev);
(polling is still possible)
- /sys/fs seems to be a perfect fit for the purpose judging by ext4 example
cons:
- uevent interface is unneeded extra(?); can be made optional, per attribute
- ...

Denis

2009-06-05 13:07:22

by Kay Sievers

[permalink] [raw]
Subject: Re: [PATCH 0/4] FS: userspace notification of errors

On Fri, Jun 5, 2009 at 13:51, Denis Karpov<[email protected]> wrote:
> This is doable, e.g. in the form of optional fields "tag[:value]"
> (field 7, Documentation/filesystems/proc.txti for mountinfo).

> But is using procfs generally a good idea ? Last several years all a lot of
> stuff moved out from procfs into sysfs. Not to forget what procfs is
> originally meant for: storing the proceses related information.

Yeah, but mounted volumes are namespace dependent, and namespaces are
process dependent. So events for your current namespace wouldn't be
too bad here. There might be reasons we don't want the mountinfo file,
but the "use sysfs for new stuff" does not count in this case. :)

> /proc/self/mountinfo solution:
> pros:
> - existing solution
> cons:
> - polling only
> - dedicated userspace tool to poll/parse/act
> - additional parsing overhead and event filtering (mountinfo changes for many
>  reasons)
> - probably this info does not belong to procfs

Userspace polls it today already on most boxes, to find out if and
where something was mounted.

> /sys/fs/<fs>/<volume>/{attributes,..} solution:
> pros:
> - nice hierarchy reflecting structure of entities in the kernel
> - extensible (other errors, conditions, events can be reflected)
> - no parsing: dedicated file for each attribute
> - uevent interface with existing userspace tool (udev);
>  (polling is still possible)

The uevent interface would need a rate limit inside the kernel.
Uevents are very expensive in userspace and you need to make sure,
that such an error reporting can never raise hundreds or thousands of
events, in no situation.

> - /sys/fs seems to be a perfect fit for the purpose judging by ext4 example
> cons:
> - uevent interface is unneeded extra(?); can be made optional, per attribute

You can not pass the mount path with the uevent, like you example
shows, you just don't know that reliably, and there can be many mount
points.

How do you want to name the /sys/fs/ device? By "dev_t st_dev" or the
underlying block device name? How do you indentify the mountpoint in
your current namespace, of the device that raised the error? The event
might be for a filesystem you can not reach at all in your mount tree.

The /sys/fs/ approach sounds very much like an "export known
superblocks in /sys/fs/", something like this could be useful, but we
need to check carefully with other people what are the issues of such
an interface, and if there is something that should not be exported
that way.

How are device-less superblocks like btrfs handled in such an
interface, how is the device named, if it does not have a direct block
device underneath?

In any case, we definitely need something better than dmesg to pass
filesystem errors from the kernel to userspace, so this discussion is
much appreciated.

Thanks,
Kay

2009-06-05 14:59:50

by Jon Masters

[permalink] [raw]
Subject: Re: [PATCH 0/4] FS: userspace notification of errors

On Thu, 2009-06-04 at 14:53 +0200, Kay Sievers wrote:
> On Wed, Jun 3, 2009 at 20:56, Andrew Morton <[email protected]> wrote:
> > On Wed, 3 Jun 2009 18:05:14 +0300 Denis Karpov <[email protected]> wrote:
>
> > hm, I'm uncertain on the desirability or otherwise of the
> > overall feature.

I think the idea is a good one. It allows a distribution to take more
proactive measures before services just start dying randomly later. I
could see this nicely wrapped up with some kind of GUI dialog presented
to the desktop user announcing impending doom, too.

> Uevents should not be used for error reporting

Not as currently implemented - but I think the idea of having kernel
events for this kind of thing isn't a bad one, nor is the idea of
listening on a netlink socket for news about them rather than writing
library code to poll whatever procfs file and parse its content.

Jon.

2009-06-09 13:50:04

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 0/4] FS: userspace notification of errors

> On Fri, Jun 5, 2009 at 13:51, Denis Karpov<[email protected]> wrote:
> > This is doable, e.g. in the form of optional fields "tag[:value]"
> > (field 7, Documentation/filesystems/proc.txti for mountinfo).
>
> > But is using procfs generally a good idea ? Last several years all a lot of
> > stuff moved out from procfs into sysfs. Not to forget what procfs is
> > originally meant for: storing the proceses related information.
>
> Yeah, but mounted volumes are namespace dependent, and namespaces are
> process dependent. So events for your current namespace wouldn't be
> too bad here. There might be reasons we don't want the mountinfo file,
> but the "use sysfs for new stuff" does not count in this case. :)
As much as I don't like the using /proc/mountinfo for this, it also
has the advantage that it nicely solves the problem with a filesystem
being bind-mounted on several directories... Otherwise you have to
solve the problem which mountpoint of the filesystem should be used
(especially because the real root of the filesystem need not be
accessible from the namespace of the process).

> > - /sys/fs seems to be a perfect fit for the purpose judging by ext4 example
> > cons:
> > - uevent interface is unneeded extra(?); can be made optional, per attribute
>
> You can not pass the mount path with the uevent, like you example
> shows, you just don't know that reliably, and there can be many mount
> points.
>
> How do you want to name the /sys/fs/ device? By "dev_t st_dev" or the
> underlying block device name? How do you indentify the mountpoint in
> your current namespace, of the device that raised the error? The event
> might be for a filesystem you can not reach at all in your mount tree.
>
> The /sys/fs/ approach sounds very much like an "export known
> superblocks in /sys/fs/", something like this could be useful, but we
> need to check carefully with other people what are the issues of such
> an interface, and if there is something that should not be exported
> that way.
>
> How are device-less superblocks like btrfs handled in such an
> interface, how is the device named, if it does not have a direct block
> device underneath?
Generally, it is an unclear question how should kernel identify a
filesystem where the problem happened. We could have a kobject for
superblock but it's still unclear how to map this to a device /
mountpoints because that's what userspace ultimately wants to tell the
sysadmin and possibly make some automated decision.
What currently seems as the cleanest solution to me, is to add some
"filesystem identifier" field to /proc/self/mountinfo (which could be
UUID, superblock pointer or whatever) and pass this along with the error
message to userspace. Passing could be done either via sysfs (but I
agree it isn't the best fit because a filesystem need not be bound to a
device) or just via generic netlink (which has the disadvantage that you
cannot use the udev framework everyone knows)...

Honza
--
Jan Kara <[email protected]>
SuSE CR Labs

2009-06-10 21:03:45

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH 4/4] EXT2: add 'notify' mount option

On Wed 2009-06-03 18:05:18, Denis Karpov wrote:
> Implement EXT2 fs mount option 'notify'. The effect of this option
> is that a notification is sent to userspace on errors that indicate
> filesystem damage/inconsistency. Generic filesystem corruption
> notification mechnism is used.

By the time you start checking volume, you may have already damaged
data, right?

(Imagine two inodes pointing to same block, lets say /etc/shadow and
/tmp/foo. Writes to /tmp/foo will now kill your passwords. fsck would
duplicate the blocks, but I do not think internal checking in ext2
would catch it soon enough).

> Signed-off-by: Denis Karpov <[email protected]>
> ---
> fs/ext2/super.c | 15 ++++++++++++++-
> include/linux/ext2_fs.h | 2 +-
> 2 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext2/super.c b/fs/ext2/super.c
> index 5c4afe6..04802cd 100644
> --- a/fs/ext2/super.c
> +++ b/fs/ext2/super.c
> @@ -32,6 +32,7 @@
> #include <linux/mount.h>
> #include <linux/log2.h>
> #include <linux/quotaops.h>
> +#include <linux/genhd.h>
> #include <asm/uaccess.h>
> #include "ext2.h"
> #include "xattr.h"
> @@ -68,6 +69,8 @@ void ext2_error (struct super_block * sb, const char * function,
> printk("Remounting filesystem read-only\n");
> sb->s_flags |= MS_RDONLY;
> }
> + if (test_opt(sb, ERR_NOTIFY))
> + notify_part_fs_unclean(part_to_dev(sb->s_bdev->bd_part), 1);
> }
>
> void ext2_warning (struct super_block * sb, const char * function,
> @@ -81,6 +84,8 @@ void ext2_warning (struct super_block * sb, const char * function,
> vprintk(fmt, args);
> printk("\n");
> va_end(args);
> + if (test_opt(sb, ERR_NOTIFY))
> + notify_part_fs_unclean(part_to_dev(sb->s_bdev->bd_part), 1);
> }
>
> void ext2_update_dynamic_rev(struct super_block *sb)
> @@ -289,6 +294,9 @@ static int ext2_show_options(struct seq_file *seq, struct vfsmount *vfs)
> if (!test_opt(sb, RESERVATION))
> seq_puts(seq, ",noreservation");
>
> + if (!test_opt(sb, ERR_NOTIFY))
> + seq_puts(seq, ",notify");
> +
> return 0;
> }
>
> @@ -391,7 +399,8 @@ enum {
> Opt_err_ro, Opt_nouid32, Opt_nocheck, Opt_debug,
> Opt_oldalloc, Opt_orlov, Opt_nobh, Opt_user_xattr, Opt_nouser_xattr,
> Opt_acl, Opt_noacl, Opt_xip, Opt_ignore, Opt_err, Opt_quota,
> - Opt_usrquota, Opt_grpquota, Opt_reservation, Opt_noreservation
> + Opt_usrquota, Opt_grpquota, Opt_reservation, Opt_noreservation,
> + Opt_err_notify,
> };
>
> static const match_table_t tokens = {
> @@ -425,6 +434,7 @@ static const match_table_t tokens = {
> {Opt_usrquota, "usrquota"},
> {Opt_reservation, "reservation"},
> {Opt_noreservation, "noreservation"},
> + {Opt_err_notify, "notify"},
> {Opt_err, NULL}
> };
>
> @@ -565,6 +575,9 @@ static int parse_options (char * options,
> clear_opt(sbi->s_mount_opt, RESERVATION);
> printk("reservations OFF\n");
> break;
> + case Opt_err_notify:
> + set_opt(sbi->s_mount_opt, ERR_NOTIFY);
> + break;
> case Opt_ignore:
> break;
> default:
> diff --git a/include/linux/ext2_fs.h b/include/linux/ext2_fs.h
> index 121720d..ecec20b 100644
> --- a/include/linux/ext2_fs.h
> +++ b/include/linux/ext2_fs.h
> @@ -347,7 +347,7 @@ struct ext2_inode {
> #define EXT2_MOUNT_USRQUOTA 0x020000 /* user quota */
> #define EXT2_MOUNT_GRPQUOTA 0x040000 /* group quota */
> #define EXT2_MOUNT_RESERVATION 0x080000 /* Preallocation */
> -
> +#define EXT2_MOUNT_ERR_NOTIFY 0x100000 /* Error notifications */
>
> #define clear_opt(o, opt) o &= ~EXT2_MOUNT_##opt
> #define set_opt(o, opt) o |= EXT2_MOUNT_##opt

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-06-10 21:04:09

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH 0/4] FS: userspace notification of errors


> > > One part of the design which you didn't describe, but which I inferred
> > > is that you intend that userspace will see the FS_UNCLEAN=1 messages
> > > and will then poll all the /sys/block/<bdev>/<part>/fs_unclean files to
> > > work out which partition(s) got the error, correct? Please spell all
> > > that out in the changelog.
> >
> > I think this part of the design needs more thought. Not
> > all FSes have block devices (UBIFS, JFFS2), and some FSes
> > may (theoretically) span more than one block device (btrfs?).
>
> Big thanks to everybody participating in this thread, for reviews and critiques.
> Here's a proposal/RFC for another way to implement this feature:
>
> Taking into account Artem's and Kay's comments, indeed, having attributes
> like 'fs_error' tied to a block device does not seem right.
> What we need is an object/entity that:
>
> - is not associated to a block device
> - is not associated to a partition
> - is not associated to a filesystem as a general entity
> - is uniquely associated to a filesystem's 'instance': a mounted volume
> carying that filesystem
> - apperas at volume mount time and disappears with volume unmount

Add a ",errors " at the end of line to /proc/mounts when error is
detected? (...and make /proc/mounts pollable?)

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-06-10 21:05:28

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH 0/4] FS: userspace notification of errors

On Thu 2009-06-04 08:57:58, Artem Bityutskiy wrote:
> Andrew Morton wrote:
>> hm, I'm uncertain on the desirability or otherwise of the overall feature.
>>
>> Are there users or distros or device manufacturers asking for this?
>> Where did the requirement come from?
>>
>> What downstream application will handle the uevent messages? Do you
>> have some userspace design/plan in mind?
>>
>> IOW, it would be useful if we were told more about all of this, rather
>> than just staring at a kernel patch!
>
> As the original idea came from me, while whole implementation
> and design was done by Denis, I'll comment on this.
>
> Our use-case is about hand-held devices. We are particularly
> working with large FAT volumes on MMC. Do not question please
> why it is FAT and not something else :-) Anyway, FAT is very
> unreliable, and often hits errors, in which case it simply
> switches to read-only mode, and usually prints something to
> the printk ring buffer.

So fsck the mmc card on card insertion...? fsck.vfat on flash is
pretty fast operation (as it onl needs to read directories +
FATs). Android 1.5 implements this.

...otherwise you will loose data on two files sharing clusters, will
never recover lost clusters, etc.

Perhaps it would be feasible to permit read-only mount of unclean
VFAT, run fsck in background, and buffer any changes in memory until
fsck finishes?

Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html