2021-05-21 09:24:01

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 00/11] File system wide monitoring

Hi,

This series follow up on my previous proposal [1] to support file system
wide monitoring. As suggested by Amir, this proposal drops the ring
buffer in favor of a single slot associated with each mark. This
simplifies a bit the implementation, as you can see in the code.

As a reminder, This proposal is limited to an interface for
administrators to monitor the health of a file system, instead of a
generic inteface for file errors. Therefore, this doesn't solve the
problem of writeback errors or the need to watch a specific subtree.

In comparison to the previous RFC, this implementation also drops the
per-fs data and location, and leave those as future extensions.

* Implementation

The feature is implemented on top of fanotify, as a new type of fanotify
mark, FAN_ERROR, which a file system monitoring tool can register to
receive error notifications. When an error occurs a new notification is
generated, in addition followed by this info field:

- FS generic data: A file system agnostic structure that has a generic
error code and identifies the filesystem. Basically, it let's
userspace know something happened on a monitored filesystem. Since
only the first error is recorded since the last read, this also
includes a counter of errors that happened since the last read.

* Testing

This was tested by watching notifications flowing from an intentionally
corrupted filesystem in different places. In addition, other events
were watched in an attempt to detect regressions.

Is there a specific testsuite for fanotify I should be running?

* Patches

This patchset is divided as follows: Patch 1 through 5 are refactoring
to fsnotify/fanotify in preparation for FS_ERROR/FAN_ERROR; patch 6 and
7 implement the FS_ERROR API for filesystems to report error; patch 8
add support for FAN_ERROR in fanotify; Patch 9 is an example
implementation for ext4; patch 10 and 11 provide a sample userspace code
and documentation.

I also pushed the full series to:

https://gitlab.collabora.com/krisman/linux -b fanotify-notifications-single-slot

[1] https://lwn.net/Articles/854545/

Cc: Darrick J. Wong <[email protected]>
Cc: Theodore Ts'o <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: [email protected]
To: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

Gabriel Krisman Bertazi (11):
fanotify: Fold event size calculation to its own function
fanotify: Split fsid check from other fid mode checks
fanotify: Simplify directory sanity check in DFID_NAME mode
fanotify: Expose fanotify_mark
inotify: Don't force FS_IN_IGNORED
fsnotify: Support FS_ERROR event type
fsnotify: Introduce helpers to send error_events
fanotify: Introduce FAN_ERROR event
ext4: Send notifications on error
samples: Add fs error monitoring example
Documentation: Document the FAN_ERROR event

.../admin-guide/filesystem-monitoring.rst | 52 +++++
Documentation/admin-guide/index.rst | 1 +
fs/ext4/super.c | 8 +
fs/notify/fanotify/fanotify.c | 80 ++++++-
fs/notify/fanotify/fanotify.h | 38 +++-
fs/notify/fanotify/fanotify_user.c | 213 ++++++++++++++----
fs/notify/inotify/inotify_user.c | 6 +-
include/linux/fanotify.h | 6 +-
include/linux/fsnotify.h | 13 ++
include/linux/fsnotify_backend.h | 15 +-
include/uapi/linux/fanotify.h | 10 +
samples/Kconfig | 8 +
samples/Makefile | 1 +
samples/fanotify/Makefile | 3 +
samples/fanotify/fs-monitor.c | 91 ++++++++
15 files changed, 485 insertions(+), 60 deletions(-)
create mode 100644 Documentation/admin-guide/filesystem-monitoring.rst
create mode 100644 samples/fanotify/Makefile
create mode 100644 samples/fanotify/fs-monitor.c

--
2.31.0


2021-05-21 09:24:06

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 02/11] fanotify: Split fsid check from other fid mode checks

FAN_ERROR will require fsid, but not necessarily require the filesystem
to expose a file handle. Split those checks into different functions, so
they can be used separately when setting up an event.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

Changes since v1:
(Amir)
- Rename fanotify_check_path_fsid -> fanotify_test_fsid
- Use dentry directly instead of path
---
fs/notify/fanotify/fanotify_user.c | 43 ++++++++++++++++++------------
1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 3ccdee3c9f1e..9cc6c8808ed5 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1178,15 +1178,31 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
}

/* Check if filesystem can encode a unique fid */
-static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid)
+static int fanotify_test_fid(struct dentry *dentry)
+{
+ /*
+ * We need to make sure that the file system supports at least
+ * encoding a file handle so user can use name_to_handle_at() to
+ * compare fid returned with event to the file handle of watched
+ * objects. However, name_to_handle_at() requires that the
+ * filesystem also supports decoding file handles.
+ */
+ if (!dentry->d_sb->s_export_op ||
+ !dentry->d_sb->s_export_op->fh_to_dentry)
+ return -EOPNOTSUPP;
+
+ return 0;
+}
+
+static int fanotify_test_fsid(struct dentry *dentry, __kernel_fsid_t *fsid)
{
__kernel_fsid_t root_fsid;
int err;

/*
- * Make sure path is not in filesystem with zero fsid (e.g. tmpfs).
+ * Make sure dentry is not of a filesystem with zero fsid (e.g. tmpfs).
*/
- err = vfs_get_fsid(path->dentry, fsid);
+ err = vfs_get_fsid(dentry, fsid);
if (err)
return err;

@@ -1194,10 +1210,10 @@ static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid)
return -ENODEV;

/*
- * Make sure path is not inside a filesystem subvolume (e.g. btrfs)
+ * Make sure dentry is not of a filesystem subvolume (e.g. btrfs)
* which uses a different fsid than sb root.
*/
- err = vfs_get_fsid(path->dentry->d_sb->s_root, &root_fsid);
+ err = vfs_get_fsid(dentry->d_sb->s_root, &root_fsid);
if (err)
return err;

@@ -1205,17 +1221,6 @@ static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid)
root_fsid.val[1] != fsid->val[1])
return -EXDEV;

- /*
- * We need to make sure that the file system supports at least
- * encoding a file handle so user can use name_to_handle_at() to
- * compare fid returned with event to the file handle of watched
- * objects. However, name_to_handle_at() requires that the
- * filesystem also supports decoding file handles.
- */
- if (!path->dentry->d_sb->s_export_op ||
- !path->dentry->d_sb->s_export_op->fh_to_dentry)
- return -EOPNOTSUPP;
-
return 0;
}

@@ -1362,7 +1367,11 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
}

if (fid_mode) {
- ret = fanotify_test_fid(&path, &__fsid);
+ ret = fanotify_test_fsid(path.dentry, &__fsid);
+ if (ret)
+ goto path_put_and_out;
+
+ ret = fanotify_test_fid(path.dentry);
if (ret)
goto path_put_and_out;

--
2.31.0

2021-05-21 09:24:06

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 03/11] fanotify: Simplify directory sanity check in DFID_NAME mode

The only fid_mode where the directory inode is reported is
FAN_REPORT_DFID_NAME. So remove the negative logic and make it more
straightforward.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/notify/fanotify/fanotify.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 057abd2cf887..711b36a9483e 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -276,7 +276,7 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
/* Path type events are only relevant for files and dirs */
if (!d_is_reg(path->dentry) && !d_can_lookup(path->dentry))
return 0;
- } else if (!(fid_mode & FAN_REPORT_FID)) {
+ } else if (fid_mode & FAN_REPORT_DFID_NAME) {
/* Do we have a directory inode to report? */
if (!dir && !(event_mask & FS_ISDIR))
return 0;
--
2.31.0

2021-05-21 09:24:15

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 04/11] fanotify: Expose fanotify_mark

FAN_ERROR will require an error structure to be stored per mark.
Therefore, wrap fsnotify_mark in a fanotify specific structure in
preparation for that.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/notify/fanotify/fanotify.c | 4 +++-
fs/notify/fanotify/fanotify.h | 10 ++++++++++
fs/notify/fanotify/fanotify_user.c | 14 +++++++-------
3 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 711b36a9483e..34e2ee759b39 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -869,7 +869,9 @@ static void fanotify_freeing_mark(struct fsnotify_mark *mark,

static void fanotify_free_mark(struct fsnotify_mark *fsn_mark)
{
- kmem_cache_free(fanotify_mark_cache, fsn_mark);
+ struct fanotify_mark *mark = FANOTIFY_MARK(fsn_mark);
+
+ kmem_cache_free(fanotify_mark_cache, mark);
}

const struct fsnotify_ops fanotify_fsnotify_ops = {
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 4a5e555dc3d2..a399c5e2615d 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -129,6 +129,16 @@ static inline void fanotify_info_copy_name(struct fanotify_info *info,
name->name);
}

+struct fanotify_mark {
+ struct fsnotify_mark fsn_mark;
+ struct fanotify_error_event *error_event;
+};
+
+static inline struct fanotify_mark *FANOTIFY_MARK(struct fsnotify_mark *mark)
+{
+ return container_of(mark, struct fanotify_mark, fsn_mark);
+}
+
/*
* Common structure for fanotify events. Concrete structs are allocated in
* fanotify_handle_event() and freed when the information is retrieved by
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 9cc6c8808ed5..00210535a78e 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -914,7 +914,7 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
__kernel_fsid_t *fsid)
{
struct ucounts *ucounts = group->fanotify_data.ucounts;
- struct fsnotify_mark *mark;
+ struct fanotify_mark *mark;
int ret;

/*
@@ -926,20 +926,20 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
!inc_ucount(ucounts->ns, ucounts->uid, UCOUNT_FANOTIFY_MARKS))
return ERR_PTR(-ENOSPC);

- mark = kmem_cache_alloc(fanotify_mark_cache, GFP_KERNEL);
+ mark = kmem_cache_zalloc(fanotify_mark_cache, GFP_KERNEL);
if (!mark) {
ret = -ENOMEM;
goto out_dec_ucounts;
}

- fsnotify_init_mark(mark, group);
- ret = fsnotify_add_mark_locked(mark, connp, type, 0, fsid);
+ fsnotify_init_mark(&mark->fsn_mark, group);
+ ret = fsnotify_add_mark_locked(&mark->fsn_mark, connp, type, 0, fsid);
if (ret) {
- fsnotify_put_mark(mark);
+ fsnotify_put_mark(&mark->fsn_mark);
goto out_dec_ucounts;
}

- return mark;
+ return &mark->fsn_mark;

out_dec_ucounts:
if (!FAN_GROUP_FLAG(group, FAN_UNLIMITED_MARKS))
@@ -1477,7 +1477,7 @@ static int __init fanotify_user_setup(void)
BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10);
BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);

- fanotify_mark_cache = KMEM_CACHE(fsnotify_mark,
+ fanotify_mark_cache = KMEM_CACHE(fanotify_mark,
SLAB_PANIC|SLAB_ACCOUNT);
fanotify_fid_event_cachep = KMEM_CACHE(fanotify_fid_event,
SLAB_PANIC);
--
2.31.0

2021-05-21 09:26:17

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 06/11] fsnotify: Support FS_ERROR event type

Expose a new type of fsnotify event for filesystems to report errors for
userspace monitoring tools. fanotify will send this type of
notification for FAN_ERROR events.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

---
Changes since v1:
- Overload FS_ERROR with FS_IN_IGNORED
---
include/linux/fsnotify_backend.h | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 1ce66748a2d2..bbef2df3fbc7 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -42,6 +42,12 @@

#define FS_UNMOUNT 0x00002000 /* inode on umount fs */
#define FS_Q_OVERFLOW 0x00004000 /* Event queued overflowed */
+#define FS_ERROR 0x00008000 /* Filesystem Error (fanotify) */
+
+/*
+ * FS_IN_IGNORED overloads FS_ERROR. It is only used internally by inotify
+ * which does not support FS_ERROR.
+ */
#define FS_IN_IGNORED 0x00008000 /* last inotify event here */

#define FS_OPEN_PERM 0x00010000 /* open event in an permission hook */
@@ -95,7 +101,8 @@
#define ALL_FSNOTIFY_EVENTS (ALL_FSNOTIFY_DIRENT_EVENTS | \
FS_EVENTS_POSS_ON_CHILD | \
FS_DELETE_SELF | FS_MOVE_SELF | FS_DN_RENAME | \
- FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED)
+ FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED | \
+ FS_ERROR)

/* Extra flags that may be reported with event or control handling of events */
#define ALL_FSNOTIFY_FLAGS (FS_EXCL_UNLINK | FS_ISDIR | FS_IN_ONESHOT | \
@@ -248,6 +255,12 @@ enum fsnotify_data_type {
FSNOTIFY_EVENT_NONE,
FSNOTIFY_EVENT_PATH,
FSNOTIFY_EVENT_INODE,
+ FSNOTIFY_EVENT_ERROR,
+};
+
+struct fs_error_report {
+ int error;
+ struct inode *inode;
};

static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
--
2.31.0

2021-05-21 09:26:25

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 07/11] fsnotify: Introduce helpers to send error_events

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

---
Changes since v2:
- Use the inode argument (Amir)
---
include/linux/fsnotify.h | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index f8acddcf54fb..458e4feb5fe1 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -317,4 +317,17 @@ static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
fsnotify_dentry(dentry, mask);
}

+static inline void fsnotify_error_event(struct super_block *sb, struct inode *inode,
+ int error)
+{
+ if (sb->s_fsnotify_marks) {
+ struct fs_error_report report = {
+ .error = error,
+ .inode = inode,
+ };
+ fsnotify(FS_ERROR, &report, FSNOTIFY_EVENT_ERROR, NULL, NULL,
+ sb->s_root->d_inode, 0);
+ }
+}
+
#endif /* _LINUX_FS_NOTIFY_H */
--
2.31.0

2021-05-21 09:26:26

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 08/11] fanotify: Introduce FAN_ERROR event

The FAN_ERROR event is used by filesystem wide monitoring tools to
receive notifications of type FS_ERROR_EVENT, emited by filesystems when
a problem is detected. The error notification includes a generic error
descriptor.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

Changes since v1:
- Pass dentry to fanotify_check_fsid (Amir)
- FANOTIFY_EVENT_TYPE_ERROR -> FANOTIFY_EVENT_TYPE_FS_ERROR
- Merge previous patch into it
- Use a single slot
---
fs/notify/fanotify/fanotify.c | 74 ++++++++++++++++-
fs/notify/fanotify/fanotify.h | 28 ++++++-
fs/notify/fanotify/fanotify_user.c | 123 ++++++++++++++++++++++++++---
include/linux/fanotify.h | 6 +-
include/uapi/linux/fanotify.h | 10 +++
5 files changed, 225 insertions(+), 16 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 34e2ee759b39..197291a8c41d 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -269,7 +269,7 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
pr_debug("%s: report_mask=%x mask=%x data=%p data_type=%d\n",
__func__, iter_info->report_mask, event_mask, data, data_type);

- if (!fid_mode) {
+ if (!fid_mode && data_type != FSNOTIFY_EVENT_ERROR) {
/* Do we have path to open a file descriptor? */
if (!path)
return 0;
@@ -657,6 +657,51 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
return event;
}

+static void fanotify_init_error_event(struct fanotify_error_event *event,
+ __kernel_fsid_t fsid,
+ const struct fs_error_report *report)
+{
+ event->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR;
+ event->err_count = 1;
+ event->fsid = fsid;
+ event->error = report->error;
+ event->ino = (report->inode) ? report->inode->i_ino : 0;
+}
+
+static int fanotify_queue_error_event(struct fsnotify_iter_info *iter_info,
+ struct fsnotify_group *group,
+ __kernel_fsid_t fsid,
+ const struct fs_error_report *report)
+{
+ struct fanotify_mark *mark;
+ int type;
+ int ret = -ENOMEM;
+
+ fsnotify_foreach_obj_type(type) {
+ if (!fsnotify_iter_should_report_type(iter_info, type))
+ continue;
+ mark = FANOTIFY_MARK(iter_info->marks[type]);
+ }
+
+ spin_lock(&mark->fsn_mark.lock);
+ if (mark->error_event) {
+ if (list_empty(&mark->error_event->fae.fse.list)) {
+ fsnotify_get_mark(&mark->fsn_mark);
+ fanotify_init_error_event(mark->error_event, fsid, report);
+ ret = fsnotify_add_event(group, &mark->error_event->fae.fse,
+ NULL, NULL);
+ if (ret)
+ fsnotify_put_mark(&mark->fsn_mark);
+ } else {
+ mark->error_event->err_count++;
+ ret = 0;
+ }
+ }
+ spin_unlock(&mark->fsn_mark.lock);
+
+ return ret;
+}
+
/*
* Get cached fsid of the filesystem containing the object from any connector.
* All connectors are supposed to have the same fsid, but we do not verify that
@@ -738,8 +783,9 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
BUILD_BUG_ON(FAN_ONDIR != FS_ISDIR);
BUILD_BUG_ON(FAN_OPEN_EXEC != FS_OPEN_EXEC);
BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM);
+ BUILD_BUG_ON(FAN_ERROR != FS_ERROR);

- BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19);
+ BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20);

mask = fanotify_group_event_mask(group, iter_info, mask, data,
data_type, dir);
@@ -757,13 +803,20 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
return 0;
}

- if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS)) {
+ if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS) || fanotify_is_error_event(mask)) {
fsid = fanotify_get_fsid(iter_info);
/* Racing with mark destruction or creation? */
if (!fsid.val[0] && !fsid.val[1])
return 0;
}

+ if (fanotify_is_error_event(mask)) {
+ ret = fanotify_queue_error_event(iter_info, group, fsid, data);
+ if (ret)
+ fsnotify_queue_overflow(group);
+ goto finish;
+ }
+
event = fanotify_alloc_event(group, mask, data, data_type, dir,
file_name, &fsid);
ret = -ENOMEM;
@@ -833,6 +886,17 @@ static void fanotify_free_name_event(struct fanotify_event *event)
kfree(FANOTIFY_NE(event));
}

+static void fanotify_free_error_event(struct fanotify_event *event)
+{
+ /*
+ * Just drop the reference acquired by
+ * fanotify_queue_error_event.
+ *
+ * The actual memory is freed with the mark.
+ */
+ fsnotify_put_mark(&(FANOTIFY_EE(event)->mark->fsn_mark));
+}
+
static void fanotify_free_event(struct fsnotify_event *fsn_event)
{
struct fanotify_event *event;
@@ -855,6 +919,9 @@ static void fanotify_free_event(struct fsnotify_event *fsn_event)
case FANOTIFY_EVENT_TYPE_OVERFLOW:
kfree(event);
break;
+ case FANOTIFY_EVENT_TYPE_FS_ERROR:
+ fanotify_free_error_event(event);
+ break;
default:
WARN_ON_ONCE(1);
}
@@ -871,6 +938,7 @@ static void fanotify_free_mark(struct fsnotify_mark *fsn_mark)
{
struct fanotify_mark *mark = FANOTIFY_MARK(fsn_mark);

+ kfree(mark->error_event);
kmem_cache_free(fanotify_mark_cache, mark);
}

diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index a399c5e2615d..ebe9e593dfbf 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -151,6 +151,7 @@ enum fanotify_event_type {
FANOTIFY_EVENT_TYPE_PATH,
FANOTIFY_EVENT_TYPE_PATH_PERM,
FANOTIFY_EVENT_TYPE_OVERFLOW, /* struct fanotify_event */
+ FANOTIFY_EVENT_TYPE_FS_ERROR, /* struct fanotify_error_event */
__FANOTIFY_EVENT_TYPE_NUM
};

@@ -206,12 +207,31 @@ FANOTIFY_NE(struct fanotify_event *event)
return container_of(event, struct fanotify_name_event, fae);
}

+struct fanotify_error_event {
+ struct fanotify_event fae;
+ __kernel_fsid_t fsid;
+ unsigned long ino;
+ int error;
+ u32 err_count;
+
+ /* Back reference to the mark this error refers to. */
+ struct fanotify_mark *mark;
+};
+
+static inline struct fanotify_error_event *
+FANOTIFY_EE(struct fanotify_event *event)
+{
+ return container_of(event, struct fanotify_error_event, fae);
+}
+
static inline __kernel_fsid_t *fanotify_event_fsid(struct fanotify_event *event)
{
if (event->type == FANOTIFY_EVENT_TYPE_FID)
return &FANOTIFY_FE(event)->fsid;
else if (event->type == FANOTIFY_EVENT_TYPE_FID_NAME)
return &FANOTIFY_NE(event)->fsid;
+ else if (event->type == FANOTIFY_EVENT_TYPE_FS_ERROR)
+ return &FANOTIFY_EE(event)->fsid;
else
return NULL;
}
@@ -297,6 +317,11 @@ static inline struct fanotify_event *FANOTIFY_E(struct fsnotify_event *fse)
return container_of(fse, struct fanotify_event, fse);
}

+static inline bool fanotify_is_error_event(u32 mask)
+{
+ return mask & FANOTIFY_ERROR_EVENTS;
+}
+
static inline bool fanotify_event_has_path(struct fanotify_event *event)
{
return event->type == FANOTIFY_EVENT_TYPE_PATH ||
@@ -325,7 +350,8 @@ static inline struct path *fanotify_event_path(struct fanotify_event *event)
*/
static inline bool fanotify_is_hashed_event(u32 mask)
{
- return !fanotify_is_perm_event(mask) && !(mask & FS_Q_OVERFLOW);
+ return (!fanotify_is_perm_event(mask) && !fanotify_is_error_event(mask)
+ && !(mask & FS_Q_OVERFLOW));
}

static inline unsigned int fanotify_event_hash_bucket(
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 00210535a78e..ea9b9f8f7c21 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -106,6 +106,8 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
#define FANOTIFY_EVENT_ALIGN 4
#define FANOTIFY_INFO_HDR_LEN \
(sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle))
+#define FANOTIFY_INFO_ERROR_LEN \
+ (sizeof(struct fanotify_event_info_error))

static int fanotify_fid_info_len(int fh_len, int name_len)
{
@@ -126,6 +128,9 @@ static size_t fanotify_event_len(struct fanotify_event *event,
int fh_len;
int dot_len = 0;

+ if (fanotify_is_error_event(event->mask))
+ return event_len + FANOTIFY_INFO_ERROR_LEN;
+
if (!fid_mode)
return event_len;

@@ -149,6 +154,30 @@ static size_t fanotify_event_len(struct fanotify_event *event,
return event_len;
}

+static struct fanotify_event *fanotify_dequeue_error_event(struct fsnotify_group *group,
+ struct fanotify_event *event,
+ struct fanotify_error_event *error_event)
+{
+ struct fsnotify_mark *mark = &(FANOTIFY_EE(event)->mark->fsn_mark);
+ /*
+ * In order to avoid missing an error count update, the
+ * queued event is de-queued and duplicated to an
+ * in-stack fanotify_error_event while still inside
+ * mark->lock. Once the event is dequeued, it can be
+ * immediately re-used for a new event.
+ *
+ * The ownership of the mark reference is dropped later
+ * by destroy_event.
+ */
+ spin_lock(&mark->lock);
+ memcpy(error_event, FANOTIFY_EE(event), sizeof(*error_event));
+ fsnotify_init_event(&error_event->fae.fse);
+ fsnotify_remove_queued_event(group, &event->fse);
+ spin_unlock(&mark->lock);
+
+ return &error_event->fae;
+}
+
/*
* Remove an hashed event from merge hash table.
*/
@@ -173,7 +202,8 @@ static void fanotify_unhash_event(struct fsnotify_group *group,
* updated accordingly.
*/
static struct fanotify_event *get_one_event(struct fsnotify_group *group,
- size_t count)
+ size_t count,
+ struct fanotify_error_event *error_event)
{
size_t event_size;
struct fanotify_event *event = NULL;
@@ -197,9 +227,14 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group,

/*
* Held the notification_lock the whole time, so this is the
- * same event we peeked above.
+ * same event we peeked above, unless it is copied to
+ * error_event.
*/
- fsnotify_remove_first_event(group);
+ if (fanotify_is_error_event(event->mask))
+ event = fanotify_dequeue_error_event(group, event, error_event);
+ else
+ fsnotify_remove_first_event(group);
+
if (fanotify_is_perm_event(event->mask))
FANOTIFY_PERM(event)->state = FAN_EVENT_REPORTED;
if (fanotify_is_hashed_event(event->mask))
@@ -309,6 +344,30 @@ static int process_access_response(struct fsnotify_group *group,
return -ENOENT;
}

+static size_t copy_error_info_to_user(struct fanotify_event *event,
+ char __user *buf, int count)
+{
+ struct fanotify_event_info_error info;
+ struct fanotify_error_event *fee = FANOTIFY_EE(event);
+
+ info.hdr.info_type = FAN_EVENT_INFO_TYPE_ERROR;
+ info.hdr.pad = 0;
+ info.hdr.len = sizeof(struct fanotify_event_info_error);
+
+ if (WARN_ON(count < info.hdr.len))
+ return -EFAULT;
+
+ info.fsid = fee->fsid;
+ info.error = fee->error;
+ info.inode = fee->ino;
+ info.error_count = fee->err_count;
+
+ if (copy_to_user(buf, &info, sizeof(info)))
+ return -EFAULT;
+
+ return info.hdr.len;
+}
+
static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh,
int info_type, const char *name, size_t name_len,
char __user *buf, size_t count)
@@ -523,6 +582,14 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
count -= ret;
}

+ if (fanotify_is_error_event(event->mask)) {
+ ret = copy_error_info_to_user(event, buf, count);
+ if (ret < 0)
+ return ret;
+ buf += ret;
+ count -= ret;
+ }
+
return metadata.event_len;

out_close_fd:
@@ -553,6 +620,7 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
{
struct fsnotify_group *group;
struct fanotify_event *event;
+ struct fanotify_error_event error_event;
char __user *start;
int ret;
DEFINE_WAIT_FUNC(wait, woken_wake_function);
@@ -569,7 +637,7 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
* in case there are lots of available events.
*/
cond_resched();
- event = get_one_event(group, count);
+ event = get_one_event(group, count, &error_event);
if (IS_ERR(event)) {
ret = PTR_ERR(event);
break;
@@ -888,16 +956,33 @@ static int fanotify_remove_inode_mark(struct fsnotify_group *group,
flags, umask);
}

-static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark,
- __u32 mask,
- unsigned int flags)
+static int fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark,
+ __u32 mask, unsigned int flags,
+ __u32 *modified_mask)
{
+ struct fanotify_mark *mark = FANOTIFY_MARK(fsn_mark);
+ struct fanotify_error_event *error_event = NULL;
+ bool addition = !(flags & FAN_MARK_IGNORED_MASK);
__u32 oldmask = -1;

+ /* Only pre-alloc error_event if needed. */
+ if (addition && (mask & FAN_ERROR) && !mark->error_event) {
+ error_event = kzalloc(sizeof(*error_event), GFP_KERNEL);
+ if (!error_event)
+ return -ENOMEM;
+ fanotify_init_event(&error_event->fae, 0, FS_ERROR);
+ error_event->mark = mark;
+ }
+
spin_lock(&fsn_mark->lock);
- if (!(flags & FAN_MARK_IGNORED_MASK)) {
+ if (addition) {
oldmask = fsn_mark->mask;
fsn_mark->mask |= mask;
+
+ if (!mark->error_event) {
+ mark->error_event = error_event;
+ error_event = NULL;
+ }
} else {
fsn_mark->ignored_mask |= mask;
if (flags & FAN_MARK_IGNORED_SURV_MODIFY)
@@ -905,7 +990,11 @@ static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark,
}
spin_unlock(&fsn_mark->lock);

- return mask & ~oldmask;
+ kfree(error_event);
+
+ *modified_mask = mask & ~oldmask;
+ return 0;
+
}

static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
@@ -955,6 +1044,7 @@ static int fanotify_add_mark(struct fsnotify_group *group,
{
struct fsnotify_mark *fsn_mark;
__u32 added;
+ int ret = 0;

mutex_lock(&group->mark_mutex);
fsn_mark = fsnotify_find_mark(connp, group);
@@ -965,13 +1055,18 @@ static int fanotify_add_mark(struct fsnotify_group *group,
return PTR_ERR(fsn_mark);
}
}
- added = fanotify_mark_add_to_mask(fsn_mark, mask, flags);
+ ret = fanotify_mark_add_to_mask(fsn_mark, mask, flags, &added);
+ if (ret)
+ goto out;
+
if (added & ~fsnotify_conn_mask(fsn_mark->connector))
fsnotify_recalc_mask(fsn_mark->connector);
+
+out:
mutex_unlock(&group->mark_mutex);

fsnotify_put_mark(fsn_mark);
- return 0;
+ return ret;
}

static int fanotify_add_vfsmount_mark(struct fsnotify_group *group,
@@ -1377,6 +1472,12 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,

fsid = &__fsid;
}
+ if (mask & FAN_ERROR) {
+ ret = fanotify_test_fsid(path.dentry, &__fsid);
+ if (ret)
+ goto path_put_and_out;
+ fsid = &__fsid;
+ }

/* inode held in place by reference to path; group by fget on fd */
if (mark_type == FAN_MARK_INODE)
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index bad41bcb25df..05c929d588e4 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -81,9 +81,12 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */
#define FANOTIFY_INODE_EVENTS (FANOTIFY_DIRENT_EVENTS | \
FAN_ATTRIB | FAN_MOVE_SELF | FAN_DELETE_SELF)

+#define FANOTIFY_ERROR_EVENTS (FAN_ERROR)
+
/* Events that user can request to be notified on */
#define FANOTIFY_EVENTS (FANOTIFY_PATH_EVENTS | \
- FANOTIFY_INODE_EVENTS)
+ FANOTIFY_INODE_EVENTS | \
+ FANOTIFY_ERROR_EVENTS)

/* Events that require a permission response from user */
#define FANOTIFY_PERM_EVENTS (FAN_OPEN_PERM | FAN_ACCESS_PERM | \
@@ -95,6 +98,7 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */
/* Events that may be reported to user */
#define FANOTIFY_OUTGOING_EVENTS (FANOTIFY_EVENTS | \
FANOTIFY_PERM_EVENTS | \
+ FANOTIFY_ERROR_EVENTS | \
FAN_Q_OVERFLOW | FAN_ONDIR)

#define ALL_FANOTIFY_EVENT_BITS (FANOTIFY_OUTGOING_EVENTS | \
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index fbf9c5c7dd59..e3920597112f 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -20,6 +20,7 @@
#define FAN_OPEN_EXEC 0x00001000 /* File was opened for exec */

#define FAN_Q_OVERFLOW 0x00004000 /* Event queued overflowed */
+#define FAN_ERROR 0x00008000 /* Filesystem error */

#define FAN_OPEN_PERM 0x00010000 /* File open in perm check */
#define FAN_ACCESS_PERM 0x00020000 /* File accessed in perm check */
@@ -123,6 +124,7 @@ struct fanotify_event_metadata {
#define FAN_EVENT_INFO_TYPE_FID 1
#define FAN_EVENT_INFO_TYPE_DFID_NAME 2
#define FAN_EVENT_INFO_TYPE_DFID 3
+#define FAN_EVENT_INFO_TYPE_ERROR 4

/* Variable length info record following event metadata */
struct fanotify_event_info_header {
@@ -148,6 +150,14 @@ struct fanotify_event_info_fid {
unsigned char handle[0];
};

+struct fanotify_event_info_error {
+ struct fanotify_event_info_header hdr;
+ int error;
+ __kernel_fsid_t fsid;
+ unsigned long inode;
+ __u32 error_count;
+};
+
struct fanotify_response {
__s32 fd;
__u32 response;
--
2.31.0

2021-05-21 09:26:28

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 09/11] ext4: Send notifications on error

Send a FS_ERROR message via fsnotify to a userspace monitoring tool
whenever a ext4 error condition is triggered. This follows the existing
error conditions in ext4, so it is hooked to the ext4_error* functions.

It also follows the current dmesg reporting in the format. The
filesystem message is composed mostly by the string that would be
otherwise printed in dmesg.

A new ext4 specific record format is exposed in the uapi, such that a
monitoring tool knows what to expect when listening errors of an ext4
filesystem.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/ext4/super.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 7dc94f3e18e6..a8c0ac2c3e4c 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -46,6 +46,7 @@
#include <linux/part_stat.h>
#include <linux/kthread.h>
#include <linux/freezer.h>
+#include <linux/fsnotify.h>

#include "ext4.h"
#include "ext4_extents.h" /* Needed for trace points definition */
@@ -752,6 +753,8 @@ void __ext4_error(struct super_block *sb, const char *function,
sb->s_id, function, line, current->comm, &vaf);
va_end(args);
}
+ fsnotify_error_event(sb, NULL, error);
+
ext4_handle_error(sb, force_ro, error, 0, block, function, line);
}

@@ -782,6 +785,8 @@ void __ext4_error_inode(struct inode *inode, const char *function,
current->comm, &vaf);
va_end(args);
}
+ fsnotify_error_event(inode->i_sb, inode, error);
+
ext4_handle_error(inode->i_sb, false, error, inode->i_ino, block,
function, line);
}
@@ -820,6 +825,8 @@ void __ext4_error_file(struct file *file, const char *function,
current->comm, path, &vaf);
va_end(args);
}
+ fsnotify_error_event(inode->i_sb, inode, EFSCORRUPTED);
+
ext4_handle_error(inode->i_sb, false, EFSCORRUPTED, inode->i_ino, block,
function, line);
}
@@ -887,6 +894,7 @@ void __ext4_std_error(struct super_block *sb, const char *function,
printk(KERN_CRIT "EXT4-fs error (device %s) in %s:%d: %s\n",
sb->s_id, function, line, errstr);
}
+ fsnotify_error_event(sb, sb->s_root->d_inode, errno);

ext4_handle_error(sb, false, -errno, 0, 0, function, line);
}
--
2.31.0

2021-05-21 09:26:29

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 10/11] samples: Add fs error monitoring example

Introduce an example of a FAN_ERROR fanotify user to track filesystem
errors.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
samples/Kconfig | 8 +++
samples/Makefile | 1 +
samples/fanotify/Makefile | 3 ++
samples/fanotify/fs-monitor.c | 91 +++++++++++++++++++++++++++++++++++
4 files changed, 103 insertions(+)
create mode 100644 samples/fanotify/Makefile
create mode 100644 samples/fanotify/fs-monitor.c

diff --git a/samples/Kconfig b/samples/Kconfig
index b5a1a7aa7e23..e421556ec3e5 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -120,6 +120,14 @@ config SAMPLE_CONNECTOR
with it.
See also Documentation/driver-api/connector.rst

+config SAMPLE_FANOTIFY_ERROR
+ bool "Build fanotify error monitoring sample"
+ depends on FANOTIFY
+ help
+ When enabled, this builds an example code that uses the FAN_ERROR
+ fanotify mechanism to monitor filesystem errors.
+ Documentation/admin-guide/filesystem-monitoring.rst
+
config SAMPLE_HIDRAW
bool "hidraw sample"
depends on CC_CAN_LINK && HEADERS_INSTALL
diff --git a/samples/Makefile b/samples/Makefile
index 087e0988ccc5..931a81847c48 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -5,6 +5,7 @@ subdir-$(CONFIG_SAMPLE_AUXDISPLAY) += auxdisplay
subdir-$(CONFIG_SAMPLE_ANDROID_BINDERFS) += binderfs
obj-$(CONFIG_SAMPLE_CONFIGFS) += configfs/
obj-$(CONFIG_SAMPLE_CONNECTOR) += connector/
+obj-$(CONFIG_SAMPLE_FANOTIFY_ERROR) += fanotify/
subdir-$(CONFIG_SAMPLE_HIDRAW) += hidraw
obj-$(CONFIG_SAMPLE_HW_BREAKPOINT) += hw_breakpoint/
obj-$(CONFIG_SAMPLE_KDB) += kdb/
diff --git a/samples/fanotify/Makefile b/samples/fanotify/Makefile
new file mode 100644
index 000000000000..b3d5cc826e6f
--- /dev/null
+++ b/samples/fanotify/Makefile
@@ -0,0 +1,3 @@
+userprogs-always-y += fs-monitor
+
+userccflags += -I usr/include
diff --git a/samples/fanotify/fs-monitor.c b/samples/fanotify/fs-monitor.c
new file mode 100644
index 000000000000..49d3e2a872e4
--- /dev/null
+++ b/samples/fanotify/fs-monitor.c
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2021, Collabora Ltd.
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <err.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <fcntl.h>
+#include <sys/fanotify.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+
+#ifndef FAN_ERROR
+#define FAN_ERROR 0x00008000
+#define FAN_EVENT_INFO_TYPE_ERROR 5
+
+struct fanotify_event_info_error {
+ struct fanotify_event_info_header hdr;
+ int error;
+ __kernel_fsid_t fsid;
+ unsigned long inode;
+ __u32 error_count;
+};
+#endif
+
+static void handle_notifications(char *buffer, int len)
+{
+ struct fanotify_event_metadata *metadata;
+ struct fanotify_event_info_error *error;
+
+ for (metadata = (struct fanotify_event_metadata *) buffer;
+ FAN_EVENT_OK(metadata, len); metadata = FAN_EVENT_NEXT(metadata, len)) {
+ if (!(metadata->mask == FAN_ERROR)) {
+ printf("unexpected FAN MARK: %llx\n", metadata->mask);
+ continue;
+ } else if (metadata->fd != FAN_NOFD) {
+ printf("Unexpected fd (!= FAN_NOFD)\n");
+ continue;
+ }
+
+ printf("FAN_ERROR found len=%d\n", metadata->event_len);
+
+ error = (struct fanotify_event_info_error *) (metadata+1);
+ if (error->hdr.info_type == FAN_EVENT_INFO_TYPE_ERROR) {
+ printf("unknown record: %d\n", error->hdr.info_type);
+ continue;
+ }
+
+ printf(" Generic Error Record: len=%d\n", error->hdr.len);
+ printf(" fsid: %llx\n", error->fsid);
+ printf(" error: %d\n", error->error);
+ printf(" inode: %lu\n", error->inode);
+ printf(" error_count: %d\n", error->error_count);
+ }
+}
+
+int main(int argc, char **argv)
+{
+ int fd;
+ char buffer[BUFSIZ];
+
+ if (argc < 2) {
+ printf("Missing path argument\n");
+ return 1;
+ }
+
+ fd = fanotify_init(FAN_CLASS_NOTIF, O_RDONLY);
+ if (fd < 0)
+ errx(1, "fanotify_init");
+
+ if (fanotify_mark(fd, FAN_MARK_ADD|FAN_MARK_FILESYSTEM,
+ FAN_ERROR, AT_FDCWD, argv[1])) {
+ errx(1, "fanotify_mark");
+ }
+
+ while (1) {
+ int n = read(fd, buffer, BUFSIZ);
+
+ if (n < 0)
+ errx(1, "read");
+
+ handle_notifications(buffer, n);
+ }
+
+ return 0;
+}
--
2.31.0

2021-05-21 09:26:36

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 11/11] Documentation: Document the FAN_ERROR event

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
.../admin-guide/filesystem-monitoring.rst | 52 +++++++++++++++++++
Documentation/admin-guide/index.rst | 1 +
2 files changed, 53 insertions(+)
create mode 100644 Documentation/admin-guide/filesystem-monitoring.rst

diff --git a/Documentation/admin-guide/filesystem-monitoring.rst b/Documentation/admin-guide/filesystem-monitoring.rst
new file mode 100644
index 000000000000..81e632f8e1de
--- /dev/null
+++ b/Documentation/admin-guide/filesystem-monitoring.rst
@@ -0,0 +1,52 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================================
+File system Monitoring with fanotify
+====================================
+
+fanotify supports the FAN_ERROR mark for file system-wide error
+reporting. It is meant to be used by file system health monitoring
+daemons who listen on that interface and take actions (notify sysadmin,
+start recovery) when a file system problem is detected by the kernel.
+
+By design, A FAN_ERROR notification exposes sufficient information for a
+monitoring tool to know a problem in the file system has happened. It
+doesn't necessarily provide a user space application with semantics to
+verify an IO operation was successfully executed. That is outside of
+scope of this feature. Instead, it is only meant as a framework for
+early file system problem detection and reporting recovery tools.
+
+At the time of this writing, the only file system that emits this
+FAN_ERROR notifications is ext4.
+
+A user space example code is provided at ``samples/fanotify/fs-monitor.c``.
+
+Usage
+=====
+
+Notification structure
+======================
+
+A FAN_ERROR Notification has the following format::
+
+ [ Notification Metadata (Mandatory) ]
+ [ Generic Error Record (Mandatory) ]
+
+With the exception of the notification metadata and the generic
+information, all information records are optional. Each record type is
+identified by its unique ``struct fanotify_event_info_header.info_type``.
+
+Generic error Location
+----------------------
+
+The Generic error record provides enough information for a file system
+agnostic tool to learn about a problem in the file system, without
+requiring any details about the problem.::
+
+ struct fanotify_event_info_error {
+ struct fanotify_event_info_header hdr;
+ int error;
+ __kernel_fsid_t fsid;
+ unsigned long inode;
+ __u32 error_count;
+ };
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index dc00afcabb95..1bedab498104 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -82,6 +82,7 @@ configure specific aspects of kernel behavior to your liking.
edid
efi-stub
ext4
+ filesystem-monitoring
nfs/index
gpio/index
highuid
--
2.31.0

2021-05-21 09:28:07

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 01/11] fanotify: Fold event size calculation to its own function

Every time this function is invoked, it is immediately added to
FAN_EVENT_METADATA_LEN, since there is no need to just calculate the
length of info records. This minor clean up folds the rest of the
calculation into the function, which now operates in terms of events,
returning the size of the entire event, including metadata.

Reviewed-by: Amir Goldstein <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

---
Changes since v1:
- rebased on top of hashing patches
---
fs/notify/fanotify/fanotify_user.c | 33 +++++++++++++++++-------------
1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 71fefb30e015..3ccdee3c9f1e 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -117,17 +117,24 @@ static int fanotify_fid_info_len(int fh_len, int name_len)
return roundup(FANOTIFY_INFO_HDR_LEN + info_len, FANOTIFY_EVENT_ALIGN);
}

-static int fanotify_event_info_len(unsigned int fid_mode,
- struct fanotify_event *event)
+static size_t fanotify_event_len(struct fanotify_event *event,
+ unsigned int fid_mode)
{
- struct fanotify_info *info = fanotify_event_info(event);
- int dir_fh_len = fanotify_event_dir_fh_len(event);
- int fh_len = fanotify_event_object_fh_len(event);
- int info_len = 0;
+ size_t event_len = FAN_EVENT_METADATA_LEN;
+ struct fanotify_info *info;
+ int dir_fh_len;
+ int fh_len;
int dot_len = 0;

+ if (!fid_mode)
+ return event_len;
+
+ info = fanotify_event_info(event);
+ dir_fh_len = fanotify_event_dir_fh_len(event);
+ fh_len = fanotify_event_object_fh_len(event);
+
if (dir_fh_len) {
- info_len += fanotify_fid_info_len(dir_fh_len, info->name_len);
+ event_len += fanotify_fid_info_len(dir_fh_len, info->name_len);
} else if ((fid_mode & FAN_REPORT_NAME) && (event->mask & FAN_ONDIR)) {
/*
* With group flag FAN_REPORT_NAME, if name was not recorded in
@@ -137,9 +144,9 @@ static int fanotify_event_info_len(unsigned int fid_mode,
}

if (fh_len)
- info_len += fanotify_fid_info_len(fh_len, dot_len);
+ event_len += fanotify_fid_info_len(fh_len, dot_len);

- return info_len;
+ return event_len;
}

/*
@@ -168,7 +175,7 @@ static void fanotify_unhash_event(struct fsnotify_group *group,
static struct fanotify_event *get_one_event(struct fsnotify_group *group,
size_t count)
{
- size_t event_size = FAN_EVENT_METADATA_LEN;
+ size_t event_size;
struct fanotify_event *event = NULL;
struct fsnotify_event *fsn_event;
unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
@@ -181,8 +188,7 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group,
goto out;

event = FANOTIFY_E(fsn_event);
- if (fid_mode)
- event_size += fanotify_event_info_len(fid_mode, event);
+ event_size = fanotify_event_len(event, fid_mode);

if (event_size > count) {
event = ERR_PTR(-EINVAL);
@@ -412,8 +418,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,

pr_debug("%s: group=%p event=%p\n", __func__, group, event);

- metadata.event_len = FAN_EVENT_METADATA_LEN +
- fanotify_event_info_len(fid_mode, event);
+ metadata.event_len = fanotify_event_len(event, fid_mode);
metadata.metadata_len = FAN_EVENT_METADATA_LEN;
metadata.vers = FANOTIFY_METADATA_VERSION;
metadata.reserved = 0;
--
2.31.0

2021-05-21 09:28:30

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 05/11] inotify: Don't force FS_IN_IGNORED

According to Amir:

"FS_IN_IGNORED is completely internal to inotify and there is no need
to set it in i_fsnotify_mask at all, so if we remove the bit from the
output of inotify_arg_to_mask() no functionality will change and we will
be able to overload the event bit for FS_ERROR."

This is done in preparation to overload FS_ERROR with the notification
mechanism in fanotify.

Suggested-by: Amir Goldstein <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/notify/inotify/inotify_user.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 98f61b31745a..4d17be6dd58d 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -89,10 +89,10 @@ static inline __u32 inotify_arg_to_mask(struct inode *inode, u32 arg)
__u32 mask;

/*
- * Everything should accept their own ignored and should receive events
- * when the inode is unmounted. All directories care about children.
+ * Everything should receive events when the inode is unmounted.
+ * All directories care about children.
*/
- mask = (FS_IN_IGNORED | FS_UNMOUNT);
+ mask = (FS_UNMOUNT);
if (S_ISDIR(inode->i_mode))
mask |= FS_EVENT_ON_CHILD;

--
2.31.0

2021-05-21 10:29:38

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 00/11] File system wide monitoring

On Fri, May 21, 2021 at 5:42 AM Gabriel Krisman Bertazi
<[email protected]> wrote:
>
> Hi,
>
> This series follow up on my previous proposal [1] to support file system
> wide monitoring. As suggested by Amir, this proposal drops the ring
> buffer in favor of a single slot associated with each mark. This
> simplifies a bit the implementation, as you can see in the code.
>
> As a reminder, This proposal is limited to an interface for
> administrators to monitor the health of a file system, instead of a
> generic inteface for file errors. Therefore, this doesn't solve the
> problem of writeback errors or the need to watch a specific subtree.
>
> In comparison to the previous RFC, this implementation also drops the
> per-fs data and location, and leave those as future extensions.
>
> * Implementation
>
> The feature is implemented on top of fanotify, as a new type of fanotify
> mark, FAN_ERROR, which a file system monitoring tool can register to
> receive error notifications. When an error occurs a new notification is
> generated, in addition followed by this info field:
>
> - FS generic data: A file system agnostic structure that has a generic
> error code and identifies the filesystem. Basically, it let's
> userspace know something happened on a monitored filesystem. Since
> only the first error is recorded since the last read, this also
> includes a counter of errors that happened since the last read.
>
> * Testing
>
> This was tested by watching notifications flowing from an intentionally
> corrupted filesystem in different places. In addition, other events
> were watched in an attempt to detect regressions.
>
> Is there a specific testsuite for fanotify I should be running?

LTP is where we maintain the fsnotify regression test.
The inotify* and fanotify* tests specifically.

>
> * Patches
>
> This patchset is divided as follows: Patch 1 through 5 are refactoring
> to fsnotify/fanotify in preparation for FS_ERROR/FAN_ERROR; patch 6 and
> 7 implement the FS_ERROR API for filesystems to report error; patch 8
> add support for FAN_ERROR in fanotify; Patch 9 is an example
> implementation for ext4; patch 10 and 11 provide a sample userspace code
> and documentation.
>
> I also pushed the full series to:
>
> https://gitlab.collabora.com/krisman/linux -b fanotify-notifications-single-slot

All in all the series looks good, give or take some implementation
details.

One general comment about UAPI (CC linux-api) -
I think Darrick has proposed to report ino/gen instead of only ino.
I personally think it would be a shame not to reuse the already existing
FAN_EVENT_INFO_TYPE_FID record format, but I can understand why
you did not want to go there:
1. Not all error reports carry inode information
2. Not all filesystems support file handles
3. Any other reason that I missed?

My proposal is that in cases where group was initialized with
FAN_REPORT_FID (implies fs supports file handles) AND error report
does carry inode information, record fanotify_info in fanotify_error_event
and report FAN_EVENT_INFO_TYPE_FID record in addition to
FAN_EVENT_INFO_TYPE_ERROR record to user.

I am not insisting on this change, but I think it won't add much complexity
to your implementation and it will allow more flexibility to the API going
forward.

However, for the time being, if you want to avoid the UAPI discussion,
I don't mind if you disallow FAN_ERROR mark for group with
FAN_REPORT_FID.

In most likelihood, the tool monitoring filesystem for errors will not care
about other events, so it shouldn't care about FAN_REPORT_FID anyway.
I'd like to hear what other think about this point as well.

Thanks,
Amir.

2021-05-21 10:31:48

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 03/11] fanotify: Simplify directory sanity check in DFID_NAME mode

On Fri, May 21, 2021 at 5:42 AM Gabriel Krisman Bertazi
<[email protected]> wrote:
>
> The only fid_mode where the directory inode is reported is
> FAN_REPORT_DFID_NAME. So remove the negative logic and make it more
> straightforward.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> ---
> fs/notify/fanotify/fanotify.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 057abd2cf887..711b36a9483e 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -276,7 +276,7 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
> /* Path type events are only relevant for files and dirs */
> if (!d_is_reg(path->dentry) && !d_can_lookup(path->dentry))
> return 0;
> - } else if (!(fid_mode & FAN_REPORT_FID)) {
> + } else if (fid_mode & FAN_REPORT_DFID_NAME) {

This change is wrong, because it regresses the case of
fid_mode = FAN_REPORT_FID | FAN_REPORT_DFID_NAME
which is a perfectly legal combination.

Thanks,
Amir.

2021-05-21 10:31:58

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 02/11] fanotify: Split fsid check from other fid mode checks

On Fri, May 21, 2021 at 5:42 AM Gabriel Krisman Bertazi
<[email protected]> wrote:
>
> FAN_ERROR will require fsid, but not necessarily require the filesystem
> to expose a file handle. Split those checks into different functions, so
> they can be used separately when setting up an event.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
>
> Changes since v1:
> (Amir)
> - Rename fanotify_check_path_fsid -> fanotify_test_fsid
> - Use dentry directly instead of path
> ---
> fs/notify/fanotify/fanotify_user.c | 43 ++++++++++++++++++------------
> 1 file changed, 26 insertions(+), 17 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 3ccdee3c9f1e..9cc6c8808ed5 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -1178,15 +1178,31 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
> }
>
> /* Check if filesystem can encode a unique fid */
> -static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid)
> +static int fanotify_test_fid(struct dentry *dentry)
> +{
> + /*
> + * We need to make sure that the file system supports at least
> + * encoding a file handle so user can use name_to_handle_at() to
> + * compare fid returned with event to the file handle of watched
> + * objects. However, name_to_handle_at() requires that the
> + * filesystem also supports decoding file handles.
> + */
> + if (!dentry->d_sb->s_export_op ||
> + !dentry->d_sb->s_export_op->fh_to_dentry)
> + return -EOPNOTSUPP;
> +
> + return 0;
> +}
> +

Nit: move this helper below test_fsid.
The patch will be smaller and easier to review and more
important, less churn that hides the true git blame on the code above.

Thanks,
Amir.

2021-05-21 10:48:13

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 04/11] fanotify: Expose fanotify_mark

On Fri, May 21, 2021 at 5:42 AM Gabriel Krisman Bertazi
<[email protected]> wrote:
>
> FAN_ERROR will require an error structure to be stored per mark.
> Therefore, wrap fsnotify_mark in a fanotify specific structure in
> preparation for that.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> ---
> fs/notify/fanotify/fanotify.c | 4 +++-
> fs/notify/fanotify/fanotify.h | 10 ++++++++++
> fs/notify/fanotify/fanotify_user.c | 14 +++++++-------
> 3 files changed, 20 insertions(+), 8 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 711b36a9483e..34e2ee759b39 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -869,7 +869,9 @@ static void fanotify_freeing_mark(struct fsnotify_mark *mark,
>
> static void fanotify_free_mark(struct fsnotify_mark *fsn_mark)
> {
> - kmem_cache_free(fanotify_mark_cache, fsn_mark);
> + struct fanotify_mark *mark = FANOTIFY_MARK(fsn_mark);
> +
> + kmem_cache_free(fanotify_mark_cache, mark);
> }
>
> const struct fsnotify_ops fanotify_fsnotify_ops = {
> diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
> index 4a5e555dc3d2..a399c5e2615d 100644
> --- a/fs/notify/fanotify/fanotify.h
> +++ b/fs/notify/fanotify/fanotify.h
> @@ -129,6 +129,16 @@ static inline void fanotify_info_copy_name(struct fanotify_info *info,
> name->name);
> }
>
> +struct fanotify_mark {
> + struct fsnotify_mark fsn_mark;
> + struct fanotify_error_event *error_event;

I don't think fanotify_error_event is defined at this point in the series.
You can add this field later in the series.

> +};
> +
> +static inline struct fanotify_mark *FANOTIFY_MARK(struct fsnotify_mark *mark)
> +{
> + return container_of(mark, struct fanotify_mark, fsn_mark);
> +}
> +
> /*
> * Common structure for fanotify events. Concrete structs are allocated in
> * fanotify_handle_event() and freed when the information is retrieved by
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 9cc6c8808ed5..00210535a78e 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -914,7 +914,7 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
> __kernel_fsid_t *fsid)
> {
> struct ucounts *ucounts = group->fanotify_data.ucounts;
> - struct fsnotify_mark *mark;
> + struct fanotify_mark *mark;
> int ret;
>
> /*
> @@ -926,20 +926,20 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
> !inc_ucount(ucounts->ns, ucounts->uid, UCOUNT_FANOTIFY_MARKS))
> return ERR_PTR(-ENOSPC);
>
> - mark = kmem_cache_alloc(fanotify_mark_cache, GFP_KERNEL);
> + mark = kmem_cache_zalloc(fanotify_mark_cache, GFP_KERNEL);
> if (!mark) {
> ret = -ENOMEM;
> goto out_dec_ucounts;
> }
>
> - fsnotify_init_mark(mark, group);
> - ret = fsnotify_add_mark_locked(mark, connp, type, 0, fsid);
> + fsnotify_init_mark(&mark->fsn_mark, group);
> + ret = fsnotify_add_mark_locked(&mark->fsn_mark, connp, type, 0, fsid);
> if (ret) {
> - fsnotify_put_mark(mark);
> + fsnotify_put_mark(&mark->fsn_mark);
> goto out_dec_ucounts;
> }
>
> - return mark;
> + return &mark->fsn_mark;
>
> out_dec_ucounts:
> if (!FAN_GROUP_FLAG(group, FAN_UNLIMITED_MARKS))
> @@ -1477,7 +1477,7 @@ static int __init fanotify_user_setup(void)
> BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10);
> BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
>
> - fanotify_mark_cache = KMEM_CACHE(fsnotify_mark,
> + fanotify_mark_cache = KMEM_CACHE(fanotify_mark,
> SLAB_PANIC|SLAB_ACCOUNT);

Thinking out loud:

Do we want to increase the size of all fanotify marks or just the size of
sb marks?

In this WIP branch [1], I took the latter approach.

The greater question is, do we want/need to allow setting FAN_ERROR
on inode marks mask at all?

My feeling is that we should not allow that at this time, because the
use case of watching for critical errors on a specific inode is a
non-requirement IMO.

OTOH, if we treat this change as a stepping stone towards adding
writeback error events in the future then we should also think about
whether watching specific files for writeback error in a sensible use case
I don't think that it is, because when application can always keep an open
fd for file of interest and fysnc on any reported writeback error on the
filesystem, as those events are supposed to be rare.

Another point to consider - in future revisions of [1] fanotify sb marks
may grow more fields (e.g. subtree_root, userns), so while adding a single
pointer field to all fanotify inode marks may not be the end of the world,
going forward, we will need to have a separate kmem_cache for sb marks
anyway, so maybe better to split it now already.

Thoughts?

Amir.

[1] https://github.com/amir73il/linux/commits/fanotify_idmapped

2021-05-21 10:48:25

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 05/11] inotify: Don't force FS_IN_IGNORED

On Fri, May 21, 2021 at 5:42 AM Gabriel Krisman Bertazi
<[email protected]> wrote:
>
> According to Amir:
>
> "FS_IN_IGNORED is completely internal to inotify and there is no need
> to set it in i_fsnotify_mask at all, so if we remove the bit from the
> output of inotify_arg_to_mask() no functionality will change and we will
> be able to overload the event bit for FS_ERROR."
>
> This is done in preparation to overload FS_ERROR with the notification
> mechanism in fanotify.
>
> Suggested-by: Amir Goldstein <[email protected]>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

Reviewed-by: Amir Goldstein <[email protected]>

> ---
> fs/notify/inotify/inotify_user.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
> index 98f61b31745a..4d17be6dd58d 100644
> --- a/fs/notify/inotify/inotify_user.c
> +++ b/fs/notify/inotify/inotify_user.c
> @@ -89,10 +89,10 @@ static inline __u32 inotify_arg_to_mask(struct inode *inode, u32 arg)
> __u32 mask;
>
> /*
> - * Everything should accept their own ignored and should receive events
> - * when the inode is unmounted. All directories care about children.
> + * Everything should receive events when the inode is unmounted.
> + * All directories care about children.
> */
> - mask = (FS_IN_IGNORED | FS_UNMOUNT);
> + mask = (FS_UNMOUNT);

Nit: can remove ()

Thanks,
Amir.

2021-05-21 10:54:36

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 06/11] fsnotify: Support FS_ERROR event type

On Fri, May 21, 2021 at 5:42 AM Gabriel Krisman Bertazi
<[email protected]> wrote:
>
> Expose a new type of fsnotify event for filesystems to report errors for
> userspace monitoring tools. fanotify will send this type of
> notification for FAN_ERROR events.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
>
> ---
> Changes since v1:
> - Overload FS_ERROR with FS_IN_IGNORED
> ---
> include/linux/fsnotify_backend.h | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index 1ce66748a2d2..bbef2df3fbc7 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -42,6 +42,12 @@
>
> #define FS_UNMOUNT 0x00002000 /* inode on umount fs */
> #define FS_Q_OVERFLOW 0x00004000 /* Event queued overflowed */
> +#define FS_ERROR 0x00008000 /* Filesystem Error (fanotify) */
> +
> +/*
> + * FS_IN_IGNORED overloads FS_ERROR. It is only used internally by inotify
> + * which does not support FS_ERROR.
> + */
> #define FS_IN_IGNORED 0x00008000 /* last inotify event here */
>
> #define FS_OPEN_PERM 0x00010000 /* open event in an permission hook */
> @@ -95,7 +101,8 @@
> #define ALL_FSNOTIFY_EVENTS (ALL_FSNOTIFY_DIRENT_EVENTS | \
> FS_EVENTS_POSS_ON_CHILD | \
> FS_DELETE_SELF | FS_MOVE_SELF | FS_DN_RENAME | \
> - FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED)
> + FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED | \
> + FS_ERROR)
>
> /* Extra flags that may be reported with event or control handling of events */
> #define ALL_FSNOTIFY_FLAGS (FS_EXCL_UNLINK | FS_ISDIR | FS_IN_ONESHOT | \
> @@ -248,6 +255,12 @@ enum fsnotify_data_type {
> FSNOTIFY_EVENT_NONE,
> FSNOTIFY_EVENT_PATH,
> FSNOTIFY_EVENT_INODE,
> + FSNOTIFY_EVENT_ERROR,
> +};
> +
> +struct fs_error_report {
> + int error;
> + struct inode *inode;
> };
>
> static inline struct inode *fsnotify_data_inode(const void *data, int data_type)

Your patch may not use it, but I prefer that fsnotify_data_inode() knows
how to handle FSNOTIFY_EVENT_ERROR type.

You will need it if you choose to implement my proposal of optional
FAN_EVENT_INFO_TYPE_FID record.

Thanks,
Amir.

2021-05-21 11:05:03

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 07/11] fsnotify: Introduce helpers to send error_events

On Fri, May 21, 2021 at 5:42 AM Gabriel Krisman Bertazi
<[email protected]> wrote:
>

Some maintainers are very strict about empty commit description...

> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
>
> ---
> Changes since v2:
> - Use the inode argument (Amir)
> ---
> include/linux/fsnotify.h | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> index f8acddcf54fb..458e4feb5fe1 100644
> --- a/include/linux/fsnotify.h
> +++ b/include/linux/fsnotify.h
> @@ -317,4 +317,17 @@ static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
> fsnotify_dentry(dentry, mask);
> }
>
> +static inline void fsnotify_error_event(struct super_block *sb, struct inode *inode,
> + int error)

The _event in the helper name is inconsistent with the rest of the helpers.
I would go with fsnotify_sb_error(), especially if you agree with me about
allowing FAN_ERROR only on sb marks.
I would also consider FAN_FS_ERROR (?) to reduce ambiguity in the future
with FAN_WB_ERROR.

> +{
> + if (sb->s_fsnotify_marks) {
> + struct fs_error_report report = {
> + .error = error,
> + .inode = inode,
> + };
> + fsnotify(FS_ERROR, &report, FSNOTIFY_EVENT_ERROR, NULL, NULL,
> + sb->s_root->d_inode, 0);

This is a bit hacky.
It *may* be acceptable if we allow FAN_ERROR only for sb marks, but if we
allow to set them on inode marks, this is wrong, because it will only report
events for all inodes only to an inode mark that was set on the root inode.

If you want a clean solution:
1) Take this cleanup patch from [1]
"fsnotify: pass arguments of fsnotify() in struct fsnotify_event_info"
2) Add sb field to fsnotify_event_info
3) In fsnotify() do not assume that inode is non-NULL:
- struct super_block *sb;
+ struct inode *inode = event_info->inode;
+ struct super_block *sb = event_info->sb ?: inode->i_sb;

There are too many args to fsnotify() already, so the cleanup patch
is due anyway.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/fanotify_idmapped

2021-05-21 11:19:35

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 09/11] ext4: Send notifications on error

On Fri, May 21, 2021 at 5:42 AM Gabriel Krisman Bertazi
<[email protected]> wrote:
>
> Send a FS_ERROR message via fsnotify to a userspace monitoring tool
> whenever a ext4 error condition is triggered. This follows the existing
> error conditions in ext4, so it is hooked to the ext4_error* functions.
>
> It also follows the current dmesg reporting in the format. The
> filesystem message is composed mostly by the string that would be
> otherwise printed in dmesg.
>
> A new ext4 specific record format is exposed in the uapi, such that a
> monitoring tool knows what to expect when listening errors of an ext4
> filesystem.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

Looks fine.

Reviewed-by: Amir Goldstein <[email protected]>

> ---
> fs/ext4/super.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 7dc94f3e18e6..a8c0ac2c3e4c 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -46,6 +46,7 @@
> #include <linux/part_stat.h>
> #include <linux/kthread.h>
> #include <linux/freezer.h>
> +#include <linux/fsnotify.h>
>
> #include "ext4.h"
> #include "ext4_extents.h" /* Needed for trace points definition */
> @@ -752,6 +753,8 @@ void __ext4_error(struct super_block *sb, const char *function,
> sb->s_id, function, line, current->comm, &vaf);
> va_end(args);
> }
> + fsnotify_error_event(sb, NULL, error);
> +
> ext4_handle_error(sb, force_ro, error, 0, block, function, line);
> }
>
> @@ -782,6 +785,8 @@ void __ext4_error_inode(struct inode *inode, const char *function,
> current->comm, &vaf);
> va_end(args);
> }
> + fsnotify_error_event(inode->i_sb, inode, error);
> +
> ext4_handle_error(inode->i_sb, false, error, inode->i_ino, block,
> function, line);
> }
> @@ -820,6 +825,8 @@ void __ext4_error_file(struct file *file, const char *function,
> current->comm, path, &vaf);
> va_end(args);
> }
> + fsnotify_error_event(inode->i_sb, inode, EFSCORRUPTED);
> +
> ext4_handle_error(inode->i_sb, false, EFSCORRUPTED, inode->i_ino, block,
> function, line);
> }
> @@ -887,6 +894,7 @@ void __ext4_std_error(struct super_block *sb, const char *function,
> printk(KERN_CRIT "EXT4-fs error (device %s) in %s:%d: %s\n",
> sb->s_id, function, line, errstr);
> }
> + fsnotify_error_event(sb, sb->s_root->d_inode, errno);
>
> ext4_handle_error(sb, false, -errno, 0, 0, function, line);
> }
> --
> 2.31.0
>

2021-05-21 11:24:16

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 10/11] samples: Add fs error monitoring example

On Fri, May 21, 2021 at 5:42 AM Gabriel Krisman Bertazi
<[email protected]> wrote:
>
> Introduce an example of a FAN_ERROR fanotify user to track filesystem
> errors.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> ---
> samples/Kconfig | 8 +++
> samples/Makefile | 1 +
> samples/fanotify/Makefile | 3 ++
> samples/fanotify/fs-monitor.c | 91 +++++++++++++++++++++++++++++++++++
> 4 files changed, 103 insertions(+)
> create mode 100644 samples/fanotify/Makefile
> create mode 100644 samples/fanotify/fs-monitor.c
>
> diff --git a/samples/Kconfig b/samples/Kconfig
> index b5a1a7aa7e23..e421556ec3e5 100644
> --- a/samples/Kconfig
> +++ b/samples/Kconfig
> @@ -120,6 +120,14 @@ config SAMPLE_CONNECTOR
> with it.
> See also Documentation/driver-api/connector.rst
>
> +config SAMPLE_FANOTIFY_ERROR
> + bool "Build fanotify error monitoring sample"
> + depends on FANOTIFY
> + help
> + When enabled, this builds an example code that uses the FAN_ERROR
> + fanotify mechanism to monitor filesystem errors.
> + Documentation/admin-guide/filesystem-monitoring.rst
> +
> config SAMPLE_HIDRAW
> bool "hidraw sample"
> depends on CC_CAN_LINK && HEADERS_INSTALL
> diff --git a/samples/Makefile b/samples/Makefile
> index 087e0988ccc5..931a81847c48 100644
> --- a/samples/Makefile
> +++ b/samples/Makefile
> @@ -5,6 +5,7 @@ subdir-$(CONFIG_SAMPLE_AUXDISPLAY) += auxdisplay
> subdir-$(CONFIG_SAMPLE_ANDROID_BINDERFS) += binderfs
> obj-$(CONFIG_SAMPLE_CONFIGFS) += configfs/
> obj-$(CONFIG_SAMPLE_CONNECTOR) += connector/
> +obj-$(CONFIG_SAMPLE_FANOTIFY_ERROR) += fanotify/
> subdir-$(CONFIG_SAMPLE_HIDRAW) += hidraw
> obj-$(CONFIG_SAMPLE_HW_BREAKPOINT) += hw_breakpoint/
> obj-$(CONFIG_SAMPLE_KDB) += kdb/
> diff --git a/samples/fanotify/Makefile b/samples/fanotify/Makefile
> new file mode 100644
> index 000000000000..b3d5cc826e6f
> --- /dev/null
> +++ b/samples/fanotify/Makefile
> @@ -0,0 +1,3 @@
> +userprogs-always-y += fs-monitor
> +
> +userccflags += -I usr/include
> diff --git a/samples/fanotify/fs-monitor.c b/samples/fanotify/fs-monitor.c
> new file mode 100644
> index 000000000000..49d3e2a872e4
> --- /dev/null
> +++ b/samples/fanotify/fs-monitor.c
> @@ -0,0 +1,91 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright 2021, Collabora Ltd.
> + */
> +
> +#define _GNU_SOURCE
> +#include <errno.h>
> +#include <err.h>
> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <fcntl.h>
> +#include <sys/fanotify.h>
> +#include <sys/types.h>
> +#include <unistd.h>
> +#include <sys/stat.h>
> +#include <sys/types.h>
> +
> +#ifndef FAN_ERROR
> +#define FAN_ERROR 0x00008000
> +#define FAN_EVENT_INFO_TYPE_ERROR 5
> +
> +struct fanotify_event_info_error {
> + struct fanotify_event_info_header hdr;
> + int error;
> + __kernel_fsid_t fsid;
> + unsigned long inode;
> + __u32 error_count;
> +};
> +#endif
> +
> +static void handle_notifications(char *buffer, int len)
> +{
> + struct fanotify_event_metadata *metadata;
> + struct fanotify_event_info_error *error;
> +
> + for (metadata = (struct fanotify_event_metadata *) buffer;
> + FAN_EVENT_OK(metadata, len); metadata = FAN_EVENT_NEXT(metadata, len)) {
> + if (!(metadata->mask == FAN_ERROR)) {

Nit: != FAN_ERROR

> + printf("unexpected FAN MARK: %llx\n", metadata->mask);
> + continue;
> + } else if (metadata->fd != FAN_NOFD) {
> + printf("Unexpected fd (!= FAN_NOFD)\n");
> + continue;
> + }
> +
> + printf("FAN_ERROR found len=%d\n", metadata->event_len);
> +
> + error = (struct fanotify_event_info_error *) (metadata+1);
> + if (error->hdr.info_type == FAN_EVENT_INFO_TYPE_ERROR) {

You meant != FAN_EVENT_INFO_TYPE_ERROR ?

> + printf("unknown record: %d\n", error->hdr.info_type);
> + continue;
> + }
> +
> + printf(" Generic Error Record: len=%d\n", error->hdr.len);
> + printf(" fsid: %llx\n", error->fsid);
> + printf(" error: %d\n", error->error);
> + printf(" inode: %lu\n", error->inode);
> + printf(" error_count: %d\n", error->error_count);
> + }
> +}
> +
> +int main(int argc, char **argv)
> +{
> + int fd;
> + char buffer[BUFSIZ];

BUFSIZ not defined?

> +
> + if (argc < 2) {
> + printf("Missing path argument\n");
> + return 1;
> + }
> +
> + fd = fanotify_init(FAN_CLASS_NOTIF, O_RDONLY);
> + if (fd < 0)
> + errx(1, "fanotify_init");
> +
> + if (fanotify_mark(fd, FAN_MARK_ADD|FAN_MARK_FILESYSTEM,
> + FAN_ERROR, AT_FDCWD, argv[1])) {
> + errx(1, "fanotify_mark");
> + }
> +
> + while (1) {
> + int n = read(fd, buffer, BUFSIZ);
> +
> + if (n < 0)
> + errx(1, "read");
> +
> + handle_notifications(buffer, n);
> + }
> +
> + return 0;
> +}
> --
> 2.31.0
>

2021-05-21 11:40:20

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 11/11] Documentation: Document the FAN_ERROR event

On Fri, May 21, 2021 at 5:42 AM Gabriel Krisman Bertazi
<[email protected]> wrote:
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> ---
> .../admin-guide/filesystem-monitoring.rst | 52 +++++++++++++++++++
> Documentation/admin-guide/index.rst | 1 +
> 2 files changed, 53 insertions(+)
> create mode 100644 Documentation/admin-guide/filesystem-monitoring.rst
>
> diff --git a/Documentation/admin-guide/filesystem-monitoring.rst b/Documentation/admin-guide/filesystem-monitoring.rst
> new file mode 100644
> index 000000000000..81e632f8e1de
> --- /dev/null
> +++ b/Documentation/admin-guide/filesystem-monitoring.rst
> @@ -0,0 +1,52 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +====================================
> +File system Monitoring with fanotify
> +====================================
> +
> +fanotify supports the FAN_ERROR mark for file system-wide error
> +reporting. It is meant to be used by file system health monitoring
> +daemons who listen on that interface and take actions (notify sysadmin,
> +start recovery) when a file system problem is detected by the kernel.
> +
> +By design, A FAN_ERROR notification exposes sufficient information for a
> +monitoring tool to know a problem in the file system has happened. It
> +doesn't necessarily provide a user space application with semantics to
> +verify an IO operation was successfully executed. That is outside of
> +scope of this feature. Instead, it is only meant as a framework for
> +early file system problem detection and reporting recovery tools.
> +
> +At the time of this writing, the only file system that emits this
> +FAN_ERROR notifications is ext4.
> +
> +A user space example code is provided at ``samples/fanotify/fs-monitor.c``.
> +
> +Usage
> +=====
> +
> +Notification structure
> +======================
> +
> +A FAN_ERROR Notification has the following format::
> +
> + [ Notification Metadata (Mandatory) ]
> + [ Generic Error Record (Mandatory) ]
> +
> +With the exception of the notification metadata and the generic
> +information, all information records are optional. Each record type is
> +identified by its unique ``struct fanotify_event_info_header.info_type``.

Out-dated. Unless you decide to add support for optional FID record.

> +
> +Generic error Location

'Location' seems irrelevant?

> +----------------------
> +
> +The Generic error record provides enough information for a file system
> +agnostic tool to learn about a problem in the file system, without
> +requiring any details about the problem.::
> +
> + struct fanotify_event_info_error {
> + struct fanotify_event_info_header hdr;
> + int error;
> + __kernel_fsid_t fsid;
> + unsigned long inode;
> + __u32 error_count;
> + };


Maybe add some text about the fact the inode info is optional,
purpose of error_count and the fact that the info is related to
the first observed error.

Thanks,
Amir.

2021-05-21 12:18:35

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 08/11] fanotify: Introduce FAN_ERROR event

On Fri, May 21, 2021 at 5:42 AM Gabriel Krisman Bertazi
<[email protected]> wrote:
>
> The FAN_ERROR event is used by filesystem wide monitoring tools to
> receive notifications of type FS_ERROR_EVENT, emited by filesystems when
> a problem is detected. The error notification includes a generic error
> descriptor.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
>
> Changes since v1:
> - Pass dentry to fanotify_check_fsid (Amir)
> - FANOTIFY_EVENT_TYPE_ERROR -> FANOTIFY_EVENT_TYPE_FS_ERROR
> - Merge previous patch into it
> - Use a single slot
> ---
> fs/notify/fanotify/fanotify.c | 74 ++++++++++++++++-
> fs/notify/fanotify/fanotify.h | 28 ++++++-
> fs/notify/fanotify/fanotify_user.c | 123 ++++++++++++++++++++++++++---
> include/linux/fanotify.h | 6 +-
> include/uapi/linux/fanotify.h | 10 +++
> 5 files changed, 225 insertions(+), 16 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 34e2ee759b39..197291a8c41d 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -269,7 +269,7 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
> pr_debug("%s: report_mask=%x mask=%x data=%p data_type=%d\n",
> __func__, iter_info->report_mask, event_mask, data, data_type);
>
> - if (!fid_mode) {
> + if (!fid_mode && data_type != FSNOTIFY_EVENT_ERROR) {
> /* Do we have path to open a file descriptor? */
> if (!path)
> return 0;
> @@ -657,6 +657,51 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
> return event;
> }
>
> +static void fanotify_init_error_event(struct fanotify_error_event *event,
> + __kernel_fsid_t fsid,
> + const struct fs_error_report *report)
> +{
> + event->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR;
> + event->err_count = 1;
> + event->fsid = fsid;
> + event->error = report->error;
> + event->ino = (report->inode) ? report->inode->i_ino : 0;
> +}
> +
> +static int fanotify_queue_error_event(struct fsnotify_iter_info *iter_info,
> + struct fsnotify_group *group,
> + __kernel_fsid_t fsid,
> + const struct fs_error_report *report)
> +{
> + struct fanotify_mark *mark;
> + int type;
> + int ret = -ENOMEM;
> +
> + fsnotify_foreach_obj_type(type) {
> + if (!fsnotify_iter_should_report_type(iter_info, type))
> + continue;
> + mark = FANOTIFY_MARK(iter_info->marks[type]);
> + }

Two options.
If FAN_ERROR is allowed on inode marks, need to do the below
for every mark found in mark type iterator.
If FAN_ERROR is only allowed in sb marks use
fsnotify_iter_sb_mark(iter_info) instead of mark iterator.

> +
> + spin_lock(&mark->fsn_mark.lock);
> + if (mark->error_event) {
> + if (list_empty(&mark->error_event->fae.fse.list)) {

event list check should be protected with notification_lock.
err_count++ as well.

I suggest that you use the merge() callback of fsnotify_add_event()
to do err_count++ under notification_lock and the insert() callback
to do fsnotify_get_mark() and fanotify_init_error_event() under
notification_lock.

> + fsnotify_get_mark(&mark->fsn_mark);
> + fanotify_init_error_event(mark->error_event, fsid, report);
> + ret = fsnotify_add_event(group, &mark->error_event->fae.fse,
> + NULL, NULL);
> + if (ret)
> + fsnotify_put_mark(&mark->fsn_mark);
> + } else {
> + mark->error_event->err_count++;
> + ret = 0;
> + }
> + }
> + spin_unlock(&mark->fsn_mark.lock);
> +
> + return ret;
> +}
> +
> /*
> * Get cached fsid of the filesystem containing the object from any connector.
> * All connectors are supposed to have the same fsid, but we do not verify that
> @@ -738,8 +783,9 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> BUILD_BUG_ON(FAN_ONDIR != FS_ISDIR);
> BUILD_BUG_ON(FAN_OPEN_EXEC != FS_OPEN_EXEC);
> BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM);
> + BUILD_BUG_ON(FAN_ERROR != FS_ERROR);
>
> - BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19);
> + BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20);
>
> mask = fanotify_group_event_mask(group, iter_info, mask, data,
> data_type, dir);
> @@ -757,13 +803,20 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> return 0;
> }
>
> - if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS)) {
> + if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS) || fanotify_is_error_event(mask)) {

Jan likes to keep code inside 80-columns.
Please run checkpatch.pl --max-line-length=80 on your patches.

> fsid = fanotify_get_fsid(iter_info);
> /* Racing with mark destruction or creation? */
> if (!fsid.val[0] && !fsid.val[1])
> return 0;
> }
>
> + if (fanotify_is_error_event(mask)) {
> + ret = fanotify_queue_error_event(iter_info, group, fsid, data);
> + if (ret)
> + fsnotify_queue_overflow(group);

fsnotify_add_event() already takes care of overflow event.

> + goto finish;
> + }
> +
> event = fanotify_alloc_event(group, mask, data, data_type, dir,
> file_name, &fsid);
> ret = -ENOMEM;
> @@ -833,6 +886,17 @@ static void fanotify_free_name_event(struct fanotify_event *event)
> kfree(FANOTIFY_NE(event));
> }
>
> +static void fanotify_free_error_event(struct fanotify_event *event)
> +{
> + /*
> + * Just drop the reference acquired by
> + * fanotify_queue_error_event.
> + *
> + * The actual memory is freed with the mark.
> + */
> + fsnotify_put_mark(&(FANOTIFY_EE(event)->mark->fsn_mark));
> +}
> +
> static void fanotify_free_event(struct fsnotify_event *fsn_event)
> {
> struct fanotify_event *event;
> @@ -855,6 +919,9 @@ static void fanotify_free_event(struct fsnotify_event *fsn_event)
> case FANOTIFY_EVENT_TYPE_OVERFLOW:
> kfree(event);
> break;
> + case FANOTIFY_EVENT_TYPE_FS_ERROR:
> + fanotify_free_error_event(event);
> + break;
> default:
> WARN_ON_ONCE(1);
> }
> @@ -871,6 +938,7 @@ static void fanotify_free_mark(struct fsnotify_mark *fsn_mark)
> {
> struct fanotify_mark *mark = FANOTIFY_MARK(fsn_mark);
>
> + kfree(mark->error_event);
> kmem_cache_free(fanotify_mark_cache, mark);
> }
>
> diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
> index a399c5e2615d..ebe9e593dfbf 100644
> --- a/fs/notify/fanotify/fanotify.h
> +++ b/fs/notify/fanotify/fanotify.h
> @@ -151,6 +151,7 @@ enum fanotify_event_type {
> FANOTIFY_EVENT_TYPE_PATH,
> FANOTIFY_EVENT_TYPE_PATH_PERM,
> FANOTIFY_EVENT_TYPE_OVERFLOW, /* struct fanotify_event */
> + FANOTIFY_EVENT_TYPE_FS_ERROR, /* struct fanotify_error_event */
> __FANOTIFY_EVENT_TYPE_NUM
> };
>
> @@ -206,12 +207,31 @@ FANOTIFY_NE(struct fanotify_event *event)
> return container_of(event, struct fanotify_name_event, fae);
> }
>
> +struct fanotify_error_event {
> + struct fanotify_event fae;
> + __kernel_fsid_t fsid;
> + unsigned long ino;
> + int error;
> + u32 err_count;
> +
> + /* Back reference to the mark this error refers to. */
> + struct fanotify_mark *mark;
> +};
> +
> +static inline struct fanotify_error_event *
> +FANOTIFY_EE(struct fanotify_event *event)
> +{
> + return container_of(event, struct fanotify_error_event, fae);
> +}
> +
> static inline __kernel_fsid_t *fanotify_event_fsid(struct fanotify_event *event)
> {
> if (event->type == FANOTIFY_EVENT_TYPE_FID)
> return &FANOTIFY_FE(event)->fsid;
> else if (event->type == FANOTIFY_EVENT_TYPE_FID_NAME)
> return &FANOTIFY_NE(event)->fsid;
> + else if (event->type == FANOTIFY_EVENT_TYPE_FS_ERROR)
> + return &FANOTIFY_EE(event)->fsid;
> else
> return NULL;
> }
> @@ -297,6 +317,11 @@ static inline struct fanotify_event *FANOTIFY_E(struct fsnotify_event *fse)
> return container_of(fse, struct fanotify_event, fse);
> }
>
> +static inline bool fanotify_is_error_event(u32 mask)
> +{
> + return mask & FANOTIFY_ERROR_EVENTS;
> +}
> +
> static inline bool fanotify_event_has_path(struct fanotify_event *event)
> {
> return event->type == FANOTIFY_EVENT_TYPE_PATH ||
> @@ -325,7 +350,8 @@ static inline struct path *fanotify_event_path(struct fanotify_event *event)
> */
> static inline bool fanotify_is_hashed_event(u32 mask)
> {
> - return !fanotify_is_perm_event(mask) && !(mask & FS_Q_OVERFLOW);
> + return (!fanotify_is_perm_event(mask) && !fanotify_is_error_event(mask)
> + && !(mask & FS_Q_OVERFLOW));

Please do us the service off adding helper fanotify_is_overflow_event(mask)

> }
>
> static inline unsigned int fanotify_event_hash_bucket(
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 00210535a78e..ea9b9f8f7c21 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -106,6 +106,8 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
> #define FANOTIFY_EVENT_ALIGN 4
> #define FANOTIFY_INFO_HDR_LEN \
> (sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle))
> +#define FANOTIFY_INFO_ERROR_LEN \
> + (sizeof(struct fanotify_event_info_error))
>
> static int fanotify_fid_info_len(int fh_len, int name_len)
> {
> @@ -126,6 +128,9 @@ static size_t fanotify_event_len(struct fanotify_event *event,
> int fh_len;
> int dot_len = 0;
>
> + if (fanotify_is_error_event(event->mask))
> + return event_len + FANOTIFY_INFO_ERROR_LEN;
> +
> if (!fid_mode)
> return event_len;
>
> @@ -149,6 +154,30 @@ static size_t fanotify_event_len(struct fanotify_event *event,
> return event_len;
> }
>
> +static struct fanotify_event *fanotify_dequeue_error_event(struct fsnotify_group *group,
> + struct fanotify_event *event,
> + struct fanotify_error_event *error_event)
> +{
> + struct fsnotify_mark *mark = &(FANOTIFY_EE(event)->mark->fsn_mark);
> + /*
> + * In order to avoid missing an error count update, the
> + * queued event is de-queued and duplicated to an
> + * in-stack fanotify_error_event while still inside
> + * mark->lock. Once the event is dequeued, it can be
> + * immediately re-used for a new event.
> + *
> + * The ownership of the mark reference is dropped later
> + * by destroy_event.
> + */
> + spin_lock(&mark->lock);
> + memcpy(error_event, FANOTIFY_EE(event), sizeof(*error_event));
> + fsnotify_init_event(&error_event->fae.fse);
> + fsnotify_remove_queued_event(group, &event->fse);
> + spin_unlock(&mark->lock);
> +
> + return &error_event->fae;
> +}
> +
> /*
> * Remove an hashed event from merge hash table.
> */
> @@ -173,7 +202,8 @@ static void fanotify_unhash_event(struct fsnotify_group *group,
> * updated accordingly.
> */
> static struct fanotify_event *get_one_event(struct fsnotify_group *group,
> - size_t count)
> + size_t count,
> + struct fanotify_error_event *error_event)
> {
> size_t event_size;
> struct fanotify_event *event = NULL;
> @@ -197,9 +227,14 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group,
>
> /*
> * Held the notification_lock the whole time, so this is the
> - * same event we peeked above.
> + * same event we peeked above, unless it is copied to
> + * error_event.
> */
> - fsnotify_remove_first_event(group);
> + if (fanotify_is_error_event(event->mask))
> + event = fanotify_dequeue_error_event(group, event, error_event);
> + else
> + fsnotify_remove_first_event(group);
> +
> if (fanotify_is_perm_event(event->mask))
> FANOTIFY_PERM(event)->state = FAN_EVENT_REPORTED;
> if (fanotify_is_hashed_event(event->mask))
> @@ -309,6 +344,30 @@ static int process_access_response(struct fsnotify_group *group,
> return -ENOENT;
> }
>
> +static size_t copy_error_info_to_user(struct fanotify_event *event,
> + char __user *buf, int count)
> +{
> + struct fanotify_event_info_error info;
> + struct fanotify_error_event *fee = FANOTIFY_EE(event);
> +
> + info.hdr.info_type = FAN_EVENT_INFO_TYPE_ERROR;
> + info.hdr.pad = 0;
> + info.hdr.len = sizeof(struct fanotify_event_info_error);
> +
> + if (WARN_ON(count < info.hdr.len))
> + return -EFAULT;
> +
> + info.fsid = fee->fsid;
> + info.error = fee->error;
> + info.inode = fee->ino;
> + info.error_count = fee->err_count;
> +
> + if (copy_to_user(buf, &info, sizeof(info)))
> + return -EFAULT;
> +
> + return info.hdr.len;
> +}
> +
> static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh,
> int info_type, const char *name, size_t name_len,
> char __user *buf, size_t count)
> @@ -523,6 +582,14 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> count -= ret;
> }
>
> + if (fanotify_is_error_event(event->mask)) {
> + ret = copy_error_info_to_user(event, buf, count);
> + if (ret < 0)
> + return ret;
> + buf += ret;
> + count -= ret;
> + }
> +
> return metadata.event_len;
>
> out_close_fd:
> @@ -553,6 +620,7 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
> {
> struct fsnotify_group *group;
> struct fanotify_event *event;
> + struct fanotify_error_event error_event;
> char __user *start;
> int ret;
> DEFINE_WAIT_FUNC(wait, woken_wake_function);
> @@ -569,7 +637,7 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
> * in case there are lots of available events.
> */
> cond_resched();
> - event = get_one_event(group, count);
> + event = get_one_event(group, count, &error_event);
> if (IS_ERR(event)) {
> ret = PTR_ERR(event);
> break;
> @@ -888,16 +956,33 @@ static int fanotify_remove_inode_mark(struct fsnotify_group *group,
> flags, umask);
> }
>
> -static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark,
> - __u32 mask,
> - unsigned int flags)
> +static int fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark,
> + __u32 mask, unsigned int flags,
> + __u32 *modified_mask)

It really means 'added' not 'modified' just like the local variable name.

> {
> + struct fanotify_mark *mark = FANOTIFY_MARK(fsn_mark);
> + struct fanotify_error_event *error_event = NULL;
> + bool addition = !(flags & FAN_MARK_IGNORED_MASK);

'addition' is confusing here. I would call it 'ignored',
but do we want to allow FAN_ERROR in ignored mask?
for what case? on inode mark?
If we do not, let's not allow it.

> __u32 oldmask = -1;
>
> + /* Only pre-alloc error_event if needed. */
> + if (addition && (mask & FAN_ERROR) && !mark->error_event) {
> + error_event = kzalloc(sizeof(*error_event), GFP_KERNEL);
> + if (!error_event)
> + return -ENOMEM;
> + fanotify_init_event(&error_event->fae, 0, FS_ERROR);
> + error_event->mark = mark;
> + }
> +
> spin_lock(&fsn_mark->lock);
> - if (!(flags & FAN_MARK_IGNORED_MASK)) {
> + if (addition) {
> oldmask = fsn_mark->mask;
> fsn_mark->mask |= mask;
> +
> + if (!mark->error_event) {
> + mark->error_event = error_event;
> + error_event = NULL;
> + }
> } else {
> fsn_mark->ignored_mask |= mask;
> if (flags & FAN_MARK_IGNORED_SURV_MODIFY)
> @@ -905,7 +990,11 @@ static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark,
> }
> spin_unlock(&fsn_mark->lock);
>
> - return mask & ~oldmask;
> + kfree(error_event);
> +
> + *modified_mask = mask & ~oldmask;
> + return 0;
> +
> }
>
> static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
> @@ -955,6 +1044,7 @@ static int fanotify_add_mark(struct fsnotify_group *group,
> {
> struct fsnotify_mark *fsn_mark;
> __u32 added;
> + int ret = 0;
>
> mutex_lock(&group->mark_mutex);
> fsn_mark = fsnotify_find_mark(connp, group);
> @@ -965,13 +1055,18 @@ static int fanotify_add_mark(struct fsnotify_group *group,
> return PTR_ERR(fsn_mark);
> }
> }
> - added = fanotify_mark_add_to_mask(fsn_mark, mask, flags);
> + ret = fanotify_mark_add_to_mask(fsn_mark, mask, flags, &added);
> + if (ret)
> + goto out;
> +
> if (added & ~fsnotify_conn_mask(fsn_mark->connector))
> fsnotify_recalc_mask(fsn_mark->connector);
> +
> +out:
> mutex_unlock(&group->mark_mutex);
>
> fsnotify_put_mark(fsn_mark);
> - return 0;
> + return ret;
> }
>
> static int fanotify_add_vfsmount_mark(struct fsnotify_group *group,
> @@ -1377,6 +1472,12 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
>
> fsid = &__fsid;
> }
> + if (mask & FAN_ERROR) {
> + ret = fanotify_test_fsid(path.dentry, &__fsid);
> + if (ret)
> + goto path_put_and_out;
> + fsid = &__fsid;
> + }
>
> /* inode held in place by reference to path; group by fget on fd */
> if (mark_type == FAN_MARK_INODE)
> diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> index bad41bcb25df..05c929d588e4 100644
> --- a/include/linux/fanotify.h
> +++ b/include/linux/fanotify.h
> @@ -81,9 +81,12 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */
> #define FANOTIFY_INODE_EVENTS (FANOTIFY_DIRENT_EVENTS | \
> FAN_ATTRIB | FAN_MOVE_SELF | FAN_DELETE_SELF)
>
> +#define FANOTIFY_ERROR_EVENTS (FAN_ERROR)
> +
> /* Events that user can request to be notified on */
> #define FANOTIFY_EVENTS (FANOTIFY_PATH_EVENTS | \
> - FANOTIFY_INODE_EVENTS)
> + FANOTIFY_INODE_EVENTS | \
> + FANOTIFY_ERROR_EVENTS)
>
> /* Events that require a permission response from user */
> #define FANOTIFY_PERM_EVENTS (FAN_OPEN_PERM | FAN_ACCESS_PERM | \
> @@ -95,6 +98,7 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */
> /* Events that may be reported to user */
> #define FANOTIFY_OUTGOING_EVENTS (FANOTIFY_EVENTS | \
> FANOTIFY_PERM_EVENTS | \
> + FANOTIFY_ERROR_EVENTS | \
> FAN_Q_OVERFLOW | FAN_ONDIR)
>
> #define ALL_FANOTIFY_EVENT_BITS (FANOTIFY_OUTGOING_EVENTS | \
> diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
> index fbf9c5c7dd59..e3920597112f 100644
> --- a/include/uapi/linux/fanotify.h
> +++ b/include/uapi/linux/fanotify.h
> @@ -20,6 +20,7 @@
> #define FAN_OPEN_EXEC 0x00001000 /* File was opened for exec */
>
> #define FAN_Q_OVERFLOW 0x00004000 /* Event queued overflowed */
> +#define FAN_ERROR 0x00008000 /* Filesystem error */
>
> #define FAN_OPEN_PERM 0x00010000 /* File open in perm check */
> #define FAN_ACCESS_PERM 0x00020000 /* File accessed in perm check */
> @@ -123,6 +124,7 @@ struct fanotify_event_metadata {
> #define FAN_EVENT_INFO_TYPE_FID 1
> #define FAN_EVENT_INFO_TYPE_DFID_NAME 2
> #define FAN_EVENT_INFO_TYPE_DFID 3
> +#define FAN_EVENT_INFO_TYPE_ERROR 4
>
> /* Variable length info record following event metadata */
> struct fanotify_event_info_header {
> @@ -148,6 +150,14 @@ struct fanotify_event_info_fid {
> unsigned char handle[0];
> };
>
> +struct fanotify_event_info_error {
> + struct fanotify_event_info_header hdr;
> + int error;
> + __kernel_fsid_t fsid;
> + unsigned long inode;

I think for UAPI this should use __u64 and better call this ino and
not inode?

Thanks,
Amir.

2021-05-21 20:19:06

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH 08/11] fanotify: Introduce FAN_ERROR event

On Thu, May 20, 2021 at 10:41:31PM -0400, Gabriel Krisman Bertazi wrote:
> The FAN_ERROR event is used by filesystem wide monitoring tools to
> receive notifications of type FS_ERROR_EVENT, emited by filesystems when
> a problem is detected. The error notification includes a generic error
> descriptor.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
>
> Changes since v1:
> - Pass dentry to fanotify_check_fsid (Amir)
> - FANOTIFY_EVENT_TYPE_ERROR -> FANOTIFY_EVENT_TYPE_FS_ERROR
> - Merge previous patch into it
> - Use a single slot
> ---
> fs/notify/fanotify/fanotify.c | 74 ++++++++++++++++-
> fs/notify/fanotify/fanotify.h | 28 ++++++-
> fs/notify/fanotify/fanotify_user.c | 123 ++++++++++++++++++++++++++---
> include/linux/fanotify.h | 6 +-
> include/uapi/linux/fanotify.h | 10 +++
> 5 files changed, 225 insertions(+), 16 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 34e2ee759b39..197291a8c41d 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -269,7 +269,7 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
> pr_debug("%s: report_mask=%x mask=%x data=%p data_type=%d\n",
> __func__, iter_info->report_mask, event_mask, data, data_type);
>
> - if (!fid_mode) {
> + if (!fid_mode && data_type != FSNOTIFY_EVENT_ERROR) {
> /* Do we have path to open a file descriptor? */
> if (!path)
> return 0;
> @@ -657,6 +657,51 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
> return event;
> }
>
> +static void fanotify_init_error_event(struct fanotify_error_event *event,
> + __kernel_fsid_t fsid,
> + const struct fs_error_report *report)
> +{
> + event->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR;
> + event->err_count = 1;
> + event->fsid = fsid;
> + event->error = report->error;
> + event->ino = (report->inode) ? report->inode->i_ino : 0;
> +}
> +
> +static int fanotify_queue_error_event(struct fsnotify_iter_info *iter_info,
> + struct fsnotify_group *group,
> + __kernel_fsid_t fsid,
> + const struct fs_error_report *report)
> +{
> + struct fanotify_mark *mark;
> + int type;
> + int ret = -ENOMEM;
> +
> + fsnotify_foreach_obj_type(type) {
> + if (!fsnotify_iter_should_report_type(iter_info, type))
> + continue;
> + mark = FANOTIFY_MARK(iter_info->marks[type]);
> + }
> +
> + spin_lock(&mark->fsn_mark.lock);
> + if (mark->error_event) {
> + if (list_empty(&mark->error_event->fae.fse.list)) {
> + fsnotify_get_mark(&mark->fsn_mark);
> + fanotify_init_error_event(mark->error_event, fsid, report);
> + ret = fsnotify_add_event(group, &mark->error_event->fae.fse,
> + NULL, NULL);
> + if (ret)
> + fsnotify_put_mark(&mark->fsn_mark);
> + } else {
> + mark->error_event->err_count++;
> + ret = 0;
> + }
> + }
> + spin_unlock(&mark->fsn_mark.lock);
> +
> + return ret;
> +}
> +
> /*
> * Get cached fsid of the filesystem containing the object from any connector.
> * All connectors are supposed to have the same fsid, but we do not verify that
> @@ -738,8 +783,9 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> BUILD_BUG_ON(FAN_ONDIR != FS_ISDIR);
> BUILD_BUG_ON(FAN_OPEN_EXEC != FS_OPEN_EXEC);
> BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM);
> + BUILD_BUG_ON(FAN_ERROR != FS_ERROR);
>
> - BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19);
> + BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20);
>
> mask = fanotify_group_event_mask(group, iter_info, mask, data,
> data_type, dir);
> @@ -757,13 +803,20 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> return 0;
> }
>
> - if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS)) {
> + if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS) || fanotify_is_error_event(mask)) {
> fsid = fanotify_get_fsid(iter_info);
> /* Racing with mark destruction or creation? */
> if (!fsid.val[0] && !fsid.val[1])
> return 0;
> }
>
> + if (fanotify_is_error_event(mask)) {
> + ret = fanotify_queue_error_event(iter_info, group, fsid, data);
> + if (ret)
> + fsnotify_queue_overflow(group);
> + goto finish;
> + }
> +
> event = fanotify_alloc_event(group, mask, data, data_type, dir,
> file_name, &fsid);
> ret = -ENOMEM;
> @@ -833,6 +886,17 @@ static void fanotify_free_name_event(struct fanotify_event *event)
> kfree(FANOTIFY_NE(event));
> }
>
> +static void fanotify_free_error_event(struct fanotify_event *event)
> +{
> + /*
> + * Just drop the reference acquired by
> + * fanotify_queue_error_event.
> + *
> + * The actual memory is freed with the mark.
> + */
> + fsnotify_put_mark(&(FANOTIFY_EE(event)->mark->fsn_mark));
> +}
> +
> static void fanotify_free_event(struct fsnotify_event *fsn_event)
> {
> struct fanotify_event *event;
> @@ -855,6 +919,9 @@ static void fanotify_free_event(struct fsnotify_event *fsn_event)
> case FANOTIFY_EVENT_TYPE_OVERFLOW:
> kfree(event);
> break;
> + case FANOTIFY_EVENT_TYPE_FS_ERROR:
> + fanotify_free_error_event(event);
> + break;
> default:
> WARN_ON_ONCE(1);
> }
> @@ -871,6 +938,7 @@ static void fanotify_free_mark(struct fsnotify_mark *fsn_mark)
> {
> struct fanotify_mark *mark = FANOTIFY_MARK(fsn_mark);
>
> + kfree(mark->error_event);
> kmem_cache_free(fanotify_mark_cache, mark);
> }
>
> diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
> index a399c5e2615d..ebe9e593dfbf 100644
> --- a/fs/notify/fanotify/fanotify.h
> +++ b/fs/notify/fanotify/fanotify.h
> @@ -151,6 +151,7 @@ enum fanotify_event_type {
> FANOTIFY_EVENT_TYPE_PATH,
> FANOTIFY_EVENT_TYPE_PATH_PERM,
> FANOTIFY_EVENT_TYPE_OVERFLOW, /* struct fanotify_event */
> + FANOTIFY_EVENT_TYPE_FS_ERROR, /* struct fanotify_error_event */
> __FANOTIFY_EVENT_TYPE_NUM
> };
>
> @@ -206,12 +207,31 @@ FANOTIFY_NE(struct fanotify_event *event)
> return container_of(event, struct fanotify_name_event, fae);
> }
>
> +struct fanotify_error_event {
> + struct fanotify_event fae;
> + __kernel_fsid_t fsid;
> + unsigned long ino;
> + int error;
> + u32 err_count;
> +
> + /* Back reference to the mark this error refers to. */
> + struct fanotify_mark *mark;
> +};
> +
> +static inline struct fanotify_error_event *
> +FANOTIFY_EE(struct fanotify_event *event)
> +{
> + return container_of(event, struct fanotify_error_event, fae);
> +}
> +
> static inline __kernel_fsid_t *fanotify_event_fsid(struct fanotify_event *event)
> {
> if (event->type == FANOTIFY_EVENT_TYPE_FID)
> return &FANOTIFY_FE(event)->fsid;
> else if (event->type == FANOTIFY_EVENT_TYPE_FID_NAME)
> return &FANOTIFY_NE(event)->fsid;
> + else if (event->type == FANOTIFY_EVENT_TYPE_FS_ERROR)
> + return &FANOTIFY_EE(event)->fsid;
> else
> return NULL;
> }
> @@ -297,6 +317,11 @@ static inline struct fanotify_event *FANOTIFY_E(struct fsnotify_event *fse)
> return container_of(fse, struct fanotify_event, fse);
> }
>
> +static inline bool fanotify_is_error_event(u32 mask)
> +{
> + return mask & FANOTIFY_ERROR_EVENTS;
> +}
> +
> static inline bool fanotify_event_has_path(struct fanotify_event *event)
> {
> return event->type == FANOTIFY_EVENT_TYPE_PATH ||
> @@ -325,7 +350,8 @@ static inline struct path *fanotify_event_path(struct fanotify_event *event)
> */
> static inline bool fanotify_is_hashed_event(u32 mask)
> {
> - return !fanotify_is_perm_event(mask) && !(mask & FS_Q_OVERFLOW);
> + return (!fanotify_is_perm_event(mask) && !fanotify_is_error_event(mask)
> + && !(mask & FS_Q_OVERFLOW));
> }
>
> static inline unsigned int fanotify_event_hash_bucket(
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 00210535a78e..ea9b9f8f7c21 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -106,6 +106,8 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
> #define FANOTIFY_EVENT_ALIGN 4
> #define FANOTIFY_INFO_HDR_LEN \
> (sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle))
> +#define FANOTIFY_INFO_ERROR_LEN \
> + (sizeof(struct fanotify_event_info_error))
>
> static int fanotify_fid_info_len(int fh_len, int name_len)
> {
> @@ -126,6 +128,9 @@ static size_t fanotify_event_len(struct fanotify_event *event,
> int fh_len;
> int dot_len = 0;
>
> + if (fanotify_is_error_event(event->mask))
> + return event_len + FANOTIFY_INFO_ERROR_LEN;
> +
> if (!fid_mode)
> return event_len;
>
> @@ -149,6 +154,30 @@ static size_t fanotify_event_len(struct fanotify_event *event,
> return event_len;
> }
>
> +static struct fanotify_event *fanotify_dequeue_error_event(struct fsnotify_group *group,
> + struct fanotify_event *event,
> + struct fanotify_error_event *error_event)
> +{
> + struct fsnotify_mark *mark = &(FANOTIFY_EE(event)->mark->fsn_mark);
> + /*
> + * In order to avoid missing an error count update, the
> + * queued event is de-queued and duplicated to an
> + * in-stack fanotify_error_event while still inside
> + * mark->lock. Once the event is dequeued, it can be
> + * immediately re-used for a new event.
> + *
> + * The ownership of the mark reference is dropped later
> + * by destroy_event.
> + */
> + spin_lock(&mark->lock);
> + memcpy(error_event, FANOTIFY_EE(event), sizeof(*error_event));
> + fsnotify_init_event(&error_event->fae.fse);
> + fsnotify_remove_queued_event(group, &event->fse);
> + spin_unlock(&mark->lock);
> +
> + return &error_event->fae;
> +}
> +
> /*
> * Remove an hashed event from merge hash table.
> */
> @@ -173,7 +202,8 @@ static void fanotify_unhash_event(struct fsnotify_group *group,
> * updated accordingly.
> */
> static struct fanotify_event *get_one_event(struct fsnotify_group *group,
> - size_t count)
> + size_t count,
> + struct fanotify_error_event *error_event)
> {
> size_t event_size;
> struct fanotify_event *event = NULL;
> @@ -197,9 +227,14 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group,
>
> /*
> * Held the notification_lock the whole time, so this is the
> - * same event we peeked above.
> + * same event we peeked above, unless it is copied to
> + * error_event.
> */
> - fsnotify_remove_first_event(group);
> + if (fanotify_is_error_event(event->mask))
> + event = fanotify_dequeue_error_event(group, event, error_event);
> + else
> + fsnotify_remove_first_event(group);
> +
> if (fanotify_is_perm_event(event->mask))
> FANOTIFY_PERM(event)->state = FAN_EVENT_REPORTED;
> if (fanotify_is_hashed_event(event->mask))
> @@ -309,6 +344,30 @@ static int process_access_response(struct fsnotify_group *group,
> return -ENOENT;
> }
>
> +static size_t copy_error_info_to_user(struct fanotify_event *event,
> + char __user *buf, int count)
> +{
> + struct fanotify_event_info_error info;
> + struct fanotify_error_event *fee = FANOTIFY_EE(event);
> +
> + info.hdr.info_type = FAN_EVENT_INFO_TYPE_ERROR;
> + info.hdr.pad = 0;
> + info.hdr.len = sizeof(struct fanotify_event_info_error);
> +
> + if (WARN_ON(count < info.hdr.len))
> + return -EFAULT;
> +
> + info.fsid = fee->fsid;
> + info.error = fee->error;
> + info.inode = fee->ino;
> + info.error_count = fee->err_count;
> +
> + if (copy_to_user(buf, &info, sizeof(info)))
> + return -EFAULT;
> +
> + return info.hdr.len;
> +}
> +
> static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh,
> int info_type, const char *name, size_t name_len,
> char __user *buf, size_t count)
> @@ -523,6 +582,14 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> count -= ret;
> }
>
> + if (fanotify_is_error_event(event->mask)) {
> + ret = copy_error_info_to_user(event, buf, count);
> + if (ret < 0)
> + return ret;
> + buf += ret;
> + count -= ret;
> + }
> +
> return metadata.event_len;
>
> out_close_fd:
> @@ -553,6 +620,7 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
> {
> struct fsnotify_group *group;
> struct fanotify_event *event;
> + struct fanotify_error_event error_event;
> char __user *start;
> int ret;
> DEFINE_WAIT_FUNC(wait, woken_wake_function);
> @@ -569,7 +637,7 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
> * in case there are lots of available events.
> */
> cond_resched();
> - event = get_one_event(group, count);
> + event = get_one_event(group, count, &error_event);
> if (IS_ERR(event)) {
> ret = PTR_ERR(event);
> break;
> @@ -888,16 +956,33 @@ static int fanotify_remove_inode_mark(struct fsnotify_group *group,
> flags, umask);
> }
>
> -static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark,
> - __u32 mask,
> - unsigned int flags)
> +static int fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark,
> + __u32 mask, unsigned int flags,
> + __u32 *modified_mask)
> {
> + struct fanotify_mark *mark = FANOTIFY_MARK(fsn_mark);
> + struct fanotify_error_event *error_event = NULL;
> + bool addition = !(flags & FAN_MARK_IGNORED_MASK);
> __u32 oldmask = -1;
>
> + /* Only pre-alloc error_event if needed. */
> + if (addition && (mask & FAN_ERROR) && !mark->error_event) {
> + error_event = kzalloc(sizeof(*error_event), GFP_KERNEL);
> + if (!error_event)
> + return -ENOMEM;
> + fanotify_init_event(&error_event->fae, 0, FS_ERROR);
> + error_event->mark = mark;
> + }
> +
> spin_lock(&fsn_mark->lock);
> - if (!(flags & FAN_MARK_IGNORED_MASK)) {
> + if (addition) {
> oldmask = fsn_mark->mask;
> fsn_mark->mask |= mask;
> +
> + if (!mark->error_event) {
> + mark->error_event = error_event;
> + error_event = NULL;
> + }
> } else {
> fsn_mark->ignored_mask |= mask;
> if (flags & FAN_MARK_IGNORED_SURV_MODIFY)
> @@ -905,7 +990,11 @@ static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark,
> }
> spin_unlock(&fsn_mark->lock);
>
> - return mask & ~oldmask;
> + kfree(error_event);
> +
> + *modified_mask = mask & ~oldmask;
> + return 0;
> +
> }
>
> static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
> @@ -955,6 +1044,7 @@ static int fanotify_add_mark(struct fsnotify_group *group,
> {
> struct fsnotify_mark *fsn_mark;
> __u32 added;
> + int ret = 0;
>
> mutex_lock(&group->mark_mutex);
> fsn_mark = fsnotify_find_mark(connp, group);
> @@ -965,13 +1055,18 @@ static int fanotify_add_mark(struct fsnotify_group *group,
> return PTR_ERR(fsn_mark);
> }
> }
> - added = fanotify_mark_add_to_mask(fsn_mark, mask, flags);
> + ret = fanotify_mark_add_to_mask(fsn_mark, mask, flags, &added);
> + if (ret)
> + goto out;
> +
> if (added & ~fsnotify_conn_mask(fsn_mark->connector))
> fsnotify_recalc_mask(fsn_mark->connector);
> +
> +out:
> mutex_unlock(&group->mark_mutex);
>
> fsnotify_put_mark(fsn_mark);
> - return 0;
> + return ret;
> }
>
> static int fanotify_add_vfsmount_mark(struct fsnotify_group *group,
> @@ -1377,6 +1472,12 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
>
> fsid = &__fsid;
> }
> + if (mask & FAN_ERROR) {
> + ret = fanotify_test_fsid(path.dentry, &__fsid);
> + if (ret)
> + goto path_put_and_out;
> + fsid = &__fsid;
> + }
>
> /* inode held in place by reference to path; group by fget on fd */
> if (mark_type == FAN_MARK_INODE)
> diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> index bad41bcb25df..05c929d588e4 100644
> --- a/include/linux/fanotify.h
> +++ b/include/linux/fanotify.h
> @@ -81,9 +81,12 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */
> #define FANOTIFY_INODE_EVENTS (FANOTIFY_DIRENT_EVENTS | \
> FAN_ATTRIB | FAN_MOVE_SELF | FAN_DELETE_SELF)
>
> +#define FANOTIFY_ERROR_EVENTS (FAN_ERROR)
> +
> /* Events that user can request to be notified on */
> #define FANOTIFY_EVENTS (FANOTIFY_PATH_EVENTS | \
> - FANOTIFY_INODE_EVENTS)
> + FANOTIFY_INODE_EVENTS | \
> + FANOTIFY_ERROR_EVENTS)
>
> /* Events that require a permission response from user */
> #define FANOTIFY_PERM_EVENTS (FAN_OPEN_PERM | FAN_ACCESS_PERM | \
> @@ -95,6 +98,7 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */
> /* Events that may be reported to user */
> #define FANOTIFY_OUTGOING_EVENTS (FANOTIFY_EVENTS | \
> FANOTIFY_PERM_EVENTS | \
> + FANOTIFY_ERROR_EVENTS | \
> FAN_Q_OVERFLOW | FAN_ONDIR)
>
> #define ALL_FANOTIFY_EVENT_BITS (FANOTIFY_OUTGOING_EVENTS | \
> diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
> index fbf9c5c7dd59..e3920597112f 100644
> --- a/include/uapi/linux/fanotify.h
> +++ b/include/uapi/linux/fanotify.h
> @@ -20,6 +20,7 @@
> #define FAN_OPEN_EXEC 0x00001000 /* File was opened for exec */
>
> #define FAN_Q_OVERFLOW 0x00004000 /* Event queued overflowed */
> +#define FAN_ERROR 0x00008000 /* Filesystem error */
>
> #define FAN_OPEN_PERM 0x00010000 /* File open in perm check */
> #define FAN_ACCESS_PERM 0x00020000 /* File accessed in perm check */
> @@ -123,6 +124,7 @@ struct fanotify_event_metadata {
> #define FAN_EVENT_INFO_TYPE_FID 1
> #define FAN_EVENT_INFO_TYPE_DFID_NAME 2
> #define FAN_EVENT_INFO_TYPE_DFID 3
> +#define FAN_EVENT_INFO_TYPE_ERROR 4
>
> /* Variable length info record following event metadata */
> struct fanotify_event_info_header {
> @@ -148,6 +150,14 @@ struct fanotify_event_info_fid {
> unsigned char handle[0];
> };
>
> +struct fanotify_event_info_error {
> + struct fanotify_event_info_header hdr;
> + int error;
> + __kernel_fsid_t fsid;
> + unsigned long inode;

This ought to be __u64 (i.e. guaranteed 64-bit quantity) since this
struct is part of the is userspace ABI and you don't want to deal with
i386-on-x64 translation headaches. The same goes for
fanotify_error_event.inode since it feeds this field.

--D

> + __u32 error_count;
> +};
> +
> struct fanotify_response {
> __s32 fd;
> __u32 response;
> --
> 2.31.0
>

2021-05-22 23:32:02

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 00/11] File system wide monitoring

Hi Gabriel,

Quick question; what userspace program are you using to test this
feature? Do you have a custom testing program you are using? If so,
could share it?

Many thanks!!

- Ted

On Thu, May 20, 2021 at 10:41:23PM -0400, Gabriel Krisman Bertazi wrote:
> Hi,
>
> This series follow up on my previous proposal [1] to support file system
> wide monitoring. As suggested by Amir, this proposal drops the ring
> buffer in favor of a single slot associated with each mark. This
> simplifies a bit the implementation, as you can see in the code.
>
> As a reminder, This proposal is limited to an interface for
> administrators to monitor the health of a file system, instead of a
> generic inteface for file errors. Therefore, this doesn't solve the
> problem of writeback errors or the need to watch a specific subtree.
>
> In comparison to the previous RFC, this implementation also drops the
> per-fs data and location, and leave those as future extensions.
>
> * Implementation
>
> The feature is implemented on top of fanotify, as a new type of fanotify
> mark, FAN_ERROR, which a file system monitoring tool can register to
> receive error notifications. When an error occurs a new notification is
> generated, in addition followed by this info field:
>
> - FS generic data: A file system agnostic structure that has a generic
> error code and identifies the filesystem. Basically, it let's
> userspace know something happened on a monitored filesystem. Since
> only the first error is recorded since the last read, this also
> includes a counter of errors that happened since the last read.
>
> * Testing
>
> This was tested by watching notifications flowing from an intentionally
> corrupted filesystem in different places. In addition, other events
> were watched in an attempt to detect regressions.
>
> Is there a specific testsuite for fanotify I should be running?
>
> * Patches
>
> This patchset is divided as follows: Patch 1 through 5 are refactoring
> to fsnotify/fanotify in preparation for FS_ERROR/FAN_ERROR; patch 6 and
> 7 implement the FS_ERROR API for filesystems to report error; patch 8
> add support for FAN_ERROR in fanotify; Patch 9 is an example
> implementation for ext4; patch 10 and 11 provide a sample userspace code
> and documentation.
>
> I also pushed the full series to:
>
> https://gitlab.collabora.com/krisman/linux -b fanotify-notifications-single-slot
>
> [1] https://lwn.net/Articles/854545/
>
> Cc: Darrick J. Wong <[email protected]>
> Cc: Theodore Ts'o <[email protected]>
> Cc: Dave Chinner <[email protected]>
> Cc: [email protected]
> To: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
>
> Gabriel Krisman Bertazi (11):
> fanotify: Fold event size calculation to its own function
> fanotify: Split fsid check from other fid mode checks
> fanotify: Simplify directory sanity check in DFID_NAME mode
> fanotify: Expose fanotify_mark
> inotify: Don't force FS_IN_IGNORED
> fsnotify: Support FS_ERROR event type
> fsnotify: Introduce helpers to send error_events
> fanotify: Introduce FAN_ERROR event
> ext4: Send notifications on error
> samples: Add fs error monitoring example
> Documentation: Document the FAN_ERROR event
>
> .../admin-guide/filesystem-monitoring.rst | 52 +++++
> Documentation/admin-guide/index.rst | 1 +
> fs/ext4/super.c | 8 +
> fs/notify/fanotify/fanotify.c | 80 ++++++-
> fs/notify/fanotify/fanotify.h | 38 +++-
> fs/notify/fanotify/fanotify_user.c | 213 ++++++++++++++----
> fs/notify/inotify/inotify_user.c | 6 +-
> include/linux/fanotify.h | 6 +-
> include/linux/fsnotify.h | 13 ++
> include/linux/fsnotify_backend.h | 15 +-
> include/uapi/linux/fanotify.h | 10 +
> samples/Kconfig | 8 +
> samples/Makefile | 1 +
> samples/fanotify/Makefile | 3 +
> samples/fanotify/fs-monitor.c | 91 ++++++++
> 15 files changed, 485 insertions(+), 60 deletions(-)
> create mode 100644 Documentation/admin-guide/filesystem-monitoring.rst
> create mode 100644 samples/fanotify/Makefile
> create mode 100644 samples/fanotify/fs-monitor.c
>
> --
> 2.31.0
>

2021-05-24 03:07:17

by Ian Kent

[permalink] [raw]
Subject: Re: [PATCH 00/11] File system wide monitoring

On Thu, 2021-05-20 at 22:41 -0400, Gabriel Krisman Bertazi wrote:
> Hi,
>
> This series follow up on my previous proposal [1] to support file
> system
> wide monitoring.  As suggested by Amir, this proposal drops the ring
> buffer in favor of a single slot associated with each mark.  This
> simplifies a bit the implementation, as you can see in the code.

I get the need for simplification but I'm wondering where this
will end up.

I also know kernel space to user space error communication has
been a concern for quite a while now.

And, from that, there are a couple of things that occur to me.

One is that the standard errno is often not sufficient to give
sufficiently accurate error reports.

It seems to me that, in the long run, there needs to be a way
for sub-systems to register errors that they will use to report
events (with associated text description) so they can be more
informative. That's probably not as simple as it sounds due to
things like error number clashes, etc. OTOH that mechanism could
be used to avoid using text strings in notifications provided
provided there was a matching user space library, thereby reducing
the size of the event report.

Another aspect, also related to the limitations of error reporting
in general, is the way the information could be used. Again, not a
simple thing to do or grok, but would probably require some way of
grouping errors that are related in a stack like manner for user
space inference engines to analyse. Yes, this is very much out of
scope but is a big picture long term usefulness type of notion.

And I don't know how error storms occurring as a side effect of
some fairly serious problem could be handled ...

So not really related to the current implementation but a comment
to try and get peoples thoughts about where this is heading in
the long run.

Ian
>
> As a reminder, This proposal is limited to an interface for
> administrators to monitor the health of a file system, instead of a
> generic inteface for file errors.  Therefore, this doesn't solve the
> problem of writeback errors or the need to watch a specific subtree.
>
> In comparison to the previous RFC, this implementation also drops the
> per-fs data and location, and leave those as future extensions.
>
> * Implementation
>
> The feature is implemented on top of fanotify, as a new type of
> fanotify
> mark, FAN_ERROR, which a file system monitoring tool can register to
> receive error notifications.  When an error occurs a new notification
> is
> generated, in addition followed by this info field:
>
>  - FS generic data: A file system agnostic structure that has a
> generic
>  error code and identifies the filesystem.  Basically, it let's
>  userspace know something happened on a monitored filesystem.  Since
>  only the first error is recorded since the last read, this also
>  includes a counter of errors that happened since the last read.
>
> * Testing
>
> This was tested by watching notifications flowing from an
> intentionally
> corrupted filesystem in different places.  In addition, other events
> were watched in an attempt to detect regressions.
>
> Is there a specific testsuite for fanotify I should be running?
>
> * Patches
>
> This patchset is divided as follows: Patch 1 through 5 are
> refactoring
> to fsnotify/fanotify in preparation for FS_ERROR/FAN_ERROR; patch 6
> and
> 7 implement the FS_ERROR API for filesystems to report error; patch 8
> add support for FAN_ERROR in fanotify; Patch 9 is an example
> implementation for ext4; patch 10 and 11 provide a sample userspace
> code
> and documentation.
>
> I also pushed the full series to:
>
>   https://gitlab.collabora.com/krisman/linux -b fanotify-
> notifications-single-slot
>
> [1] https://lwn.net/Articles/854545/
>
> Cc: Darrick J. Wong <[email protected]>
> Cc: Theodore Ts'o <[email protected]>
> Cc: Dave Chinner <[email protected]>
> Cc: [email protected]
> To: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
>
> Gabriel Krisman Bertazi (11):
>   fanotify: Fold event size calculation to its own function
>   fanotify: Split fsid check from other fid mode checks
>   fanotify: Simplify directory sanity check in DFID_NAME mode
>   fanotify: Expose fanotify_mark
>   inotify: Don't force FS_IN_IGNORED
>   fsnotify: Support FS_ERROR event type
>   fsnotify: Introduce helpers to send error_events
>   fanotify: Introduce FAN_ERROR event
>   ext4: Send notifications on error
>   samples: Add fs error monitoring example
>   Documentation: Document the FAN_ERROR event
>
>  .../admin-guide/filesystem-monitoring.rst     |  52 +++++
>  Documentation/admin-guide/index.rst           |   1 +
>  fs/ext4/super.c                               |   8 +
>  fs/notify/fanotify/fanotify.c                 |  80 ++++++-
>  fs/notify/fanotify/fanotify.h                 |  38 +++-
>  fs/notify/fanotify/fanotify_user.c            | 213 ++++++++++++++--
> --
>  fs/notify/inotify/inotify_user.c              |   6 +-
>  include/linux/fanotify.h                      |   6 +-
>  include/linux/fsnotify.h                      |  13 ++
>  include/linux/fsnotify_backend.h              |  15 +-
>  include/uapi/linux/fanotify.h                 |  10 +
>  samples/Kconfig                               |   8 +
>  samples/Makefile                              |   1 +
>  samples/fanotify/Makefile                     |   3 +
>  samples/fanotify/fs-monitor.c                 |  91 ++++++++
>  15 files changed, 485 insertions(+), 60 deletions(-)
>  create mode 100644 Documentation/admin-guide/filesystem-
> monitoring.rst
>  create mode 100644 samples/fanotify/Makefile
>  create mode 100644 samples/fanotify/fs-monitor.c
>


2021-05-24 16:08:12

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH 00/11] File system wide monitoring

"Theodore Y. Ts'o" <[email protected]> writes:

> Hi Gabriel,
>
> Quick question; what userspace program are you using to test this
> feature? Do you have a custom testing program you are using? If so,
> could share it?

Hello Ted,

I'm using the program in patch 10, to watch and print notifications ,
along with corrupt filesystems. I trigger operations via command line
and watch the reports flow. I have slightly modified the sample code to
test marks disappearing at inopportune times, but that's trivial to
recreate with the samples code.

I plan to write more automated tests for LTP, once we settle on this
design.

--
Gabriel Krisman Bertazi

2021-05-27 01:21:02

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH 10/11] samples: Add fs error monitoring example

Amir Goldstein <[email protected]> writes:

>> + printf("unexpected FAN MARK: %llx\n", metadata->mask);
>> + continue;
>> + } else if (metadata->fd != FAN_NOFD) {
>> + printf("Unexpected fd (!= FAN_NOFD)\n");
>> + continue;
>> + }
>> +
>> + printf("FAN_ERROR found len=%d\n", metadata->event_len);
>> +
>> + error = (struct fanotify_event_info_error *) (metadata+1);
>> + if (error->hdr.info_type == FAN_EVENT_INFO_TYPE_ERROR) {
>
> You meant != FAN_EVENT_INFO_TYPE_ERROR ?

Ugh. the FAN_EVENT_INFO_TYPE_ERROR definition on this file was not updated,
preventing me from catching this. nice catch.
>
>> + printf("unknown record: %d\n", error->hdr.info_type);
>> + continue;
>> + }
>> +
>> + printf(" Generic Error Record: len=%d\n", error->hdr.len);
>> + printf(" fsid: %llx\n", error->fsid);
>> + printf(" error: %d\n", error->error);
>> + printf(" inode: %lu\n", error->inode);
>> + printf(" error_count: %d\n", error->error_count);
>> + }
>> +}
>> +
>> +int main(int argc, char **argv)
>> +{
>> + int fd;
>> + char buffer[BUFSIZ];
>
> BUFSIZ not defined?

that's from stdio.h.


--
Gabriel Krisman Bertazi