2021-08-04 16:07:29

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 00/23] File system wide monitoring

Hi,

This is the 5th version of the FAN_FS_ERROR patches. This applies
the feedback from last version (thanks Amir, Jan). Biggest changes are
the split up of the FAN_FS_ERROR patch into something more reviewable,
and the removal of the event_info structure due to the perf regression
shown by unixbench.

This was tested with LTP for regressions, and also using the sample on
the last patch, with a corrupted image. I wrote a new ltp test for this
feature which is being reviewed and is available at:

https://gitlab.collabora.com/krisman/ltp -b fan-fs-error

In addition, I wrote a man-page that can be pulled from:

https://gitlab.collabora.com/krisman/man-pages.git -b fan-fs-error

And is being reviewed at the list.

I also pushed this full series to:

https://gitlab.collabora.com/krisman/linux -b fanotify-notifications-single-slot

Thank you

Original cover letter
---------------------
Hi,

This series follow up on my previous proposal [1] to support file system
wide monitoring. As suggested by Amir, this proposal drops the ring
buffer in favor of a single slot associated with each mark. This
simplifies a bit the implementation, as you can see in the code.

As a reminder, This proposal is limited to an interface for
administrators to monitor the health of a file system, instead of a
generic inteface for file errors. Therefore, this doesn't solve the
problem of writeback errors or the need to watch a specific subtree.

In comparison to the previous RFC, this implementation also drops the
per-fs data and location, and leave those as future extensions.

* Implementation

The feature is implemented on top of fanotify, as a new type of fanotify
mark, FAN_ERROR, which a file system monitoring tool can register to
receive error notifications. When an error occurs a new notification is
generated, in addition followed by this info field:

- FS generic data: A file system agnostic structure that has a generic
error code and identifies the filesystem. Basically, it let's
userspace know something happened on a monitored filesystem. Since
only the first error is recorded since the last read, this also
includes a counter of errors that happened since the last read.

* Testing

This was tested by watching notifications flowing from an intentionally
corrupted filesystem in different places. In addition, other events
were watched in an attempt to detect regressions.

Is there a specific testsuite for fanotify I should be running?

* Patches

This patchset is divided as follows: Patch 1 through 5 are refactoring
to fsnotify/fanotify in preparation for FS_ERROR/FAN_ERROR; patch 6 and
7 implement the FS_ERROR API for filesystems to report error; patch 8
add support for FAN_ERROR in fanotify; Patch 9 is an example
implementation for ext4; patch 10 and 11 provide a sample userspace code
and documentation.

I also pushed the full series to:

https://gitlab.collabora.com/krisman/linux -b fanotify-notifications-single-slot

[1] https://lwn.net/Articles/854545/
[2] https://lwn.net/Articles/856916/

Cc: Darrick J. Wong <[email protected]>
Cc: Theodore Ts'o <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: [email protected]
To: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

Gabriel Krisman Bertazi (23):
fsnotify: Don't insert unmergeable events in hashtable
fanotify: Fold event size calculation to its own function
fanotify: Split fsid check from other fid mode checks
fsnotify: Reserve mark bits for backends
fanotify: Split superblock marks out to a new cache
inotify: Don't force FS_IN_IGNORED
fsnotify: Add helper to detect overflow_event
fsnotify: Add wrapper around fsnotify_add_event
fsnotify: Support passing argument to insert callback on add_event
fsnotify: Allow events reported with an empty inode
fsnotify: Support FS_ERROR event type
fanotify: Expose helper to estimate file handle encoding length
fanotify: Allow file handle encoding for unhashed events
fanotify: Encode invalid file handler when no inode is provided
fanotify: Require fid_mode for any non-fd event
fanotify: Reserve UAPI bits for FAN_FS_ERROR
fanotify: Preallocate per superblock mark error event
fanotify: Handle FAN_FS_ERROR events
fanotify: Report fid info for file related file system errors
fanotify: Emit generic error info type for error event
ext4: Send notifications on error
samples: Add fs error monitoring example
docs: Document the FAN_FS_ERROR event

.../admin-guide/filesystem-monitoring.rst | 70 +++++
Documentation/admin-guide/index.rst | 1 +
fs/ext4/super.c | 8 +
fs/kernfs/file.c | 6 +-
fs/notify/fanotify/fanotify.c | 186 +++++++++---
fs/notify/fanotify/fanotify.h | 80 +++++-
fs/notify/fanotify/fanotify_user.c | 266 +++++++++++++++---
fs/notify/fsnotify.c | 14 +-
fs/notify/inotify/inotify_fsnotify.c | 2 +-
fs/notify/inotify/inotify_user.c | 6 +-
fs/notify/notification.c | 16 +-
include/linux/fanotify.h | 9 +-
include/linux/fsnotify.h | 20 +-
include/linux/fsnotify_backend.h | 76 ++++-
include/uapi/linux/fanotify.h | 8 +
samples/Kconfig | 9 +
samples/Makefile | 1 +
samples/fanotify/Makefile | 5 +
samples/fanotify/fs-monitor.c | 138 +++++++++
19 files changed, 803 insertions(+), 118 deletions(-)
create mode 100644 Documentation/admin-guide/filesystem-monitoring.rst
create mode 100644 samples/fanotify/Makefile
create mode 100644 samples/fanotify/fs-monitor.c

--
2.32.0


2021-08-04 16:07:40

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 02/23] fanotify: Fold event size calculation to its own function

Every time this function is invoked, it is immediately added to
FAN_EVENT_METADATA_LEN, since there is no need to just calculate the
length of info records. This minor clean up folds the rest of the
calculation into the function, which now operates in terms of events,
returning the size of the entire event, including metadata.

Reviewed-by: Amir Goldstein <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

---
Changes since v1:
- rebased on top of hashing patches
---
fs/notify/fanotify/fanotify_user.c | 33 +++++++++++++++++-------------
1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 64864fb40b40..68a53d3534f8 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -117,17 +117,24 @@ static int fanotify_fid_info_len(int fh_len, int name_len)
return roundup(FANOTIFY_INFO_HDR_LEN + info_len, FANOTIFY_EVENT_ALIGN);
}

-static int fanotify_event_info_len(unsigned int fid_mode,
- struct fanotify_event *event)
+static size_t fanotify_event_len(struct fanotify_event *event,
+ unsigned int fid_mode)
{
- struct fanotify_info *info = fanotify_event_info(event);
- int dir_fh_len = fanotify_event_dir_fh_len(event);
- int fh_len = fanotify_event_object_fh_len(event);
- int info_len = 0;
+ size_t event_len = FAN_EVENT_METADATA_LEN;
+ struct fanotify_info *info;
+ int dir_fh_len;
+ int fh_len;
int dot_len = 0;

+ if (!fid_mode)
+ return event_len;
+
+ info = fanotify_event_info(event);
+ dir_fh_len = fanotify_event_dir_fh_len(event);
+ fh_len = fanotify_event_object_fh_len(event);
+
if (dir_fh_len) {
- info_len += fanotify_fid_info_len(dir_fh_len, info->name_len);
+ event_len += fanotify_fid_info_len(dir_fh_len, info->name_len);
} else if ((fid_mode & FAN_REPORT_NAME) && (event->mask & FAN_ONDIR)) {
/*
* With group flag FAN_REPORT_NAME, if name was not recorded in
@@ -137,9 +144,9 @@ static int fanotify_event_info_len(unsigned int fid_mode,
}

if (fh_len)
- info_len += fanotify_fid_info_len(fh_len, dot_len);
+ event_len += fanotify_fid_info_len(fh_len, dot_len);

- return info_len;
+ return event_len;
}

/*
@@ -168,7 +175,7 @@ static void fanotify_unhash_event(struct fsnotify_group *group,
static struct fanotify_event *get_one_event(struct fsnotify_group *group,
size_t count)
{
- size_t event_size = FAN_EVENT_METADATA_LEN;
+ size_t event_size;
struct fanotify_event *event = NULL;
struct fsnotify_event *fsn_event;
unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
@@ -181,8 +188,7 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group,
goto out;

event = FANOTIFY_E(fsn_event);
- if (fid_mode)
- event_size += fanotify_event_info_len(fid_mode, event);
+ event_size = fanotify_event_len(event, fid_mode);

if (event_size > count) {
event = ERR_PTR(-EINVAL);
@@ -412,8 +418,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,

pr_debug("%s: group=%p event=%p\n", __func__, group, event);

- metadata.event_len = FAN_EVENT_METADATA_LEN +
- fanotify_event_info_len(fid_mode, event);
+ metadata.event_len = fanotify_event_len(event, fid_mode);
metadata.metadata_len = FAN_EVENT_METADATA_LEN;
metadata.vers = FANOTIFY_METADATA_VERSION;
metadata.reserved = 0;
--
2.32.0

2021-08-04 16:07:47

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 04/23] fsnotify: Reserve mark bits for backends

Split out the final bits of struct fsnotify_mark->flags for use by a
backend.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
include/linux/fsnotify_backend.h | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 1ce66748a2d2..9d5586445c65 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -363,6 +363,21 @@ struct fsnotify_mark_connector {
struct hlist_head list;
};

+#define FSNOTIFY_MARK_FLAG(flag) \
+static const unsigned int FSNOTIFY_MARK_FLAG_##flag = \
+ (1 << FSN_MARK_FL_BIT_##flag)
+
+enum fsnotify_mark_bits {
+ FSN_MARK_FL_BIT_IGNORED_SURV_MODIFY,
+ FSN_MARK_FL_BIT_ALIVE,
+ FSN_MARK_FL_BIT_ATTACHED,
+ FSN_MARK_PRIVATE_FLAGS,
+};
+
+FSNOTIFY_MARK_FLAG(IGNORED_SURV_MODIFY);
+FSNOTIFY_MARK_FLAG(ALIVE);
+FSNOTIFY_MARK_FLAG(ATTACHED);
+
/*
* A mark is simply an object attached to an in core inode which allows an
* fsnotify listener to indicate they are either no longer interested in events
@@ -398,9 +413,7 @@ struct fsnotify_mark {
struct fsnotify_mark_connector *connector;
/* Events types to ignore [mark->lock, group->mark_mutex] */
__u32 ignored_mask;
-#define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY 0x01
-#define FSNOTIFY_MARK_FLAG_ALIVE 0x02
-#define FSNOTIFY_MARK_FLAG_ATTACHED 0x04
+ /* Upper bits [31:PRIVATE_FLAGS] are reserved for backend usage */
unsigned int flags; /* flags [mark->lock] */
};

--
2.32.0

2021-08-04 16:07:50

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 06/23] inotify: Don't force FS_IN_IGNORED

According to Amir:

"FS_IN_IGNORED is completely internal to inotify and there is no need
to set it in i_fsnotify_mask at all, so if we remove the bit from the
output of inotify_arg_to_mask() no functionality will change and we will
be able to overload the event bit for FS_ERROR."

This is done in preparation to overload FS_ERROR with the notification
mechanism in fanotify.

Suggested-by: Amir Goldstein <[email protected]>
Reviewed-by: Amir Goldstein <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/notify/inotify/inotify_user.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 98f61b31745a..4d17be6dd58d 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -89,10 +89,10 @@ static inline __u32 inotify_arg_to_mask(struct inode *inode, u32 arg)
__u32 mask;

/*
- * Everything should accept their own ignored and should receive events
- * when the inode is unmounted. All directories care about children.
+ * Everything should receive events when the inode is unmounted.
+ * All directories care about children.
*/
- mask = (FS_IN_IGNORED | FS_UNMOUNT);
+ mask = (FS_UNMOUNT);
if (S_ISDIR(inode->i_mode))
mask |= FS_EVENT_ON_CHILD;

--
2.32.0

2021-08-04 16:07:53

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 03/23] fanotify: Split fsid check from other fid mode checks

FAN_FS_ERROR will require fsid, but not necessarily require the
filesystem to expose a file handle. Split those checks into different
functions, so they can be used separately when setting up an event.

While there, update a comment about tmpfs having 0 fsid, which is no
longer true.

Reviewed-by: Amir Goldstein <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

---
Changes since v2:
- FAN_ERROR -> FAN_FS_ERROR (Amir)
- Update comment (Amir)

Changes since v1:
(Amir)
- Sort hunks to simplify diff.
Changes since RFC:
(Amir)
- Rename fanotify_check_path_fsid -> fanotify_test_fsid.
- Use dentry directly instead of path.
---
fs/notify/fanotify/fanotify_user.c | 27 ++++++++++++++++++---------
1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 68a53d3534f8..67b18dfe0025 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1192,16 +1192,15 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
return fd;
}

-/* Check if filesystem can encode a unique fid */
-static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid)
+static int fanotify_test_fsid(struct dentry *dentry, __kernel_fsid_t *fsid)
{
__kernel_fsid_t root_fsid;
int err;

/*
- * Make sure path is not in filesystem with zero fsid (e.g. tmpfs).
+ * Make sure dentry is not of a filesystem with zero fsid (e.g. fuse).
*/
- err = vfs_get_fsid(path->dentry, fsid);
+ err = vfs_get_fsid(dentry, fsid);
if (err)
return err;

@@ -1209,10 +1208,10 @@ static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid)
return -ENODEV;

/*
- * Make sure path is not inside a filesystem subvolume (e.g. btrfs)
+ * Make sure dentry is not of a filesystem subvolume (e.g. btrfs)
* which uses a different fsid than sb root.
*/
- err = vfs_get_fsid(path->dentry->d_sb->s_root, &root_fsid);
+ err = vfs_get_fsid(dentry->d_sb->s_root, &root_fsid);
if (err)
return err;

@@ -1220,6 +1219,12 @@ static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid)
root_fsid.val[1] != fsid->val[1])
return -EXDEV;

+ return 0;
+}
+
+/* Check if filesystem can encode a unique fid */
+static int fanotify_test_fid(struct dentry *dentry)
+{
/*
* We need to make sure that the file system supports at least
* encoding a file handle so user can use name_to_handle_at() to
@@ -1227,8 +1232,8 @@ static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid)
* objects. However, name_to_handle_at() requires that the
* filesystem also supports decoding file handles.
*/
- if (!path->dentry->d_sb->s_export_op ||
- !path->dentry->d_sb->s_export_op->fh_to_dentry)
+ if (!dentry->d_sb->s_export_op ||
+ !dentry->d_sb->s_export_op->fh_to_dentry)
return -EOPNOTSUPP;

return 0;
@@ -1379,7 +1384,11 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
}

if (fid_mode) {
- ret = fanotify_test_fid(&path, &__fsid);
+ ret = fanotify_test_fsid(path.dentry, &__fsid);
+ if (ret)
+ goto path_put_and_out;
+
+ ret = fanotify_test_fid(path.dentry);
if (ret)
goto path_put_and_out;

--
2.32.0

2021-08-04 16:07:54

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 05/23] fanotify: Split superblock marks out to a new cache

FAN_FS_ERROR will require an error structure to be stored per mark.
But, since FAN_FS_ERROR doesn't apply to inode/mount marks, it should
suffice to only expose this information for superblock marks. Therefore,
wrap this kind of marks into a container and plumb it for the future.

Reviewed-by: Amir Goldstein <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

---
Changes since v2:
- Move mark initialization to fanotify_alloc_mark (Amir)

Changes since v1:
- Only extend superblock marks (Amir)
---
fs/notify/fanotify/fanotify.c | 10 ++++++--
fs/notify/fanotify/fanotify.h | 23 ++++++++++++++++++
fs/notify/fanotify/fanotify_user.c | 38 ++++++++++++++++++++++++++++--
3 files changed, 67 insertions(+), 4 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 310246f8d3f1..c3eefe3f6494 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -869,9 +869,15 @@ static void fanotify_freeing_mark(struct fsnotify_mark *mark,
dec_ucount(group->fanotify_data.ucounts, UCOUNT_FANOTIFY_MARKS);
}

-static void fanotify_free_mark(struct fsnotify_mark *fsn_mark)
+static void fanotify_free_mark(struct fsnotify_mark *mark)
{
- kmem_cache_free(fanotify_mark_cache, fsn_mark);
+ if (mark->flags & FANOTIFY_MARK_FLAG_SB_MARK) {
+ struct fanotify_sb_mark *fa_mark = FANOTIFY_SB_MARK(mark);
+
+ kmem_cache_free(fanotify_sb_mark_cache, fa_mark);
+ } else {
+ kmem_cache_free(fanotify_mark_cache, mark);
+ }
}

const struct fsnotify_ops fanotify_fsnotify_ops = {
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 4a5e555dc3d2..c548b96472dd 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -6,6 +6,7 @@
#include <linux/hashtable.h>

extern struct kmem_cache *fanotify_mark_cache;
+extern struct kmem_cache *fanotify_sb_mark_cache;
extern struct kmem_cache *fanotify_fid_event_cachep;
extern struct kmem_cache *fanotify_path_event_cachep;
extern struct kmem_cache *fanotify_perm_event_cachep;
@@ -129,6 +130,28 @@ static inline void fanotify_info_copy_name(struct fanotify_info *info,
name->name);
}

+#define FANOTIFY_MARK_FLAG(flag) \
+static const unsigned int FANOTIFY_MARK_FLAG_##flag = \
+ (1 << FANOTIFY_MARK_FLAG_BIT_##flag)
+
+enum fanotify_mark_bits {
+ FANOTIFY_MARK_FLAG_BIT_SB_MARK = FSN_MARK_PRIVATE_FLAGS,
+};
+
+FANOTIFY_MARK_FLAG(SB_MARK);
+
+struct fanotify_sb_mark {
+ struct fsnotify_mark fsn_mark;
+};
+
+static inline
+struct fanotify_sb_mark *FANOTIFY_SB_MARK(struct fsnotify_mark *mark)
+{
+ WARN_ON(!(mark->flags & FANOTIFY_MARK_FLAG_SB_MARK));
+
+ return container_of(mark, struct fanotify_sb_mark, fsn_mark);
+}
+
/*
* Common structure for fanotify events. Concrete structs are allocated in
* fanotify_handle_event() and freed when the information is retrieved by
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 67b18dfe0025..0696f2121781 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -99,6 +99,7 @@ struct ctl_table fanotify_table[] = {
extern const struct fsnotify_ops fanotify_fsnotify_ops;

struct kmem_cache *fanotify_mark_cache __read_mostly;
+struct kmem_cache *fanotify_sb_mark_cache __read_mostly;
struct kmem_cache *fanotify_fid_event_cachep __read_mostly;
struct kmem_cache *fanotify_path_event_cachep __read_mostly;
struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
@@ -915,6 +916,38 @@ static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark,
return mask & ~oldmask;
}

+static struct fsnotify_mark *fanotify_alloc_mark(struct fsnotify_group *group,
+ unsigned int type)
+{
+ struct fanotify_sb_mark *sb_mark;
+ struct fsnotify_mark *mark;
+
+ switch (type) {
+ case FSNOTIFY_OBJ_TYPE_SB:
+ sb_mark = kmem_cache_zalloc(fanotify_sb_mark_cache, GFP_KERNEL);
+ if (!sb_mark)
+ return NULL;
+ mark = &sb_mark->fsn_mark;
+ break;
+
+ case FSNOTIFY_OBJ_TYPE_INODE:
+ case FSNOTIFY_OBJ_TYPE_PARENT:
+ case FSNOTIFY_OBJ_TYPE_VFSMOUNT:
+ mark = kmem_cache_alloc(fanotify_mark_cache, GFP_KERNEL);
+ break;
+ default:
+ WARN_ON(1);
+ return NULL;
+ }
+
+ fsnotify_init_mark(mark, group);
+
+ if (type == FSNOTIFY_OBJ_TYPE_SB)
+ mark->flags |= FANOTIFY_MARK_FLAG_SB_MARK;
+
+ return mark;
+}
+
static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
fsnotify_connp_t *connp,
unsigned int type,
@@ -933,13 +966,12 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
!inc_ucount(ucounts->ns, ucounts->uid, UCOUNT_FANOTIFY_MARKS))
return ERR_PTR(-ENOSPC);

- mark = kmem_cache_alloc(fanotify_mark_cache, GFP_KERNEL);
+ mark = fanotify_alloc_mark(group, type);
if (!mark) {
ret = -ENOMEM;
goto out_dec_ucounts;
}

- fsnotify_init_mark(mark, group);
ret = fsnotify_add_mark_locked(mark, connp, type, 0, fsid);
if (ret) {
fsnotify_put_mark(mark);
@@ -1497,6 +1529,8 @@ static int __init fanotify_user_setup(void)

fanotify_mark_cache = KMEM_CACHE(fsnotify_mark,
SLAB_PANIC|SLAB_ACCOUNT);
+ fanotify_sb_mark_cache = KMEM_CACHE(fanotify_sb_mark,
+ SLAB_PANIC|SLAB_ACCOUNT);
fanotify_fid_event_cachep = KMEM_CACHE(fanotify_fid_event,
SLAB_PANIC);
fanotify_path_event_cachep = KMEM_CACHE(fanotify_path_event,
--
2.32.0

2021-08-04 16:07:59

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 08/23] fsnotify: Add wrapper around fsnotify_add_event

fsnotify_add_event is growing in number of parameters, which is most
case are just passed a NULL pointer. So, split out a new
fsnotify_insert_event function to clean things up for users who don't
need an insert hook.

Suggested-by: Amir Goldstein <[email protected]>
Reviewed-by: Amir Goldstein <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/notify/fanotify/fanotify.c | 4 ++--
fs/notify/inotify/inotify_fsnotify.c | 2 +-
fs/notify/notification.c | 12 ++++++------
include/linux/fsnotify_backend.h | 23 ++++++++++++++++-------
4 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index c3eefe3f6494..acf78c0ed219 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -781,8 +781,8 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
}

fsn_event = &event->fse;
- ret = fsnotify_add_event(group, fsn_event, fanotify_merge,
- fanotify_insert_event);
+ ret = fsnotify_insert_event(group, fsn_event, fanotify_merge,
+ fanotify_insert_event);
if (ret) {
/* Permission events shouldn't be merged */
BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS);
diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
index d1a64daa0171..a96582cbfad1 100644
--- a/fs/notify/inotify/inotify_fsnotify.c
+++ b/fs/notify/inotify/inotify_fsnotify.c
@@ -116,7 +116,7 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask,
if (len)
strcpy(event->name, name->name);

- ret = fsnotify_add_event(group, fsn_event, inotify_merge, NULL);
+ ret = fsnotify_add_event(group, fsn_event, inotify_merge);
if (ret) {
/* Our event wasn't used in the end. Free it. */
fsnotify_destroy_event(group, fsn_event);
diff --git a/fs/notify/notification.c b/fs/notify/notification.c
index 32f45543b9c6..44bb10f50715 100644
--- a/fs/notify/notification.c
+++ b/fs/notify/notification.c
@@ -78,12 +78,12 @@ void fsnotify_destroy_event(struct fsnotify_group *group,
* 2 if the event was not queued - either the queue of events has overflown
* or the group is shutting down.
*/
-int fsnotify_add_event(struct fsnotify_group *group,
- struct fsnotify_event *event,
- int (*merge)(struct fsnotify_group *,
- struct fsnotify_event *),
- void (*insert)(struct fsnotify_group *,
- struct fsnotify_event *))
+int fsnotify_insert_event(struct fsnotify_group *group,
+ struct fsnotify_event *event,
+ int (*merge)(struct fsnotify_group *,
+ struct fsnotify_event *),
+ void (*insert)(struct fsnotify_group *,
+ struct fsnotify_event *))
{
int ret = 0;
struct list_head *list = &group->notification_list;
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 2b5fb9327a77..cd4ca11f129e 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -495,16 +495,25 @@ extern int fsnotify_fasync(int fd, struct file *file, int on);
extern void fsnotify_destroy_event(struct fsnotify_group *group,
struct fsnotify_event *event);
/* attach the event to the group notification queue */
-extern int fsnotify_add_event(struct fsnotify_group *group,
- struct fsnotify_event *event,
- int (*merge)(struct fsnotify_group *,
- struct fsnotify_event *),
- void (*insert)(struct fsnotify_group *,
- struct fsnotify_event *));
+extern int fsnotify_insert_event(struct fsnotify_group *group,
+ struct fsnotify_event *event,
+ int (*merge)(struct fsnotify_group *,
+ struct fsnotify_event *),
+ void (*insert)(struct fsnotify_group *,
+ struct fsnotify_event *));
+
+static inline int fsnotify_add_event(struct fsnotify_group *group,
+ struct fsnotify_event *event,
+ int (*merge)(struct fsnotify_group *,
+ struct fsnotify_event *))
+{
+ return fsnotify_insert_event(group, event, merge, NULL);
+}
+
/* Queue overflow event to a notification group */
static inline void fsnotify_queue_overflow(struct fsnotify_group *group)
{
- fsnotify_add_event(group, group->overflow_event, NULL, NULL);
+ fsnotify_add_event(group, group->overflow_event, NULL);
}

static inline bool fsnotify_is_overflow_event(u32 mask)
--
2.32.0

2021-08-04 16:08:01

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 07/23] fsnotify: Add helper to detect overflow_event

Similarly to fanotify_is_perm_event and friends, provide a helper
predicate to say whether a mask is of an overflow event.

Suggested-by: Amir Goldstein <[email protected]>
Reviewed-by: Amir Goldstein <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/notify/fanotify/fanotify.h | 3 ++-
include/linux/fsnotify_backend.h | 5 +++++
2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index c548b96472dd..d4a562c2619f 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -338,7 +338,8 @@ static inline struct path *fanotify_event_path(struct fanotify_event *event)
*/
static inline bool fanotify_is_hashed_event(u32 mask)
{
- return !fanotify_is_perm_event(mask) && !(mask & FS_Q_OVERFLOW);
+ return !(fanotify_is_perm_event(mask) ||
+ fsnotify_is_overflow_event(mask));
}

static inline unsigned int fanotify_event_hash_bucket(
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 9d5586445c65..2b5fb9327a77 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -507,6 +507,11 @@ static inline void fsnotify_queue_overflow(struct fsnotify_group *group)
fsnotify_add_event(group, group->overflow_event, NULL, NULL);
}

+static inline bool fsnotify_is_overflow_event(u32 mask)
+{
+ return mask & FS_Q_OVERFLOW;
+}
+
static inline bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group)
{
assert_spin_locked(&group->notification_lock);
--
2.32.0

2021-08-04 16:08:01

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 09/23] fsnotify: Support passing argument to insert callback on add_event

FAN_FS_ERROR requires some initialization to happen from inside the
insert hook. This allows a user of fanotify_add_event to pass an
argument to be sent to the insert callback.

Reviewed-by: Amir Goldstein <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/notify/fanotify/fanotify.c | 5 +++--
fs/notify/notification.c | 6 ++++--
include/linux/fsnotify_backend.h | 6 ++++--
3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index acf78c0ed219..538d114f439f 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -694,7 +694,8 @@ static __kernel_fsid_t fanotify_get_fsid(struct fsnotify_iter_info *iter_info)
* Add an event to hash table for faster merge.
*/
static void fanotify_insert_event(struct fsnotify_group *group,
- struct fsnotify_event *fsn_event)
+ struct fsnotify_event *fsn_event,
+ const void *data)
{
struct fanotify_event *event = FANOTIFY_E(fsn_event);
unsigned int bucket = fanotify_event_hash_bucket(group, event);
@@ -782,7 +783,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,

fsn_event = &event->fse;
ret = fsnotify_insert_event(group, fsn_event, fanotify_merge,
- fanotify_insert_event);
+ fanotify_insert_event, NULL);
if (ret) {
/* Permission events shouldn't be merged */
BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS);
diff --git a/fs/notify/notification.c b/fs/notify/notification.c
index 44bb10f50715..206a17346ca6 100644
--- a/fs/notify/notification.c
+++ b/fs/notify/notification.c
@@ -83,7 +83,9 @@ int fsnotify_insert_event(struct fsnotify_group *group,
int (*merge)(struct fsnotify_group *,
struct fsnotify_event *),
void (*insert)(struct fsnotify_group *,
- struct fsnotify_event *))
+ struct fsnotify_event *,
+ const void *),
+ const void *insert_data)
{
int ret = 0;
struct list_head *list = &group->notification_list;
@@ -121,7 +123,7 @@ int fsnotify_insert_event(struct fsnotify_group *group,
group->q_len++;
list_add_tail(&event->list, list);
if (insert)
- insert(group, event);
+ insert(group, event, insert_data);
spin_unlock(&group->notification_lock);

wake_up(&group->notification_waitq);
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index cd4ca11f129e..693fe4cb48cf 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -500,14 +500,16 @@ extern int fsnotify_insert_event(struct fsnotify_group *group,
int (*merge)(struct fsnotify_group *,
struct fsnotify_event *),
void (*insert)(struct fsnotify_group *,
- struct fsnotify_event *));
+ struct fsnotify_event *,
+ const void *),
+ const void *insert_data);

static inline int fsnotify_add_event(struct fsnotify_group *group,
struct fsnotify_event *event,
int (*merge)(struct fsnotify_group *,
struct fsnotify_event *))
{
- return fsnotify_insert_event(group, event, merge, NULL);
+ return fsnotify_insert_event(group, event, merge, NULL, NULL);
}

/* Queue overflow event to a notification group */
--
2.32.0

2021-08-04 16:08:38

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 10/23] fsnotify: Allow events reported with an empty inode

Some file system events (i.e. FS_ERROR) might not be associated with an
inode. For these, it makes sense to associate them directly with the
super block of the file system they apply to. This patch allows the
event to be reported directly against the super block instead of an
inode.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/kernfs/file.c | 6 +++---
fs/notify/fsnotify.c | 14 +++++++++-----
include/linux/fsnotify.h | 8 +++++---
include/linux/fsnotify_backend.h | 10 ++++++----
4 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index c75719312147..eaa0de3a6520 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -883,9 +883,9 @@ static void kernfs_notify_workfn(struct work_struct *work)
if (parent) {
p_inode = ilookup(info->sb, kernfs_ino(parent));
if (p_inode) {
- fsnotify(FS_MODIFY | FS_EVENT_ON_CHILD,
- inode, FSNOTIFY_EVENT_INODE,
- p_inode, &name, inode, 0);
+ fsnotify(FS_MODIFY | FS_EVENT_ON_CHILD, inode,
+ FSNOTIFY_EVENT_INODE, NULL, p_inode,
+ &name, inode, 0);
iput(p_inode);
}

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 30d422b8c0fc..36045d5f9d41 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -229,7 +229,8 @@ int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
}

notify:
- ret = fsnotify(mask, data, data_type, p_inode, file_name, inode, 0);
+ ret = fsnotify(mask, data, data_type, NULL, p_inode, file_name,
+ inode, 0);

if (file_name)
release_dentry_name_snapshot(&name);
@@ -459,12 +460,13 @@ static void fsnotify_iter_next(struct fsnotify_iter_info *iter_info)
* if both are non-NULL event may be reported to both.
* @cookie: inotify rename cookie
*/
-int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
- const struct qstr *file_name, struct inode *inode, u32 cookie)
+int fsnotify(__u32 mask, const void *data, int data_type,
+ struct super_block *sb, struct inode *dir,
+ const struct qstr *file_name, struct inode *inode,
+ u32 cookie)
{
const struct path *path = fsnotify_data_path(data, data_type);
struct fsnotify_iter_info iter_info = {};
- struct super_block *sb;
struct mount *mnt = NULL;
struct inode *parent = NULL;
int ret = 0;
@@ -483,7 +485,9 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
*/
parent = dir;
}
- sb = inode->i_sb;
+
+ if (!sb)
+ sb = inode->i_sb;

/*
* Optimization: srcu_read_lock() has a memory barrier which can
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index f8acddcf54fb..39c9dbc46d36 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -30,7 +30,8 @@ static inline void fsnotify_name(struct inode *dir, __u32 mask,
struct inode *child,
const struct qstr *name, u32 cookie)
{
- fsnotify(mask, child, FSNOTIFY_EVENT_INODE, dir, name, NULL, cookie);
+ fsnotify(mask, child, FSNOTIFY_EVENT_INODE, NULL, dir, name, NULL,
+ cookie);
}

static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry,
@@ -44,7 +45,8 @@ static inline void fsnotify_inode(struct inode *inode, __u32 mask)
if (S_ISDIR(inode->i_mode))
mask |= FS_ISDIR;

- fsnotify(mask, inode, FSNOTIFY_EVENT_INODE, NULL, NULL, inode, 0);
+ fsnotify(mask, inode, FSNOTIFY_EVENT_INODE, NULL, NULL, NULL, inode,
+ 0);
}

/* Notify this dentry's parent about a child's events. */
@@ -68,7 +70,7 @@ static inline int fsnotify_parent(struct dentry *dentry, __u32 mask,
return __fsnotify_parent(dentry, mask, data, data_type);

notify_child:
- return fsnotify(mask, data, data_type, NULL, NULL, inode, 0);
+ return fsnotify(mask, data, data_type, NULL, NULL, NULL, inode, 0);
}

/*
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 693fe4cb48cf..4a765edd3b2a 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -423,8 +423,9 @@ struct fsnotify_mark {

/* main fsnotify call to send events */
extern int fsnotify(__u32 mask, const void *data, int data_type,
- struct inode *dir, const struct qstr *name,
- struct inode *inode, u32 cookie);
+ struct super_block *sb, struct inode *dir,
+ const struct qstr *name, struct inode *inode,
+ u32 cookie);
extern int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
int data_type);
extern void __fsnotify_inode_delete(struct inode *inode);
@@ -618,8 +619,9 @@ static inline void fsnotify_init_event(struct fsnotify_event *event)
#else

static inline int fsnotify(__u32 mask, const void *data, int data_type,
- struct inode *dir, const struct qstr *name,
- struct inode *inode, u32 cookie)
+ struct super_block *sb, struct inode *dir,
+ const struct qstr *name, struct inode *inode,
+ u32 cookie)
{
return 0;
}
--
2.32.0

2021-08-04 16:08:47

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 13/23] fanotify: Allow file handle encoding for unhashed events

FAN_FS_ERROR will report a file handle, but it is a unhashed event.n
Allow passing a NULL hash to fanotify_encode_fh and avoid calculating
the hash if not needed.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/notify/fanotify/fanotify.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index a015822e29d8..0d6ba218bc01 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -385,8 +385,12 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
fh->type = type;
fh->len = fh_len;

- /* Mix fh into event merge key */
- *hash ^= fanotify_hash_fh(fh);
+ /*
+ * Mix fh into event merge key. Hash might be NULL in case of
+ * unhashed FID events (i.e. FAN_FS_ERROR).
+ */
+ if (hash)
+ *hash ^= fanotify_hash_fh(fh);

return FANOTIFY_FH_HDR_LEN + fh_len;

--
2.32.0

2021-08-04 16:08:47

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 12/23] fanotify: Expose helper to estimate file handle encoding length

Superblock marks needs this when pre-allocating error events.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/notify/fanotify/fanotify.c | 18 ------------------
fs/notify/fanotify/fanotify.h | 18 ++++++++++++++++++
2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 538d114f439f..a015822e29d8 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -334,24 +334,6 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
return test_mask & user_mask;
}

-/*
- * Check size needed to encode fanotify_fh.
- *
- * Return size of encoded fh without fanotify_fh header.
- * Return 0 on failure to encode.
- */
-static int fanotify_encode_fh_len(struct inode *inode)
-{
- int dwords = 0;
-
- if (!inode)
- return 0;
-
- exportfs_encode_inode_fh(inode, NULL, &dwords, NULL);
-
- return dwords << 2;
-}
-
/*
* Encode fanotify_fh.
*
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index d4a562c2619f..8061040f0348 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -348,3 +348,21 @@ static inline unsigned int fanotify_event_hash_bucket(
{
return event->hash & FANOTIFY_HTABLE_MASK;
}
+
+/*
+ * Check size needed to encode fanotify_fh.
+ *
+ * Return size of encoded fh without fanotify_fh header.
+ * Return 0 on failure to encode.
+ */
+static inline int fanotify_encode_fh_len(struct inode *inode)
+{
+ int dwords = 0;
+
+ if (!inode)
+ return 0;
+
+ exportfs_encode_inode_fh(inode, NULL, &dwords, NULL);
+
+ return dwords << 2;
+}
--
2.32.0

2021-08-04 16:09:01

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 15/23] fanotify: Require fid_mode for any non-fd event

Like inode events, FAN_FS_ERROR will require fid mode. Therefore,
convert the verification during fanotify_mark(2) to require fid for any
non-fd event. This means fid_mode will not only be required for inode
events, but for any event that doesn't provide a descriptor.

Suggested-by: Amir Goldstein <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

---
changes since v5:
- Fix condition to include FANOTIFY_EVENT_FLAGS. (me)
---
fs/notify/fanotify/fanotify_user.c | 14 +++++++-------
include/linux/fanotify.h | 3 +++
2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 0696f2121781..6dad2a00e337 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1382,14 +1382,14 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
goto fput_and_out;

/*
- * Events with data type inode do not carry enough information to report
- * event->fd, so we do not allow setting a mask for inode events unless
- * group supports reporting fid.
- * inode events are not supported on a mount mark, because they do not
- * carry enough information (i.e. path) to be filtered by mount point.
- */
+ * Events that do not carry enough information to report
+ * event->fd require a group that supports reporting fid. Those
+ * events are not supported on a mount mark, because they do not
+ * carry enough information (i.e. path) to be filtered by mount
+ * point.
+ */
fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
- if (mask & FANOTIFY_INODE_EVENTS &&
+ if (mask & ~(FANOTIFY_FD_EVENTS|FANOTIFY_EVENT_FLAGS) &&
(!fid_mode || mark_type == FAN_MARK_MOUNT))
goto fput_and_out;

diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index a16dbeced152..c05d45bde8b8 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -81,6 +81,9 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */
*/
#define FANOTIFY_DIRENT_EVENTS (FAN_MOVE | FAN_CREATE | FAN_DELETE)

+/* Events that can be reported with event->fd */
+#define FANOTIFY_FD_EVENTS (FANOTIFY_PATH_EVENTS | FANOTIFY_PERM_EVENTS)
+
/* Events that can only be reported with data type FSNOTIFY_EVENT_INODE */
#define FANOTIFY_INODE_EVENTS (FANOTIFY_DIRENT_EVENTS | \
FAN_ATTRIB | FAN_MOVE_SELF | FAN_DELETE_SELF)
--
2.32.0

2021-08-04 16:09:09

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 14/23] fanotify: Encode invalid file handler when no inode is provided

Instead of failing, encode an invalid file handler in fanotify_encode_fh
if no inode is provided. This bogus file handler will be reported by
FAN_FS_ERROR for non-inode errors.

Also adjust the single caller that might rely on failure after passing
an empty inode.

Suggested-by: Amir Goldstein <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/notify/fanotify/fanotify.c | 39 ++++++++++++++++++++---------------
fs/notify/fanotify/fanotify.h | 6 ++++--
2 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 0d6ba218bc01..456c60107d88 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -349,12 +349,6 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
void *buf = fh->buf;
int err;

- fh->type = FILEID_ROOT;
- fh->len = 0;
- fh->flags = 0;
- if (!inode)
- return 0;
-
/*
* !gpf means preallocated variable size fh, but fh_len could
* be zero in that case if encoding fh len failed.
@@ -363,8 +357,9 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
if (fh_len < 4 || WARN_ON_ONCE(fh_len % 4))
goto out_err;

- /* No external buffer in a variable size allocated fh */
- if (gfp && fh_len > FANOTIFY_INLINE_FH_LEN) {
+ fh->flags = 0;
+ /* No external buffer in a variable size allocated fh or null fh */
+ if (inode && gfp && fh_len > FANOTIFY_INLINE_FH_LEN) {
/* Treat failure to allocate fh as failure to encode fh */
err = -ENOMEM;
ext_buf = kmalloc(fh_len, gfp);
@@ -376,14 +371,24 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
fh->flags |= FANOTIFY_FH_FLAG_EXT_BUF;
}

- dwords = fh_len >> 2;
- type = exportfs_encode_inode_fh(inode, buf, &dwords, NULL);
- err = -EINVAL;
- if (!type || type == FILEID_INVALID || fh_len != dwords << 2)
- goto out_err;
-
- fh->type = type;
- fh->len = fh_len;
+ if (inode) {
+ dwords = fh_len >> 2;
+ type = exportfs_encode_inode_fh(inode, buf, &dwords, NULL);
+ err = -EINVAL;
+ if (!type || type == FILEID_INVALID || fh_len != dwords << 2)
+ goto out_err;
+ fh->type = type;
+ fh->len = fh_len;
+ } else {
+ /*
+ * Invalid FHs are used on FAN_FS_ERROR for errors not
+ * linked to any inode. Caller needs to guarantee the fh
+ * has at least FANOTIFY_NULL_FH_LEN bytes of space.
+ */
+ fh->type = FILEID_INVALID;
+ fh->len = FANOTIFY_NULL_FH_LEN;
+ memset(buf, 0, FANOTIFY_NULL_FH_LEN);
+ }

/*
* Mix fh into event merge key. Hash might be NULL in case of
@@ -511,7 +516,7 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *id,
struct fanotify_info *info;
struct fanotify_fh *dfh, *ffh;
unsigned int dir_fh_len = fanotify_encode_fh_len(id);
- unsigned int child_fh_len = fanotify_encode_fh_len(child);
+ unsigned int child_fh_len = child ? fanotify_encode_fh_len(child) : 0;
unsigned int size;

size = sizeof(*fne) + FANOTIFY_FH_HDR_LEN + dir_fh_len;
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 8061040f0348..aa555975c0f8 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -349,18 +349,20 @@ static inline unsigned int fanotify_event_hash_bucket(
return event->hash & FANOTIFY_HTABLE_MASK;
}

+#define FANOTIFY_NULL_FH_LEN 8
+
/*
* Check size needed to encode fanotify_fh.
*
* Return size of encoded fh without fanotify_fh header.
- * Return 0 on failure to encode.
+ * For a NULL inode, return the size of a NULL FH.
*/
static inline int fanotify_encode_fh_len(struct inode *inode)
{
int dwords = 0;

if (!inode)
- return 0;
+ return FANOTIFY_NULL_FH_LEN;

exportfs_encode_inode_fh(inode, NULL, &dwords, NULL);

--
2.32.0

2021-08-04 16:09:12

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 11/23] fsnotify: Support FS_ERROR event type

Expose a new type of fsnotify event for filesystems to report errors for
userspace monitoring tools. fanotify will send this type of
notification for FAN_FS_ERROR events. This also introduce a helper for
generating the new event.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

---
Changes since v3:
- Squash patch ("fsnotify: Introduce helpers to send error_events")
- Drop reviewed-bys!

Changes since v2:
- FAN_ERROR->FAN_FS_ERROR (Amir)

Changes since v1:
- Overload FS_ERROR with FS_IN_IGNORED
- Implement support for this type on fsnotify_data_inode (Amir)
---
include/linux/fsnotify.h | 12 ++++++++++++
include/linux/fsnotify_backend.h | 17 ++++++++++++++++-
2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 39c9dbc46d36..d3f0f5ed8e9e 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -319,4 +319,16 @@ static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
fsnotify_dentry(dentry, mask);
}

+static inline int fsnotify_sb_error(struct super_block *sb, struct inode *inode,
+ int error)
+{
+ struct fs_error_report report = {
+ .error = error,
+ .inode = inode,
+ };
+
+ return fsnotify(FS_ERROR, &report, FSNOTIFY_EVENT_ERROR, sb,
+ NULL, NULL, NULL, 0);
+}
+
#endif /* _LINUX_FS_NOTIFY_H */
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 4a765edd3b2a..9010e3505f9c 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -42,6 +42,12 @@

#define FS_UNMOUNT 0x00002000 /* inode on umount fs */
#define FS_Q_OVERFLOW 0x00004000 /* Event queued overflowed */
+#define FS_ERROR 0x00008000 /* Filesystem Error (fanotify) */
+
+/*
+ * FS_IN_IGNORED overloads FS_ERROR. It is only used internally by inotify
+ * which does not support FS_ERROR.
+ */
#define FS_IN_IGNORED 0x00008000 /* last inotify event here */

#define FS_OPEN_PERM 0x00010000 /* open event in an permission hook */
@@ -95,7 +101,8 @@
#define ALL_FSNOTIFY_EVENTS (ALL_FSNOTIFY_DIRENT_EVENTS | \
FS_EVENTS_POSS_ON_CHILD | \
FS_DELETE_SELF | FS_MOVE_SELF | FS_DN_RENAME | \
- FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED)
+ FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED | \
+ FS_ERROR)

/* Extra flags that may be reported with event or control handling of events */
#define ALL_FSNOTIFY_FLAGS (FS_EXCL_UNLINK | FS_ISDIR | FS_IN_ONESHOT | \
@@ -248,6 +255,12 @@ enum fsnotify_data_type {
FSNOTIFY_EVENT_NONE,
FSNOTIFY_EVENT_PATH,
FSNOTIFY_EVENT_INODE,
+ FSNOTIFY_EVENT_ERROR,
+};
+
+struct fs_error_report {
+ int error;
+ struct inode *inode;
};

static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
@@ -257,6 +270,8 @@ static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
return (struct inode *)data;
case FSNOTIFY_EVENT_PATH:
return d_inode(((const struct path *)data)->dentry);
+ case FSNOTIFY_EVENT_ERROR:
+ return ((struct fs_error_report *)data)->inode;
default:
return NULL;
}
--
2.32.0

2021-08-04 16:09:54

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 16/23] fanotify: Reserve UAPI bits for FAN_FS_ERROR

FAN_FS_ERROR allows reporting of event type FS_ERROR to userspace, which
a mechanism to report file system wide problems via fanotify. This
commit preallocate userspace visible bits to match the FS_ERROR event.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/notify/fanotify/fanotify.c | 1 +
include/uapi/linux/fanotify.h | 1 +
2 files changed, 2 insertions(+)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 456c60107d88..ca74199f11e2 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -733,6 +733,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
BUILD_BUG_ON(FAN_ONDIR != FS_ISDIR);
BUILD_BUG_ON(FAN_OPEN_EXEC != FS_OPEN_EXEC);
BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM);
+ BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR);

BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19);

diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index fbf9c5c7dd59..16402037fc7a 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -20,6 +20,7 @@
#define FAN_OPEN_EXEC 0x00001000 /* File was opened for exec */

#define FAN_Q_OVERFLOW 0x00004000 /* Event queued overflowed */
+#define FAN_FS_ERROR 0x00008000 /* Filesystem error */

#define FAN_OPEN_PERM 0x00010000 /* File open in perm check */
#define FAN_ACCESS_PERM 0x00020000 /* File accessed in perm check */
--
2.32.0

2021-08-04 16:11:15

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 17/23] fanotify: Preallocate per superblock mark error event

Error reporting needs to be done in an atomic context. This patch
introduces a single error slot for superblock marks that report the
FAN_FS_ERROR event, to be used during event submission.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/notify/fanotify/fanotify.c | 22 +++++++++++++++++
fs/notify/fanotify/fanotify.h | 13 ++++++++++
fs/notify/fanotify/fanotify_user.c | 39 +++++++++++++++++++++++++++++-
3 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index ca74199f11e2..0678d35432a7 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -828,6 +828,24 @@ static void fanotify_free_name_event(struct fanotify_event *event)
kfree(FANOTIFY_NE(event));
}

+static void fanotify_free_error_event(struct fanotify_event *event)
+{
+ struct fanotify_error_event *fee;
+
+ if (!event)
+ return;
+
+ fee = FANOTIFY_EE(event);
+ /*
+ * If this is an active error event, disassociate it from the
+ * mark prior to removal.
+ */
+ if (fee->sb_mark->fee_slot == fee)
+ fee->sb_mark->fee_slot = NULL;
+
+ kfree(fee);
+}
+
static void fanotify_free_event(struct fsnotify_event *fsn_event)
{
struct fanotify_event *event;
@@ -850,6 +868,9 @@ static void fanotify_free_event(struct fsnotify_event *fsn_event)
case FANOTIFY_EVENT_TYPE_OVERFLOW:
kfree(event);
break;
+ case FANOTIFY_EVENT_TYPE_FS_ERROR:
+ fanotify_free_error_event(event);
+ break;
default:
WARN_ON_ONCE(1);
}
@@ -867,6 +888,7 @@ static void fanotify_free_mark(struct fsnotify_mark *mark)
if (mark->flags & FANOTIFY_MARK_FLAG_SB_MARK) {
struct fanotify_sb_mark *fa_mark = FANOTIFY_SB_MARK(mark);

+ fanotify_free_error_event(&fa_mark->fee_slot->fae);
kmem_cache_free(fanotify_sb_mark_cache, fa_mark);
} else {
kmem_cache_free(fanotify_mark_cache, mark);
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index aa555975c0f8..206dc6cfd671 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -142,6 +142,7 @@ FANOTIFY_MARK_FLAG(SB_MARK);

struct fanotify_sb_mark {
struct fsnotify_mark fsn_mark;
+ struct fanotify_error_event *fee_slot;
};

static inline
@@ -164,6 +165,7 @@ enum fanotify_event_type {
FANOTIFY_EVENT_TYPE_PATH,
FANOTIFY_EVENT_TYPE_PATH_PERM,
FANOTIFY_EVENT_TYPE_OVERFLOW, /* struct fanotify_event */
+ FANOTIFY_EVENT_TYPE_FS_ERROR, /* struct fanotify_error_event */
__FANOTIFY_EVENT_TYPE_NUM
};

@@ -219,6 +221,17 @@ FANOTIFY_NE(struct fanotify_event *event)
return container_of(event, struct fanotify_name_event, fae);
}

+struct fanotify_error_event {
+ struct fanotify_event fae;
+ struct fanotify_sb_mark *sb_mark; /* Back reference to the mark. */
+};
+
+static inline struct fanotify_error_event *
+FANOTIFY_EE(struct fanotify_event *event)
+{
+ return container_of(event, struct fanotify_error_event, fae);
+}
+
static inline __kernel_fsid_t *fanotify_event_fsid(struct fanotify_event *event)
{
if (event->type == FANOTIFY_EVENT_TYPE_FID)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 6dad2a00e337..76c1c805af3d 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -167,6 +167,22 @@ static void fanotify_unhash_event(struct fsnotify_group *group,
hlist_del_init(&event->merge_list);
}

+static struct fanotify_error_event *fanotify_alloc_error_event(
+ struct fanotify_sb_mark *sb_mark)
+
+{
+ struct fanotify_error_event *fee;
+
+ fee = kzalloc(sizeof(*fee), GFP_KERNEL_ACCOUNT);
+ if (!fee)
+ return NULL;
+
+ fanotify_init_event(&fee->fae, 0, FS_ERROR);
+ fee->sb_mark = sb_mark;
+
+ return fee;
+}
+
/*
* Get an fanotify notification event if one exists and is small
* enough to fit in "count". Return an error pointer if the count
@@ -994,6 +1010,7 @@ static int fanotify_add_mark(struct fsnotify_group *group,
{
struct fsnotify_mark *fsn_mark;
__u32 added;
+ int ret = 0;

mutex_lock(&group->mark_mutex);
fsn_mark = fsnotify_find_mark(connp, group);
@@ -1004,13 +1021,33 @@ static int fanotify_add_mark(struct fsnotify_group *group,
return PTR_ERR(fsn_mark);
}
}
+
+ /*
+ * Error events are allocated per super-block mark only if
+ * strictly needed (i.e. FAN_FS_ERROR was requested).
+ */
+ if (type == FSNOTIFY_OBJ_TYPE_SB && !(flags & FAN_MARK_IGNORED_MASK) &&
+ (mask & FAN_FS_ERROR)) {
+ struct fanotify_sb_mark *sb_mark = FANOTIFY_SB_MARK(fsn_mark);
+
+ if (!sb_mark->fee_slot) {
+ sb_mark->fee_slot = fanotify_alloc_error_event(sb_mark);
+ if (!sb_mark->fee_slot) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ }
+ }
+
added = fanotify_mark_add_to_mask(fsn_mark, mask, flags);
if (added & ~fsnotify_conn_mask(fsn_mark->connector))
fsnotify_recalc_mask(fsn_mark->connector);
+
+out:
mutex_unlock(&group->mark_mutex);

fsnotify_put_mark(fsn_mark);
- return 0;
+ return ret;
}

static int fanotify_add_vfsmount_mark(struct fsnotify_group *group,
--
2.32.0

2021-08-04 16:12:43

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 20/23] fanotify: Emit generic error info type for error event

The Error info type is a record sent to users on FAN_FS_ERROR events
documenting the type of error. It also carries an error count,
documenting how many errors were observed since the last reporting.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/notify/fanotify/fanotify_user.c | 35 ++++++++++++++++++++++++++++++
include/uapi/linux/fanotify.h | 7 ++++++
2 files changed, 42 insertions(+)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 74def82630bb..ea22586c39e6 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -107,6 +107,8 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
#define FANOTIFY_EVENT_ALIGN 4
#define FANOTIFY_INFO_HDR_LEN \
(sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle))
+#define FANOTIFY_INFO_ERROR_LEN \
+ (sizeof(struct fanotify_event_info_error))

static int fanotify_fid_info_len(int fh_len, int name_len)
{
@@ -130,6 +132,9 @@ static size_t fanotify_event_len(struct fanotify_event *event,
if (!fid_mode)
return event_len;

+ if (fanotify_is_error_event(event->mask))
+ event_len += FANOTIFY_INFO_ERROR_LEN;
+
info = fanotify_event_info(event);
dir_fh_len = fanotify_event_dir_fh_len(event);
fh_len = fanotify_event_object_fh_len(event);
@@ -395,6 +400,28 @@ static int process_access_response(struct fsnotify_group *group,
return -ENOENT;
}

+static size_t copy_error_info_to_user(struct fanotify_event *event,
+ char __user *buf, int count)
+{
+ struct fanotify_event_info_error info;
+ struct fanotify_error_event *fee = FANOTIFY_EE(event);
+
+ info.hdr.info_type = FAN_EVENT_INFO_TYPE_ERROR;
+ info.hdr.pad = 0;
+ info.hdr.len = FANOTIFY_INFO_ERROR_LEN;
+
+ if (WARN_ON(count < info.hdr.len))
+ return -EFAULT;
+
+ info.error = fee->error;
+ info.error_count = fee->err_count;
+
+ if (copy_to_user(buf, &info, sizeof(info)))
+ return -EFAULT;
+
+ return info.hdr.len;
+}
+
static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh,
int info_type, const char *name, size_t name_len,
char __user *buf, size_t count)
@@ -553,6 +580,14 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
if (f)
fd_install(fd, f);

+ if (fanotify_is_error_event(event->mask)) {
+ ret = copy_error_info_to_user(event, buf, count);
+ if (ret < 0)
+ goto out_close_fd;
+ buf += ret;
+ count -= ret;
+ }
+
/* Event info records order is: dir fid + name, child fid */
if (fanotify_event_dir_fh_len(event)) {
info_type = info->name_len ? FAN_EVENT_INFO_TYPE_DFID_NAME :
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index 16402037fc7a..80040a92e9d9 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -124,6 +124,7 @@ struct fanotify_event_metadata {
#define FAN_EVENT_INFO_TYPE_FID 1
#define FAN_EVENT_INFO_TYPE_DFID_NAME 2
#define FAN_EVENT_INFO_TYPE_DFID 3
+#define FAN_EVENT_INFO_TYPE_ERROR 4

/* Variable length info record following event metadata */
struct fanotify_event_info_header {
@@ -149,6 +150,12 @@ struct fanotify_event_info_fid {
unsigned char handle[0];
};

+struct fanotify_event_info_error {
+ struct fanotify_event_info_header hdr;
+ __s32 error;
+ __u32 error_count;
+};
+
struct fanotify_response {
__s32 fd;
__u32 response;
--
2.32.0

2021-08-04 16:12:43

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 19/23] fanotify: Report fid info for file related file system errors

Plumb the pieces to add a FID report to error records. Since all error
event memory must be pre-allocated, we estimate a file handler size and
if it is insuficient, we report an invalid FID and increase the
prediction for the next error slot allocation.

For errors that don't expose a file handler report it with an invalid
FID.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/notify/fanotify/fanotify.c | 27 +++++++++++++++++++++++++++
fs/notify/fanotify/fanotify.h | 12 ++++++++++++
fs/notify/fanotify/fanotify_user.c | 19 ++++++++++++++++++-
3 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 4e9e271a4394..2fd30b5eb9d3 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -705,7 +705,9 @@ static void fanotify_insert_error_event(struct fsnotify_group *group,
{
const struct fs_error_report *report = (struct fs_error_report *) data;
struct fanotify_event *fae = FANOTIFY_E(event);
+ struct inode *inode = report->inode;
struct fanotify_error_event *fee;
+ int fh_len;

/* This might be an unexpected type of event (i.e. overflow). */
if (!fanotify_is_error_event(fae->mask))
@@ -715,6 +717,31 @@ static void fanotify_insert_error_event(struct fsnotify_group *group,
fee->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR;
fee->error = report->error;
fee->err_count = 1;
+ fee->fsid = fee->sb_mark->fsn_mark.connector->fsid;
+
+ /*
+ * Error reporting needs to happen in atomic context. If this
+ * inode's file handler length is more than we initially
+ * predicted, there is nothing better we can do than report the
+ * error with a bad file handler.
+ */
+ fh_len = fanotify_encode_fh_len(inode);
+ if (fh_len > fee->sb_mark->pred_fh_len) {
+ pr_warn_ratelimited(
+ "FH overflows error event. Drop inode information.\n");
+ /*
+ * Update the handler size prediction for the next error
+ * event allocation. This reduces the chance of another
+ * overflow.
+ */
+ fee->sb_mark->pred_fh_len = fh_len;
+
+ /* For the current error, ignore the inode information. */
+ inode = NULL;
+ fh_len = fanotify_encode_fh_len(NULL);
+ }
+
+ fanotify_encode_fh(&fee->object_fh, inode, fh_len, NULL, 0);
}

/*
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 8929ea50f96f..e4e7968b70ee 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -142,6 +142,7 @@ FANOTIFY_MARK_FLAG(SB_MARK);

struct fanotify_sb_mark {
struct fsnotify_mark fsn_mark;
+ size_t pred_fh_len;
struct fanotify_error_event *fee_slot;
};

@@ -227,6 +228,13 @@ struct fanotify_error_event {
u32 err_count; /* Suppressed errors count */

struct fanotify_sb_mark *sb_mark; /* Back reference to the mark. */
+
+ __kernel_fsid_t fsid; /* FSID this error refers to. */
+ /*
+ * object_fh is followed by a variable sized buffer, so it must
+ * be the last element of this structure.
+ */
+ struct fanotify_fh object_fh;
};

static inline struct fanotify_error_event *
@@ -241,6 +249,8 @@ static inline __kernel_fsid_t *fanotify_event_fsid(struct fanotify_event *event)
return &FANOTIFY_FE(event)->fsid;
else if (event->type == FANOTIFY_EVENT_TYPE_FID_NAME)
return &FANOTIFY_NE(event)->fsid;
+ else if (event->type == FANOTIFY_EVENT_TYPE_FS_ERROR)
+ return &FANOTIFY_EE(event)->fsid;
else
return NULL;
}
@@ -252,6 +262,8 @@ static inline struct fanotify_fh *fanotify_event_object_fh(
return &FANOTIFY_FE(event)->object_fh;
else if (event->type == FANOTIFY_EVENT_TYPE_FID_NAME)
return fanotify_info_file_fh(&FANOTIFY_NE(event)->info);
+ else if (event->type == FANOTIFY_EVENT_TYPE_FS_ERROR)
+ return &FANOTIFY_EE(event)->object_fh;
else
return NULL;
}
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index e7fe6bc61b6f..74def82630bb 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -172,8 +172,25 @@ static struct fanotify_error_event *fanotify_alloc_error_event(

{
struct fanotify_error_event *fee;
+ struct super_block *sb;

- fee = kzalloc(sizeof(*fee), GFP_KERNEL_ACCOUNT);
+ if (!sb_mark->pred_fh_len) {
+ /*
+ * The handler size is initially predicted to be the
+ * same size as the root inode file handler. It can be
+ * increased later if a larger file handler is found.
+ */
+ sb = container_of(sb_mark->fsn_mark.connector->obj,
+ struct super_block, s_fsnotify_marks);
+ sb_mark->pred_fh_len =
+ fanotify_encode_fh_len(sb->s_root->d_inode);
+ }
+
+ /* Guarantee there is always space to report an invalid FID. */
+ if (sb_mark->pred_fh_len < FANOTIFY_NULL_FH_LEN)
+ sb_mark->pred_fh_len = FANOTIFY_NULL_FH_LEN;
+
+ fee = kzalloc(sizeof(*fee) + sb_mark->pred_fh_len, GFP_KERNEL_ACCOUNT);
if (!fee)
return NULL;

--
2.32.0

2021-08-04 16:12:43

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 18/23] fanotify: Handle FAN_FS_ERROR events

Wire up FAN_FS_ERROR in the fanotify_mark syscall. The event can only
be requested for the entire filesystem, thus it requires the
FAN_MARK_FILESYSTEM.

FAN_FS_ERROR has to be handled slightly differently from other events
because it needs to be submitted in an atomic context, using
preallocated memory. This patch implements the submission path by only
storing the first error event that happened in the slot (userspace
resets the slot by reading the event).

Extra error events happening when the slot is occupied are merged to the
original report, and the only information keep for these extra errors is
an accumulator counting the number of events, which is part of the
record reported back to userspace.

Reporting only the first event should be fine, since when a FS error
happens, a cascade of error usually follows, but the most meaningful
information is (usually) on the first erro.

The event dequeueing is also a bit special to avoid losing events. Since
event merging only happens while the event is queued, there is a window
between when an error event is dequeued (notification_lock is dropped)
until it is reset (.free_event()) where the slot is full, but no merges
can happen.

The proposed solution is to replace the event in the slot with a new
structure, prior to dropping the lock. This way, if a new event arrives
in the time between the event was dequeued and the time it resets, the
new errors will still be logged and merged in the new slot.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

---
Changes since v4:
- Split parts to earlier patches (amir)
- Simplify fanotify entry replacement
- Update handle size prediction on overflow
Changes since v3:
- Convert WARN_ON to pr_warn (amir)
- Remove unecessary READ/WRITE_ONCE (amir)
- Alloc with GFP_KERNEL_ACCOUNT(amir)
- Simplify flags on mark allocation (amir)
- Avoid atomic set of error_count (amir)
- Simplify rules when merging error_event (amir)
- Allocate new error_event on get_one_event (amir)
- Report superblock error with invalid FH (amir,jan)

Changes since v2:
- Support and equire FID mode (amir)
- Goto error path instead of early return (amir)
- Simplify get_one_event (me)
- Base merging on error_count
- drop fanotify_queue_error_event

Changes since v1:
- Pass dentry to fanotify_check_fsid (Amir)
- FANOTIFY_EVENT_TYPE_ERROR -> FANOTIFY_EVENT_TYPE_FS_ERROR
- Merge previous patch into it
- Use a single slot
- Move fanotify_mark.error_event definition to this commit
- Rename FAN_ERROR -> FAN_FS_ERROR
- Restrict FAN_FS_ERROR to FAN_MARK_FILESYSTEM
---
fs/notify/fanotify/fanotify.c | 50 +++++++++++++++++++++++-
fs/notify/fanotify/fanotify.h | 9 +++++
fs/notify/fanotify/fanotify_user.c | 63 +++++++++++++++++++++++++++++-
include/linux/fanotify.h | 6 ++-
4 files changed, 124 insertions(+), 4 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 0678d35432a7..4e9e271a4394 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -681,6 +681,42 @@ static __kernel_fsid_t fanotify_get_fsid(struct fsnotify_iter_info *iter_info)
return fsid;
}

+static int fanotify_merge_error_event(struct fsnotify_group *group,
+ struct fsnotify_event *event)
+{
+ struct fanotify_event *fae = FANOTIFY_E(event);
+ struct fanotify_error_event *fee = FANOTIFY_EE(fae);
+
+ /*
+ * When err_count > 0, the reporting slot is full. Just account
+ * the additional error and abort the insertion.
+ */
+ if (fee->err_count) {
+ fee->err_count++;
+ return 1;
+ }
+
+ return 0;
+}
+
+static void fanotify_insert_error_event(struct fsnotify_group *group,
+ struct fsnotify_event *event,
+ const void *data)
+{
+ const struct fs_error_report *report = (struct fs_error_report *) data;
+ struct fanotify_event *fae = FANOTIFY_E(event);
+ struct fanotify_error_event *fee;
+
+ /* This might be an unexpected type of event (i.e. overflow). */
+ if (!fanotify_is_error_event(fae->mask))
+ return;
+
+ fee = FANOTIFY_EE(fae);
+ fee->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR;
+ fee->error = report->error;
+ fee->err_count = 1;
+}
+
/*
* Add an event to hash table for faster merge.
*/
@@ -735,7 +771,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM);
BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR);

- BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19);
+ BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20);

mask = fanotify_group_event_mask(group, iter_info, mask, data,
data_type, dir);
@@ -760,6 +796,18 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
return 0;
}

+ if (fanotify_is_error_event(mask)) {
+ struct fanotify_sb_mark *sb_mark =
+ FANOTIFY_SB_MARK(fsnotify_iter_sb_mark(iter_info));
+
+ ret = fsnotify_insert_event(group,
+ &sb_mark->fee_slot->fae.fse,
+ fanotify_merge_error_event,
+ fanotify_insert_error_event,
+ data);
+ goto finish;
+ }
+
event = fanotify_alloc_event(group, mask, data, data_type, dir,
file_name, &fsid);
ret = -ENOMEM;
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 206dc6cfd671..8929ea50f96f 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -223,6 +223,9 @@ FANOTIFY_NE(struct fanotify_event *event)

struct fanotify_error_event {
struct fanotify_event fae;
+ s32 error; /* Error reported by the Filesystem. */
+ u32 err_count; /* Suppressed errors count */
+
struct fanotify_sb_mark *sb_mark; /* Back reference to the mark. */
};

@@ -323,6 +326,11 @@ static inline struct fanotify_event *FANOTIFY_E(struct fsnotify_event *fse)
return container_of(fse, struct fanotify_event, fse);
}

+static inline bool fanotify_is_error_event(u32 mask)
+{
+ return mask & FAN_FS_ERROR;
+}
+
static inline bool fanotify_event_has_path(struct fanotify_event *event)
{
return event->type == FANOTIFY_EVENT_TYPE_PATH ||
@@ -352,6 +360,7 @@ static inline struct path *fanotify_event_path(struct fanotify_event *event)
static inline bool fanotify_is_hashed_event(u32 mask)
{
return !(fanotify_is_perm_event(mask) ||
+ fanotify_is_error_event(mask) ||
fsnotify_is_overflow_event(mask));
}

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 76c1c805af3d..e7fe6bc61b6f 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -183,6 +183,45 @@ static struct fanotify_error_event *fanotify_alloc_error_event(
return fee;
}

+/*
+ * Replace a mark's error event with a new structure in preparation for
+ * it to be dequeued. This is a bit annoying since we need to drop the
+ * lock, so another thread might just steal the event from us.
+ */
+static int fanotify_replace_fs_error_event(struct fsnotify_group *group,
+ struct fanotify_event *fae)
+{
+ struct fanotify_error_event *new, *fee = FANOTIFY_EE(fae);
+ struct fanotify_sb_mark *sb_mark = fee->sb_mark;
+ struct fsnotify_event *fse;
+
+ pr_debug("%s: event=%p\n", __func__, fae);
+
+ assert_spin_locked(&group->notification_lock);
+
+ spin_unlock(&group->notification_lock);
+ new = fanotify_alloc_error_event(sb_mark);
+ spin_lock(&group->notification_lock);
+
+ if (!new)
+ return -ENOMEM;
+
+ /*
+ * Since we temporarily dropped the notification_lock, the event
+ * might have been taken from under us and reported by another
+ * reader. If that is the case, don't play games, just retry.
+ */
+ fse = fsnotify_peek_first_event(group);
+ if (fse != &fae->fse) {
+ kfree(new);
+ return -EAGAIN;
+ }
+
+ sb_mark->fee_slot = new;
+
+ return 0;
+}
+
/*
* Get an fanotify notification event if one exists and is small
* enough to fit in "count". Return an error pointer if the count
@@ -196,6 +235,7 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group,
struct fanotify_event *event = NULL;
struct fsnotify_event *fsn_event;
unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
+ int ret;

pr_debug("%s: group=%p count=%zd\n", __func__, group, count);

@@ -212,9 +252,21 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group,
goto out;
}

+ if (fanotify_is_error_event(event->mask)) {
+ /*
+ * Replace the error event ahead of dequeueing so we
+ * don't need to handle a incorrectly dequeued event.
+ */
+ ret = fanotify_replace_fs_error_event(group, event);
+ if (ret) {
+ event = ERR_PTR(ret);
+ goto out;
+ }
+ }
+
/*
- * Held the notification_lock the whole time, so this is the
- * same event we peeked above.
+ * Even though we might have temporarily dropped the lock, this
+ * is guaranteed to be the same event we peeked above.
*/
fsnotify_remove_first_event(group);
if (fanotify_is_perm_event(event->mask))
@@ -596,6 +648,8 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
event = get_one_event(group, count);
if (IS_ERR(event)) {
ret = PTR_ERR(event);
+ if (ret == -EAGAIN)
+ continue;
break;
}

@@ -1464,6 +1518,11 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
fsid = &__fsid;
}

+ if (mask & FAN_FS_ERROR && mark_type != FAN_MARK_FILESYSTEM) {
+ ret = -EINVAL;
+ goto path_put_and_out;
+ }
+
/* inode held in place by reference to path; group by fget on fd */
if (mark_type == FAN_MARK_INODE)
inode = path.dentry->d_inode;
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index c05d45bde8b8..c4d49308b2d0 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -88,9 +88,13 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */
#define FANOTIFY_INODE_EVENTS (FANOTIFY_DIRENT_EVENTS | \
FAN_ATTRIB | FAN_MOVE_SELF | FAN_DELETE_SELF)

+/* Events that can only be reported with data type FSNOTIFY_EVENT_ERROR */
+#define FANOTIFY_ERROR_EVENTS (FAN_FS_ERROR)
+
/* Events that user can request to be notified on */
#define FANOTIFY_EVENTS (FANOTIFY_PATH_EVENTS | \
- FANOTIFY_INODE_EVENTS)
+ FANOTIFY_INODE_EVENTS | \
+ FANOTIFY_ERROR_EVENTS)

/* Events that require a permission response from user */
#define FANOTIFY_PERM_EVENTS (FAN_OPEN_PERM | FAN_ACCESS_PERM | \
--
2.32.0

2021-08-04 16:12:52

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 21/23] ext4: Send notifications on error

Send a FS_ERROR message via fsnotify to a userspace monitoring tool
whenever a ext4 error condition is triggered. This follows the existing
error conditions in ext4, so it is hooked to the ext4_error* functions.

It also follows the current dmesg reporting in the format. The
filesystem message is composed mostly by the string that would be
otherwise printed in dmesg.

A new ext4 specific record format is exposed in the uapi, such that a
monitoring tool knows what to expect when listening errors of an ext4
filesystem.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
Reviewed-by: Amir Goldstein <[email protected]>
---
fs/ext4/super.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index dfa09a277b56..b9ecd43678d7 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -46,6 +46,7 @@
#include <linux/part_stat.h>
#include <linux/kthread.h>
#include <linux/freezer.h>
+#include <linux/fsnotify.h>

#include "ext4.h"
#include "ext4_extents.h" /* Needed for trace points definition */
@@ -762,6 +763,8 @@ void __ext4_error(struct super_block *sb, const char *function,
sb->s_id, function, line, current->comm, &vaf);
va_end(args);
}
+ fsnotify_sb_error(sb, NULL, error);
+
ext4_handle_error(sb, force_ro, error, 0, block, function, line);
}

@@ -792,6 +795,8 @@ void __ext4_error_inode(struct inode *inode, const char *function,
current->comm, &vaf);
va_end(args);
}
+ fsnotify_sb_error(inode->i_sb, inode, error);
+
ext4_handle_error(inode->i_sb, false, error, inode->i_ino, block,
function, line);
}
@@ -830,6 +835,8 @@ void __ext4_error_file(struct file *file, const char *function,
current->comm, path, &vaf);
va_end(args);
}
+ fsnotify_sb_error(inode->i_sb, inode, EFSCORRUPTED);
+
ext4_handle_error(inode->i_sb, false, EFSCORRUPTED, inode->i_ino, block,
function, line);
}
@@ -897,6 +904,7 @@ void __ext4_std_error(struct super_block *sb, const char *function,
printk(KERN_CRIT "EXT4-fs error (device %s) in %s:%d: %s\n",
sb->s_id, function, line, errstr);
}
+ fsnotify_sb_error(sb, sb->s_root->d_inode, errno);

ext4_handle_error(sb, false, -errno, 0, 0, function, line);
}
--
2.32.0

2021-08-04 16:13:04

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 22/23] samples: Add fs error monitoring example

Introduce an example of a FAN_FS_ERROR fanotify user to track filesystem
errors.

Reviewed-by: Amir Goldstein <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

---
Changes since v4:
- Protect file_handle defines with ifdef guards

Changes since v1:
- minor fixes
---
samples/Kconfig | 9 +++
samples/Makefile | 1 +
samples/fanotify/Makefile | 5 ++
samples/fanotify/fs-monitor.c | 138 ++++++++++++++++++++++++++++++++++
4 files changed, 153 insertions(+)
create mode 100644 samples/fanotify/Makefile
create mode 100644 samples/fanotify/fs-monitor.c

diff --git a/samples/Kconfig b/samples/Kconfig
index b0503ef058d3..88353b8eac0b 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -120,6 +120,15 @@ config SAMPLE_CONNECTOR
with it.
See also Documentation/driver-api/connector.rst

+config SAMPLE_FANOTIFY_ERROR
+ bool "Build fanotify error monitoring sample"
+ depends on FANOTIFY
+ help
+ When enabled, this builds an example code that uses the
+ FAN_FS_ERROR fanotify mechanism to monitor filesystem
+ errors.
+ See also Documentation/admin-guide/filesystem-monitoring.rst.
+
config SAMPLE_HIDRAW
bool "hidraw sample"
depends on CC_CAN_LINK && HEADERS_INSTALL
diff --git a/samples/Makefile b/samples/Makefile
index 087e0988ccc5..931a81847c48 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -5,6 +5,7 @@ subdir-$(CONFIG_SAMPLE_AUXDISPLAY) += auxdisplay
subdir-$(CONFIG_SAMPLE_ANDROID_BINDERFS) += binderfs
obj-$(CONFIG_SAMPLE_CONFIGFS) += configfs/
obj-$(CONFIG_SAMPLE_CONNECTOR) += connector/
+obj-$(CONFIG_SAMPLE_FANOTIFY_ERROR) += fanotify/
subdir-$(CONFIG_SAMPLE_HIDRAW) += hidraw
obj-$(CONFIG_SAMPLE_HW_BREAKPOINT) += hw_breakpoint/
obj-$(CONFIG_SAMPLE_KDB) += kdb/
diff --git a/samples/fanotify/Makefile b/samples/fanotify/Makefile
new file mode 100644
index 000000000000..e20db1bdde3b
--- /dev/null
+++ b/samples/fanotify/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0-only
+userprogs-always-y += fs-monitor
+
+userccflags += -I usr/include -Wall
+
diff --git a/samples/fanotify/fs-monitor.c b/samples/fanotify/fs-monitor.c
new file mode 100644
index 000000000000..ad6605b337ea
--- /dev/null
+++ b/samples/fanotify/fs-monitor.c
@@ -0,0 +1,138 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2021, Collabora Ltd.
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <err.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <fcntl.h>
+#include <sys/fanotify.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <sys/types.h>
+
+#ifndef FAN_FS_ERROR
+#define FAN_FS_ERROR 0x00008000
+#define FAN_EVENT_INFO_TYPE_ERROR 4
+
+struct fanotify_event_info_error {
+ struct fanotify_event_info_header hdr;
+ __s32 error;
+ __u32 error_count;
+};
+#endif
+
+#ifndef FILEID_INO32_GEN
+#define FILEID_INO32_GEN 1
+#endif
+
+#ifndef FILEID_INVALID
+#define FILEID_INVALID 0xff
+#endif
+
+static void print_fh(struct file_handle *fh)
+{
+ int i;
+ uint32_t *h = (uint32_t *) fh->f_handle;
+
+ printf("\tfh: ");
+ for (i = 0; i < fh->handle_bytes; i++)
+ printf("%hhx", fh->f_handle[i]);
+ printf("\n");
+
+ printf("\tdecoded fh: ");
+ if (fh->handle_type == FILEID_INO32_GEN)
+ printf("inode=%u gen=%u\n", h[0], h[1]);
+ else if (fh->handle_type == FILEID_INVALID && !h[0] && !h[1])
+ printf("Type %d (Superblock error)\n", fh->handle_type);
+ else
+ printf("Type %d (Unknown)\n", fh->handle_type);
+
+}
+
+static void handle_notifications(char *buffer, int len)
+{
+ struct fanotify_event_metadata *metadata;
+ struct fanotify_event_info_error *error;
+ struct fanotify_event_info_fid *fid;
+ char *next;
+
+ for (metadata = (struct fanotify_event_metadata *) buffer;
+ FAN_EVENT_OK(metadata, len);
+ metadata = FAN_EVENT_NEXT(metadata, len)) {
+ next = (char *)metadata + metadata->event_len;
+ if (metadata->mask != FAN_FS_ERROR) {
+ printf("unexpected FAN MARK: %llx\n", metadata->mask);
+ goto next_event;
+ } else if (metadata->fd != FAN_NOFD) {
+ printf("Unexpected fd (!= FAN_NOFD)\n");
+ goto next_event;
+ }
+
+ printf("FAN_FS_ERROR found len=%d\n", metadata->event_len);
+
+ error = (struct fanotify_event_info_error *) (metadata+1);
+ if (error->hdr.info_type != FAN_EVENT_INFO_TYPE_ERROR) {
+ printf("unknown record: %d (Expecting TYPE_ERROR)\n",
+ error->hdr.info_type);
+ goto next_event;
+ }
+
+ printf("\tGeneric Error Record: len=%d\n", error->hdr.len);
+ printf("\terror: %d\n", error->error);
+ printf("\terror_count: %d\n", error->error_count);
+
+ fid = (struct fanotify_event_info_fid *) (error + 1);
+ if ((char *) fid >= next) {
+ printf("Event doesn't have FID\n");
+ goto next_event;
+ }
+ printf("FID record found\n");
+
+ if (fid->hdr.info_type != FAN_EVENT_INFO_TYPE_FID) {
+ printf("unknown record: %d (Expecting TYPE_FID)\n",
+ fid->hdr.info_type);
+ goto next_event;
+ }
+ printf("\tfsid: %x%x\n", fid->fsid.val[0], fid->fsid.val[1]);
+ print_fh((struct file_handle *) &fid->handle);
+
+next_event:
+ printf("---\n\n");
+ }
+}
+
+int main(int argc, char **argv)
+{
+ int fd;
+
+ char buffer[BUFSIZ];
+
+ if (argc < 2) {
+ printf("Missing path argument\n");
+ return 1;
+ }
+
+ fd = fanotify_init(FAN_CLASS_NOTIF|FAN_REPORT_FID, O_RDONLY);
+ if (fd < 0)
+ errx(1, "fanotify_init");
+
+ if (fanotify_mark(fd, FAN_MARK_ADD|FAN_MARK_FILESYSTEM,
+ FAN_FS_ERROR, AT_FDCWD, argv[1])) {
+ errx(1, "fanotify_mark");
+ }
+
+ while (1) {
+ int n = read(fd, buffer, BUFSIZ);
+
+ if (n < 0)
+ errx(1, "read");
+
+ handle_notifications(buffer, n);
+ }
+
+ return 0;
+}
--
2.32.0

2021-08-04 16:13:29

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v5 23/23] docs: Document the FAN_FS_ERROR event

Document the FAN_FS_ERROR event for user administrators and user space
developers.

Reviewed-by: Amir Goldstein <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

---
Changes Since v4:
- Update documentation about reporting non-file error.
Changes Since v3:
- Move FAN_FS_ERROR notification into a subsection of the file.
Changes Since v2:
- NTR
Changes since v1:
- Drop references to location record
- Explain that the inode field is optional
- Explain we are reporting only the first error
---
.../admin-guide/filesystem-monitoring.rst | 70 +++++++++++++++++++
Documentation/admin-guide/index.rst | 1 +
2 files changed, 71 insertions(+)
create mode 100644 Documentation/admin-guide/filesystem-monitoring.rst

diff --git a/Documentation/admin-guide/filesystem-monitoring.rst b/Documentation/admin-guide/filesystem-monitoring.rst
new file mode 100644
index 000000000000..d03a2e54ae2a
--- /dev/null
+++ b/Documentation/admin-guide/filesystem-monitoring.rst
@@ -0,0 +1,70 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================================
+File system Monitoring with fanotify
+====================================
+
+File system Error Reporting
+===========================
+
+fanotify supports the FAN_FS_ERROR mark for file system-wide error
+reporting. It is meant to be used by file system health monitoring
+daemons who listen on that interface and take actions (notify sysadmin,
+start recovery) when a file system problem is detected by the kernel.
+
+By design, A FAN_FS_ERROR notification exposes sufficient information for a
+monitoring tool to know a problem in the file system has happened. It
+doesn't necessarily provide a user space application with semantics to
+verify an IO operation was successfully executed. That is outside of
+scope of this feature. Instead, it is only meant as a framework for
+early file system problem detection and reporting recovery tools.
+
+When a file system operation fails, it is common for dozens of kernel
+errors to cascade after the initial failure, hiding the original failure
+log, which is usually the most useful debug data to troubleshoot the
+problem. For this reason, FAN_FS_ERROR only reports the first error that
+occurred since the last notification, and it simply counts addition
+errors. This ensures that the most important piece of error information
+is never lost.
+
+FAN_FS_ERROR requires the fanotify group to be setup with the
+FAN_REPORT_FID flag.
+
+At the time of this writing, the only file system that emits FAN_FS_ERROR
+notifications is Ext4.
+
+A user space example code is provided at ``samples/fanotify/fs-monitor.c``.
+
+A FAN_FS_ERROR Notification has the following format::
+
+ [ Notification Metadata (Mandatory) ]
+ [ Generic Error Record (Mandatory) ]
+ [ FID record (Mandatory) ]
+
+Generic error record
+--------------------
+
+The generic error record provides enough information for a file system
+agnostic tool to learn about a problem in the file system, without
+providing any additional details about the problem. This record is
+identified by ``struct fanotify_event_info_header.info_type`` being set
+to FAN_EVENT_INFO_TYPE_ERROR.
+
+ struct fanotify_event_info_error {
+ struct fanotify_event_info_header hdr;
+ __s32 error;
+ __u32 error_count;
+ };
+
+The `error` field identifies the type of error. `error_count` count
+tracks the number of errors that occurred and were suppressed to
+preserve the original error, since the last notification.
+
+FID record
+----------
+
+The FID record can be used to uniquely identify the inode that triggered
+the error through the combination of fsid and file handler. A
+filesystem specific handler can use that information to attempt a
+recovery procedure. Errors that are not related to an inode are
+reported with an empty file handler, with type FILEID_INVALID.
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index dc00afcabb95..1bedab498104 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -82,6 +82,7 @@ configure specific aspects of kernel behavior to your liking.
edid
efi-stub
ext4
+ filesystem-monitoring
nfs/index
gpio/index
highuid
--
2.32.0

2021-08-05 09:15:25

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v5 04/23] fsnotify: Reserve mark bits for backends

Hi Gabriel,

I love your patch! Yet something to improve:

[auto build test ERROR on ext3/fsnotify]
[also build test ERROR on ext4/dev ext3/for_next linus/master v5.14-rc4 next-20210804]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210805-001201
base: https://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git fsnotify
config: powerpc-ppc64_defconfig (attached as .config)
compiler: powerpc-linux-gcc (GCC) 10.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/e5294af07c565333b5efb87d6b431d7d601a1043
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Gabriel-Krisman-Bertazi/File-system-wide-monitoring/20210805-001201
git checkout e5294af07c565333b5efb87d6b431d7d601a1043
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-10.3.0 make.cross ARCH=powerpc

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

In file included from include/linux/fsnotify.h:15,
from arch/powerpc/platforms/cell/spufs/inode.c:15:
include/linux/fsnotify_backend.h:367:27: error: 'FSNOTIFY_MARK_FLAG_ATTACHED' defined but not used [-Werror=unused-const-variable=]
367 | static const unsigned int FSNOTIFY_MARK_FLAG_##flag = \
| ^~~~~~~~~~~~~~~~~~~
include/linux/fsnotify_backend.h:379:1: note: in expansion of macro 'FSNOTIFY_MARK_FLAG'
379 | FSNOTIFY_MARK_FLAG(ATTACHED);
| ^~~~~~~~~~~~~~~~~~
include/linux/fsnotify_backend.h:367:27: error: 'FSNOTIFY_MARK_FLAG_ALIVE' defined but not used [-Werror=unused-const-variable=]
367 | static const unsigned int FSNOTIFY_MARK_FLAG_##flag = \
| ^~~~~~~~~~~~~~~~~~~
include/linux/fsnotify_backend.h:378:1: note: in expansion of macro 'FSNOTIFY_MARK_FLAG'
378 | FSNOTIFY_MARK_FLAG(ALIVE);
| ^~~~~~~~~~~~~~~~~~
>> include/linux/fsnotify_backend.h:367:27: error: 'FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY' defined but not used [-Werror=unused-const-variable=]
367 | static const unsigned int FSNOTIFY_MARK_FLAG_##flag = \
| ^~~~~~~~~~~~~~~~~~~
include/linux/fsnotify_backend.h:377:1: note: in expansion of macro 'FSNOTIFY_MARK_FLAG'
377 | FSNOTIFY_MARK_FLAG(IGNORED_SURV_MODIFY);
| ^~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors


vim +/FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY +367 include/linux/fsnotify_backend.h

365
366 #define FSNOTIFY_MARK_FLAG(flag) \
> 367 static const unsigned int FSNOTIFY_MARK_FLAG_##flag = \
368 (1 << FSN_MARK_FL_BIT_##flag)
369

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (3.26 kB)
.config.gz (26.18 kB)
Download all attachments

2021-08-05 09:19:46

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v5 04/23] fsnotify: Reserve mark bits for backends

On Wed 04-08-21 12:05:53, Gabriel Krisman Bertazi wrote:
> Split out the final bits of struct fsnotify_mark->flags for use by a
> backend.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> ---
> include/linux/fsnotify_backend.h | 19 ++++++++++++++++---
> 1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index 1ce66748a2d2..9d5586445c65 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -363,6 +363,21 @@ struct fsnotify_mark_connector {
> struct hlist_head list;
> };
>
> +#define FSNOTIFY_MARK_FLAG(flag) \
> +static const unsigned int FSNOTIFY_MARK_FLAG_##flag = \
> + (1 << FSN_MARK_FL_BIT_##flag)

Static variable declaration in a header file makes me a bit uneasy. I know
it is const so a compiler should optimize this to a constant but still
there will likely be some side-effects (see the 0-day warning).

Honestly, given these are just three flags I'd just don't overengineer this
and have:

#define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY \
(1 << FSN_MARK_FL_BIT_IGNORED_SURV_MODIFY)
...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-05 09:25:51

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v5 05/23] fanotify: Split superblock marks out to a new cache

On Wed 04-08-21 12:05:54, Gabriel Krisman Bertazi wrote:
> FAN_FS_ERROR will require an error structure to be stored per mark.
> But, since FAN_FS_ERROR doesn't apply to inode/mount marks, it should
> suffice to only expose this information for superblock marks. Therefore,
> wrap this kind of marks into a container and plumb it for the future.
>
> Reviewed-by: Amir Goldstein <[email protected]>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

Looks mostly good, just one nit below:

> @@ -915,6 +916,38 @@ static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark,
> return mask & ~oldmask;
> }
>
> +static struct fsnotify_mark *fanotify_alloc_mark(struct fsnotify_group *group,
> + unsigned int type)
> +{
> + struct fanotify_sb_mark *sb_mark;
> + struct fsnotify_mark *mark;
> +
> + switch (type) {
> + case FSNOTIFY_OBJ_TYPE_SB:
> + sb_mark = kmem_cache_zalloc(fanotify_sb_mark_cache, GFP_KERNEL);
> + if (!sb_mark)
> + return NULL;
> + mark = &sb_mark->fsn_mark;
> + break;
> +
> + case FSNOTIFY_OBJ_TYPE_INODE:
> + case FSNOTIFY_OBJ_TYPE_PARENT:
> + case FSNOTIFY_OBJ_TYPE_VFSMOUNT:
> + mark = kmem_cache_alloc(fanotify_mark_cache, GFP_KERNEL);
> + break;

It is odd that sb marks are allocated with zalloc while other marks with
alloc. Why is that? It is errorprone to have this different among mark
types as somebody may mistakenly assume a mark is zeroed when it actually is
not. So please either use kmem_cache_alloc() for sb mark as well and zero
out by hand what you need, or do a cleanup patch that uses zalloc across
all of dnotify, inotify, fanotify (I can see kernel/audit_* users already
use zalloc) and drop zeroing from fsnotify_init_mark(). Thanks!

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-05 09:29:09

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v5 08/23] fsnotify: Add wrapper around fsnotify_add_event

On Wed 04-08-21 12:05:57, Gabriel Krisman Bertazi wrote:
> fsnotify_add_event is growing in number of parameters, which is most
^^ in most
cases...

> case are just passed a NULL pointer. So, split out a new
> fsnotify_insert_event function to clean things up for users who don't
> need an insert hook.
>
> Suggested-by: Amir Goldstein <[email protected]>
> Reviewed-by: Amir Goldstein <[email protected]>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

Otherwise looks good. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/notify/fanotify/fanotify.c | 4 ++--
> fs/notify/inotify/inotify_fsnotify.c | 2 +-
> fs/notify/notification.c | 12 ++++++------
> include/linux/fsnotify_backend.h | 23 ++++++++++++++++-------
> 4 files changed, 25 insertions(+), 16 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index c3eefe3f6494..acf78c0ed219 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -781,8 +781,8 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> }
>
> fsn_event = &event->fse;
> - ret = fsnotify_add_event(group, fsn_event, fanotify_merge,
> - fanotify_insert_event);
> + ret = fsnotify_insert_event(group, fsn_event, fanotify_merge,
> + fanotify_insert_event);
> if (ret) {
> /* Permission events shouldn't be merged */
> BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS);
> diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
> index d1a64daa0171..a96582cbfad1 100644
> --- a/fs/notify/inotify/inotify_fsnotify.c
> +++ b/fs/notify/inotify/inotify_fsnotify.c
> @@ -116,7 +116,7 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask,
> if (len)
> strcpy(event->name, name->name);
>
> - ret = fsnotify_add_event(group, fsn_event, inotify_merge, NULL);
> + ret = fsnotify_add_event(group, fsn_event, inotify_merge);
> if (ret) {
> /* Our event wasn't used in the end. Free it. */
> fsnotify_destroy_event(group, fsn_event);
> diff --git a/fs/notify/notification.c b/fs/notify/notification.c
> index 32f45543b9c6..44bb10f50715 100644
> --- a/fs/notify/notification.c
> +++ b/fs/notify/notification.c
> @@ -78,12 +78,12 @@ void fsnotify_destroy_event(struct fsnotify_group *group,
> * 2 if the event was not queued - either the queue of events has overflown
> * or the group is shutting down.
> */
> -int fsnotify_add_event(struct fsnotify_group *group,
> - struct fsnotify_event *event,
> - int (*merge)(struct fsnotify_group *,
> - struct fsnotify_event *),
> - void (*insert)(struct fsnotify_group *,
> - struct fsnotify_event *))
> +int fsnotify_insert_event(struct fsnotify_group *group,
> + struct fsnotify_event *event,
> + int (*merge)(struct fsnotify_group *,
> + struct fsnotify_event *),
> + void (*insert)(struct fsnotify_group *,
> + struct fsnotify_event *))
> {
> int ret = 0;
> struct list_head *list = &group->notification_list;
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index 2b5fb9327a77..cd4ca11f129e 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -495,16 +495,25 @@ extern int fsnotify_fasync(int fd, struct file *file, int on);
> extern void fsnotify_destroy_event(struct fsnotify_group *group,
> struct fsnotify_event *event);
> /* attach the event to the group notification queue */
> -extern int fsnotify_add_event(struct fsnotify_group *group,
> - struct fsnotify_event *event,
> - int (*merge)(struct fsnotify_group *,
> - struct fsnotify_event *),
> - void (*insert)(struct fsnotify_group *,
> - struct fsnotify_event *));
> +extern int fsnotify_insert_event(struct fsnotify_group *group,
> + struct fsnotify_event *event,
> + int (*merge)(struct fsnotify_group *,
> + struct fsnotify_event *),
> + void (*insert)(struct fsnotify_group *,
> + struct fsnotify_event *));
> +
> +static inline int fsnotify_add_event(struct fsnotify_group *group,
> + struct fsnotify_event *event,
> + int (*merge)(struct fsnotify_group *,
> + struct fsnotify_event *))
> +{
> + return fsnotify_insert_event(group, event, merge, NULL);
> +}
> +
> /* Queue overflow event to a notification group */
> static inline void fsnotify_queue_overflow(struct fsnotify_group *group)
> {
> - fsnotify_add_event(group, group->overflow_event, NULL, NULL);
> + fsnotify_add_event(group, group->overflow_event, NULL);
> }
>
> static inline bool fsnotify_is_overflow_event(u32 mask)
> --
> 2.32.0
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-05 09:41:13

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v5 13/23] fanotify: Allow file handle encoding for unhashed events

On Wed 04-08-21 12:06:02, Gabriel Krisman Bertazi wrote:
> FAN_FS_ERROR will report a file handle, but it is a unhashed event.n
^^ an ^
spurious 'n'.

> Allow passing a NULL hash to fanotify_encode_fh and avoid calculating
> the hash if not needed.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

Otherwise looks good. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/notify/fanotify/fanotify.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index a015822e29d8..0d6ba218bc01 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -385,8 +385,12 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
> fh->type = type;
> fh->len = fh_len;
>
> - /* Mix fh into event merge key */
> - *hash ^= fanotify_hash_fh(fh);
> + /*
> + * Mix fh into event merge key. Hash might be NULL in case of
> + * unhashed FID events (i.e. FAN_FS_ERROR).
> + */
> + if (hash)
> + *hash ^= fanotify_hash_fh(fh);
>
> return FANOTIFY_FH_HDR_LEN + fh_len;
>
> --
> 2.32.0
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-05 09:57:24

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v5 14/23] fanotify: Encode invalid file handler when no inode is provided

On Wed 04-08-21 12:06:03, Gabriel Krisman Bertazi wrote:
> Instead of failing, encode an invalid file handler in fanotify_encode_fh
> if no inode is provided. This bogus file handler will be reported by
> FAN_FS_ERROR for non-inode errors.
>
> Also adjust the single caller that might rely on failure after passing
> an empty inode.

It is not 'file handler' but rather 'file handle' - several times in the
changelog and in subject :).

> Suggested-by: Amir Goldstein <[email protected]>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> ---
> fs/notify/fanotify/fanotify.c | 39 ++++++++++++++++++++---------------
> fs/notify/fanotify/fanotify.h | 6 ++++--
> 2 files changed, 26 insertions(+), 19 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 0d6ba218bc01..456c60107d88 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -349,12 +349,6 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
> void *buf = fh->buf;
> int err;
>
> - fh->type = FILEID_ROOT;
> - fh->len = 0;
> - fh->flags = 0;
> - if (!inode)
> - return 0;
> -

I'd keep the fh->flags initialization here. Otherwise it will not be
initialized on some error returns.

> @@ -363,8 +357,9 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
> if (fh_len < 4 || WARN_ON_ONCE(fh_len % 4))
> goto out_err;
>
> - /* No external buffer in a variable size allocated fh */
> - if (gfp && fh_len > FANOTIFY_INLINE_FH_LEN) {
> + fh->flags = 0;
> + /* No external buffer in a variable size allocated fh or null fh */
> + if (inode && gfp && fh_len > FANOTIFY_INLINE_FH_LEN) {
> /* Treat failure to allocate fh as failure to encode fh */
> err = -ENOMEM;
> ext_buf = kmalloc(fh_len, gfp);
> @@ -376,14 +371,24 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
> fh->flags |= FANOTIFY_FH_FLAG_EXT_BUF;
> }
>
> - dwords = fh_len >> 2;
> - type = exportfs_encode_inode_fh(inode, buf, &dwords, NULL);
> - err = -EINVAL;
> - if (!type || type == FILEID_INVALID || fh_len != dwords << 2)
> - goto out_err;
> -
> - fh->type = type;
> - fh->len = fh_len;
> + if (inode) {
> + dwords = fh_len >> 2;
> + type = exportfs_encode_inode_fh(inode, buf, &dwords, NULL);
> + err = -EINVAL;
> + if (!type || type == FILEID_INVALID || fh_len != dwords << 2)
> + goto out_err;
> + fh->type = type;
> + fh->len = fh_len;
> + } else {
> + /*
> + * Invalid FHs are used on FAN_FS_ERROR for errors not
> + * linked to any inode. Caller needs to guarantee the fh
> + * has at least FANOTIFY_NULL_FH_LEN bytes of space.
> + */
> + fh->type = FILEID_INVALID;
> + fh->len = FANOTIFY_NULL_FH_LEN;
> + memset(buf, 0, FANOTIFY_NULL_FH_LEN);
> + }

Maybe it will become clearer later during the series but why do you set
fh->len to FANOTIFY_NULL_FH_LEN and not 0?

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-05 10:26:27

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v5 10/23] fsnotify: Allow events reported with an empty inode

On Wed 04-08-21 12:05:59, Gabriel Krisman Bertazi wrote:
> Some file system events (i.e. FS_ERROR) might not be associated with an
> inode. For these, it makes sense to associate them directly with the
> super block of the file system they apply to. This patch allows the
> event to be reported directly against the super block instead of an
> inode.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

There's a comment before fsnotify() declaration that states that 'either
@dir or @inode must be non-NULL' - in this patch it would be good time to
update that comment.

> @@ -459,12 +460,13 @@ static void fsnotify_iter_next(struct fsnotify_iter_info *iter_info)
> * if both are non-NULL event may be reported to both.
> * @cookie: inotify rename cookie
> */
> -int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
> - const struct qstr *file_name, struct inode *inode, u32 cookie)
> +int fsnotify(__u32 mask, const void *data, int data_type,
> + struct super_block *sb, struct inode *dir,
> + const struct qstr *file_name, struct inode *inode,
> + u32 cookie)
> {

Two notes as ideas for consideration:

1) We could derive 'sb' from 'data'. I.e., have a helper like
fsnotify_data_sb(data, data_type). For FSNOTIFY_EVENT_PATH and
FSNOTIFY_EVENT_INODE this is easy to provide, for FSNOTIFY_EVENT_ERROR we
would have to add sb pointer to the structure but I guess that's easy. That
way we'd avoid the mostly NULL 'sb' argument. What do you guys think?

2) AFAICS 'inode' can be always derived from 'data' as well. So maybe we
can drop it Amir?

That being said I can live with the code as is in this patch as well
(although with a bit of "ugh, irk ;)" so I want to discuss these ideas).

Honza

> const struct path *path = fsnotify_data_path(data, data_type);
> struct fsnotify_iter_info iter_info = {};
> - struct super_block *sb;
> struct mount *mnt = NULL;
> struct inode *parent = NULL;
> int ret = 0;
> @@ -483,7 +485,9 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
> */
> parent = dir;
> }
> - sb = inode->i_sb;
> +
> + if (!sb)
> + sb = inode->i_sb;
>
> /*
> * Optimization: srcu_read_lock() has a memory barrier which can
> diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> index f8acddcf54fb..39c9dbc46d36 100644
> --- a/include/linux/fsnotify.h
> +++ b/include/linux/fsnotify.h
> @@ -30,7 +30,8 @@ static inline void fsnotify_name(struct inode *dir, __u32 mask,
> struct inode *child,
> const struct qstr *name, u32 cookie)
> {
> - fsnotify(mask, child, FSNOTIFY_EVENT_INODE, dir, name, NULL, cookie);
> + fsnotify(mask, child, FSNOTIFY_EVENT_INODE, NULL, dir, name, NULL,
> + cookie);
> }
>
> static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry,
> @@ -44,7 +45,8 @@ static inline void fsnotify_inode(struct inode *inode, __u32 mask)
> if (S_ISDIR(inode->i_mode))
> mask |= FS_ISDIR;
>
> - fsnotify(mask, inode, FSNOTIFY_EVENT_INODE, NULL, NULL, inode, 0);
> + fsnotify(mask, inode, FSNOTIFY_EVENT_INODE, NULL, NULL, NULL, inode,
> + 0);
> }
>
> /* Notify this dentry's parent about a child's events. */
> @@ -68,7 +70,7 @@ static inline int fsnotify_parent(struct dentry *dentry, __u32 mask,
> return __fsnotify_parent(dentry, mask, data, data_type);
>
> notify_child:
> - return fsnotify(mask, data, data_type, NULL, NULL, inode, 0);
> + return fsnotify(mask, data, data_type, NULL, NULL, NULL, inode, 0);
> }
>
> /*
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index 693fe4cb48cf..4a765edd3b2a 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -423,8 +423,9 @@ struct fsnotify_mark {
>
> /* main fsnotify call to send events */
> extern int fsnotify(__u32 mask, const void *data, int data_type,
> - struct inode *dir, const struct qstr *name,
> - struct inode *inode, u32 cookie);
> + struct super_block *sb, struct inode *dir,
> + const struct qstr *name, struct inode *inode,
> + u32 cookie);
> extern int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
> int data_type);
> extern void __fsnotify_inode_delete(struct inode *inode);
> @@ -618,8 +619,9 @@ static inline void fsnotify_init_event(struct fsnotify_event *event)
> #else
>
> static inline int fsnotify(__u32 mask, const void *data, int data_type,
> - struct inode *dir, const struct qstr *name,
> - struct inode *inode, u32 cookie)
> + struct super_block *sb, struct inode *dir,
> + const struct qstr *name, struct inode *inode,
> + u32 cookie)
> {
> return 0;
> }
> --
> 2.32.0
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-05 10:31:25

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v5 15/23] fanotify: Require fid_mode for any non-fd event

On Wed 04-08-21 12:06:04, Gabriel Krisman Bertazi wrote:
> Like inode events, FAN_FS_ERROR will require fid mode. Therefore,
> convert the verification during fanotify_mark(2) to require fid for any
> non-fd event. This means fid_mode will not only be required for inode
> events, but for any event that doesn't provide a descriptor.
>
> Suggested-by: Amir Goldstein <[email protected]>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

Looks good. One style nit below, otherwise feel free to add:

Reviewed-by: Jan Kara <[email protected]>


> /*
> - * Events with data type inode do not carry enough information to report
> - * event->fd, so we do not allow setting a mask for inode events unless
> - * group supports reporting fid.
> - * inode events are not supported on a mount mark, because they do not
> - * carry enough information (i.e. path) to be filtered by mount point.
> - */
> + * Events that do not carry enough information to report
> + * event->fd require a group that supports reporting fid. Those
> + * events are not supported on a mount mark, because they do not
> + * carry enough information (i.e. path) to be filtered by mount
> + * point.
> + */
^ I think these should be one column more to the right to align
with the opening /*.

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-05 10:34:39

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v5 16/23] fanotify: Reserve UAPI bits for FAN_FS_ERROR

On Wed 04-08-21 12:06:05, Gabriel Krisman Bertazi wrote:
> FAN_FS_ERROR allows reporting of event type FS_ERROR to userspace, which
> a mechanism to report file system wide problems via fanotify. This
> commit preallocate userspace visible bits to match the FS_ERROR event.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/notify/fanotify/fanotify.c | 1 +
> include/uapi/linux/fanotify.h | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 456c60107d88..ca74199f11e2 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -733,6 +733,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> BUILD_BUG_ON(FAN_ONDIR != FS_ISDIR);
> BUILD_BUG_ON(FAN_OPEN_EXEC != FS_OPEN_EXEC);
> BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM);
> + BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR);
>
> BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19);
>
> diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
> index fbf9c5c7dd59..16402037fc7a 100644
> --- a/include/uapi/linux/fanotify.h
> +++ b/include/uapi/linux/fanotify.h
> @@ -20,6 +20,7 @@
> #define FAN_OPEN_EXEC 0x00001000 /* File was opened for exec */
>
> #define FAN_Q_OVERFLOW 0x00004000 /* Event queued overflowed */
> +#define FAN_FS_ERROR 0x00008000 /* Filesystem error */
>
> #define FAN_OPEN_PERM 0x00010000 /* File open in perm check */
> #define FAN_ACCESS_PERM 0x00020000 /* File accessed in perm check */
> --
> 2.32.0
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-05 11:29:54

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v5 19/23] fanotify: Report fid info for file related file system errors

On Wed 04-08-21 12:06:08, Gabriel Krisman Bertazi wrote:
> Plumb the pieces to add a FID report to error records. Since all error
> event memory must be pre-allocated, we estimate a file handler size and
> if it is insuficient, we report an invalid FID and increase the
> prediction for the next error slot allocation.

Hum, cannot we just preallocate MAX_HANDLE_SZ? It is wasteful but then I
don't expect error marks would be that frequent for this to really
matter and the code would be simpler.

Honza

>
> For errors that don't expose a file handler report it with an invalid
> FID.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> ---
> fs/notify/fanotify/fanotify.c | 27 +++++++++++++++++++++++++++
> fs/notify/fanotify/fanotify.h | 12 ++++++++++++
> fs/notify/fanotify/fanotify_user.c | 19 ++++++++++++++++++-
> 3 files changed, 57 insertions(+), 1 deletion(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 4e9e271a4394..2fd30b5eb9d3 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -705,7 +705,9 @@ static void fanotify_insert_error_event(struct fsnotify_group *group,
> {
> const struct fs_error_report *report = (struct fs_error_report *) data;
> struct fanotify_event *fae = FANOTIFY_E(event);
> + struct inode *inode = report->inode;
> struct fanotify_error_event *fee;
> + int fh_len;
>
> /* This might be an unexpected type of event (i.e. overflow). */
> if (!fanotify_is_error_event(fae->mask))
> @@ -715,6 +717,31 @@ static void fanotify_insert_error_event(struct fsnotify_group *group,
> fee->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR;
> fee->error = report->error;
> fee->err_count = 1;
> + fee->fsid = fee->sb_mark->fsn_mark.connector->fsid;
> +
> + /*
> + * Error reporting needs to happen in atomic context. If this
> + * inode's file handler length is more than we initially
> + * predicted, there is nothing better we can do than report the
> + * error with a bad file handler.
> + */
> + fh_len = fanotify_encode_fh_len(inode);
> + if (fh_len > fee->sb_mark->pred_fh_len) {
> + pr_warn_ratelimited(
> + "FH overflows error event. Drop inode information.\n");
> + /*
> + * Update the handler size prediction for the next error
> + * event allocation. This reduces the chance of another
> + * overflow.
> + */
> + fee->sb_mark->pred_fh_len = fh_len;
> +
> + /* For the current error, ignore the inode information. */
> + inode = NULL;
> + fh_len = fanotify_encode_fh_len(NULL);
> + }
> +
> + fanotify_encode_fh(&fee->object_fh, inode, fh_len, NULL, 0);
> }
>
> /*
> diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
> index 8929ea50f96f..e4e7968b70ee 100644
> --- a/fs/notify/fanotify/fanotify.h
> +++ b/fs/notify/fanotify/fanotify.h
> @@ -142,6 +142,7 @@ FANOTIFY_MARK_FLAG(SB_MARK);
>
> struct fanotify_sb_mark {
> struct fsnotify_mark fsn_mark;
> + size_t pred_fh_len;
> struct fanotify_error_event *fee_slot;
> };
>
> @@ -227,6 +228,13 @@ struct fanotify_error_event {
> u32 err_count; /* Suppressed errors count */
>
> struct fanotify_sb_mark *sb_mark; /* Back reference to the mark. */
> +
> + __kernel_fsid_t fsid; /* FSID this error refers to. */
> + /*
> + * object_fh is followed by a variable sized buffer, so it must
> + * be the last element of this structure.
> + */
> + struct fanotify_fh object_fh;
> };
>
> static inline struct fanotify_error_event *
> @@ -241,6 +249,8 @@ static inline __kernel_fsid_t *fanotify_event_fsid(struct fanotify_event *event)
> return &FANOTIFY_FE(event)->fsid;
> else if (event->type == FANOTIFY_EVENT_TYPE_FID_NAME)
> return &FANOTIFY_NE(event)->fsid;
> + else if (event->type == FANOTIFY_EVENT_TYPE_FS_ERROR)
> + return &FANOTIFY_EE(event)->fsid;
> else
> return NULL;
> }
> @@ -252,6 +262,8 @@ static inline struct fanotify_fh *fanotify_event_object_fh(
> return &FANOTIFY_FE(event)->object_fh;
> else if (event->type == FANOTIFY_EVENT_TYPE_FID_NAME)
> return fanotify_info_file_fh(&FANOTIFY_NE(event)->info);
> + else if (event->type == FANOTIFY_EVENT_TYPE_FS_ERROR)
> + return &FANOTIFY_EE(event)->object_fh;
> else
> return NULL;
> }
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index e7fe6bc61b6f..74def82630bb 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -172,8 +172,25 @@ static struct fanotify_error_event *fanotify_alloc_error_event(
>
> {
> struct fanotify_error_event *fee;
> + struct super_block *sb;
>
> - fee = kzalloc(sizeof(*fee), GFP_KERNEL_ACCOUNT);
> + if (!sb_mark->pred_fh_len) {
> + /*
> + * The handler size is initially predicted to be the
> + * same size as the root inode file handler. It can be
> + * increased later if a larger file handler is found.
> + */
> + sb = container_of(sb_mark->fsn_mark.connector->obj,
> + struct super_block, s_fsnotify_marks);
> + sb_mark->pred_fh_len =
> + fanotify_encode_fh_len(sb->s_root->d_inode);
> + }
> +
> + /* Guarantee there is always space to report an invalid FID. */
> + if (sb_mark->pred_fh_len < FANOTIFY_NULL_FH_LEN)
> + sb_mark->pred_fh_len = FANOTIFY_NULL_FH_LEN;
> +
> + fee = kzalloc(sizeof(*fee) + sb_mark->pred_fh_len, GFP_KERNEL_ACCOUNT);
> if (!fee)
> return NULL;
>
> --
> 2.32.0
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-05 13:39:56

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v5 19/23] fanotify: Report fid info for file related file system errors

On Wed 04-08-21 12:06:08, Gabriel Krisman Bertazi wrote:
> Plumb the pieces to add a FID report to error records. Since all error
> event memory must be pre-allocated, we estimate a file handler size and
> if it is insuficient, we report an invalid FID and increase the
> prediction for the next error slot allocation.
>
> For errors that don't expose a file handler report it with an invalid
> FID.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

...

> @@ -715,6 +717,31 @@ static void fanotify_insert_error_event(struct fsnotify_group *group,
> fee->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR;
> fee->error = report->error;
> fee->err_count = 1;
> + fee->fsid = fee->sb_mark->fsn_mark.connector->fsid;
> +
> + /*
> + * Error reporting needs to happen in atomic context. If this
> + * inode's file handler length is more than we initially
> + * predicted, there is nothing better we can do than report the
> + * error with a bad file handler.
> + */
> + fh_len = fanotify_encode_fh_len(inode);
> + if (fh_len > fee->sb_mark->pred_fh_len) {
> + pr_warn_ratelimited(
> + "FH overflows error event. Drop inode information.\n");
> + /*
> + * Update the handler size prediction for the next error
> + * event allocation. This reduces the chance of another
> + * overflow.
> + */
> + fee->sb_mark->pred_fh_len = fh_len;
> +
> + /* For the current error, ignore the inode information. */
> + inode = NULL;
> + fh_len = fanotify_encode_fh_len(NULL);
> + }
> +
> + fanotify_encode_fh(&fee->object_fh, inode, fh_len, NULL, 0);
> }

I don't think calling fanotify_encode_fh() from under notification_lock is
a good practice. It calls into a filesystem and god knows which locks it may
need to take to get necessary info... See my reply to patch 18 how we could
handle that better.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-05 13:40:51

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v5 18/23] fanotify: Handle FAN_FS_ERROR events

On Wed 04-08-21 12:06:07, Gabriel Krisman Bertazi wrote:
> Wire up FAN_FS_ERROR in the fanotify_mark syscall. The event can only
> be requested for the entire filesystem, thus it requires the
> FAN_MARK_FILESYSTEM.
>
> FAN_FS_ERROR has to be handled slightly differently from other events
> because it needs to be submitted in an atomic context, using
> preallocated memory. This patch implements the submission path by only
> storing the first error event that happened in the slot (userspace
> resets the slot by reading the event).
>
> Extra error events happening when the slot is occupied are merged to the
> original report, and the only information keep for these extra errors is
> an accumulator counting the number of events, which is part of the
> record reported back to userspace.
>
> Reporting only the first event should be fine, since when a FS error
> happens, a cascade of error usually follows, but the most meaningful
> information is (usually) on the first erro.
>
> The event dequeueing is also a bit special to avoid losing events. Since
> event merging only happens while the event is queued, there is a window
> between when an error event is dequeued (notification_lock is dropped)
> until it is reset (.free_event()) where the slot is full, but no merges
> can happen.
>
> The proposed solution is to replace the event in the slot with a new
> structure, prior to dropping the lock. This way, if a new event arrives
> in the time between the event was dequeued and the time it resets, the
> new errors will still be logged and merged in the new slot.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

The splitting of the patches really helped. Now I think I can grok much
more details than before :) Thanks! Some comments below.

> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 0678d35432a7..4e9e271a4394 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -681,6 +681,42 @@ static __kernel_fsid_t fanotify_get_fsid(struct fsnotify_iter_info *iter_info)
> return fsid;
> }
>
> +static int fanotify_merge_error_event(struct fsnotify_group *group,
> + struct fsnotify_event *event)
> +{
> + struct fanotify_event *fae = FANOTIFY_E(event);
> + struct fanotify_error_event *fee = FANOTIFY_EE(fae);
> +
> + /*
> + * When err_count > 0, the reporting slot is full. Just account
> + * the additional error and abort the insertion.
> + */
> + if (fee->err_count) {
> + fee->err_count++;
> + return 1;
> + }
> +
> + return 0;
> +}
> +
> +static void fanotify_insert_error_event(struct fsnotify_group *group,
> + struct fsnotify_event *event,
> + const void *data)
> +{
> + const struct fs_error_report *report = (struct fs_error_report *) data;
> + struct fanotify_event *fae = FANOTIFY_E(event);
> + struct fanotify_error_event *fee;
> +
> + /* This might be an unexpected type of event (i.e. overflow). */
> + if (!fanotify_is_error_event(fae->mask))
> + return;
> +
> + fee = FANOTIFY_EE(fae);
> + fee->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR;
> + fee->error = report->error;
> + fee->err_count = 1;
> +}
> +
> /*
> * Add an event to hash table for faster merge.
> */
> @@ -735,7 +771,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM);
> BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR);
>
> - BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19);
> + BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20);
>
> mask = fanotify_group_event_mask(group, iter_info, mask, data,
> data_type, dir);
> @@ -760,6 +796,18 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> return 0;
> }
>
> + if (fanotify_is_error_event(mask)) {
> + struct fanotify_sb_mark *sb_mark =
> + FANOTIFY_SB_MARK(fsnotify_iter_sb_mark(iter_info));
> +
> + ret = fsnotify_insert_event(group,
> + &sb_mark->fee_slot->fae.fse,
> + fanotify_merge_error_event,
> + fanotify_insert_error_event,
> + data);
> + goto finish;
> + }

Hum, seeing this and how you had to extend fsnotify_add_event() to
accommodate this use, cannot we instead have something like:

if (fanotify_is_error_event(mask)) {
struct fanotify_sb_mark *sb_mark =
FANOTIFY_SB_MARK(fsnotify_iter_sb_mark(iter_info));
struct fanotify_error_event *event = &sb_mark->fee_slot;
bool queue = false;

spin_lock(&group->notification_lock);
/* Not yet queued? */
if (!event->err_count) {
fee->error = report->error;
queue = true;
}
event->err_count++;
spin_unlock(&group->notification_lock);
if (queue) {
... fill in other error info in 'event' such as fhandle
fsnotify_add_event(group, &event->fae.fse, NULL);
}
}

It would be IMHO simpler to follow what's going on and we don't have to
touch fsnotify_add_event(). I do recognize that due to races it may happen
that some racing fsnotify(FAN_FS_ERROR) call returns before the event is
actually visible in the event queue. It don't think it really matters but
if we wanted to be more careful, we would need to preformat fhandle into a
local buffer and only copy it into the event under notification_lock when
we see the event is unused.

> +/*
> + * Replace a mark's error event with a new structure in preparation for
> + * it to be dequeued. This is a bit annoying since we need to drop the
> + * lock, so another thread might just steal the event from us.
> + */
> +static int fanotify_replace_fs_error_event(struct fsnotify_group *group,
> + struct fanotify_event *fae)
> +{
> + struct fanotify_error_event *new, *fee = FANOTIFY_EE(fae);
> + struct fanotify_sb_mark *sb_mark = fee->sb_mark;
> + struct fsnotify_event *fse;
> +
> + pr_debug("%s: event=%p\n", __func__, fae);
> +
> + assert_spin_locked(&group->notification_lock);
> +
> + spin_unlock(&group->notification_lock);
> + new = fanotify_alloc_error_event(sb_mark);
> + spin_lock(&group->notification_lock);
> +
> + if (!new)
> + return -ENOMEM;
> +
> + /*
> + * Since we temporarily dropped the notification_lock, the event
> + * might have been taken from under us and reported by another
> + * reader. If that is the case, don't play games, just retry.
> + */
> + fse = fsnotify_peek_first_event(group);
> + if (fse != &fae->fse) {
> + kfree(new);
> + return -EAGAIN;
> + }
> +
> + sb_mark->fee_slot = new;
> +
> + return 0;
> +}
> +
> /*
> * Get an fanotify notification event if one exists and is small
> * enough to fit in "count". Return an error pointer if the count
> @@ -212,9 +252,21 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group,
> goto out;
> }
>
> + if (fanotify_is_error_event(event->mask)) {
> + /*
> + * Replace the error event ahead of dequeueing so we
> + * don't need to handle a incorrectly dequeued event.
> + */
> + ret = fanotify_replace_fs_error_event(group, event);
> + if (ret) {
> + event = ERR_PTR(ret);
> + goto out;
> + }
> + }
> +

The replacing, retry, and all is hairy. Cannot we just keep the same event
attached to the sb mark and copy-out to on-stack buffer under
notification_lock in get_one_event()? The event is big (due to fhandle) but
fanotify_read() is not called from a deep call chain so we should have
enough space on stack for that.

> /*
> - * Held the notification_lock the whole time, so this is the
> - * same event we peeked above.
> + * Even though we might have temporarily dropped the lock, this
> + * is guaranteed to be the same event we peeked above.
> */
> fsnotify_remove_first_event(group);
> if (fanotify_is_perm_event(event->mask))
> @@ -596,6 +648,8 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
> event = get_one_event(group, count);
> if (IS_ERR(event)) {
> ret = PTR_ERR(event);
> + if (ret == -EAGAIN)
> + continue;
> break;
> }
>

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-05 14:46:51

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH v5 18/23] fanotify: Handle FAN_FS_ERROR events

On Thu, Aug 5, 2021 at 3:15 PM Jan Kara <[email protected]> wrote:
>
> On Wed 04-08-21 12:06:07, Gabriel Krisman Bertazi wrote:
> > Wire up FAN_FS_ERROR in the fanotify_mark syscall. The event can only
> > be requested for the entire filesystem, thus it requires the
> > FAN_MARK_FILESYSTEM.
> >
> > FAN_FS_ERROR has to be handled slightly differently from other events
> > because it needs to be submitted in an atomic context, using
> > preallocated memory. This patch implements the submission path by only
> > storing the first error event that happened in the slot (userspace
> > resets the slot by reading the event).
> >
> > Extra error events happening when the slot is occupied are merged to the
> > original report, and the only information keep for these extra errors is
> > an accumulator counting the number of events, which is part of the
> > record reported back to userspace.
> >
> > Reporting only the first event should be fine, since when a FS error
> > happens, a cascade of error usually follows, but the most meaningful
> > information is (usually) on the first erro.
> >
> > The event dequeueing is also a bit special to avoid losing events. Since
> > event merging only happens while the event is queued, there is a window
> > between when an error event is dequeued (notification_lock is dropped)
> > until it is reset (.free_event()) where the slot is full, but no merges
> > can happen.
> >
> > The proposed solution is to replace the event in the slot with a new
> > structure, prior to dropping the lock. This way, if a new event arrives
> > in the time between the event was dequeued and the time it resets, the
> > new errors will still be logged and merged in the new slot.
> >
> > Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
>
> The splitting of the patches really helped. Now I think I can grok much
> more details than before :) Thanks! Some comments below.
>
> > diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> > index 0678d35432a7..4e9e271a4394 100644
> > --- a/fs/notify/fanotify/fanotify.c
> > +++ b/fs/notify/fanotify/fanotify.c
> > @@ -681,6 +681,42 @@ static __kernel_fsid_t fanotify_get_fsid(struct fsnotify_iter_info *iter_info)
> > return fsid;
> > }
> >
> > +static int fanotify_merge_error_event(struct fsnotify_group *group,
> > + struct fsnotify_event *event)
> > +{
> > + struct fanotify_event *fae = FANOTIFY_E(event);
> > + struct fanotify_error_event *fee = FANOTIFY_EE(fae);
> > +
> > + /*
> > + * When err_count > 0, the reporting slot is full. Just account
> > + * the additional error and abort the insertion.
> > + */
> > + if (fee->err_count) {
> > + fee->err_count++;
> > + return 1;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static void fanotify_insert_error_event(struct fsnotify_group *group,
> > + struct fsnotify_event *event,
> > + const void *data)
> > +{
> > + const struct fs_error_report *report = (struct fs_error_report *) data;
> > + struct fanotify_event *fae = FANOTIFY_E(event);
> > + struct fanotify_error_event *fee;
> > +
> > + /* This might be an unexpected type of event (i.e. overflow). */
> > + if (!fanotify_is_error_event(fae->mask))
> > + return;
> > +
> > + fee = FANOTIFY_EE(fae);
> > + fee->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR;
> > + fee->error = report->error;
> > + fee->err_count = 1;
> > +}
> > +
> > /*
> > * Add an event to hash table for faster merge.
> > */
> > @@ -735,7 +771,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> > BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM);
> > BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR);
> >
> > - BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19);
> > + BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20);
> >
> > mask = fanotify_group_event_mask(group, iter_info, mask, data,
> > data_type, dir);
> > @@ -760,6 +796,18 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> > return 0;
> > }
> >
> > + if (fanotify_is_error_event(mask)) {
> > + struct fanotify_sb_mark *sb_mark =
> > + FANOTIFY_SB_MARK(fsnotify_iter_sb_mark(iter_info));
> > +
> > + ret = fsnotify_insert_event(group,
> > + &sb_mark->fee_slot->fae.fse,
> > + fanotify_merge_error_event,
> > + fanotify_insert_error_event,
> > + data);
> > + goto finish;
> > + }
>
> Hum, seeing this and how you had to extend fsnotify_add_event() to
> accommodate this use, cannot we instead have something like:
>
> if (fanotify_is_error_event(mask)) {
> struct fanotify_sb_mark *sb_mark =
> FANOTIFY_SB_MARK(fsnotify_iter_sb_mark(iter_info));
> struct fanotify_error_event *event = &sb_mark->fee_slot;
> bool queue = false;
>
> spin_lock(&group->notification_lock);
> /* Not yet queued? */
> if (!event->err_count) {
> fee->error = report->error;
> queue = true;
> }
> event->err_count++;
> spin_unlock(&group->notification_lock);
> if (queue) {
> ... fill in other error info in 'event' such as fhandle
> fsnotify_add_event(group, &event->fae.fse, NULL);
> }
> }
>
> It would be IMHO simpler to follow what's going on and we don't have to
> touch fsnotify_add_event(). I do recognize that due to races it may happen
> that some racing fsnotify(FAN_FS_ERROR) call returns before the event is
> actually visible in the event queue. It don't think it really matters but
> if we wanted to be more careful, we would need to preformat fhandle into a
> local buffer and only copy it into the event under notification_lock when
> we see the event is unused.
>
> > +/*
> > + * Replace a mark's error event with a new structure in preparation for
> > + * it to be dequeued. This is a bit annoying since we need to drop the
> > + * lock, so another thread might just steal the event from us.
> > + */
> > +static int fanotify_replace_fs_error_event(struct fsnotify_group *group,
> > + struct fanotify_event *fae)
> > +{
> > + struct fanotify_error_event *new, *fee = FANOTIFY_EE(fae);
> > + struct fanotify_sb_mark *sb_mark = fee->sb_mark;
> > + struct fsnotify_event *fse;
> > +
> > + pr_debug("%s: event=%p\n", __func__, fae);
> > +
> > + assert_spin_locked(&group->notification_lock);
> > +
> > + spin_unlock(&group->notification_lock);
> > + new = fanotify_alloc_error_event(sb_mark);
> > + spin_lock(&group->notification_lock);
> > +
> > + if (!new)
> > + return -ENOMEM;
> > +
> > + /*
> > + * Since we temporarily dropped the notification_lock, the event
> > + * might have been taken from under us and reported by another
> > + * reader. If that is the case, don't play games, just retry.
> > + */
> > + fse = fsnotify_peek_first_event(group);
> > + if (fse != &fae->fse) {
> > + kfree(new);
> > + return -EAGAIN;
> > + }
> > +
> > + sb_mark->fee_slot = new;
> > +
> > + return 0;
> > +}
> > +
> > /*
> > * Get an fanotify notification event if one exists and is small
> > * enough to fit in "count". Return an error pointer if the count
> > @@ -212,9 +252,21 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group,
> > goto out;
> > }
> >
> > + if (fanotify_is_error_event(event->mask)) {
> > + /*
> > + * Replace the error event ahead of dequeueing so we
> > + * don't need to handle a incorrectly dequeued event.
> > + */
> > + ret = fanotify_replace_fs_error_event(group, event);
> > + if (ret) {
> > + event = ERR_PTR(ret);
> > + goto out;
> > + }
> > + }
> > +
>
> The replacing, retry, and all is hairy. Cannot we just keep the same event
> attached to the sb mark and copy-out to on-stack buffer under
> notification_lock in get_one_event()? The event is big (due to fhandle) but
> fanotify_read() is not called from a deep call chain so we should have
> enough space on stack for that.
>

For the record, this was one of the first implementations from Gabriel.
When I proposed the double buffer implementation it was either that
or go back to copy to stack.

Given the complications, I agree that going back to copy to stack
is preferred.

Thanks,
Amir.

2021-08-05 14:49:10

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH v5 10/23] fsnotify: Allow events reported with an empty inode

On Thu, Aug 5, 2021 at 1:24 PM Jan Kara <[email protected]> wrote:
>
> On Wed 04-08-21 12:05:59, Gabriel Krisman Bertazi wrote:
> > Some file system events (i.e. FS_ERROR) might not be associated with an
> > inode. For these, it makes sense to associate them directly with the
> > super block of the file system they apply to. This patch allows the
> > event to be reported directly against the super block instead of an
> > inode.
> >
> > Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
>
> There's a comment before fsnotify() declaration that states that 'either
> @dir or @inode must be non-NULL' - in this patch it would be good time to
> update that comment.
>
> > @@ -459,12 +460,13 @@ static void fsnotify_iter_next(struct fsnotify_iter_info *iter_info)
> > * if both are non-NULL event may be reported to both.
> > * @cookie: inotify rename cookie
> > */
> > -int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
> > - const struct qstr *file_name, struct inode *inode, u32 cookie)
> > +int fsnotify(__u32 mask, const void *data, int data_type,
> > + struct super_block *sb, struct inode *dir,
> > + const struct qstr *file_name, struct inode *inode,
> > + u32 cookie)
> > {
>
> Two notes as ideas for consideration:
>
> 1) We could derive 'sb' from 'data'. I.e., have a helper like
> fsnotify_data_sb(data, data_type). For FSNOTIFY_EVENT_PATH and
> FSNOTIFY_EVENT_INODE this is easy to provide, for FSNOTIFY_EVENT_ERROR we
> would have to add sb pointer to the structure but I guess that's easy. That
> way we'd avoid the mostly NULL 'sb' argument. What do you guys think?

I think that's a great and simple idea that escaped me.

>
> 2) AFAICS 'inode' can be always derived from 'data' as well. So maybe we
> can drop it Amir?

If only we could. The reason that we pass the allegedly redundant inode
argument is because there are two different distinguished inode
arguments:

1. The inode event happened on, which can be referenced from data
2. Inode that may be marked, which is passed in the inode argument

Particularly, dirent events carry the inode of the child as data, but
intentionally pass NULL inode arguments, because mark on inode
itself should not be getting e.g. FAN_DELETE event, but
audit_mark_handle_event() uses the child inode data.

If we wanted to, we could pass report_mask arg to fsnotify()
instead of inode arg and then fsnotify() will build iter_info
accordingly, but that sounds very complicated and doesn't gain
much.

There could be a simpler solution that I am missing...

Thanks,
Amir.

2021-08-05 15:57:43

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v5 10/23] fsnotify: Allow events reported with an empty inode

On Thu 05-08-21 17:14:26, Amir Goldstein wrote:
> > 2) AFAICS 'inode' can be always derived from 'data' as well. So maybe we
> > can drop it Amir?
>
> If only we could. The reason that we pass the allegedly redundant inode
> argument is because there are two different distinguished inode
> arguments:
>
> 1. The inode event happened on, which can be referenced from data
> 2. Inode that may be marked, which is passed in the inode argument
>
> Particularly, dirent events carry the inode of the child as data, but
> intentionally pass NULL inode arguments, because mark on inode
> itself should not be getting e.g. FAN_DELETE event, but
> audit_mark_handle_event() uses the child inode data.

I see, thanks for explanation. I forgot that NULL 'inode' argument from
fsnotify_name() is actually needed for this to work.

> If we wanted to, we could pass report_mask arg to fsnotify()
> instead of inode arg and then fsnotify() will build iter_info
> accordingly, but that sounds very complicated and doesn't gain
> much.

Yeah. I'll think a bit more if we could simplify this but now I don't see
anything obvious.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-10 04:37:26

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH v5 18/23] fanotify: Handle FAN_FS_ERROR events

Jan Kara <[email protected]> writes:
>> @@ -760,6 +796,18 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
>> return 0;
>> }
>>
>> + if (fanotify_is_error_event(mask)) {
>> + struct fanotify_sb_mark *sb_mark =
>> + FANOTIFY_SB_MARK(fsnotify_iter_sb_mark(iter_info));
>> +
>> + ret = fsnotify_insert_event(group,
>> + &sb_mark->fee_slot->fae.fse,
>> + fanotify_merge_error_event,
>> + fanotify_insert_error_event,
>> + data);
>> + goto finish;
>> + }
>
> Hum, seeing this and how you had to extend fsnotify_add_event() to
> accommodate this use, cannot we instead have something like:
>
> if (fanotify_is_error_event(mask)) {
> struct fanotify_sb_mark *sb_mark =
> FANOTIFY_SB_MARK(fsnotify_iter_sb_mark(iter_info));
> struct fanotify_error_event *event = &sb_mark->fee_slot;
> bool queue = false;
>
> spin_lock(&group->notification_lock);
> /* Not yet queued? */
> if (!event->err_count) {
> fee->error = report->error;
> queue = true;
> }
> event->err_count++;
> spin_unlock(&group->notification_lock);
> if (queue) {
> ... fill in other error info in 'event' such as fhandle
> fsnotify_add_event(group, &event->fae.fse, NULL);
> }
> }
>
> It would be IMHO simpler to follow what's going on and we don't have to
> touch fsnotify_add_event(). I do recognize that due to races it may happen
> that some racing fsnotify(FAN_FS_ERROR) call returns before the event is
> actually visible in the event queue. It don't think it really matters but
> if we wanted to be more careful, we would need to preformat fhandle into a
> local buffer and only copy it into the event under notification_lock when
> we see the event is unused.

Hi Jan,

This is actually similar to my first implementation too (like what
Amir said about the hunk below). It is a shame, cause I really like
the current version better, but the point about not doing the FH
encoding under the notification_lock makes a lot of sense. I will
revert to the previous approach.

>> +/*
>> + * Replace a mark's error event with a new structure in preparation for
>> + * it to be dequeued. This is a bit annoying since we need to drop the
>> + * lock, so another thread might just steal the event from us.
>> + */
>> +static int fanotify_replace_fs_error_event(struct fsnotify_group *group,
>> + struct fanotify_event *fae)
>> +{
>> + struct fanotify_error_event *new, *fee = FANOTIFY_EE(fae);
>> + struct fanotify_sb_mark *sb_mark = fee->sb_mark;
>> + struct fsnotify_event *fse;
>> +
>> + pr_debug("%s: event=%p\n", __func__, fae);
>> +
>> + assert_spin_locked(&group->notification_lock);
>> +
>> + spin_unlock(&group->notification_lock);
>> + new = fanotify_alloc_error_event(sb_mark);
>> + spin_lock(&group->notification_lock);
>> +
>> + if (!new)
>> + return -ENOMEM;
>> +
>> + /*
>> + * Since we temporarily dropped the notification_lock, the event
>> + * might have been taken from under us and reported by another
>> + * reader. If that is the case, don't play games, just retry.
>> + */
>> + fse = fsnotify_peek_first_event(group);
>> + if (fse != &fae->fse) {
>> + kfree(new);
>> + return -EAGAIN;
>> + }
>> +
>> + sb_mark->fee_slot = new;
>> +
>> + return 0;
>> +}
>> +
>> /*
>> * Get an fanotify notification event if one exists and is small
>> * enough to fit in "count". Return an error pointer if the count
>> @@ -212,9 +252,21 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group,
>> goto out;
>> }
>>
>> + if (fanotify_is_error_event(event->mask)) {
>> + /*
>> + * Replace the error event ahead of dequeueing so we
>> + * don't need to handle a incorrectly dequeued event.
>> + */
>> + ret = fanotify_replace_fs_error_event(group, event);
>> + if (ret) {
>> + event = ERR_PTR(ret);
>> + goto out;
>> + }
>> + }
>> +
> The replacing, retry, and all is hairy. Cannot we just keep the same event
> attached to the sb mark and copy-out to on-stack buffer under
> notification_lock in get_one_event()? The event is big (due to fhandle) but
> fanotify_read() is not called from a deep call chain so we should have
> enough space on stack for that.



--
Gabriel Krisman Bertazi

2021-08-11 21:12:42

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH v5 14/23] fanotify: Encode invalid file handler when no inode is provided

Jan Kara <[email protected]> writes:

> On Wed 04-08-21 12:06:03, Gabriel Krisman Bertazi wrote:
>> Instead of failing, encode an invalid file handler in fanotify_encode_fh
>> if no inode is provided. This bogus file handler will be reported by
>> FAN_FS_ERROR for non-inode errors.
>>
>> Also adjust the single caller that might rely on failure after passing
>> an empty inode.
>
> It is not 'file handler' but rather 'file handle' - several times in the
> changelog and in subject :).
>
>> Suggested-by: Amir Goldstein <[email protected]>
>> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
>> ---
>> fs/notify/fanotify/fanotify.c | 39 ++++++++++++++++++++---------------
>> fs/notify/fanotify/fanotify.h | 6 ++++--
>> 2 files changed, 26 insertions(+), 19 deletions(-)
>>
>> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
>> index 0d6ba218bc01..456c60107d88 100644
>> --- a/fs/notify/fanotify/fanotify.c
>> +++ b/fs/notify/fanotify/fanotify.c
>> @@ -349,12 +349,6 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
>> void *buf = fh->buf;
>> int err;
>>
>> - fh->type = FILEID_ROOT;
>> - fh->len = 0;
>> - fh->flags = 0;
>> - if (!inode)
>> - return 0;
>> -
>
> I'd keep the fh->flags initialization here. Otherwise it will not be
> initialized on some error returns.
>
>> @@ -363,8 +357,9 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
>> if (fh_len < 4 || WARN_ON_ONCE(fh_len % 4))
>> goto out_err;
>>
>> - /* No external buffer in a variable size allocated fh */
>> - if (gfp && fh_len > FANOTIFY_INLINE_FH_LEN) {
>> + fh->flags = 0;
>> + /* No external buffer in a variable size allocated fh or null fh */
>> + if (inode && gfp && fh_len > FANOTIFY_INLINE_FH_LEN) {
>> /* Treat failure to allocate fh as failure to encode fh */
>> err = -ENOMEM;
>> ext_buf = kmalloc(fh_len, gfp);
>> @@ -376,14 +371,24 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
>> fh->flags |= FANOTIFY_FH_FLAG_EXT_BUF;
>> }
>>
>> - dwords = fh_len >> 2;
>> - type = exportfs_encode_inode_fh(inode, buf, &dwords, NULL);
>> - err = -EINVAL;
>> - if (!type || type == FILEID_INVALID || fh_len != dwords << 2)
>> - goto out_err;
>> -
>> - fh->type = type;
>> - fh->len = fh_len;
>> + if (inode) {
>> + dwords = fh_len >> 2;
>> + type = exportfs_encode_inode_fh(inode, buf, &dwords, NULL);
>> + err = -EINVAL;
>> + if (!type || type == FILEID_INVALID || fh_len != dwords << 2)
>> + goto out_err;
>> + fh->type = type;
>> + fh->len = fh_len;
>> + } else {
>> + /*
>> + * Invalid FHs are used on FAN_FS_ERROR for errors not
>> + * linked to any inode. Caller needs to guarantee the fh
>> + * has at least FANOTIFY_NULL_FH_LEN bytes of space.
>> + */
>> + fh->type = FILEID_INVALID;
>> + fh->len = FANOTIFY_NULL_FH_LEN;
>> + memset(buf, 0, FANOTIFY_NULL_FH_LEN);
>> + }
>
> Maybe it will become clearer later during the series but why do you set
> fh->len to FANOTIFY_NULL_FH_LEN and not 0?

Jan,

That is how we encode a NULL file handle (i.e. superblock error). Amir
suggested it would be an invalid FILEID_INVALID, with a zeroed handle of
size 8. I will improve the comment on the next iteration.

--
Gabriel Krisman Bertazi

2021-08-12 15:13:44

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v5 14/23] fanotify: Encode invalid file handler when no inode is provided

On Wed 11-08-21 17:12:05, Gabriel Krisman Bertazi wrote:
> Jan Kara <[email protected]> writes:
> >> @@ -376,14 +371,24 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
> >> fh->flags |= FANOTIFY_FH_FLAG_EXT_BUF;
> >> }
> >>
> >> - dwords = fh_len >> 2;
> >> - type = exportfs_encode_inode_fh(inode, buf, &dwords, NULL);
> >> - err = -EINVAL;
> >> - if (!type || type == FILEID_INVALID || fh_len != dwords << 2)
> >> - goto out_err;
> >> -
> >> - fh->type = type;
> >> - fh->len = fh_len;
> >> + if (inode) {
> >> + dwords = fh_len >> 2;
> >> + type = exportfs_encode_inode_fh(inode, buf, &dwords, NULL);
> >> + err = -EINVAL;
> >> + if (!type || type == FILEID_INVALID || fh_len != dwords << 2)
> >> + goto out_err;
> >> + fh->type = type;
> >> + fh->len = fh_len;
> >> + } else {
> >> + /*
> >> + * Invalid FHs are used on FAN_FS_ERROR for errors not
> >> + * linked to any inode. Caller needs to guarantee the fh
> >> + * has at least FANOTIFY_NULL_FH_LEN bytes of space.
> >> + */
> >> + fh->type = FILEID_INVALID;
> >> + fh->len = FANOTIFY_NULL_FH_LEN;
> >> + memset(buf, 0, FANOTIFY_NULL_FH_LEN);
> >> + }
> >
> > Maybe it will become clearer later during the series but why do you set
> > fh->len to FANOTIFY_NULL_FH_LEN and not 0?
>
> Jan,
>
> That is how we encode a NULL file handle (i.e. superblock error). Amir
> suggested it would be an invalid FILEID_INVALID, with a zeroed handle of
> size 8. I will improve the comment on the next iteration.

Thanks for info. Then I have a question for Amir I guess :) Amir, what's
the advantage of zeroed handle of size 8 instead of just 0 length file
handle?

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-12 15:30:49

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH v5 14/23] fanotify: Encode invalid file handler when no inode is provided

Jan Kara <[email protected]> writes:

> On Wed 11-08-21 17:12:05, Gabriel Krisman Bertazi wrote:
>> Jan Kara <[email protected]> writes:
>> >> @@ -376,14 +371,24 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
>> >> fh->flags |= FANOTIFY_FH_FLAG_EXT_BUF;
>> >> }
>> >>
>> >> - dwords = fh_len >> 2;
>> >> - type = exportfs_encode_inode_fh(inode, buf, &dwords, NULL);
>> >> - err = -EINVAL;
>> >> - if (!type || type == FILEID_INVALID || fh_len != dwords << 2)
>> >> - goto out_err;
>> >> -
>> >> - fh->type = type;
>> >> - fh->len = fh_len;
>> >> + if (inode) {
>> >> + dwords = fh_len >> 2;
>> >> + type = exportfs_encode_inode_fh(inode, buf, &dwords, NULL);
>> >> + err = -EINVAL;
>> >> + if (!type || type == FILEID_INVALID || fh_len != dwords << 2)
>> >> + goto out_err;
>> >> + fh->type = type;
>> >> + fh->len = fh_len;
>> >> + } else {
>> >> + /*
>> >> + * Invalid FHs are used on FAN_FS_ERROR for errors not
>> >> + * linked to any inode. Caller needs to guarantee the fh
>> >> + * has at least FANOTIFY_NULL_FH_LEN bytes of space.
>> >> + */
>> >> + fh->type = FILEID_INVALID;
>> >> + fh->len = FANOTIFY_NULL_FH_LEN;
>> >> + memset(buf, 0, FANOTIFY_NULL_FH_LEN);
>> >> + }
>> >
>> > Maybe it will become clearer later during the series but why do you set
>> > fh->len to FANOTIFY_NULL_FH_LEN and not 0?
>>
>> Jan,
>>
>> That is how we encode a NULL file handle (i.e. superblock error). Amir
>> suggested it would be an invalid FILEID_INVALID, with a zeroed handle of
>> size 8. I will improve the comment on the next iteration.
>
> Thanks for info. Then I have a question for Amir I guess :) Amir, what's
> the advantage of zeroed handle of size 8 instead of just 0 length file
> handle?

Jan,

Looking back at the email from Amir, I realize I misunderstood his
original suggestion. Amir suggested it be FILEID_INVALID with 0-len OR
FILEID_INO32_GEN with zeroed fields. I mixed the two suggestions.

The advantage of doing FILEID_INO32_GEN with zeroed field is to avoid
special casing the test program. But I don't have a good reason to use
FILEID_INVALID with a len > 0.

I'm sending a v6 with everything, including this, addressed. testcase
and man pages will be updated as well.

--
Gabriel Krisman Bertazi

2021-08-12 15:31:50

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH v5 14/23] fanotify: Encode invalid file handler when no inode is provided

On Thu, Aug 12, 2021 at 5:20 PM Jan Kara <[email protected]> wrote:
>
> On Wed 11-08-21 17:12:05, Gabriel Krisman Bertazi wrote:
> > Jan Kara <[email protected]> writes:
> > >> @@ -376,14 +371,24 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
> > >> fh->flags |= FANOTIFY_FH_FLAG_EXT_BUF;
> > >> }
> > >>
> > >> - dwords = fh_len >> 2;
> > >> - type = exportfs_encode_inode_fh(inode, buf, &dwords, NULL);
> > >> - err = -EINVAL;
> > >> - if (!type || type == FILEID_INVALID || fh_len != dwords << 2)
> > >> - goto out_err;
> > >> -
> > >> - fh->type = type;
> > >> - fh->len = fh_len;
> > >> + if (inode) {
> > >> + dwords = fh_len >> 2;
> > >> + type = exportfs_encode_inode_fh(inode, buf, &dwords, NULL);
> > >> + err = -EINVAL;
> > >> + if (!type || type == FILEID_INVALID || fh_len != dwords << 2)
> > >> + goto out_err;
> > >> + fh->type = type;
> > >> + fh->len = fh_len;
> > >> + } else {
> > >> + /*
> > >> + * Invalid FHs are used on FAN_FS_ERROR for errors not
> > >> + * linked to any inode. Caller needs to guarantee the fh
> > >> + * has at least FANOTIFY_NULL_FH_LEN bytes of space.
> > >> + */
> > >> + fh->type = FILEID_INVALID;
> > >> + fh->len = FANOTIFY_NULL_FH_LEN;
> > >> + memset(buf, 0, FANOTIFY_NULL_FH_LEN);
> > >> + }
> > >
> > > Maybe it will become clearer later during the series but why do you set
> > > fh->len to FANOTIFY_NULL_FH_LEN and not 0?
> >
> > Jan,
> >
> > That is how we encode a NULL file handle (i.e. superblock error). Amir
> > suggested it would be an invalid FILEID_INVALID, with a zeroed handle of
> > size 8. I will improve the comment on the next iteration.
>
> Thanks for info. Then I have a question for Amir I guess :) Amir, what's
> the advantage of zeroed handle of size 8 instead of just 0 length file
> handle?

With current code, zero fh->len means we are not reporting an FID info
record (e.g. due to encode error), see copy_info_records_to_user().

This is because fh->len plays a dual role for indicating the length of the
file handle and the existence of FID info.

I figured that keeping a positive length for the special NULL_FH is an
easy way to workaround this ambiguity and keep the code simpler.
We don't really need to pay any cost for keeping the 8 bytes zero buffer.

Thanks,
Amir.

2021-08-12 15:52:05

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH v5 14/23] fanotify: Encode invalid file handler when no inode is provided

On Thu, Aug 12, 2021 at 6:14 PM Gabriel Krisman Bertazi
<[email protected]> wrote:
>
> Jan Kara <[email protected]> writes:
>
> > On Wed 11-08-21 17:12:05, Gabriel Krisman Bertazi wrote:
> >> Jan Kara <[email protected]> writes:
> >> >> @@ -376,14 +371,24 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
> >> >> fh->flags |= FANOTIFY_FH_FLAG_EXT_BUF;
> >> >> }
> >> >>
> >> >> - dwords = fh_len >> 2;
> >> >> - type = exportfs_encode_inode_fh(inode, buf, &dwords, NULL);
> >> >> - err = -EINVAL;
> >> >> - if (!type || type == FILEID_INVALID || fh_len != dwords << 2)
> >> >> - goto out_err;
> >> >> -
> >> >> - fh->type = type;
> >> >> - fh->len = fh_len;
> >> >> + if (inode) {
> >> >> + dwords = fh_len >> 2;
> >> >> + type = exportfs_encode_inode_fh(inode, buf, &dwords, NULL);
> >> >> + err = -EINVAL;
> >> >> + if (!type || type == FILEID_INVALID || fh_len != dwords << 2)
> >> >> + goto out_err;
> >> >> + fh->type = type;
> >> >> + fh->len = fh_len;
> >> >> + } else {
> >> >> + /*
> >> >> + * Invalid FHs are used on FAN_FS_ERROR for errors not
> >> >> + * linked to any inode. Caller needs to guarantee the fh
> >> >> + * has at least FANOTIFY_NULL_FH_LEN bytes of space.
> >> >> + */
> >> >> + fh->type = FILEID_INVALID;
> >> >> + fh->len = FANOTIFY_NULL_FH_LEN;
> >> >> + memset(buf, 0, FANOTIFY_NULL_FH_LEN);
> >> >> + }
> >> >
> >> > Maybe it will become clearer later during the series but why do you set
> >> > fh->len to FANOTIFY_NULL_FH_LEN and not 0?
> >>
> >> Jan,
> >>
> >> That is how we encode a NULL file handle (i.e. superblock error). Amir
> >> suggested it would be an invalid FILEID_INVALID, with a zeroed handle of
> >> size 8. I will improve the comment on the next iteration.
> >
> > Thanks for info. Then I have a question for Amir I guess :) Amir, what's
> > the advantage of zeroed handle of size 8 instead of just 0 length file
> > handle?
>
> Jan,
>
> Looking back at the email from Amir, I realize I misunderstood his
> original suggestion. Amir suggested it be FILEID_INVALID with 0-len OR
> FILEID_INO32_GEN with zeroed fields. I mixed the two suggestions.
>

That was from a discussion about UAPI and agree it doesn't make much sense
to report non-zero handle_bytes and invalid handle_type.

But specifically, Jan's question above was directly referring to the internal
representation of the event.

Since you are going to allocate a file handle buffer of size MAX_HANDLE_SZ
(right?), then there is no caveat in declaring the NULL_FH with positive size
internally, which I think, simplifies the implementation.
NULL_FH internal length could be 4 instead of 8 though.

You will not have to special case fanotify_fid_info_len() and the info record
hdr.len will include a 4 bytes zero padding after the fid info record
with NULL_FH.

You will only special case this line in copy_fid_info_to_user():
handle.handle_type = fh->type;
if (fh->type != FILEID_INVALID)
handle.handle_bytes = fh_len;

Thanks,
Amir.

2021-08-13 12:13:20

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v5 14/23] fanotify: Encode invalid file handler when no inode is provided

On Thu 12-08-21 18:17:10, Amir Goldstein wrote:
> On Thu, Aug 12, 2021 at 5:20 PM Jan Kara <[email protected]> wrote:
> >
> > On Wed 11-08-21 17:12:05, Gabriel Krisman Bertazi wrote:
> > > Jan Kara <[email protected]> writes:
> > > >> @@ -376,14 +371,24 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
> > > >> fh->flags |= FANOTIFY_FH_FLAG_EXT_BUF;
> > > >> }
> > > >>
> > > >> - dwords = fh_len >> 2;
> > > >> - type = exportfs_encode_inode_fh(inode, buf, &dwords, NULL);
> > > >> - err = -EINVAL;
> > > >> - if (!type || type == FILEID_INVALID || fh_len != dwords << 2)
> > > >> - goto out_err;
> > > >> -
> > > >> - fh->type = type;
> > > >> - fh->len = fh_len;
> > > >> + if (inode) {
> > > >> + dwords = fh_len >> 2;
> > > >> + type = exportfs_encode_inode_fh(inode, buf, &dwords, NULL);
> > > >> + err = -EINVAL;
> > > >> + if (!type || type == FILEID_INVALID || fh_len != dwords << 2)
> > > >> + goto out_err;
> > > >> + fh->type = type;
> > > >> + fh->len = fh_len;
> > > >> + } else {
> > > >> + /*
> > > >> + * Invalid FHs are used on FAN_FS_ERROR for errors not
> > > >> + * linked to any inode. Caller needs to guarantee the fh
> > > >> + * has at least FANOTIFY_NULL_FH_LEN bytes of space.
> > > >> + */
> > > >> + fh->type = FILEID_INVALID;
> > > >> + fh->len = FANOTIFY_NULL_FH_LEN;
> > > >> + memset(buf, 0, FANOTIFY_NULL_FH_LEN);
> > > >> + }
> > > >
> > > > Maybe it will become clearer later during the series but why do you set
> > > > fh->len to FANOTIFY_NULL_FH_LEN and not 0?
> > >
> > > Jan,
> > >
> > > That is how we encode a NULL file handle (i.e. superblock error). Amir
> > > suggested it would be an invalid FILEID_INVALID, with a zeroed handle of
> > > size 8. I will improve the comment on the next iteration.
> >
> > Thanks for info. Then I have a question for Amir I guess :) Amir, what's
> > the advantage of zeroed handle of size 8 instead of just 0 length file
> > handle?
>
> With current code, zero fh->len means we are not reporting an FID info
> record (e.g. due to encode error), see copy_info_records_to_user().
>
> This is because fh->len plays a dual role for indicating the length of the
> file handle and the existence of FID info.

I see, thanks for info.

> I figured that keeping a positive length for the special NULL_FH is an
> easy way to workaround this ambiguity and keep the code simpler.
> We don't really need to pay any cost for keeping the 8 bytes zero buffer.

There are two separate questions:
1) How do we internally propagate the information that we don't have
file_handle to report but we do want fsid reported.
2) What do we report to userspace in file_handle.

For 2) I think we should report fsid + FILEID_INVALID, 0-length filehandle.
Currently the non-zero lenght FILEID_INVALID filehandle was propagating to
userspace and IMO that's confusing. For 1), whatever is the simplest to
propagate the information "we want only fsid reported" internally is fine
by me.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-13 17:26:44

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH v5 14/23] fanotify: Encode invalid file handler when no inode is provided

On Fri, Aug 13, 2021 at 3:09 PM Jan Kara <[email protected]> wrote:
>
> On Thu 12-08-21 18:17:10, Amir Goldstein wrote:
> > On Thu, Aug 12, 2021 at 5:20 PM Jan Kara <[email protected]> wrote:
> > >
> > > On Wed 11-08-21 17:12:05, Gabriel Krisman Bertazi wrote:
> > > > Jan Kara <[email protected]> writes:
> > > > >> @@ -376,14 +371,24 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
> > > > >> fh->flags |= FANOTIFY_FH_FLAG_EXT_BUF;
> > > > >> }
> > > > >>
> > > > >> - dwords = fh_len >> 2;
> > > > >> - type = exportfs_encode_inode_fh(inode, buf, &dwords, NULL);
> > > > >> - err = -EINVAL;
> > > > >> - if (!type || type == FILEID_INVALID || fh_len != dwords << 2)
> > > > >> - goto out_err;
> > > > >> -
> > > > >> - fh->type = type;
> > > > >> - fh->len = fh_len;
> > > > >> + if (inode) {
> > > > >> + dwords = fh_len >> 2;
> > > > >> + type = exportfs_encode_inode_fh(inode, buf, &dwords, NULL);
> > > > >> + err = -EINVAL;
> > > > >> + if (!type || type == FILEID_INVALID || fh_len != dwords << 2)
> > > > >> + goto out_err;
> > > > >> + fh->type = type;
> > > > >> + fh->len = fh_len;
> > > > >> + } else {
> > > > >> + /*
> > > > >> + * Invalid FHs are used on FAN_FS_ERROR for errors not
> > > > >> + * linked to any inode. Caller needs to guarantee the fh
> > > > >> + * has at least FANOTIFY_NULL_FH_LEN bytes of space.
> > > > >> + */
> > > > >> + fh->type = FILEID_INVALID;
> > > > >> + fh->len = FANOTIFY_NULL_FH_LEN;
> > > > >> + memset(buf, 0, FANOTIFY_NULL_FH_LEN);
> > > > >> + }
> > > > >
> > > > > Maybe it will become clearer later during the series but why do you set
> > > > > fh->len to FANOTIFY_NULL_FH_LEN and not 0?
> > > >
> > > > Jan,
> > > >
> > > > That is how we encode a NULL file handle (i.e. superblock error). Amir
> > > > suggested it would be an invalid FILEID_INVALID, with a zeroed handle of
> > > > size 8. I will improve the comment on the next iteration.
> > >
> > > Thanks for info. Then I have a question for Amir I guess :) Amir, what's
> > > the advantage of zeroed handle of size 8 instead of just 0 length file
> > > handle?
> >
> > With current code, zero fh->len means we are not reporting an FID info
> > record (e.g. due to encode error), see copy_info_records_to_user().
> >
> > This is because fh->len plays a dual role for indicating the length of the
> > file handle and the existence of FID info.
>
> I see, thanks for info.
>
> > I figured that keeping a positive length for the special NULL_FH is an
> > easy way to workaround this ambiguity and keep the code simpler.
> > We don't really need to pay any cost for keeping the 8 bytes zero buffer.
>
> There are two separate questions:
> 1) How do we internally propagate the information that we don't have
> file_handle to report but we do want fsid reported.
> 2) What do we report to userspace in file_handle.
>
> For 2) I think we should report fsid + FILEID_INVALID, 0-length filehandle.
> Currently the non-zero lenght FILEID_INVALID filehandle was propagating to
> userspace and IMO that's confusing.

Agree. That was implemented in v6.

> For 1), whatever is the simplest to
> propagate the information "we want only fsid reported" internally is fine
> by me.
>

Ok. I think it would be fair (based on v6 patches) to call the fh->len = 4
option "simple".

But following my "simple" suggestion as is, v6 has a side effect of
adding 4 bytes of zero padding after the event fid info record.

Can you please confirm that this side effect is fine by you as well.
It wouldn't be too hard to special case FILEID_INVALID and also
truncate those 4 bytes of zero padding, but I really don't think it is
worth the effort.

Thanks,
Amir.