2021-10-25 20:10:45

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v9 00/31] file system-wide error monitoring

Hi,

This is the 9th version of this patch series. Thank you, Amir, Jan and
Ted, for the feedback in the previous versions.

The main difference in this version is that the pool is no longer
resizeable nor limited in number of marks, even though we only
pre-allocate 32 slots. In addition, ext4 was modified to always return
non-zero errno, and the documentation was fixed accordingly (No longer
suggests we return EXT4_ERR* values.

I also droped the Reviewed-by tags from the ext4 patch, due to the
changes above.

Please let me know what you think.

This was tested with LTP for regressions and also using the sample code
on the last patch, with a corrupted image. I wrote a new ltp test for
this feature which is being reviewed and is available at:

https://gitlab.collabora.com/krisman/ltp -b fan-fs-error

In addition, I wrote a man-page that can be pulled from:

https://gitlab.collabora.com/krisman/man-pages.git -b fan-fs-error

And is being reviewed at the list.

I also pushed this full series to:

https://gitlab.collabora.com/krisman/linux -b fanotify-notifications-v8

Thank you

Cc: Darrick J. Wong <[email protected]>
Cc: Theodore Ts'o <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: [email protected]
To: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

Amir Goldstein (3):
fsnotify: pass data_type to fsnotify_name()
fsnotify: pass dentry instead of inode data
fsnotify: clarify contract for create event hooks

Gabriel Krisman Bertazi (28):
fsnotify: Don't insert unmergeable events in hashtable
fanotify: Fold event size calculation to its own function
fanotify: Split fsid check from other fid mode checks
inotify: Don't force FS_IN_IGNORED
fsnotify: Add helper to detect overflow_event
fsnotify: Add wrapper around fsnotify_add_event
fsnotify: Retrieve super block from the data field
fsnotify: Protect fsnotify_handle_inode_event from no-inode events
fsnotify: Pass group argument to free_event
fanotify: Support null inode event in fanotify_dfid_inode
fanotify: Allow file handle encoding for unhashed events
fanotify: Encode empty file handle when no inode is provided
fanotify: Require fid_mode for any non-fd event
fsnotify: Support FS_ERROR event type
fanotify: Reserve UAPI bits for FAN_FS_ERROR
fanotify: Pre-allocate pool of error events
fanotify: Support enqueueing of error events
fanotify: Support merging of error events
fanotify: Wrap object_fh inline space in a creator macro
fanotify: Add helpers to decide whether to report FID/DFID
fanotify: Report fid entry even for zero-length file_handle
fanotify: WARN_ON against too large file handles
fanotify: Report fid info for file related file system errors
fanotify: Emit generic error info for error event
fanotify: Allow users to request FAN_FS_ERROR events
ext4: Send notifications on error
samples: Add fs error monitoring example
docs: Document the FAN_FS_ERROR event

.../admin-guide/filesystem-monitoring.rst | 74 +++++++++
Documentation/admin-guide/index.rst | 1 +
fs/ext4/super.c | 8 +
fs/nfsd/filecache.c | 3 +
fs/notify/fanotify/fanotify.c | 117 +++++++++++--
fs/notify/fanotify/fanotify.h | 54 +++++-
fs/notify/fanotify/fanotify_user.c | 156 +++++++++++++-----
fs/notify/fsnotify.c | 10 +-
fs/notify/group.c | 2 +-
fs/notify/inotify/inotify_fsnotify.c | 5 +-
fs/notify/inotify/inotify_user.c | 6 +-
fs/notify/notification.c | 14 +-
include/linux/fanotify.h | 9 +-
include/linux/fsnotify.h | 58 +++++--
include/linux/fsnotify_backend.h | 96 ++++++++++-
include/uapi/linux/fanotify.h | 8 +
kernel/audit_fsnotify.c | 3 +-
kernel/audit_watch.c | 3 +-
samples/Kconfig | 9 +
samples/Makefile | 1 +
samples/fanotify/Makefile | 5 +
samples/fanotify/fs-monitor.c | 142 ++++++++++++++++
22 files changed, 685 insertions(+), 99 deletions(-)
create mode 100644 Documentation/admin-guide/filesystem-monitoring.rst
create mode 100644 samples/fanotify/Makefile
create mode 100644 samples/fanotify/fs-monitor.c

--
2.33.0


2021-10-25 20:11:13

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v9 17/31] fsnotify: Support FS_ERROR event type

Expose a new type of fsnotify event for filesystems to report errors for
userspace monitoring tools. fanotify will send this type of
notification for FAN_FS_ERROR events. This also introduce a helper for
generating the new event.

Reviewed-by: Amir Goldstein <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

---
Changes since v6:
- Add fsnotify_data_error_report

Changes since v5:
- pass sb inside data field (jan)
Changes since v3:
- Squash patch ("fsnotify: Introduce helpers to send error_events")
- Drop reviewed-bys!

Changes since v2:
- FAN_ERROR->FAN_FS_ERROR (Amir)

Changes since v1:
- Overload FS_ERROR with FS_IN_IGNORED
- Implement support for this type on fsnotify_data_inode (Amir)
---
include/linux/fsnotify.h | 13 +++++++++++++
include/linux/fsnotify_backend.h | 32 +++++++++++++++++++++++++++++++-
2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 1e5f7435a4b5..787545e87eeb 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -339,4 +339,17 @@ static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
fsnotify_dentry(dentry, mask);
}

+static inline int fsnotify_sb_error(struct super_block *sb, struct inode *inode,
+ int error)
+{
+ struct fs_error_report report = {
+ .error = error,
+ .inode = inode,
+ .sb = sb,
+ };
+
+ return fsnotify(FS_ERROR, &report, FSNOTIFY_EVENT_ERROR,
+ NULL, NULL, NULL, 0);
+}
+
#endif /* _LINUX_FS_NOTIFY_H */
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 3a7c31436182..00dbaafbcf95 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -42,6 +42,12 @@

#define FS_UNMOUNT 0x00002000 /* inode on umount fs */
#define FS_Q_OVERFLOW 0x00004000 /* Event queued overflowed */
+#define FS_ERROR 0x00008000 /* Filesystem Error (fanotify) */
+
+/*
+ * FS_IN_IGNORED overloads FS_ERROR. It is only used internally by inotify
+ * which does not support FS_ERROR.
+ */
#define FS_IN_IGNORED 0x00008000 /* last inotify event here */

#define FS_OPEN_PERM 0x00010000 /* open event in an permission hook */
@@ -95,7 +101,8 @@
#define ALL_FSNOTIFY_EVENTS (ALL_FSNOTIFY_DIRENT_EVENTS | \
FS_EVENTS_POSS_ON_CHILD | \
FS_DELETE_SELF | FS_MOVE_SELF | FS_DN_RENAME | \
- FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED)
+ FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED | \
+ FS_ERROR)

/* Extra flags that may be reported with event or control handling of events */
#define ALL_FSNOTIFY_FLAGS (FS_EXCL_UNLINK | FS_ISDIR | FS_IN_ONESHOT | \
@@ -250,6 +257,13 @@ enum fsnotify_data_type {
FSNOTIFY_EVENT_PATH,
FSNOTIFY_EVENT_INODE,
FSNOTIFY_EVENT_DENTRY,
+ FSNOTIFY_EVENT_ERROR,
+};
+
+struct fs_error_report {
+ int error;
+ struct inode *inode;
+ struct super_block *sb;
};

static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
@@ -261,6 +275,8 @@ static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
return d_inode(data);
case FSNOTIFY_EVENT_PATH:
return d_inode(((const struct path *)data)->dentry);
+ case FSNOTIFY_EVENT_ERROR:
+ return ((struct fs_error_report *)data)->inode;
default:
return NULL;
}
@@ -300,6 +316,20 @@ static inline struct super_block *fsnotify_data_sb(const void *data,
return ((struct dentry *)data)->d_sb;
case FSNOTIFY_EVENT_PATH:
return ((const struct path *)data)->dentry->d_sb;
+ case FSNOTIFY_EVENT_ERROR:
+ return ((struct fs_error_report *) data)->sb;
+ default:
+ return NULL;
+ }
+}
+
+static inline struct fs_error_report *fsnotify_data_error_report(
+ const void *data,
+ int data_type)
+{
+ switch (data_type) {
+ case FSNOTIFY_EVENT_ERROR:
+ return (struct fs_error_report *) data;
default:
return NULL;
}
--
2.33.0

2021-10-25 20:11:13

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v9 20/31] fanotify: Support enqueueing of error events

Once an error event is triggered, enqueue it in the notification group,
similarly to what is done for other events. FAN_FS_ERROR is not
handled specially, since the memory is now handled by a preallocated
mempool.

For now, make the event unhashed. A future patch implements merging of
this kind of event.

Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Amir Goldstein <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

---
Changes since v7:
- WARN_ON -> WARN_ON_ONCE (Amir)
---
fs/notify/fanotify/fanotify.c | 35 +++++++++++++++++++++++++++++++++++
fs/notify/fanotify/fanotify.h | 6 ++++++
2 files changed, 41 insertions(+)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 01d68dfc74aa..1f195c95dfcd 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -574,6 +574,27 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *id,
return &fne->fae;
}

+static struct fanotify_event *fanotify_alloc_error_event(
+ struct fsnotify_group *group,
+ __kernel_fsid_t *fsid,
+ const void *data, int data_type)
+{
+ struct fs_error_report *report =
+ fsnotify_data_error_report(data, data_type);
+ struct fanotify_error_event *fee;
+
+ if (WARN_ON_ONCE(!report))
+ return NULL;
+
+ fee = mempool_alloc(&group->fanotify_data.error_events_pool, GFP_NOFS);
+ if (!fee)
+ return NULL;
+
+ fee->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR;
+
+ return &fee->fae;
+}
+
static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
u32 mask, const void *data,
int data_type, struct inode *dir,
@@ -641,6 +662,9 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,

if (fanotify_is_perm_event(mask)) {
event = fanotify_alloc_perm_event(path, gfp);
+ } else if (fanotify_is_error_event(mask)) {
+ event = fanotify_alloc_error_event(group, fsid, data,
+ data_type);
} else if (name_event && (file_name || child)) {
event = fanotify_alloc_name_event(id, fsid, file_name, child,
&hash, gfp);
@@ -850,6 +874,14 @@ static void fanotify_free_name_event(struct fanotify_event *event)
kfree(FANOTIFY_NE(event));
}

+static void fanotify_free_error_event(struct fsnotify_group *group,
+ struct fanotify_event *event)
+{
+ struct fanotify_error_event *fee = FANOTIFY_EE(event);
+
+ mempool_free(fee, &group->fanotify_data.error_events_pool);
+}
+
static void fanotify_free_event(struct fsnotify_group *group,
struct fsnotify_event *fsn_event)
{
@@ -873,6 +905,9 @@ static void fanotify_free_event(struct fsnotify_group *group,
case FANOTIFY_EVENT_TYPE_OVERFLOW:
kfree(event);
break;
+ case FANOTIFY_EVENT_TYPE_FS_ERROR:
+ fanotify_free_error_event(group, event);
+ break;
default:
WARN_ON_ONCE(1);
}
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index a577e87fac2b..ebef952481fa 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -298,6 +298,11 @@ static inline struct fanotify_event *FANOTIFY_E(struct fsnotify_event *fse)
return container_of(fse, struct fanotify_event, fse);
}

+static inline bool fanotify_is_error_event(u32 mask)
+{
+ return mask & FAN_FS_ERROR;
+}
+
static inline bool fanotify_event_has_path(struct fanotify_event *event)
{
return event->type == FANOTIFY_EVENT_TYPE_PATH ||
@@ -327,6 +332,7 @@ static inline struct path *fanotify_event_path(struct fanotify_event *event)
static inline bool fanotify_is_hashed_event(u32 mask)
{
return !(fanotify_is_perm_event(mask) ||
+ fanotify_is_error_event(mask) ||
fsnotify_is_overflow_event(mask));
}

--
2.33.0

2021-10-25 20:11:26

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v9 24/31] fanotify: Report fid entry even for zero-length file_handle

Non-inode errors will reported with an empty file_handle. In
preparation for that, allow some events to print the FID record even if
there isn't any file_handle encoded

Even though FILEID_ROOT is used internally, make zero-length file
handles be reported as FILEID_INVALID.

Reviewed-by: Amir Goldstein <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

---
Changes since v8:
- Move fanotify_event_has_object_fh check here (jan)
---
fs/notify/fanotify/fanotify.h | 3 +++
fs/notify/fanotify/fanotify_user.c | 8 +++++---
2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 80af269eebb8..f51ab6e556e8 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -266,6 +266,9 @@ static inline int fanotify_event_dir_fh_len(struct fanotify_event *event)

static inline bool fanotify_event_has_object_fh(struct fanotify_event *event)
{
+ /* For error events, even zeroed fh are reported. */
+ if (event->type == FANOTIFY_EVENT_TYPE_FS_ERROR)
+ return true;
return fanotify_event_object_fh_len(event) > 0;
}

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index a9b5c36ee49e..ff4a7373f7a5 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -339,9 +339,6 @@ static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh,
pr_debug("%s: fh_len=%zu name_len=%zu, info_len=%zu, count=%zu\n",
__func__, fh_len, name_len, info_len, count);

- if (!fh_len)
- return 0;
-
if (WARN_ON_ONCE(len < sizeof(info) || len > count))
return -EFAULT;

@@ -376,6 +373,11 @@ static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh,

handle.handle_type = fh->type;
handle.handle_bytes = fh_len;
+
+ /* Mangle handle_type for bad file_handle */
+ if (!fh_len)
+ handle.handle_type = FILEID_INVALID;
+
if (copy_to_user(buf, &handle, sizeof(handle)))
return -EFAULT;

--
2.33.0

2021-10-25 20:11:30

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v9 31/31] docs: Document the FAN_FS_ERROR event

Document the FAN_FS_ERROR event for user administrators and user space
developers.

Reviewed-by: Amir Goldstein <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

---
Changes Since v8:
- Replace fs-error specific errno bits with generic errno. (Jan)
- Explain event order guarantees and point to example parser (Jan)
Changes Since v7:
- Update semantics
Changes Since v6:
- English fixes (jan)
- Proper document error field (jan)
Changes Since v4:
- Update documentation about reporting non-file error.
Changes Since v3:
- Move FAN_FS_ERROR notification into a subsection of the file.
Changes Since v2:
- NTR
Changes since v1:
- Drop references to location record
- Explain that the inode field is optional
- Explain we are reporting only the first error
---
.../admin-guide/filesystem-monitoring.rst | 74 +++++++++++++++++++
Documentation/admin-guide/index.rst | 1 +
2 files changed, 75 insertions(+)
create mode 100644 Documentation/admin-guide/filesystem-monitoring.rst

diff --git a/Documentation/admin-guide/filesystem-monitoring.rst b/Documentation/admin-guide/filesystem-monitoring.rst
new file mode 100644
index 000000000000..5a3c84e60095
--- /dev/null
+++ b/Documentation/admin-guide/filesystem-monitoring.rst
@@ -0,0 +1,74 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================================
+File system Monitoring with fanotify
+====================================
+
+File system Error Reporting
+===========================
+
+Fanotify supports the FAN_FS_ERROR event type for file system-wide error
+reporting. It is meant to be used by file system health monitoring
+daemons, which listen for these events and take actions (notify
+sysadmin, start recovery) when a file system problem is detected.
+
+By design, a FAN_FS_ERROR notification exposes sufficient information
+for a monitoring tool to know a problem in the file system has happened.
+It doesn't necessarily provide a user space application with semantics
+to verify an IO operation was successfully executed. That is out of
+scope for this feature. Instead, it is only meant as a framework for
+early file system problem detection and reporting recovery tools.
+
+When a file system operation fails, it is common for dozens of kernel
+errors to cascade after the initial failure, hiding the original failure
+log, which is usually the most useful debug data to troubleshoot the
+problem. For this reason, FAN_FS_ERROR tries to report only the first
+error that occurred for a file system since the last notification, and
+it simply counts additional errors. This ensures that the most
+important pieces of information are never lost.
+
+FAN_FS_ERROR requires the fanotify group to be setup with the
+FAN_REPORT_FID flag.
+
+At the time of this writing, the only file system that emits FAN_FS_ERROR
+notifications is Ext4.
+
+A FAN_FS_ERROR Notification has the following format::
+
+ [ Notification Metadata (Mandatory) ]
+ [ Generic Error Record (Mandatory) ]
+ [ FID record (Mandatory) ]
+
+The order of records is not guaranteed, and new records might be added
+in the future. Therefore, applications must not rely on the order and
+must be prepared to skip over unknown records. Please refer to
+``samples/fanotify/fs-monitor.c`` for an example parser.
+
+Generic error record
+--------------------
+
+The generic error record provides enough information for a file system
+agnostic tool to learn about a problem in the file system, without
+providing any additional details about the problem. This record is
+identified by ``struct fanotify_event_info_header.info_type`` being set
+to FAN_EVENT_INFO_TYPE_ERROR.
+
+ struct fanotify_event_info_error {
+ struct fanotify_event_info_header hdr;
+ __s32 error;
+ __u32 error_count;
+ };
+
+The `error` field identifies the type of error using errno values.
+`error_count` tracks the number of errors that occurred and were
+suppressed to preserve the original error information, since the last
+notification.
+
+FID record
+----------
+
+The FID record can be used to uniquely identify the inode that triggered
+the error through the combination of fsid and file handle. A file system
+specific application can use that information to attempt a recovery
+procedure. Errors that are not related to an inode are reported with an
+empty file handle of type FILEID_INVALID.
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index dc00afcabb95..1bedab498104 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -82,6 +82,7 @@ configure specific aspects of kernel behavior to your liking.
edid
efi-stub
ext4
+ filesystem-monitoring
nfs/index
gpio/index
highuid
--
2.33.0

2021-10-25 20:11:33

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v9 27/31] fanotify: Emit generic error info for error event

The error info is a record sent to users on FAN_FS_ERROR events
documenting the type of error. It also carries an error count,
documenting how many errors were observed since the last reporting.

Reviewed-by: Amir Goldstein <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

---
Changes since v6:
- Rebase on top of pidfd patches
Changes since v5:
- Move error code here
---
fs/notify/fanotify/fanotify.c | 1 +
fs/notify/fanotify/fanotify.h | 1 +
fs/notify/fanotify/fanotify_user.c | 35 ++++++++++++++++++++++++++++++
include/uapi/linux/fanotify.h | 7 ++++++
4 files changed, 44 insertions(+)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 465f07e70e6d..af61425e6e3b 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -621,6 +621,7 @@ static struct fanotify_event *fanotify_alloc_error_event(
return NULL;

fee->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR;
+ fee->error = report->error;
fee->err_count = 1;
fee->fsid = *fsid;

diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index edd7587adcc5..d25f500bf7e7 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -205,6 +205,7 @@ FANOTIFY_NE(struct fanotify_event *event)

struct fanotify_error_event {
struct fanotify_event fae;
+ s32 error; /* Error reported by the Filesystem. */
u32 err_count; /* Suppressed errors count */

__kernel_fsid_t fsid; /* FSID this error refers to. */
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index ff4a7373f7a5..97ec5b005b50 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -115,6 +115,8 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
(sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle))
#define FANOTIFY_PIDFD_INFO_HDR_LEN \
sizeof(struct fanotify_event_info_pidfd)
+#define FANOTIFY_ERROR_INFO_LEN \
+ (sizeof(struct fanotify_event_info_error))

static int fanotify_fid_info_len(int fh_len, int name_len)
{
@@ -139,6 +141,9 @@ static size_t fanotify_event_len(unsigned int info_mode,
if (!info_mode)
return event_len;

+ if (fanotify_is_error_event(event->mask))
+ event_len += FANOTIFY_ERROR_INFO_LEN;
+
info = fanotify_event_info(event);

if (fanotify_event_has_dir_fh(event)) {
@@ -324,6 +329,28 @@ static int process_access_response(struct fsnotify_group *group,
return -ENOENT;
}

+static size_t copy_error_info_to_user(struct fanotify_event *event,
+ char __user *buf, int count)
+{
+ struct fanotify_event_info_error info;
+ struct fanotify_error_event *fee = FANOTIFY_EE(event);
+
+ info.hdr.info_type = FAN_EVENT_INFO_TYPE_ERROR;
+ info.hdr.pad = 0;
+ info.hdr.len = FANOTIFY_ERROR_INFO_LEN;
+
+ if (WARN_ON(count < info.hdr.len))
+ return -EFAULT;
+
+ info.error = fee->error;
+ info.error_count = fee->err_count;
+
+ if (copy_to_user(buf, &info, sizeof(info)))
+ return -EFAULT;
+
+ return info.hdr.len;
+}
+
static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh,
int info_type, const char *name,
size_t name_len,
@@ -530,6 +557,14 @@ static int copy_info_records_to_user(struct fanotify_event *event,
total_bytes += ret;
}

+ if (fanotify_is_error_event(event->mask)) {
+ ret = copy_error_info_to_user(event, buf, count);
+ if (ret < 0)
+ return ret;
+ buf += ret;
+ count -= ret;
+ }
+
return total_bytes;
}

diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index 2990731ddc8b..bd1932c2074d 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -126,6 +126,7 @@ struct fanotify_event_metadata {
#define FAN_EVENT_INFO_TYPE_DFID_NAME 2
#define FAN_EVENT_INFO_TYPE_DFID 3
#define FAN_EVENT_INFO_TYPE_PIDFD 4
+#define FAN_EVENT_INFO_TYPE_ERROR 5

/* Variable length info record following event metadata */
struct fanotify_event_info_header {
@@ -160,6 +161,12 @@ struct fanotify_event_info_pidfd {
__s32 pidfd;
};

+struct fanotify_event_info_error {
+ struct fanotify_event_info_header hdr;
+ __s32 error;
+ __u32 error_count;
+};
+
struct fanotify_response {
__s32 fd;
__u32 response;
--
2.33.0

2021-10-25 20:12:40

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v9 22/31] fanotify: Wrap object_fh inline space in a creator macro

fanotify_error_event would duplicate this sequence of declarations that
already exist elsewhere with a slight different size. Create a helper
macro to avoid code duplication.

Suggested-by: Jan Kara <[email protected]>
Reviewed-by: Amir Goldstein <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

---
Changes since v8:
- Drop comment (Amir)
- Pass fanotify_fh object name to constructor (Jan)
---
fs/notify/fanotify/fanotify.h | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 2b032b79d5b0..3510d06654ed 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -171,12 +171,18 @@ static inline void fanotify_init_event(struct fanotify_event *event,
event->pid = NULL;
}

+#define FANOTIFY_INLINE_FH(name, size) \
+struct { \
+ struct fanotify_fh (name); \
+ /* Space for object_fh.buf[] - access with fanotify_fh_buf() */ \
+ unsigned char _inline_fh_buf[(size)]; \
+}
+
struct fanotify_fid_event {
struct fanotify_event fae;
__kernel_fsid_t fsid;
- struct fanotify_fh object_fh;
- /* Reserve space in object_fh.buf[] - access with fanotify_fh_buf() */
- unsigned char _inline_fh_buf[FANOTIFY_INLINE_FH_LEN];
+
+ FANOTIFY_INLINE_FH(object_fh, FANOTIFY_INLINE_FH_LEN);
};

static inline struct fanotify_fid_event *
--
2.33.0

2021-10-25 20:13:37

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v9 19/31] fanotify: Pre-allocate pool of error events

Pre-allocate slots for file system errors to have greater chances of
succeeding, since error events can happen in GFP_NOFS context. This
patch introduces a group-wide mempool of error events, shared by all
FAN_FS_ERROR marks in this group.

For now, just allocate 128 positions. A future patch allows this
array to be dynamically resized when a new mark is added.

Reviewed-by: Amir Goldstein <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

---
Changes since v8:
- FANOTIFY_DEFAULT_MAX_FEE_POOL -> FANOTIFY_DEFAULT_FEE_POOL_SIZE
- Reduce limit to 32. (Jan)
Changes since v7:
- Expand limit to 128. (Amir)
---
fs/notify/fanotify/fanotify.c | 3 +++
fs/notify/fanotify/fanotify.h | 11 +++++++++++
fs/notify/fanotify/fanotify_user.c | 26 +++++++++++++++++++++++++-
include/linux/fsnotify_backend.h | 2 ++
4 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 8f152445d75c..01d68dfc74aa 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -819,6 +819,9 @@ static void fanotify_free_group_priv(struct fsnotify_group *group)
if (group->fanotify_data.ucounts)
dec_ucount(group->fanotify_data.ucounts,
UCOUNT_FANOTIFY_GROUPS);
+
+ if (mempool_initialized(&group->fanotify_data.error_events_pool))
+ mempool_exit(&group->fanotify_data.error_events_pool);
}

static void fanotify_free_path_event(struct fanotify_event *event)
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index c42cf8fd7d79..a577e87fac2b 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -141,6 +141,7 @@ enum fanotify_event_type {
FANOTIFY_EVENT_TYPE_PATH,
FANOTIFY_EVENT_TYPE_PATH_PERM,
FANOTIFY_EVENT_TYPE_OVERFLOW, /* struct fanotify_event */
+ FANOTIFY_EVENT_TYPE_FS_ERROR, /* struct fanotify_error_event */
__FANOTIFY_EVENT_TYPE_NUM
};

@@ -196,6 +197,16 @@ FANOTIFY_NE(struct fanotify_event *event)
return container_of(event, struct fanotify_name_event, fae);
}

+struct fanotify_error_event {
+ struct fanotify_event fae;
+};
+
+static inline struct fanotify_error_event *
+FANOTIFY_EE(struct fanotify_event *event)
+{
+ return container_of(event, struct fanotify_error_event, fae);
+}
+
static inline __kernel_fsid_t *fanotify_event_fsid(struct fanotify_event *event)
{
if (event->type == FANOTIFY_EVENT_TYPE_FID)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 66ee3c2805c7..2f4182b754b2 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -30,6 +30,7 @@
#define FANOTIFY_DEFAULT_MAX_EVENTS 16384
#define FANOTIFY_OLD_DEFAULT_MAX_MARKS 8192
#define FANOTIFY_DEFAULT_MAX_GROUPS 128
+#define FANOTIFY_DEFAULT_FEE_POOL_SIZE 32

/*
* Legacy fanotify marks limits (8192) is per group and we introduced a tunable
@@ -1054,6 +1055,15 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
return ERR_PTR(ret);
}

+static int fanotify_group_init_error_pool(struct fsnotify_group *group)
+{
+ if (mempool_initialized(&group->fanotify_data.error_events_pool))
+ return 0;
+
+ return mempool_init_kmalloc_pool(&group->fanotify_data.error_events_pool,
+ FANOTIFY_DEFAULT_FEE_POOL_SIZE,
+ sizeof(struct fanotify_error_event));
+}

static int fanotify_add_mark(struct fsnotify_group *group,
fsnotify_connp_t *connp, unsigned int type,
@@ -1062,6 +1072,7 @@ static int fanotify_add_mark(struct fsnotify_group *group,
{
struct fsnotify_mark *fsn_mark;
__u32 added;
+ int ret = 0;

mutex_lock(&group->mark_mutex);
fsn_mark = fsnotify_find_mark(connp, group);
@@ -1072,13 +1083,26 @@ static int fanotify_add_mark(struct fsnotify_group *group,
return PTR_ERR(fsn_mark);
}
}
+
+ /*
+ * Error events are pre-allocated per group, only if strictly
+ * needed (i.e. FAN_FS_ERROR was requested).
+ */
+ if (!(flags & FAN_MARK_IGNORED_MASK) && (mask & FAN_FS_ERROR)) {
+ ret = fanotify_group_init_error_pool(group);
+ if (ret)
+ goto out;
+ }
+
added = fanotify_mark_add_to_mask(fsn_mark, mask, flags);
if (added & ~fsnotify_conn_mask(fsn_mark->connector))
fsnotify_recalc_mask(fsn_mark->connector);
+
+out:
mutex_unlock(&group->mark_mutex);

fsnotify_put_mark(fsn_mark);
- return 0;
+ return ret;
}

static int fanotify_add_vfsmount_mark(struct fsnotify_group *group,
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 00dbaafbcf95..51ef2b079bfa 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -19,6 +19,7 @@
#include <linux/atomic.h>
#include <linux/user_namespace.h>
#include <linux/refcount.h>
+#include <linux/mempool.h>

/*
* IN_* from inotfy.h lines up EXACTLY with FS_*, this is so we can easily
@@ -246,6 +247,7 @@ struct fsnotify_group {
int flags; /* flags from fanotify_init() */
int f_flags; /* event_f_flags from fanotify_init() */
struct ucounts *ucounts;
+ mempool_t error_events_pool;
} fanotify_data;
#endif /* CONFIG_FANOTIFY */
};
--
2.33.0

2021-10-26 07:26:39

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH v9 19/31] fanotify: Pre-allocate pool of error events

On Mon, Oct 25, 2021 at 10:30 PM Gabriel Krisman Bertazi
<[email protected]> wrote:
>
> Pre-allocate slots for file system errors to have greater chances of
> succeeding, since error events can happen in GFP_NOFS context. This
> patch introduces a group-wide mempool of error events, shared by all
> FAN_FS_ERROR marks in this group.
>
> For now, just allocate 128 positions. A future patch allows this
> array to be dynamically resized when a new mark is added.

This phrase is out dated.
No need to re-post entire v10 series just for that.
You may suggest the re-phrase here.

Thanks,
Amir.

>
> Reviewed-by: Amir Goldstein <[email protected]>
> Reviewed-by: Jan Kara <[email protected]>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
>
> ---
> Changes since v8:
> - FANOTIFY_DEFAULT_MAX_FEE_POOL -> FANOTIFY_DEFAULT_FEE_POOL_SIZE
> - Reduce limit to 32. (Jan)
> Changes since v7:
> - Expand limit to 128. (Amir)
> ---
> fs/notify/fanotify/fanotify.c | 3 +++
> fs/notify/fanotify/fanotify.h | 11 +++++++++++
> fs/notify/fanotify/fanotify_user.c | 26 +++++++++++++++++++++++++-
> include/linux/fsnotify_backend.h | 2 ++
> 4 files changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 8f152445d75c..01d68dfc74aa 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -819,6 +819,9 @@ static void fanotify_free_group_priv(struct fsnotify_group *group)
> if (group->fanotify_data.ucounts)
> dec_ucount(group->fanotify_data.ucounts,
> UCOUNT_FANOTIFY_GROUPS);
> +
> + if (mempool_initialized(&group->fanotify_data.error_events_pool))
> + mempool_exit(&group->fanotify_data.error_events_pool);
> }
>
> static void fanotify_free_path_event(struct fanotify_event *event)
> diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
> index c42cf8fd7d79..a577e87fac2b 100644
> --- a/fs/notify/fanotify/fanotify.h
> +++ b/fs/notify/fanotify/fanotify.h
> @@ -141,6 +141,7 @@ enum fanotify_event_type {
> FANOTIFY_EVENT_TYPE_PATH,
> FANOTIFY_EVENT_TYPE_PATH_PERM,
> FANOTIFY_EVENT_TYPE_OVERFLOW, /* struct fanotify_event */
> + FANOTIFY_EVENT_TYPE_FS_ERROR, /* struct fanotify_error_event */
> __FANOTIFY_EVENT_TYPE_NUM
> };
>
> @@ -196,6 +197,16 @@ FANOTIFY_NE(struct fanotify_event *event)
> return container_of(event, struct fanotify_name_event, fae);
> }
>
> +struct fanotify_error_event {
> + struct fanotify_event fae;
> +};
> +
> +static inline struct fanotify_error_event *
> +FANOTIFY_EE(struct fanotify_event *event)
> +{
> + return container_of(event, struct fanotify_error_event, fae);
> +}
> +
> static inline __kernel_fsid_t *fanotify_event_fsid(struct fanotify_event *event)
> {
> if (event->type == FANOTIFY_EVENT_TYPE_FID)
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 66ee3c2805c7..2f4182b754b2 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -30,6 +30,7 @@
> #define FANOTIFY_DEFAULT_MAX_EVENTS 16384
> #define FANOTIFY_OLD_DEFAULT_MAX_MARKS 8192
> #define FANOTIFY_DEFAULT_MAX_GROUPS 128
> +#define FANOTIFY_DEFAULT_FEE_POOL_SIZE 32
>
> /*
> * Legacy fanotify marks limits (8192) is per group and we introduced a tunable
> @@ -1054,6 +1055,15 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
> return ERR_PTR(ret);
> }
>
> +static int fanotify_group_init_error_pool(struct fsnotify_group *group)
> +{
> + if (mempool_initialized(&group->fanotify_data.error_events_pool))
> + return 0;
> +
> + return mempool_init_kmalloc_pool(&group->fanotify_data.error_events_pool,
> + FANOTIFY_DEFAULT_FEE_POOL_SIZE,
> + sizeof(struct fanotify_error_event));
> +}
>
> static int fanotify_add_mark(struct fsnotify_group *group,
> fsnotify_connp_t *connp, unsigned int type,
> @@ -1062,6 +1072,7 @@ static int fanotify_add_mark(struct fsnotify_group *group,
> {
> struct fsnotify_mark *fsn_mark;
> __u32 added;
> + int ret = 0;
>
> mutex_lock(&group->mark_mutex);
> fsn_mark = fsnotify_find_mark(connp, group);
> @@ -1072,13 +1083,26 @@ static int fanotify_add_mark(struct fsnotify_group *group,
> return PTR_ERR(fsn_mark);
> }
> }
> +
> + /*
> + * Error events are pre-allocated per group, only if strictly
> + * needed (i.e. FAN_FS_ERROR was requested).
> + */
> + if (!(flags & FAN_MARK_IGNORED_MASK) && (mask & FAN_FS_ERROR)) {
> + ret = fanotify_group_init_error_pool(group);
> + if (ret)
> + goto out;
> + }
> +
> added = fanotify_mark_add_to_mask(fsn_mark, mask, flags);
> if (added & ~fsnotify_conn_mask(fsn_mark->connector))
> fsnotify_recalc_mask(fsn_mark->connector);
> +
> +out:
> mutex_unlock(&group->mark_mutex);
>
> fsnotify_put_mark(fsn_mark);
> - return 0;
> + return ret;
> }
>
> static int fanotify_add_vfsmount_mark(struct fsnotify_group *group,
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index 00dbaafbcf95..51ef2b079bfa 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -19,6 +19,7 @@
> #include <linux/atomic.h>
> #include <linux/user_namespace.h>
> #include <linux/refcount.h>
> +#include <linux/mempool.h>
>
> /*
> * IN_* from inotfy.h lines up EXACTLY with FS_*, this is so we can easily
> @@ -246,6 +247,7 @@ struct fsnotify_group {
> int flags; /* flags from fanotify_init() */
> int f_flags; /* event_f_flags from fanotify_init() */
> struct ucounts *ucounts;
> + mempool_t error_events_pool;
> } fanotify_data;
> #endif /* CONFIG_FANOTIFY */
> };
> --
> 2.33.0
>

2021-10-26 10:18:27

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH v9 24/31] fanotify: Report fid entry even for zero-length file_handle

On Mon, Oct 25, 2021 at 10:30 PM Gabriel Krisman Bertazi
<[email protected]> wrote:
>
> Non-inode errors will reported with an empty file_handle. In
> preparation for that, allow some events to print the FID record even if
> there isn't any file_handle encoded
>
> Even though FILEID_ROOT is used internally, make zero-length file
> handles be reported as FILEID_INVALID.
>
> Reviewed-by: Amir Goldstein <[email protected]>
> Reviewed-by: Jan Kara <[email protected]>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
>
> ---
> Changes since v8:
> - Move fanotify_event_has_object_fh check here (jan)

Logically, this move is wrong, because after this patch,
copy_fid_info_to_user() can theoretically be called with NULL fh in the
existing construct of:
if (fanotify_event_has_object_fh(event)) {
...
ret = copy_fid_info_to_user(fanotify_event_fsid(event),

fanotify_event_object_fh(event),

The thing that prevents this case in effect is that FAN_FS_ERROR
is not yet wired, but I am not sure if leaving this theoretic bisect
issue is a good idea.

Anyway, that's a very minor theoretic issue and I am sure Jan can
decide whether to deal with it and how (no need to post v10 IMO).

Thanks,
Amir.

2021-10-26 10:18:40

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH v9 00/31] file system-wide error monitoring

On Mon, Oct 25, 2021 at 10:27 PM Gabriel Krisman Bertazi
<[email protected]> wrote:
>
> Hi,
>
> This is the 9th version of this patch series. Thank you, Amir, Jan and
> Ted, for the feedback in the previous versions.
>
> The main difference in this version is that the pool is no longer
> resizeable nor limited in number of marks, even though we only
> pre-allocate 32 slots. In addition, ext4 was modified to always return
> non-zero errno, and the documentation was fixed accordingly (No longer
> suggests we return EXT4_ERR* values.
>
> I also droped the Reviewed-by tags from the ext4 patch, due to the
> changes above.
>
> Please let me know what you think.
>

All good on my end.
I've made a couple of minor comments that
could be addressed on commit if no other issues are found.

Thanks,
Amir.

2021-10-26 14:31:11

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v9 24/31] fanotify: Report fid entry even for zero-length file_handle

On Tue 26-10-21 12:09:19, Amir Goldstein wrote:
> On Mon, Oct 25, 2021 at 10:30 PM Gabriel Krisman Bertazi
> <[email protected]> wrote:
> >
> > Non-inode errors will reported with an empty file_handle. In
> > preparation for that, allow some events to print the FID record even if
> > there isn't any file_handle encoded
> >
> > Even though FILEID_ROOT is used internally, make zero-length file
> > handles be reported as FILEID_INVALID.
> >
> > Reviewed-by: Amir Goldstein <[email protected]>
> > Reviewed-by: Jan Kara <[email protected]>
> > Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> >
> > ---
> > Changes since v8:
> > - Move fanotify_event_has_object_fh check here (jan)
>
> Logically, this move is wrong, because after this patch,
> copy_fid_info_to_user() can theoretically be called with NULL fh in the
> existing construct of:
> if (fanotify_event_has_object_fh(event)) {
> ...
> ret = copy_fid_info_to_user(fanotify_event_fsid(event),
>
> fanotify_event_object_fh(event),
>
> The thing that prevents this case in effect is that FAN_FS_ERROR
> is not yet wired, but I am not sure if leaving this theoretic bisect
> issue is a good idea.
>
> Anyway, that's a very minor theoretic issue and I am sure Jan can
> decide whether to deal with it and how (no need to post v10 IMO).

Hum, correct. I guess I'll just fold this patch into patch 26. Logically
they are very close anyway.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-10-26 14:34:58

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v9 27/31] fanotify: Emit generic error info for error event

On Mon 25-10-21 16:27:42, Gabriel Krisman Bertazi wrote:
> The error info is a record sent to users on FAN_FS_ERROR events
> documenting the type of error. It also carries an error count,
> documenting how many errors were observed since the last reporting.
>
> Reviewed-by: Amir Goldstein <[email protected]>
> Reviewed-by: Jan Kara <[email protected]>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

...

> @@ -530,6 +557,14 @@ static int copy_info_records_to_user(struct fanotify_event *event,
> total_bytes += ret;
> }
>
> + if (fanotify_is_error_event(event->mask)) {
> + ret = copy_error_info_to_user(event, buf, count);
> + if (ret < 0)
> + return ret;
> + buf += ret;
> + count -= ret;
> + }
> +
> return total_bytes;
> }

This is currently harmless but we should add

total_bytes += ret;

here as well.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-10-26 14:37:01

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v9 31/31] docs: Document the FAN_FS_ERROR event

On Mon 25-10-21 16:27:46, Gabriel Krisman Bertazi wrote:
> Document the FAN_FS_ERROR event for user administrators and user space
> developers.
>
> Reviewed-by: Amir Goldstein <[email protected]>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

>
> ---
> Changes Since v8:
> - Replace fs-error specific errno bits with generic errno. (Jan)
> - Explain event order guarantees and point to example parser (Jan)
> Changes Since v7:
> - Update semantics
> Changes Since v6:
> - English fixes (jan)
> - Proper document error field (jan)
> Changes Since v4:
> - Update documentation about reporting non-file error.
> Changes Since v3:
> - Move FAN_FS_ERROR notification into a subsection of the file.
> Changes Since v2:
> - NTR
> Changes since v1:
> - Drop references to location record
> - Explain that the inode field is optional
> - Explain we are reporting only the first error
> ---
> .../admin-guide/filesystem-monitoring.rst | 74 +++++++++++++++++++
> Documentation/admin-guide/index.rst | 1 +
> 2 files changed, 75 insertions(+)
> create mode 100644 Documentation/admin-guide/filesystem-monitoring.rst
>
> diff --git a/Documentation/admin-guide/filesystem-monitoring.rst b/Documentation/admin-guide/filesystem-monitoring.rst
> new file mode 100644
> index 000000000000..5a3c84e60095
> --- /dev/null
> +++ b/Documentation/admin-guide/filesystem-monitoring.rst
> @@ -0,0 +1,74 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +====================================
> +File system Monitoring with fanotify
> +====================================
> +
> +File system Error Reporting
> +===========================
> +
> +Fanotify supports the FAN_FS_ERROR event type for file system-wide error
> +reporting. It is meant to be used by file system health monitoring
> +daemons, which listen for these events and take actions (notify
> +sysadmin, start recovery) when a file system problem is detected.
> +
> +By design, a FAN_FS_ERROR notification exposes sufficient information
> +for a monitoring tool to know a problem in the file system has happened.
> +It doesn't necessarily provide a user space application with semantics
> +to verify an IO operation was successfully executed. That is out of
> +scope for this feature. Instead, it is only meant as a framework for
> +early file system problem detection and reporting recovery tools.
> +
> +When a file system operation fails, it is common for dozens of kernel
> +errors to cascade after the initial failure, hiding the original failure
> +log, which is usually the most useful debug data to troubleshoot the
> +problem. For this reason, FAN_FS_ERROR tries to report only the first
> +error that occurred for a file system since the last notification, and
> +it simply counts additional errors. This ensures that the most
> +important pieces of information are never lost.
> +
> +FAN_FS_ERROR requires the fanotify group to be setup with the
> +FAN_REPORT_FID flag.
> +
> +At the time of this writing, the only file system that emits FAN_FS_ERROR
> +notifications is Ext4.
> +
> +A FAN_FS_ERROR Notification has the following format::
> +
> + [ Notification Metadata (Mandatory) ]
> + [ Generic Error Record (Mandatory) ]
> + [ FID record (Mandatory) ]
> +
> +The order of records is not guaranteed, and new records might be added
> +in the future. Therefore, applications must not rely on the order and
> +must be prepared to skip over unknown records. Please refer to
> +``samples/fanotify/fs-monitor.c`` for an example parser.
> +
> +Generic error record
> +--------------------
> +
> +The generic error record provides enough information for a file system
> +agnostic tool to learn about a problem in the file system, without
> +providing any additional details about the problem. This record is
> +identified by ``struct fanotify_event_info_header.info_type`` being set
> +to FAN_EVENT_INFO_TYPE_ERROR.
> +
> + struct fanotify_event_info_error {
> + struct fanotify_event_info_header hdr;
> + __s32 error;
> + __u32 error_count;
> + };
> +
> +The `error` field identifies the type of error using errno values.
> +`error_count` tracks the number of errors that occurred and were
> +suppressed to preserve the original error information, since the last
> +notification.
> +
> +FID record
> +----------
> +
> +The FID record can be used to uniquely identify the inode that triggered
> +the error through the combination of fsid and file handle. A file system
> +specific application can use that information to attempt a recovery
> +procedure. Errors that are not related to an inode are reported with an
> +empty file handle of type FILEID_INVALID.
> diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
> index dc00afcabb95..1bedab498104 100644
> --- a/Documentation/admin-guide/index.rst
> +++ b/Documentation/admin-guide/index.rst
> @@ -82,6 +82,7 @@ configure specific aspects of kernel behavior to your liking.
> edid
> efi-stub
> ext4
> + filesystem-monitoring
> nfs/index
> gpio/index
> highuid
> --
> 2.33.0
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-10-27 21:22:40

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v9 00/31] file system-wide error monitoring

On Tue 26-10-21 12:12:38, Amir Goldstein wrote:
> On Mon, Oct 25, 2021 at 10:27 PM Gabriel Krisman Bertazi
> <[email protected]> wrote:
> >
> > Hi,
> >
> > This is the 9th version of this patch series. Thank you, Amir, Jan and
> > Ted, for the feedback in the previous versions.
> >
> > The main difference in this version is that the pool is no longer
> > resizeable nor limited in number of marks, even though we only
> > pre-allocate 32 slots. In addition, ext4 was modified to always return
> > non-zero errno, and the documentation was fixed accordingly (No longer
> > suggests we return EXT4_ERR* values.
> >
> > I also droped the Reviewed-by tags from the ext4 patch, due to the
> > changes above.
> >
> > Please let me know what you think.
> >
>
> All good on my end.
> I've made a couple of minor comments that
> could be addressed on commit if no other issues are found.

All good on my end as well. I've applied all the minor updates, tested the
result and pushed it out to fsnotify branch of my tree. WRT to your new
FS_ERROR LTP tests, I've noticed that the testcases 1 and 3 from test
fanotify20 fail for me. After a bit of debugging this seems to be a bug in
ext4 where it calls ext4_abort() with EXT4_ERR_ESHUTDOWN instead of plain
ESHUTDOWN. Not sure if you have that fixed or how come the tests passed for
you. After fixing that ext4 bug everything passes for me.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-10-27 21:24:30

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH v9 00/31] file system-wide error monitoring

On Wed, Oct 27, 2021 at 2:22 PM Jan Kara <[email protected]> wrote:
>
> On Tue 26-10-21 12:12:38, Amir Goldstein wrote:
> > On Mon, Oct 25, 2021 at 10:27 PM Gabriel Krisman Bertazi
> > <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > This is the 9th version of this patch series. Thank you, Amir, Jan and
> > > Ted, for the feedback in the previous versions.
> > >
> > > The main difference in this version is that the pool is no longer
> > > resizeable nor limited in number of marks, even though we only
> > > pre-allocate 32 slots. In addition, ext4 was modified to always return
> > > non-zero errno, and the documentation was fixed accordingly (No longer
> > > suggests we return EXT4_ERR* values.
> > >
> > > I also droped the Reviewed-by tags from the ext4 patch, due to the
> > > changes above.
> > >
> > > Please let me know what you think.
> > >
> >
> > All good on my end.
> > I've made a couple of minor comments that
> > could be addressed on commit if no other issues are found.
>
> All good on my end as well. I've applied all the minor updates, tested the
> result and pushed it out to fsnotify branch of my tree. WRT to your new
> FS_ERROR LTP tests, I've noticed that the testcases 1 and 3 from test
> fanotify20 fail for me. After a bit of debugging this seems to be a bug in
> ext4 where it calls ext4_abort() with EXT4_ERR_ESHUTDOWN instead of plain
> ESHUTDOWN. Not sure if you have that fixed or how come the tests passed for
> you. After fixing that ext4 bug everything passes for me.
>

Gabriel mentioned that bug in the cover letter of the LTP series :-)
https://lore.kernel.org/linux-ext4/[email protected]/T/#u

Thanks,
Amir.

2021-10-27 21:27:26

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH v9 00/31] file system-wide error monitoring

Amir Goldstein <[email protected]> writes:

> On Wed, Oct 27, 2021 at 2:22 PM Jan Kara <[email protected]> wrote:
>>
>> On Tue 26-10-21 12:12:38, Amir Goldstein wrote:
>> > On Mon, Oct 25, 2021 at 10:27 PM Gabriel Krisman Bertazi
>> > <[email protected]> wrote:
>> > >
>> > > Hi,
>> > >
>> > > This is the 9th version of this patch series. Thank you, Amir, Jan and
>> > > Ted, for the feedback in the previous versions.
>> > >
>> > > The main difference in this version is that the pool is no longer
>> > > resizeable nor limited in number of marks, even though we only
>> > > pre-allocate 32 slots. In addition, ext4 was modified to always return
>> > > non-zero errno, and the documentation was fixed accordingly (No longer
>> > > suggests we return EXT4_ERR* values.
>> > >
>> > > I also droped the Reviewed-by tags from the ext4 patch, due to the
>> > > changes above.
>> > >
>> > > Please let me know what you think.
>> > >
>> >
>> > All good on my end.
>> > I've made a couple of minor comments that
>> > could be addressed on commit if no other issues are found.
>>
>> All good on my end as well. I've applied all the minor updates, tested the
>> result and pushed it out to fsnotify branch of my tree. WRT to your new
>> FS_ERROR LTP tests, I've noticed that the testcases 1 and 3 from test
>> fanotify20 fail for me. After a bit of debugging this seems to be a bug in
>> ext4 where it calls ext4_abort() with EXT4_ERR_ESHUTDOWN instead of plain
>> ESHUTDOWN. Not sure if you have that fixed or how come the tests passed for
>> you. After fixing that ext4 bug everything passes for me.
>>
>
> Gabriel mentioned that bug in the cover letter of the LTP series :-)
> https://lore.kernel.org/linux-ext4/[email protected]/T/#u

Yes :)

Also, thank you both for the extensive review and ideas during the
development of this series. It was really appreciated!

I'm sending out the new version for tests + man pages today.

--
Gabriel Krisman Bertazi

2021-10-28 05:56:38

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH v9 00/31] file system-wide error monitoring

> Also, thank you both for the extensive review and ideas during the
> development of this series. It was really appreciated!
>

Thank you for your appreciated effort!
It was a wild journey through some interesting experiments, but
you survived it well ;-)

Would you be interested in pursuing FAN_WB_ERROR after a due rest
and after all the dust on FAN_FS_ERROR has settled?

FAN_WB_ERROR can use the same info record and same internal
error event struct as FAN_FS_ERROR.

A call to fsnotify_wb_error(sb, inode, err) can be placed inside
mapping_set_error() and maybe for other sporadic callers of errseq_set().

For wb error, we can consider storing a snapshot of errseq of the sb/inode
in the sb/inode mark and compute error_count from the errseq diff
instead of counting it when merging events. This will keep a more accurate
report even when error reports are dropped due to allocation failure or event
queue overflow.

I have a feeling that the Postgres project would find this
functionality useful (?).

Thanks,
Amir.

2021-10-29 22:24:43

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH v9 00/31] file system-wide error monitoring

Amir Goldstein <[email protected]> writes:

>> Also, thank you both for the extensive review and ideas during the
>> development of this series. It was really appreciated!
>>
>
> Thank you for your appreciated effort!
> It was a wild journey through some interesting experiments, but
> you survived it well ;-)
>
> Would you be interested in pursuing FAN_WB_ERROR after a due rest
> and after all the dust on FAN_FS_ERROR has settled?

I think it would make sense for me to continue working on it, yes. But,
before that, I think I still have some support to add to FAN_FS_ERROR,
like a detailed, fs-specific, info record, and an error location info
record, which has a use-case in Google Cloud environments. I have to
discuss priorities internally, but we (collabora) do have an interest in
supporting WB_ERROR too.

For the detailed error report, fanotify could have a new info record
that carries a structure sent out by the file system. fanotify could
handle the lifetime of this object, by keeping a larger mempool, or
delegate its allocation/destruction to the filesystem.

Like I proposed in an earlier version of FAN_FS_ERROR, the format could
be as simple as:

struct fanotify_error_data_info {
struct fanotify_event_info_header hdr;
char data[];
}

I think xfs, at least, would be able to make good use of this record with
xfs_scrub, as the xfs maintainers mentioned.

--
Gabriel Krisman Bertazi

2021-10-30 06:54:04

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH v9 00/31] file system-wide error monitoring

On Sat, Oct 30, 2021 at 1:24 AM Gabriel Krisman Bertazi
<[email protected]> wrote:
>
> Amir Goldstein <[email protected]> writes:
>
> >> Also, thank you both for the extensive review and ideas during the
> >> development of this series. It was really appreciated!
> >>
> >
> > Thank you for your appreciated effort!
> > It was a wild journey through some interesting experiments, but
> > you survived it well ;-)
> >
> > Would you be interested in pursuing FAN_WB_ERROR after a due rest
> > and after all the dust on FAN_FS_ERROR has settled?
>
> I think it would make sense for me to continue working on it, yes. But,
> before that, I think I still have some support to add to FAN_FS_ERROR,
> like a detailed, fs-specific, info record, and an error location info
> record, which has a use-case in Google Cloud environments. I have to
> discuss priorities internally, but we (collabora) do have an interest in
> supporting WB_ERROR too.
>
> For the detailed error report, fanotify could have a new info record
> that carries a structure sent out by the file system. fanotify could
> handle the lifetime of this object, by keeping a larger mempool, or
> delegate its allocation/destruction to the filesystem.
>

Before you try anything radical, please check the size of prospect
fs-specific data.
My hunch says that in most cases fs-specific data could fit cozy along side the
file handle within MAX_HANDLE_SZ and if this is true, then we do not need to
worry about extreme cases right now.
If there comes a time when we have a justified case of a filesystem that needs
to report much bigger fs-specific data, we can consider it then.
Until that time, we simply drop the over sized fs-specific data same as we do
if filesystem passed in a file handle larger than MAX_HANDLE_SZ.

> Like I proposed in an earlier version of FAN_FS_ERROR, the format could
> be as simple as:
>
> struct fanotify_error_data_info {
> struct fanotify_event_info_header hdr;
> char data[];
> }
>

We can add char data[] field to the end of struct fanotify_event_info_error.
It does not change the layout nor size of the structure and the info record
is variable size per definition anyway.

I know Jan didn't like this so much at the time and contemplated a
separate info record for filename, but eventually, fanotify_event_info_fid
also has an optional name following the unsigned char handle[].

> I think xfs, at least, would be able to make good use of this record with
> xfs_scrub, as the xfs maintainers mentioned.

I am not sure if that was the final conclusion.
xfs_scrub is proactive and should have no problem reporting its own findings,
but I have no objections to fs-specific details in FS_ERROR event.

Thanks,
Amir.