2019-06-28 15:51:01

by David Howells

[permalink] [raw]
Subject: [PATCH 0/6] Mount and superblock notifications [ver #5]


Here's a set of patches to adds VFS-related watches to the general
notification system to add sources of events for:

(1) Mount topology events, such as mounting, unmounting, mount expiry,
mount reconfiguration.

(2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O
errors (not complete yet).

One of the reasons for this is so that we can remove the issue of processes
having to repeatedly and regularly scan /proc/mounts, which has proven to
be a system performance problem. To further aid this, the fsinfo() syscall
on which this patch series depends, provides a way to access superblock and
mount information in binary form without the need to parse /proc/mounts.

LSM hooks are included are provided that allow an LSM to rule on whether or
not a watch may be set. Each of these hooks takes a different "watched
object" parameter, so they're not really shareable. The LSM should use
current's credentials. [Wanted by SELinux & Smack]

Watches are created with:

watch_mount(AT_FDCWD, "/", 0, fd, 0x03);
watch_sb(AT_FDCWD, "/mnt", 0, fd, 0x04);

where in all three cases, fd indicates the queue and the number after is a
tag between 0 and 255.

Further things that could be considered:

(1) Adding global superblock event queue.

(2) Propagating watches to child superblock over automounts.


The patches can be found here also:

http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications

Changes:

ver #5:

(*) The superblock watch and mount watch parts are split out into this set
from the core branch (notifications-core) as it depends on fsinfo().

David
---
David Howells (6):
security: Add hooks to rule on setting a superblock or mount watch
Adjust watch_queue documentation to mention mount and superblock watches.
vfs: Add a mount-notification facility
vfs: Add superblock notifications
fsinfo: Export superblock notification counter
Add sample notification program


Documentation/watch_queue.rst | 20 +++
arch/alpha/kernel/syscalls/syscall.tbl | 2
arch/arm/tools/syscall.tbl | 2
arch/arm64/include/asm/unistd.h | 2
arch/ia64/kernel/syscalls/syscall.tbl | 2
arch/m68k/kernel/syscalls/syscall.tbl | 2
arch/microblaze/kernel/syscalls/syscall.tbl | 2
arch/mips/kernel/syscalls/syscall_n32.tbl | 2
arch/mips/kernel/syscalls/syscall_n64.tbl | 2
arch/mips/kernel/syscalls/syscall_o32.tbl | 2
arch/parisc/kernel/syscalls/syscall.tbl | 2
arch/powerpc/kernel/syscalls/syscall.tbl | 2
arch/s390/kernel/syscalls/syscall.tbl | 2
arch/sh/kernel/syscalls/syscall.tbl | 2
arch/sparc/kernel/syscalls/syscall.tbl | 2
arch/x86/entry/syscalls/syscall_32.tbl | 2
arch/x86/entry/syscalls/syscall_64.tbl | 2
arch/xtensa/kernel/syscalls/syscall.tbl | 2
drivers/misc/Kconfig | 5 -
fs/Kconfig | 21 +++
fs/Makefile | 1
fs/fsinfo.c | 12 ++
fs/mount.h | 33 +++--
fs/mount_notify.c | 188 +++++++++++++++++++++++++++
fs/namespace.c | 16 ++
fs/super.c | 126 ++++++++++++++++++
include/linux/dcache.h | 1
include/linux/fs.h | 78 +++++++++++
include/linux/lsm_hooks.h | 16 ++
include/linux/security.h | 10 +
include/linux/syscalls.h | 4 +
include/uapi/asm-generic/unistd.h | 6 +
include/uapi/linux/fsinfo.h | 10 +
include/uapi/linux/watch_queue.h | 61 +++++++++
kernel/sys_ni.c | 2
samples/vfs/test-fsinfo.c | 13 ++
samples/watch_queue/watch_test.c | 76 +++++++++++
security/security.c | 10 +
38 files changed, 722 insertions(+), 21 deletions(-)
create mode 100644 fs/mount_notify.c


2019-06-28 15:51:08

by David Howells

[permalink] [raw]
Subject: [PATCH 1/6] security: Add hooks to rule on setting a superblock or mount watch [ver #5]

Add security hooks that will allow an LSM to rule on whether or not a watch
may be set on a mount or on a superblock. More than one hook is required
as the watches watch different types of object.

Signed-off-by: David Howells <[email protected]>
cc: Casey Schaufler <[email protected]>
cc: Stephen Smalley <[email protected]>
cc: [email protected]
---

include/linux/lsm_hooks.h | 16 ++++++++++++++++
include/linux/security.h | 10 ++++++++++
security/security.c | 10 ++++++++++
3 files changed, 36 insertions(+)

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 5fe387d35990..3a4d7a260572 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1433,6 +1433,18 @@
* from devices (as a global set).
* @watch: The watch object
*
+ * @watch_mount:
+ * Check to see if a process is allowed to watch for mount topology change
+ * notifications on a mount subtree.
+ * @watch: The watch object
+ * @path: The root of the subtree to watch.
+ *
+ * @watch_sb:
+ * Check to see if a process is allowed to watch for event notifications
+ * from a superblock.
+ * @watch: The watch object
+ * @sb: The superblock to watch.
+ *
* @post_notification:
* Check to see if a watch notification can be posted to a particular
* queue.
@@ -1721,6 +1733,8 @@ union security_list_options {
#ifdef CONFIG_WATCH_QUEUE
int (*watch_key)(struct watch *watch, struct key *key);
int (*watch_devices)(struct watch *watch);
+ int (*watch_mount)(struct watch *watch, struct path *path);
+ int (*watch_sb)(struct watch *watch, struct super_block *sb);
int (*post_notification)(const struct cred *w_cred,
const struct cred *cred,
struct watch_notification *n);
@@ -2007,6 +2021,8 @@ struct security_hook_heads {
#ifdef CONFIG_WATCH_QUEUE
struct hlist_head watch_key;
struct hlist_head watch_devices;
+ struct hlist_head watch_mount;
+ struct hlist_head watch_sb;
struct hlist_head post_notification;
#endif /* CONFIG_WATCH_QUEUE */
#ifdef CONFIG_SECURITY_NETWORK
diff --git a/include/linux/security.h b/include/linux/security.h
index 8a9645472232..74ec6d41eca5 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -401,6 +401,8 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen);
#ifdef CONFIG_WATCH_QUEUE
int security_watch_key(struct watch *watch, struct key *key);
int security_watch_devices(struct watch *watch);
+int security_watch_mount(struct watch *watch, struct path *path);
+int security_watch_sb(struct watch *watch, struct super_block *sb);
int security_post_notification(const struct cred *w_cred,
const struct cred *cred,
struct watch_notification *n);
@@ -1233,6 +1235,14 @@ static inline int security_watch_devices(struct watch *watch)
{
return 0;
}
+static inline int security_watch_mount(struct watch *watch, struct path *path)
+{
+ return 0;
+}
+static inline int security_watch_sb(struct watch *watch, struct super_block *sb)
+{
+ return 0;
+}
static inline int security_post_notification(const struct cred *w_cred,
const struct cred *cred,
struct watch_notification *n)
diff --git a/security/security.c b/security/security.c
index 1390fb1203e4..37fec6cec905 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1940,6 +1940,16 @@ int security_watch_devices(struct watch *watch)
return call_int_hook(watch_devices, 0, watch);
}

+int security_watch_mount(struct watch *watch, struct path *path)
+{
+ return call_int_hook(watch_mount, 0, watch, path);
+}
+
+int security_watch_sb(struct watch *watch, struct super_block *sb)
+{
+ return call_int_hook(watch_sb, 0, watch, sb);
+}
+
int security_post_notification(const struct cred *w_cred,
const struct cred *cred,
struct watch_notification *n)

2019-06-28 15:51:24

by David Howells

[permalink] [raw]
Subject: [PATCH 4/6] vfs: Add superblock notifications [ver #5]

Add a superblock event notification facility whereby notifications about
superblock events, such as I/O errors (EIO), quota limits being hit
(EDQUOT) and running out of space (ENOSPC) can be reported to a monitoring
process asynchronously. Note that this does not cover vfsmount topology
changes. watch_mount() is used for that.

Firstly, an event queue needs to be created:

fd = open("/dev/event_queue", O_RDWR);
ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n);

then a notification can be set up to report notifications via that queue:

struct watch_notification_filter filter = {
.nr_filters = 1,
.filters = {
[0] = {
.type = WATCH_TYPE_SB_NOTIFY,
.subtype_filter[0] = UINT_MAX,
},
},
};
ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter);
watch_sb(AT_FDCWD, "/home/dhowells", 0, fd, 0x03);

In this case, it would let me monitor my own homedir for events. After
setting the watch, records will be placed into the queue when, for example,
as superblock switches between read-write and read-only. Records are of
the following format:

struct superblock_notification {
struct watch_notification watch;
__u64 sb_id;
} *n;

Where:

n->watch.type will be WATCH_TYPE_SB_NOTIFY.

n->watch.subtype will indicate the type of event, such as
NOTIFY_SUPERBLOCK_READONLY.

n->watch.info & WATCH_INFO_LENGTH will indicate the length of the
record.

n->watch.info & WATCH_INFO_ID will be the fifth argument to
watch_sb(), shifted.

n->watch.info & NOTIFY_SUPERBLOCK_IS_NOW_RO will be used for
NOTIFY_SUPERBLOCK_READONLY, being set if the superblock becomes
R/O, and being cleared otherwise.

n->sb_id will be the ID of the superblock, as can be retrieved with
the fsinfo() syscall, as part of the fsinfo_sb_notifications
attribute in the the watch_id field.

Note that it is permissible for event records to be of variable length -
or, at least, the length may be dependent on the subtype. Note also that
the queue can be shared between multiple notifications of various types.

Signed-off-by: David Howells <[email protected]>
---

arch/alpha/kernel/syscalls/syscall.tbl | 1
arch/arm/tools/syscall.tbl | 1
arch/arm64/include/asm/unistd.h | 2
arch/ia64/kernel/syscalls/syscall.tbl | 1
arch/m68k/kernel/syscalls/syscall.tbl | 1
arch/microblaze/kernel/syscalls/syscall.tbl | 1
arch/mips/kernel/syscalls/syscall_n32.tbl | 1
arch/mips/kernel/syscalls/syscall_n64.tbl | 1
arch/mips/kernel/syscalls/syscall_o32.tbl | 1
arch/parisc/kernel/syscalls/syscall.tbl | 1
arch/powerpc/kernel/syscalls/syscall.tbl | 1
arch/s390/kernel/syscalls/syscall.tbl | 1
arch/sh/kernel/syscalls/syscall.tbl | 1
arch/sparc/kernel/syscalls/syscall.tbl | 1
arch/x86/entry/syscalls/syscall_32.tbl | 1
arch/x86/entry/syscalls/syscall_64.tbl | 1
arch/xtensa/kernel/syscalls/syscall.tbl | 1
fs/Kconfig | 12 +++
fs/super.c | 125 +++++++++++++++++++++++++++
include/linux/fs.h | 77 +++++++++++++++++
include/linux/syscalls.h | 2
include/uapi/asm-generic/unistd.h | 4 +
include/uapi/linux/watch_queue.h | 31 ++++++-
kernel/sys_ni.c | 1
24 files changed, 267 insertions(+), 3 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index fbf0d0f5cfb3..2fa4a8008892 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -476,3 +476,4 @@
544 common fsinfo sys_fsinfo
545 common watch_devices sys_watch_devices
546 common watch_mount sys_watch_mount
+547 common watch_sb sys_watch_sb
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index a15324ed6419..29d110112053 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -450,3 +450,4 @@
434 common fsinfo sys_fsinfo
435 common watch_devices sys_watch_devices
436 common watch_mount sys_watch_mount
+437 common watch_sb sys_watch_sb
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index d04eb26cfaeb..24480c2d95da 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -44,7 +44,7 @@
#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
#define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)

-#define __NR_compat_syscalls 436
+#define __NR_compat_syscalls 437
#endif

#define __ARCH_WANT_SYS_CLONE
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index 2e7becfa2f56..43d789bebdc5 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -357,3 +357,4 @@
434 common fsinfo sys_fsinfo
435 common watch_devices sys_watch_devices
436 common watch_mount sys_watch_mount
+437 common watch_sb sys_watch_sb
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 3431e8df17f5..3cc310a4aca2 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -436,3 +436,4 @@
434 common fsinfo sys_fsinfo
435 common watch_devices sys_watch_devices
436 common watch_mount sys_watch_mount
+437 common watch_sb sys_watch_sb
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index fbe3c932c3d8..63ec96cf2856 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -442,3 +442,4 @@
434 common fsinfo sys_fsinfo
435 common watch_devices sys_watch_devices
436 common watch_mount sys_watch_mount
+437 common watch_sb sys_watch_sb
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index e2f6e92ed8c5..fa3f3973e46d 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -375,3 +375,4 @@
434 n32 fsinfo sys_fsinfo
435 n32 watch_devices sys_watch_devices
436 n32 watch_mount sys_watch_mount
+437 n32 watch_sb sys_watch_sb
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index bdd1f98f3515..e4bb2b7fb1fe 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -351,3 +351,4 @@
434 n64 fsinfo sys_fsinfo
435 n64 watch_devices sys_watch_devices
436 n64 watch_mount sys_watch_mount
+437 n64 watch_sb sys_watch_sb
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index ff992a6fdd95..0ac3fce74d0b 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -424,3 +424,4 @@
434 o32 fsinfo sys_fsinfo
435 o32 watch_devices sys_watch_devices
436 o32 watch_mount sys_watch_mount
+437 o32 watch_sb sys_watch_sb
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 11ae6854d49c..cc841a941ebd 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -433,3 +433,4 @@
434 common fsinfo sys_fsinfo
435 common watch_devices sys_watch_devices
436 common watch_mount sys_watch_mount
+437 common watch_sb sys_watch_sb
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 7bc79d837385..7116d18f5189 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -518,3 +518,4 @@
434 common fsinfo sys_fsinfo
435 common watch_devices sys_watch_devices
436 common watch_mount sys_watch_mount
+437 common watch_sb sys_watch_sb
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index e2f8785d1c4a..1048060ea28d 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -439,3 +439,4 @@
434 common fsinfo sys_fsinfo sys_fsinfo
435 common watch_devices sys_watch_devices sys_watch_devices
436 common watch_mount sys_watch_mount sys_watch_mount
+437 common watch_sb sys_watch_sb sys_watch_sb
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index d94d71558742..d9dcab80b9b4 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -439,3 +439,4 @@
434 common fsinfo sys_fsinfo
435 common watch_devices sys_watch_devices
436 common watch_mount sys_watch_mount
+437 common watch_sb sys_watch_sb
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 9f7fa4f381cc..f5b052a7bd32 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -482,3 +482,4 @@
434 common fsinfo sys_fsinfo
435 common watch_devices sys_watch_devices
436 common watch_mount sys_watch_mount
+437 common watch_sb sys_watch_sb
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ea34893de5b9..151459569d8e 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -441,3 +441,4 @@
434 i386 fsinfo sys_fsinfo __ia32_sys_fsinfo
435 i386 watch_devices sys_watch_devices __ia32_sys_watch_devices
436 i386 watch_mount sys_watch_mount __ia32_sys_watch_mount
+437 i386 watch_sb sys_watch_sb __ia32_sys_watch_sb
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index b6f3fdbee456..cd4c854607ba 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -358,6 +358,7 @@
434 common fsinfo __x64_sys_fsinfo
435 common watch_devices __x64_sys_watch_devices
436 common watch_mount __x64_sys_watch_mount
+437 common watch_sb __x64_sys_watch_sb

#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 570b23dc5582..7d07362460ba 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -407,3 +407,4 @@
434 common fsinfo sys_fsinfo
435 common watch_devices sys_watch_devices
436 common watch_mount sys_watch_mount
+437 common watch_sb sys_watch_sb
diff --git a/fs/Kconfig b/fs/Kconfig
index a26bbe27a791..fc0fa4b35f3c 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -130,6 +130,18 @@ config MOUNT_NOTIFICATIONS
device to handle the notification buffer and provides the
mount_notify() system call to enable/disable watchpoints.

+config SB_NOTIFICATIONS
+ bool "Superblock event notifications"
+ select WATCH_QUEUE
+ help
+ This option provides support for receiving superblock event
+ notifications. This makes use of the /dev/watch_queue misc device to
+ handle the notification buffer and provides the sb_notify() system
+ call to enable/disable watches.
+
+ Events can include things like changing between R/W and R/O, EIO
+ generation, ENOSPC generation and EDQUOT generation.
+
source "fs/quota/Kconfig"

source "fs/autofs/Kconfig"
diff --git a/fs/super.c b/fs/super.c
index c04f9481a708..9f631cd4f93b 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -36,6 +36,8 @@
#include <linux/lockdep.h>
#include <linux/user_namespace.h>
#include <linux/fs_context.h>
+#include <linux/syscalls.h>
+#include <linux/namei.h>
#include <uapi/linux/mount.h>
#include "internal.h"

@@ -350,6 +352,10 @@ void deactivate_locked_super(struct super_block *s)
{
struct file_system_type *fs = s->s_type;
if (atomic_dec_and_test(&s->s_active)) {
+#ifdef CONFIG_SB_NOTIFICATIONS
+ if (s->s_watchers)
+ remove_watch_list(s->s_watchers);
+#endif
cleancache_invalidate_fs(s);
unregister_shrinker(&s->s_shrink);
fs->kill_sb(s);
@@ -1022,6 +1028,8 @@ int reconfigure_super(struct fs_context *fc)
/* Needs to be ordered wrt mnt_is_readonly() */
smp_wmb();
sb->s_readonly_remount = 0;
+ notify_sb(sb, NOTIFY_SUPERBLOCK_READONLY,
+ remount_ro ? NOTIFY_SUPERBLOCK_IS_NOW_RO : 0);

/*
* Some filesystems modify their metadata via some other path than the
@@ -1825,3 +1833,120 @@ int thaw_super(struct super_block *sb)
return thaw_super_locked(sb);
}
EXPORT_SYMBOL(thaw_super);
+
+#ifdef CONFIG_SB_NOTIFICATIONS
+/*
+ * Post superblock notifications.
+ */
+void post_sb_notification(struct super_block *s, struct superblock_notification *n)
+{
+ post_watch_notification(s->s_watchers, &n->watch, current_cred(),
+ s->s_unique_id);
+}
+
+/**
+ * sys_watch_sb - Watch for superblock events.
+ * @dfd: Base directory to pathwalk from or fd referring to superblock.
+ * @filename: Path to superblock to place the watch upon
+ * @at_flags: Pathwalk control flags
+ * @watch_fd: The watch queue to send notifications to.
+ * @watch_id: The watch ID to be placed in the notification (-1 to remove watch)
+ */
+SYSCALL_DEFINE5(watch_sb,
+ int, dfd,
+ const char __user *, filename,
+ unsigned int, at_flags,
+ int, watch_fd,
+ int, watch_id)
+{
+ struct watch_queue *wqueue;
+ struct super_block *s;
+ struct watch_list *wlist = NULL;
+ struct watch *watch;
+ struct path path;
+ unsigned int lookup_flags =
+ LOOKUP_DIRECTORY | LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT;
+ int ret;
+
+ if (watch_id < -1 || watch_id > 0xff)
+ return -EINVAL;
+ if ((at_flags & ~(AT_NO_AUTOMOUNT | AT_EMPTY_PATH)) != 0)
+ return -EINVAL;
+ if (at_flags & AT_NO_AUTOMOUNT)
+ lookup_flags &= ~LOOKUP_AUTOMOUNT;
+ if (at_flags & AT_EMPTY_PATH)
+ lookup_flags |= LOOKUP_EMPTY;
+
+ ret = user_path_at(dfd, filename, at_flags, &path);
+ if (ret)
+ return ret;
+
+ ret = inode_permission(path.dentry->d_inode, MAY_EXEC);
+ if (ret)
+ goto err_path;
+
+ wqueue = get_watch_queue(watch_fd);
+ if (IS_ERR(wqueue))
+ goto err_path;
+
+ s = path.dentry->d_sb;
+ if (watch_id >= 0) {
+ ret = -ENOMEM;
+ if (!s->s_watchers) {
+ wlist = kzalloc(sizeof(*wlist), GFP_KERNEL);
+ if (!wlist)
+ goto err_wqueue;
+ init_watch_list(wlist, NULL);
+ }
+
+ watch = kzalloc(sizeof(*watch), GFP_KERNEL);
+ if (!watch)
+ goto err_wlist;
+
+ init_watch(watch, wqueue);
+ watch->id = s->s_unique_id;
+ watch->private = s;
+ watch->info_id = (u32)watch_id << 24;
+
+ ret = security_watch_sb(watch, s);
+ if (ret < 0)
+ goto err_watch;
+
+ down_write(&s->s_umount);
+ ret = -EIO;
+ if (atomic_read(&s->s_active)) {
+ if (!s->s_watchers) {
+ s->s_watchers = wlist;
+ wlist = NULL;
+ }
+
+ ret = add_watch_to_object(watch, s->s_watchers);
+ if (ret == 0) {
+ spin_lock(&sb_lock);
+ s->s_count++;
+ spin_unlock(&sb_lock);
+ watch = NULL;
+ }
+ }
+ up_write(&s->s_umount);
+ } else {
+ ret = -EBADSLT;
+ if (READ_ONCE(s->s_watchers)) {
+ down_write(&s->s_umount);
+ ret = remove_watch_from_object(s->s_watchers, wqueue,
+ s->s_unique_id, false);
+ up_write(&s->s_umount);
+ }
+ }
+
+err_watch:
+ kfree(watch);
+err_wlist:
+ kfree(wlist);
+err_wqueue:
+ put_watch_queue(wqueue);
+err_path:
+ path_put(&path);
+ return ret;
+}
+#endif
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 61098cded376..42adb7a391a9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -40,6 +40,7 @@
#include <linux/fs_types.h>
#include <linux/build_bug.h>
#include <linux/stddef.h>
+#include <linux/watch_queue.h>

#include <asm/byteorder.h>
#include <uapi/linux/fs.h>
@@ -1530,6 +1531,10 @@ struct super_block {

/* Superblock event notifications */
u64 s_unique_id;
+
+#ifdef CONFIG_SB_NOTIFICATIONS
+ struct watch_list *s_watchers;
+#endif
} __randomize_layout;

/* Helper functions so that in most cases filesystems will
@@ -3554,4 +3559,76 @@ static inline struct sock *io_uring_get_socket(struct file *file)
}
#endif

+extern void post_sb_notification(struct super_block *, struct superblock_notification *);
+
+/**
+ * notify_sb: Post simple superblock notification.
+ * @s: The superblock the notification is about.
+ * @subtype: The type of notification.
+ * @info: WATCH_INFO_FLAG_* flags to be set in the record.
+ */
+static inline void notify_sb(struct super_block *s,
+ enum superblock_notification_type subtype,
+ u32 info)
+{
+#ifdef CONFIG_SB_NOTIFICATIONS
+ if (unlikely(s->s_watchers)) {
+ struct superblock_notification n = {
+ .watch.type = WATCH_TYPE_SB_NOTIFY,
+ .watch.subtype = subtype,
+ .watch.info = watch_sizeof(n) | info,
+ .sb_id = s->s_unique_id,
+ };
+
+ post_sb_notification(s, &n);
+ }
+
+#endif
+}
+
+/**
+ * notify_sb_error: Post superblock error notification.
+ * @s: The superblock the notification is about.
+ * @error: The error number to be recorded.
+ */
+static inline int notify_sb_error(struct super_block *s, int error)
+{
+#ifdef CONFIG_SB_NOTIFICATIONS
+ if (unlikely(s->s_watchers)) {
+ struct superblock_error_notification n = {
+ .s.watch.type = WATCH_TYPE_SB_NOTIFY,
+ .s.watch.subtype = NOTIFY_SUPERBLOCK_ERROR,
+ .s.watch.info = watch_sizeof(n),
+ .s.sb_id = s->s_unique_id,
+ .error_number = error,
+ .error_cookie = 0,
+ };
+
+ post_sb_notification(s, &n.s);
+ }
+#endif
+ return error;
+}
+
+/**
+ * notify_sb_EDQUOT: Post superblock quota overrun notification.
+ * @s: The superblock the notification is about.
+ */
+static inline int notify_sb_EQDUOT(struct super_block *s)
+{
+#ifdef CONFIG_SB_NOTIFICATIONS
+ if (unlikely(s->s_watchers)) {
+ struct superblock_notification n = {
+ .watch.type = WATCH_TYPE_SB_NOTIFY,
+ .watch.subtype = NOTIFY_SUPERBLOCK_EDQUOT,
+ .watch.info = watch_sizeof(n),
+ .sb_id = s->s_unique_id,
+ };
+
+ post_sb_notification(s, &n);
+ }
+#endif
+ return -EDQUOT;
+}
+
#endif /* _LINUX_FS_H */
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 8b0ab1594a62..d27173aa22fe 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1004,6 +1004,8 @@ asmlinkage long sys_fsinfo(int dfd, const char __user *pathname,
asmlinkage long sys_watch_devices(int watch_fd, int watch_id, unsigned int flags);
asmlinkage long sys_watch_mount(int dfd, const char __user *path,
unsigned int at_flags, int watch_fd, int watch_id);
+asmlinkage long sys_watch_sb(int dfd, const char __user *path,
+ unsigned int at_flags, int watch_fd, int watch_id);

/*
* Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 85977cfa853d..f74e6fb3c314 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -850,9 +850,11 @@ __SYSCALL(__NR_fsinfo, sys_fsinfo)
__SYSCALL(__NR_watch_devices, sys_watch_devices)
#define __NR_watch_mount 436
__SYSCALL(__NR_watch_mount, sys_watch_mount)
+#define __NR_watch_sb 437
+__SYSCALL(__NR_watch_sb, sys_watch_sb)

#undef __NR_syscalls
-#define __NR_syscalls 437
+#define __NR_syscalls 438

/*
* 32 bit systems traditionally used different
diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h
index 1dce57287ded..c8f0adefd8de 100644
--- a/include/uapi/linux/watch_queue.h
+++ b/include/uapi/linux/watch_queue.h
@@ -14,7 +14,8 @@ enum watch_notification_type {
WATCH_TYPE_BLOCK_NOTIFY = 2, /* Block layer event notification */
WATCH_TYPE_USB_NOTIFY = 3, /* USB subsystem event notification */
WATCH_TYPE_MOUNT_NOTIFY = 4, /* Mount topology change notification */
- WATCH_TYPE___NR = 5
+ WATCH_TYPE_SB_NOTIFY = 5, /* Superblock event notification */
+ WATCH_TYPE___NR = 6
};

enum watch_meta_notification_subtype {
@@ -197,4 +198,32 @@ struct mount_notification {
__u32 changed_mount; /* The mount that got changed */
};

+/*
+ * Type of superblock notification.
+ */
+enum superblock_notification_type {
+ NOTIFY_SUPERBLOCK_READONLY = 0, /* Filesystem toggled between R/O and R/W */
+ NOTIFY_SUPERBLOCK_ERROR = 1, /* Error in filesystem or blockdev */
+ NOTIFY_SUPERBLOCK_EDQUOT = 2, /* EDQUOT notification */
+ NOTIFY_SUPERBLOCK_NETWORK = 3, /* Network status change */
+};
+
+#define NOTIFY_SUPERBLOCK_IS_NOW_RO WATCH_INFO_FLAG_0 /* Superblock changed to R/O */
+
+/*
+ * Superblock notification record.
+ * - watch.type = WATCH_TYPE_MOUNT_NOTIFY
+ * - watch.subtype = enum superblock_notification_subtype
+ */
+struct superblock_notification {
+ struct watch_notification watch; /* WATCH_TYPE_SB_NOTIFY */
+ __u64 sb_id; /* 64-bit superblock ID [fsinfo_ids::f_sb_id] */
+};
+
+struct superblock_error_notification {
+ struct superblock_notification s; /* subtype = notify_superblock_error */
+ __u32 error_number;
+ __u32 error_cookie;
+};
+
#endif /* _UAPI_LINUX_WATCH_QUEUE_H */
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 3755d0e5d748..4d559ab64de4 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -54,6 +54,7 @@ COND_SYSCALL(io_uring_register);
COND_SYSCALL(fsinfo);
COND_SYSCALL(watch_devices);
COND_SYSCALL(watch_mount);
+COND_SYSCALL(watch_sb);

/* fs/xattr.c */


2019-06-28 15:51:25

by David Howells

[permalink] [raw]
Subject: [PATCH 2/6] Adjust watch_queue documentation to mention mount and superblock watches. [ver #5]

Signed-off-by: David Howells <[email protected]>
---

Documentation/watch_queue.rst | 20 +++++++++++++++++++-
drivers/misc/Kconfig | 5 +++--
2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst
index 4087a8e670a8..1bec2018d549 100644
--- a/Documentation/watch_queue.rst
+++ b/Documentation/watch_queue.rst
@@ -13,6 +13,10 @@ receive notifications from the kernel. This can be used in conjunction with::

* USB subsystem event notifications

+ * Mount topology change notifications
+
+ * Superblock event notifications
+

The notifications buffers can be enabled by:

@@ -324,6 +328,19 @@ Any particular buffer can be fed from multiple sources. Sources include:
for buses and devices. Watchpoints of this type are set on the global
device watch list.

+ * WATCH_TYPE_MOUNT_NOTIFY
+
+ Notifications of this type indicate mount tree topology changes and mount
+ attribute changes. A watch can be set on a particular file or directory
+ and notifications from the path subtree rooted at that point will be
+ intercepted.
+
+ * WATCH_TYPE_SB_NOTIFY
+
+ Notifications of this type indicate superblock events, such as quota limits
+ being hit, I/O errors being produced or network server loss/reconnection.
+ Watches of this type are set directly on superblocks.
+

Event Filtering
===============
@@ -365,7 +382,8 @@ Where:
(watch.info & info_mask) == info_filter

This could be used, for example, to ignore events that are not exactly on
- the watched point in a mount tree.
+ the watched point in a mount tree by specifying NOTIFY_MOUNT_IN_SUBTREE
+ must be 0.

* ``subtype_filter`` is a bitmask indicating the subtypes that are of
interest. Bit 0 of subtype_filter[0] corresponds to subtype 0, bit 1 to
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index e53f88783fe7..8b13103b17c0 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -11,8 +11,9 @@ config WATCH_QUEUE
help
This is a general notification queue for the kernel to pass events to
userspace through a mmap()'able ring buffer. It can be used in
- conjunction with watches for key/keyring change notifications and device
- notifications.
+ conjunction with watches for key/keyring change notifications, device
+ notifications, mount topology change notifications, and superblock
+ change notifications.

Note that in theory this should work fine with NOMMU, but I'm not
sure how to make that work.

2019-06-28 15:51:35

by David Howells

[permalink] [raw]
Subject: [PATCH 6/6] Add sample notification program [ver #5]

This needs to be linked with -lkeyutils.

It is run like:

./watch_test

and watches "/" for mount changes and the current session keyring for key
changes:

# keyctl add user a a @s
1035096409
# keyctl unlink 1035096409 @s
# mount -t tmpfs none /mnt/nfsv3tcp/
# umount /mnt/nfsv3tcp

producing:

# ./watch_test
ptrs h=4 t=2 m=20003
NOTIFY[00000004-00000002] ty=0003 sy=0002 i=01000010
KEY 2ffc2e5d change=2[linked] aux=1035096409
ptrs h=6 t=4 m=20003
NOTIFY[00000006-00000004] ty=0003 sy=0003 i=01000010
KEY 2ffc2e5d change=3[unlinked] aux=1035096409
ptrs h=8 t=6 m=20003
NOTIFY[00000008-00000006] ty=0001 sy=0000 i=02000010
MOUNT 00000013 change=0[new_mount] aux=168
ptrs h=a t=8 m=20003
NOTIFY[0000000a-00000008] ty=0001 sy=0001 i=02000010
MOUNT 00000013 change=1[unmount] aux=168

Other events may be produced, such as with a failing disk:

ptrs h=5 t=2 m=6000004
NOTIFY[00000005-00000002] ty=0004 sy=0006 i=04000018
BLOCK 00800050 e=6[critical medium] s=5be8

This corresponds to:

print_req_error: critical medium error, dev sdf, sector 23528 flags 0

in dmesg.

Signed-off-by: David Howells <[email protected]>
---

samples/watch_queue/watch_test.c | 76 ++++++++++++++++++++++++++++++++++++++
1 file changed, 76 insertions(+)

diff --git a/samples/watch_queue/watch_test.c b/samples/watch_queue/watch_test.c
index f792c13614f4..0018ecac188a 100644
--- a/samples/watch_queue/watch_test.c
+++ b/samples/watch_queue/watch_test.c
@@ -30,6 +30,12 @@
#ifndef __NR_watch_devices
#define __NR_watch_devices -1
#endif
+#ifndef __NR_watch_mount
+#define __NR_watch_mount -1
+#endif
+#ifndef __NR_watch_sb
+#define __NR_watch_sb -1
+#endif

#define BUF_SIZE 4

@@ -61,6 +67,47 @@ static void saw_key_change(struct watch_notification *n)
k->key_id, n->subtype, key_subtypes[n->subtype], k->aux);
}

+static const char *mount_subtypes[256] = {
+ [NOTIFY_MOUNT_NEW_MOUNT] = "new_mount",
+ [NOTIFY_MOUNT_UNMOUNT] = "unmount",
+ [NOTIFY_MOUNT_EXPIRY] = "expiry",
+ [NOTIFY_MOUNT_READONLY] = "readonly",
+ [NOTIFY_MOUNT_SETATTR] = "setattr",
+ [NOTIFY_MOUNT_MOVE_FROM] = "move_from",
+ [NOTIFY_MOUNT_MOVE_TO] = "move_to",
+};
+
+static void saw_mount_change(struct watch_notification *n)
+{
+ struct mount_notification *m = (struct mount_notification *)n;
+ unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT;
+
+ if (len != sizeof(struct mount_notification) / WATCH_LENGTH_GRANULARITY)
+ return;
+
+ printf("MOUNT %08x change=%u[%s] aux=%u\n",
+ m->triggered_on, n->subtype, mount_subtypes[n->subtype], m->changed_mount);
+}
+
+static const char *super_subtypes[256] = {
+ [NOTIFY_SUPERBLOCK_READONLY] = "readonly",
+ [NOTIFY_SUPERBLOCK_ERROR] = "error",
+ [NOTIFY_SUPERBLOCK_EDQUOT] = "edquot",
+ [NOTIFY_SUPERBLOCK_NETWORK] = "network",
+};
+
+static void saw_super_change(struct watch_notification *n)
+{
+ struct superblock_notification *s = (struct superblock_notification *)n;
+ unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT;
+
+ if (len < sizeof(struct superblock_notification) / WATCH_LENGTH_GRANULARITY)
+ return;
+
+ printf("SUPER %08llx change=%u[%s]\n",
+ s->sb_id, n->subtype, super_subtypes[n->subtype]);
+}
+
static const char *block_subtypes[256] = {
[NOTIFY_BLOCK_ERROR_TIMEOUT] = "timeout",
[NOTIFY_BLOCK_ERROR_NO_SPACE] = "critical space allocation",
@@ -159,6 +206,12 @@ static int consumer(int fd, struct watch_queue_buffer *buf)
case WATCH_TYPE_USB_NOTIFY:
saw_usb_event(n);
break;
+ case WATCH_TYPE_MOUNT_NOTIFY:
+ saw_mount_change(n);
+ break;
+ case WATCH_TYPE_SB_NOTIFY:
+ saw_super_change(n);
+ break;
}

tail += (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT;
@@ -186,6 +239,19 @@ static struct watch_notification_filter filter = {
.type = WATCH_TYPE_USB_NOTIFY,
.subtype_filter[0] = UINT_MAX,
},
+ [3] = {
+ .type = WATCH_TYPE_MOUNT_NOTIFY,
+ // Reject move-from notifications
+ .subtype_filter[0] = UINT_MAX & ~(1 << NOTIFY_MOUNT_MOVE_FROM),
+ },
+ [4] = {
+ .type = WATCH_TYPE_SB_NOTIFY,
+ // Only accept notification of changes to R/O state
+ .subtype_filter[0] = (1 << NOTIFY_SUPERBLOCK_READONLY),
+ // Only accept notifications of change-to-R/O
+ .info_mask = WATCH_INFO_FLAG_0,
+ .info_filter = WATCH_INFO_FLAG_0,
+ },
},
};

@@ -229,5 +295,15 @@ int main(int argc, char **argv)
exit(1);
}

+ if (syscall(__NR_watch_mount, AT_FDCWD, "/", 0, fd, 0x02) == -1) {
+ perror("watch_mount");
+ exit(1);
+ }
+
+ if (syscall(__NR_watch_sb, AT_FDCWD, "/mnt", 0, fd, 0x03) == -1) {
+ perror("watch_sb");
+ exit(1);
+ }
+
return consumer(fd, buf);
}

2019-06-28 15:52:34

by David Howells

[permalink] [raw]
Subject: [PATCH 3/6] vfs: Add a mount-notification facility [ver #5]

Add a mount notification facility whereby notifications about changes in
mount topology and configuration can be received. Note that this only
covers vfsmount topology changes and not superblock events. A separate
facility will be added for that.

Firstly, an event queue needs to be created:

fd = open("/dev/event_queue", O_RDWR);
ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n);

then a notification can be set up to report notifications via that queue:

struct watch_notification_filter filter = {
.nr_filters = 1,
.filters = {
[0] = {
.type = WATCH_TYPE_MOUNT_NOTIFY,
.subtype_filter[0] = UINT_MAX,
},
},
};
ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter);
watch_mount(AT_FDCWD, "/", 0, fd, 0x02);

In this case, it would let me monitor the mount topology subtree rooted at
"/" for events. Mount notifications propagate up the tree towards the
root, so a watch will catch all of the events happening in the subtree
rooted at the watch.

After setting the watch, records will be placed into the queue when, for
example, as superblock switches between read-write and read-only. Records
are of the following format:

struct mount_notification {
struct watch_notification watch;
__u32 triggered_on;
__u32 changed_mount;
} *n;

Where:

n->watch.type will be WATCH_TYPE_MOUNT_NOTIFY.

n->watch.subtype will indicate the type of event, such as
NOTIFY_MOUNT_NEW_MOUNT.

n->watch.info & WATCH_INFO_LENGTH will indicate the length of the
record.

n->watch.info & WATCH_INFO_ID will be the fifth argument to
watch_mount(), shifted.

n->watch.info & NOTIFY_MOUNT_IN_SUBTREE if true indicates that the
notifcation was generated in the mount subtree rooted at the watch,
and not actually in the watch itself.

n->watch.info & NOTIFY_MOUNT_IS_RECURSIVE if true indicates that
the notifcation was generated by an event (eg. SETATTR) that was
applied recursively. The notification is only generated for the
object that initially triggered it.

n->watch.info & NOTIFY_MOUNT_IS_NOW_RO will be used for
NOTIFY_MOUNT_READONLY, being set if the superblock becomes R/O, and
being cleared otherwise, and for NOTIFY_MOUNT_NEW_MOUNT, being set
if the new mount is a submount (e.g. an automount).

n->watch.info & NOTIFY_MOUNT_IS_SUBMOUNT if true indicates that the
NOTIFY_MOUNT_NEW_MOUNT notification is in response to a mount
performed by the kernel (e.g. an automount).

n->triggered_on indicates the ID of the mount on which the watch
was installed.

n->changed_mount indicates the ID of the mount that was affected.

The mount IDs can be retrieved with the fsinfo() syscall, using the
fsinfo_mount_info and fsinfo_mount_child attributes. There are change
notification counters there too for when a buffer overrun occurs, thereby
allowing the mount tree to be quickly rescanned.

Note that it is permissible for event records to be of variable length -
or, at least, the length may be dependent on the subtype. Note also that
the queue can be shared between multiple notifications of various types.

Signed-off-by: David Howells <[email protected]>
---

arch/alpha/kernel/syscalls/syscall.tbl | 1
arch/arm/tools/syscall.tbl | 1
arch/arm64/include/asm/unistd.h | 2
arch/ia64/kernel/syscalls/syscall.tbl | 1
arch/m68k/kernel/syscalls/syscall.tbl | 1
arch/microblaze/kernel/syscalls/syscall.tbl | 1
arch/mips/kernel/syscalls/syscall_n32.tbl | 1
arch/mips/kernel/syscalls/syscall_n64.tbl | 1
arch/mips/kernel/syscalls/syscall_o32.tbl | 1
arch/parisc/kernel/syscalls/syscall.tbl | 1
arch/powerpc/kernel/syscalls/syscall.tbl | 1
arch/s390/kernel/syscalls/syscall.tbl | 1
arch/sh/kernel/syscalls/syscall.tbl | 1
arch/sparc/kernel/syscalls/syscall.tbl | 1
arch/x86/entry/syscalls/syscall_32.tbl | 1
arch/x86/entry/syscalls/syscall_64.tbl | 1
arch/xtensa/kernel/syscalls/syscall.tbl | 1
fs/Kconfig | 9 +
fs/Makefile | 1
fs/mount.h | 33 +++--
fs/mount_notify.c | 188 +++++++++++++++++++++++++++
fs/namespace.c | 16 ++
include/linux/dcache.h | 1
include/linux/syscalls.h | 2
include/uapi/asm-generic/unistd.h | 4 -
include/uapi/linux/watch_queue.h | 32 ++++-
kernel/sys_ni.c | 1
27 files changed, 287 insertions(+), 18 deletions(-)
create mode 100644 fs/mount_notify.c

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index 1aee39ab62ac..fbf0d0f5cfb3 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -475,3 +475,4 @@
543 common fspick sys_fspick
544 common fsinfo sys_fsinfo
545 common watch_devices sys_watch_devices
+546 common watch_mount sys_watch_mount
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 35e4557af12d..a15324ed6419 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -449,3 +449,4 @@
433 common fspick sys_fspick
434 common fsinfo sys_fsinfo
435 common watch_devices sys_watch_devices
+436 common watch_mount sys_watch_mount
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index e8f7d95a1481..d04eb26cfaeb 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -44,7 +44,7 @@
#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
#define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)

-#define __NR_compat_syscalls 435
+#define __NR_compat_syscalls 436
#endif

#define __ARCH_WANT_SYS_CLONE
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index 796e60d26d47..2e7becfa2f56 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -356,3 +356,4 @@
433 common fspick sys_fspick
434 common fsinfo sys_fsinfo
435 common watch_devices sys_watch_devices
+436 common watch_mount sys_watch_mount
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 64ac06b4ac16..3431e8df17f5 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -435,3 +435,4 @@
433 common fspick sys_fspick
434 common fsinfo sys_fsinfo
435 common watch_devices sys_watch_devices
+436 common watch_mount sys_watch_mount
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index cfba0cdbdf26..fbe3c932c3d8 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -441,3 +441,4 @@
433 common fspick sys_fspick
434 common fsinfo sys_fsinfo
435 common watch_devices sys_watch_devices
+436 common watch_mount sys_watch_mount
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 23a9ccb23113..e2f6e92ed8c5 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -374,3 +374,4 @@
433 n32 fspick sys_fspick
434 n32 fsinfo sys_fsinfo
435 n32 watch_devices sys_watch_devices
+436 n32 watch_mount sys_watch_mount
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index 43e25257fa13..bdd1f98f3515 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -350,3 +350,4 @@
433 n64 fspick sys_fspick
434 n64 fsinfo sys_fsinfo
435 n64 watch_devices sys_watch_devices
+436 n64 watch_mount sys_watch_mount
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index f3e66772e497..ff992a6fdd95 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -423,3 +423,4 @@
433 o32 fspick sys_fspick
434 o32 fsinfo sys_fsinfo
435 o32 watch_devices sys_watch_devices
+436 o32 watch_mount sys_watch_mount
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index e3237dac3acb..11ae6854d49c 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -432,3 +432,4 @@
433 common fspick sys_fspick
434 common fsinfo sys_fsinfo
435 common watch_devices sys_watch_devices
+436 common watch_mount sys_watch_mount
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 11e9bcf7cc83..7bc79d837385 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -517,3 +517,4 @@
433 common fspick sys_fspick
434 common fsinfo sys_fsinfo
435 common watch_devices sys_watch_devices
+436 common watch_mount sys_watch_mount
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index e7daacbe2d68..e2f8785d1c4a 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -438,3 +438,4 @@
433 common fspick sys_fspick sys_fspick
434 common fsinfo sys_fsinfo sys_fsinfo
435 common watch_devices sys_watch_devices sys_watch_devices
+436 common watch_mount sys_watch_mount sys_watch_mount
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index 6ae830c9c13a..d94d71558742 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -438,3 +438,4 @@
433 common fspick sys_fspick
434 common fsinfo sys_fsinfo
435 common watch_devices sys_watch_devices
+436 common watch_mount sys_watch_mount
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 860b2bd72a48..9f7fa4f381cc 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -481,3 +481,4 @@
433 common fspick sys_fspick
434 common fsinfo sys_fsinfo
435 common watch_devices sys_watch_devices
+436 common watch_mount sys_watch_mount
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 9ee8a11a9148..ea34893de5b9 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -440,3 +440,4 @@
433 i386 fspick sys_fspick __ia32_sys_fspick
434 i386 fsinfo sys_fsinfo __ia32_sys_fsinfo
435 i386 watch_devices sys_watch_devices __ia32_sys_watch_devices
+436 i386 watch_mount sys_watch_mount __ia32_sys_watch_mount
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 344ffc3a98be..b6f3fdbee456 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -357,6 +357,7 @@
433 common fspick __x64_sys_fspick
434 common fsinfo __x64_sys_fsinfo
435 common watch_devices __x64_sys_watch_devices
+436 common watch_mount __x64_sys_watch_mount

#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 941dae94159b..570b23dc5582 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -406,3 +406,4 @@
433 common fspick sys_fspick
434 common fsinfo sys_fsinfo
435 common watch_devices sys_watch_devices
+436 common watch_mount sys_watch_mount
diff --git a/fs/Kconfig b/fs/Kconfig
index 9e7d2f2c0111..a26bbe27a791 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -121,6 +121,15 @@ source "fs/crypto/Kconfig"

source "fs/notify/Kconfig"

+config MOUNT_NOTIFICATIONS
+ bool "Mount topology change notifications"
+ select WATCH_QUEUE
+ help
+ This option provides support for getting change notifications on the
+ mount tree topology. This makes use of the /dev/watch_queue misc
+ device to handle the notification buffer and provides the
+ mount_notify() system call to enable/disable watchpoints.
+
source "fs/quota/Kconfig"

source "fs/autofs/Kconfig"
diff --git a/fs/Makefile b/fs/Makefile
index 26eaeae4b9a1..c6a71daf2464 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -131,3 +131,4 @@ obj-$(CONFIG_F2FS_FS) += f2fs/
obj-$(CONFIG_CEPH_FS) += ceph/
obj-$(CONFIG_PSTORE) += pstore/
obj-$(CONFIG_EFIVAR_FS) += efivarfs/
+obj-$(CONFIG_MOUNT_NOTIFICATIONS) += mount_notify.o
diff --git a/fs/mount.h b/fs/mount.h
index 65cb51f47c8c..4711e7d603a9 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -4,6 +4,7 @@
#include <linux/poll.h>
#include <linux/ns_common.h>
#include <linux/fs_pin.h>
+#include <linux/watch_queue.h>

struct mnt_namespace {
atomic_t count;
@@ -67,10 +68,14 @@ struct mount {
int mnt_id; /* mount identifier */
int mnt_group_id; /* peer group identifier */
int mnt_expiry_mark; /* true if marked for expiry */
+ int mnt_nr_watchers; /* The number of subtree watches tracking this */
struct hlist_head mnt_pins;
struct fs_pin mnt_umount;
struct dentry *mnt_ex_mountpoint;
atomic_t mnt_change_counter; /* Number of changed applied */
+#ifdef CONFIG_MOUNT_NOTIFICATIONS
+ struct watch_list *mnt_watchers; /* Watches on dentries within this mount */
+#endif
} __randomize_layout;

#define MNT_NS_INTERNAL ERR_PTR(-EINVAL) /* distinct from any mnt_namespace */
@@ -153,18 +158,8 @@ static inline bool is_anon_ns(struct mnt_namespace *ns)
return ns->seq == 0;
}

-/*
- * Type of mount topology change notification.
- */
-enum mount_notification_subtype {
- NOTIFY_MOUNT_NEW_MOUNT = 0, /* New mount added */
- NOTIFY_MOUNT_UNMOUNT = 1, /* Mount removed manually */
- NOTIFY_MOUNT_EXPIRY = 2, /* Automount expired */
- NOTIFY_MOUNT_READONLY = 3, /* Mount R/O state changed */
- NOTIFY_MOUNT_SETATTR = 4, /* Mount attributes changed */
- NOTIFY_MOUNT_MOVE_FROM = 5, /* Mount moved from here */
- NOTIFY_MOUNT_MOVE_TO = 6, /* Mount moved to here (compare op_id) */
-};
+extern void post_mount_notification(struct mount *changed,
+ struct mount_notification *notify);

static inline void notify_mount(struct mount *changed,
struct mount *aux,
@@ -172,4 +167,18 @@ static inline void notify_mount(struct mount *changed,
u32 info_flags)
{
atomic_inc(&changed->mnt_change_counter);
+
+#ifdef CONFIG_MOUNT_NOTIFICATIONS
+ {
+ struct mount_notification n = {
+ .watch.type = WATCH_TYPE_MOUNT_NOTIFY,
+ .watch.subtype = subtype,
+ .watch.info = info_flags | watch_sizeof(n),
+ .triggered_on = changed->mnt_id,
+ .changed_mount = aux ? aux->mnt_id : 0,
+ };
+
+ post_mount_notification(changed, &n);
+ }
+#endif
}
diff --git a/fs/mount_notify.c b/fs/mount_notify.c
new file mode 100644
index 000000000000..a8d6187c6262
--- /dev/null
+++ b/fs/mount_notify.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Provide mount topology/attribute change notifications.
+ *
+ * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/syscalls.h>
+#include <linux/slab.h>
+#include <linux/security.h>
+#include "mount.h"
+
+/*
+ * Post mount notifications to all watches going rootwards along the tree.
+ *
+ * Must be called with the mount_lock held.
+ */
+void post_mount_notification(struct mount *changed,
+ struct mount_notification *notify)
+{
+ const struct cred *cred = current_cred();
+ struct path cursor;
+ struct mount *mnt;
+ unsigned seq;
+
+ seq = 0;
+ rcu_read_lock();
+restart:
+ cursor.mnt = &changed->mnt;
+ cursor.dentry = changed->mnt.mnt_root;
+ mnt = real_mount(cursor.mnt);
+ notify->watch.info &= ~NOTIFY_MOUNT_IN_SUBTREE;
+
+ read_seqbegin_or_lock(&rename_lock, &seq);
+ for (;;) {
+ if (mnt->mnt_watchers &&
+ !hlist_empty(&mnt->mnt_watchers->watchers)) {
+ if (cursor.dentry->d_flags & DCACHE_MOUNT_WATCH)
+ post_watch_notification(mnt->mnt_watchers,
+ &notify->watch, cred,
+ (unsigned long)cursor.dentry);
+ } else {
+ cursor.dentry = mnt->mnt.mnt_root;
+ }
+ notify->watch.info |= NOTIFY_MOUNT_IN_SUBTREE;
+
+ if (cursor.dentry == cursor.mnt->mnt_root ||
+ IS_ROOT(cursor.dentry)) {
+ struct mount *parent = READ_ONCE(mnt->mnt_parent);
+
+ /* Escaped? */
+ if (cursor.dentry != cursor.mnt->mnt_root)
+ break;
+
+ /* Global root? */
+ if (mnt == parent)
+ break;
+
+ cursor.dentry = READ_ONCE(mnt->mnt_mountpoint);
+ mnt = parent;
+ cursor.mnt = &mnt->mnt;
+ } else {
+ cursor.dentry = cursor.dentry->d_parent;
+ }
+ }
+
+ if (need_seqretry(&rename_lock, seq)) {
+ seq = 1;
+ goto restart;
+ }
+
+ done_seqretry(&rename_lock, seq);
+ rcu_read_unlock();
+}
+
+static void release_mount_watch(struct watch *watch)
+{
+ struct dentry *dentry = (struct dentry *)(unsigned long)watch->id;
+
+ dput(dentry);
+}
+
+/**
+ * sys_watch_mount - Watch for mount topology/attribute changes
+ * @dfd: Base directory to pathwalk from or fd referring to mount.
+ * @filename: Path to mount to place the watch upon
+ * @at_flags: Pathwalk control flags
+ * @watch_fd: The watch queue to send notifications to.
+ * @watch_id: The watch ID to be placed in the notification (-1 to remove watch)
+ */
+SYSCALL_DEFINE5(watch_mount,
+ int, dfd,
+ const char __user *, filename,
+ unsigned int, at_flags,
+ int, watch_fd,
+ int, watch_id)
+{
+ struct watch_queue *wqueue;
+ struct watch_list *wlist = NULL;
+ struct watch *watch;
+ struct mount *m;
+ struct path path;
+ unsigned int lookup_flags =
+ LOOKUP_DIRECTORY | LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT;
+ int ret;
+
+ if (watch_id < -1 || watch_id > 0xff)
+ return -EINVAL;
+ if ((at_flags & ~(AT_NO_AUTOMOUNT | AT_EMPTY_PATH)) != 0)
+ return -EINVAL;
+ if (at_flags & AT_NO_AUTOMOUNT)
+ lookup_flags &= ~LOOKUP_AUTOMOUNT;
+ if (at_flags & AT_EMPTY_PATH)
+ lookup_flags |= LOOKUP_EMPTY;
+
+ ret = user_path_at(dfd, filename, lookup_flags, &path);
+ if (ret)
+ return ret;
+
+ ret = inode_permission(path.dentry->d_inode, MAY_EXEC);
+ if (ret)
+ goto err_path;
+
+ wqueue = get_watch_queue(watch_fd);
+ if (IS_ERR(wqueue))
+ goto err_path;
+
+ m = real_mount(path.mnt);
+
+ if (watch_id >= 0) {
+ ret = -ENOMEM;
+ if (!m->mnt_watchers) {
+ wlist = kzalloc(sizeof(*wlist), GFP_KERNEL);
+ if (!wlist)
+ goto err_wqueue;
+ init_watch_list(wlist, release_mount_watch);
+ }
+
+ watch = kzalloc(sizeof(*watch), GFP_KERNEL);
+ if (!watch)
+ goto err_wlist;
+
+ init_watch(watch, wqueue);
+ watch->id = (unsigned long)path.dentry;
+ watch->info_id = (u32)watch_id << 24;
+
+ ret = security_watch_mount(watch, &path);
+ if (ret < 0)
+ goto err_watch;
+
+ down_write(&m->mnt.mnt_sb->s_umount);
+ if (!m->mnt_watchers) {
+ m->mnt_watchers = wlist;
+ wlist = NULL;
+ }
+
+ ret = add_watch_to_object(watch, m->mnt_watchers);
+ if (ret == 0) {
+ spin_lock(&path.dentry->d_lock);
+ path.dentry->d_flags |= DCACHE_MOUNT_WATCH;
+ spin_unlock(&path.dentry->d_lock);
+ dget(path.dentry);
+ watch = NULL;
+ }
+ up_write(&m->mnt.mnt_sb->s_umount);
+ } else {
+ ret = -EBADSLT;
+ if (m->mnt_watchers) {
+ down_write(&m->mnt.mnt_sb->s_umount);
+ ret = remove_watch_from_object(m->mnt_watchers, wqueue,
+ (unsigned long)path.dentry,
+ false);
+ up_write(&m->mnt.mnt_sb->s_umount);
+ }
+ }
+
+err_watch:
+ kfree(watch);
+err_wlist:
+ kfree(wlist);
+err_wqueue:
+ put_watch_queue(wqueue);
+err_path:
+ path_put(&path);
+ return ret;
+}
diff --git a/fs/namespace.c b/fs/namespace.c
index 925602b8c329..71cbd192a306 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -183,6 +183,10 @@ unsigned int mnt_get_count(struct mount *mnt)
static void drop_mountpoint(struct fs_pin *p)
{
struct mount *m = container_of(p, struct mount, mnt_umount);
+#ifdef CONFIG_MOUNT_NOTIFICATIONS
+ if (m->mnt_watchers)
+ remove_watch_list(m->mnt_watchers);
+#endif
dput(m->mnt_ex_mountpoint);
pin_remove(p);
mntput(&m->mnt);
@@ -515,7 +519,8 @@ static int mnt_make_readonly(struct mount *mnt)
mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
unlock_mount_hash();
if (ret == 0)
- notify_mount(mnt, NULL, NOTIFY_MOUNT_READONLY, 0x10000);
+ notify_mount(mnt, NULL, NOTIFY_MOUNT_READONLY,
+ NOTIFY_MOUNT_IS_NOW_RO);
return ret;
}

@@ -2113,7 +2118,11 @@ static int attach_recursive_mnt(struct mount *source_mnt,
list_del_init(&source_mnt->mnt_ns->list);
}
mnt_set_mountpoint(dest_mnt, dest_mp, source_mnt);
- notify_mount(dest_mnt, source_mnt, NOTIFY_MOUNT_NEW_MOUNT, 0);
+ notify_mount(dest_mnt, source_mnt, NOTIFY_MOUNT_NEW_MOUNT,
+ (source_mnt->mnt.mnt_sb->s_flags & SB_RDONLY ?
+ NOTIFY_MOUNT_IS_NOW_RO : 0) |
+ (source_mnt->mnt.mnt_sb->s_flags & SB_SUBMOUNT ?
+ NOTIFY_MOUNT_IS_SUBMOUNT : 0));
commit_tree(source_mnt);
}

@@ -2490,7 +2499,8 @@ static void set_mount_attributes(struct mount *mnt, unsigned int mnt_flags)
mnt->mnt.mnt_flags = mnt_flags;
touch_mnt_namespace(mnt->mnt_ns);
unlock_mount_hash();
- notify_mount(mnt, NULL, NOTIFY_MOUNT_SETATTR, 0);
+ notify_mount(mnt, NULL, NOTIFY_MOUNT_SETATTR,
+ (mnt_flags & SB_RDONLY ? NOTIFY_MOUNT_IS_NOW_RO : 0));
}

/*
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index f14e587c5d5d..a9e5b0070d6d 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -217,6 +217,7 @@ struct dentry_operations {
#define DCACHE_PAR_LOOKUP 0x10000000 /* being looked up (with parent locked shared) */
#define DCACHE_DENTRY_CURSOR 0x20000000
#define DCACHE_NORCU 0x40000000 /* No RCU delay for freeing */
+#define DCACHE_MOUNT_WATCH 0x80000000 /* There's a mount watch here */

extern seqlock_t rename_lock;

diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 52cc2dd6d5aa..8b0ab1594a62 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1002,6 +1002,8 @@ asmlinkage long sys_fsinfo(int dfd, const char __user *pathname,
struct fsinfo_params __user *params,
void __user *buffer, size_t buf_size);
asmlinkage long sys_watch_devices(int watch_fd, int watch_id, unsigned int flags);
+asmlinkage long sys_watch_mount(int dfd, const char __user *path,
+ unsigned int at_flags, int watch_fd, int watch_id);

/*
* Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 8ed4e1c73f6a..85977cfa853d 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -848,9 +848,11 @@ __SYSCALL(__NR_fspick, sys_fspick)
__SYSCALL(__NR_fsinfo, sys_fsinfo)
#define __NR_watch_devices 435
__SYSCALL(__NR_watch_devices, sys_watch_devices)
+#define __NR_watch_mount 436
+__SYSCALL(__NR_watch_mount, sys_watch_mount)

#undef __NR_syscalls
-#define __NR_syscalls 436
+#define __NR_syscalls 437

/*
* 32 bit systems traditionally used different
diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h
index 7e695ac43104..1dce57287ded 100644
--- a/include/uapi/linux/watch_queue.h
+++ b/include/uapi/linux/watch_queue.h
@@ -13,7 +13,8 @@ enum watch_notification_type {
WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */
WATCH_TYPE_BLOCK_NOTIFY = 2, /* Block layer event notification */
WATCH_TYPE_USB_NOTIFY = 3, /* USB subsystem event notification */
- WATCH_TYPE___NR = 4
+ WATCH_TYPE_MOUNT_NOTIFY = 4, /* Mount topology change notification */
+ WATCH_TYPE___NR = 5
};

enum watch_meta_notification_subtype {
@@ -167,4 +168,33 @@ struct usb_notification {

#define USB_NOTIFICATION_MAX_NAME_LEN 63

+/*
+ * Type of mount topology change notification.
+ */
+enum mount_notification_subtype {
+ NOTIFY_MOUNT_NEW_MOUNT = 0, /* New mount added */
+ NOTIFY_MOUNT_UNMOUNT = 1, /* Mount removed manually */
+ NOTIFY_MOUNT_EXPIRY = 2, /* Automount expired */
+ NOTIFY_MOUNT_READONLY = 3, /* Mount R/O state changed */
+ NOTIFY_MOUNT_SETATTR = 4, /* Mount attributes changed */
+ NOTIFY_MOUNT_MOVE_FROM = 5, /* Mount moved from here */
+ NOTIFY_MOUNT_MOVE_TO = 6, /* Mount moved to here (compare op_id) */
+};
+
+#define NOTIFY_MOUNT_IN_SUBTREE WATCH_INFO_FLAG_0 /* Event not actually at watched dentry */
+#define NOTIFY_MOUNT_IS_RECURSIVE WATCH_INFO_FLAG_1 /* Change applied recursively */
+#define NOTIFY_MOUNT_IS_NOW_RO WATCH_INFO_FLAG_2 /* Mount changed to R/O */
+#define NOTIFY_MOUNT_IS_SUBMOUNT WATCH_INFO_FLAG_3 /* New mount is submount */
+
+/*
+ * Mount topology/configuration change notification record.
+ * - watch.type = WATCH_TYPE_MOUNT_NOTIFY
+ * - watch.subtype = enum mount_notification_subtype
+ */
+struct mount_notification {
+ struct watch_notification watch; /* WATCH_TYPE_MOUNT_NOTIFY */
+ __u32 triggered_on; /* The mount that the notify was on */
+ __u32 changed_mount; /* The mount that got changed */
+};
+
#endif /* _UAPI_LINUX_WATCH_QUEUE_H */
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 9d583aae405f..3755d0e5d748 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -53,6 +53,7 @@ COND_SYSCALL(io_uring_enter);
COND_SYSCALL(io_uring_register);
COND_SYSCALL(fsinfo);
COND_SYSCALL(watch_devices);
+COND_SYSCALL(watch_mount);

/* fs/xattr.c */


2019-06-28 15:52:42

by David Howells

[permalink] [raw]
Subject: [PATCH 5/6] fsinfo: Export superblock notification counter [ver #5]

Provide an fsinfo attribute to export the superblock notification counter
so that it can be polled in the case of a notification buffer overrun.
This is accessed with:

struct fsinfo_params params = {
.request = FSINFO_ATTR_SB_NOTIFICATIONS,
};

and returns a structure that looks like:

struct fsinfo_sb_notifications {
__u64 watch_id;
__u32 notify_counter;
__u32 __reserved[1];
};

Where watch_id is a number uniquely identifying the superblock in
notification records and notify_counter is incremented for each
superblock notification posted.

Signed-off-by: David Howells <[email protected]>
---

fs/fsinfo.c | 12 ++++++++++++
fs/super.c | 1 +
include/linux/fs.h | 1 +
include/uapi/linux/fsinfo.h | 10 ++++++++++
include/uapi/linux/watch_queue.h | 2 +-
samples/vfs/test-fsinfo.c | 13 +++++++++++++
6 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/fs/fsinfo.c b/fs/fsinfo.c
index 758d1cbf8eba..a328f659ecb3 100644
--- a/fs/fsinfo.c
+++ b/fs/fsinfo.c
@@ -321,6 +321,16 @@ void fsinfo_note_sb_params(struct fsinfo_kparams *params, unsigned int s_flags)
}
EXPORT_SYMBOL(fsinfo_note_sb_params);

+static int fsinfo_generic_sb_notifications(struct path *path,
+ struct fsinfo_sb_notifications *p)
+{
+ struct super_block *sb = path->dentry->d_sb;
+
+ p->watch_id = sb->s_unique_id;
+ p->notify_counter = atomic_read(&sb->s_notify_counter);
+ return sizeof(*p);
+}
+
static int fsinfo_generic_parameters(struct path *path,
struct fsinfo_kparams *params)
{
@@ -357,6 +367,7 @@ int generic_fsinfo(struct path *path, struct fsinfo_kparams *params)
case _genp(MOUNT_DEVNAME, mount_devname);
case _genp(MOUNT_CHILDREN, mount_children);
case _genp(MOUNT_SUBMOUNT, mount_submount);
+ case _gen(SB_NOTIFICATIONS, sb_notifications);
default:
return -EOPNOTSUPP;
}
@@ -645,6 +656,7 @@ static const struct fsinfo_attr_info fsinfo_buffer_info[FSINFO_ATTR__NR] = {
FSINFO_STRING (MOUNT_DEVNAME),
FSINFO_STRUCT_ARRAY (MOUNT_CHILDREN, mount_child),
FSINFO_STRING_N (MOUNT_SUBMOUNT),
+ FSINFO_STRUCT (SB_NOTIFICATIONS, sb_notifications),
};

/**
diff --git a/fs/super.c b/fs/super.c
index 9f631cd4f93b..b338d2c6aca4 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1840,6 +1840,7 @@ EXPORT_SYMBOL(thaw_super);
*/
void post_sb_notification(struct super_block *s, struct superblock_notification *n)
{
+ atomic_inc(&s->s_notify_counter);
post_watch_notification(s->s_watchers, &n->watch, current_cred(),
s->s_unique_id);
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 42adb7a391a9..25586732b127 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1535,6 +1535,7 @@ struct super_block {
#ifdef CONFIG_SB_NOTIFICATIONS
struct watch_list *s_watchers;
#endif
+ atomic_t s_notify_counter;
} __randomize_layout;

/* Helper functions so that in most cases filesystems will
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
index 401ad9625c11..b9b3026a40a1 100644
--- a/include/uapi/linux/fsinfo.h
+++ b/include/uapi/linux/fsinfo.h
@@ -39,6 +39,7 @@ enum fsinfo_attribute {
FSINFO_ATTR_MOUNT_DEVNAME = 21, /* Mount object device name (string) */
FSINFO_ATTR_MOUNT_CHILDREN = 22, /* Submount list (array) */
FSINFO_ATTR_MOUNT_SUBMOUNT = 23, /* Relative path of Nth submount (string) */
+ FSINFO_ATTR_SB_NOTIFICATIONS = 24, /* sb_notify() information */
FSINFO_ATTR__NR
};

@@ -316,4 +317,13 @@ struct fsinfo_mount_child {
__u32 change_counter; /* Number of changes applied to mount. */
};

+/*
+ * Information struct for fsinfo(FSINFO_ATTR_SB_NOTIFICATIONS).
+ */
+struct fsinfo_sb_notifications {
+ __u64 watch_id; /* Watch ID for superblock. */
+ __u32 notify_counter; /* Number of notifications. */
+ __u32 __reserved[1];
+};
+
#endif /* _UAPI_LINUX_FSINFO_H */
diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h
index c8f0adefd8de..11d1d24b83cb 100644
--- a/include/uapi/linux/watch_queue.h
+++ b/include/uapi/linux/watch_queue.h
@@ -217,7 +217,7 @@ enum superblock_notification_type {
*/
struct superblock_notification {
struct watch_notification watch; /* WATCH_TYPE_SB_NOTIFY */
- __u64 sb_id; /* 64-bit superblock ID [fsinfo_ids::f_sb_id] */
+ __u64 sb_id; /* 64-bit superblock ID [fsinfo_sb_notifications::watch_id] */
};

struct superblock_error_notification {
diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
index 28c9f3cd2c8c..6cac56bbfe4f 100644
--- a/samples/vfs/test-fsinfo.c
+++ b/samples/vfs/test-fsinfo.c
@@ -90,6 +90,7 @@ static const struct fsinfo_attr_info fsinfo_buffer_info[FSINFO_ATTR__NR] = {
FSINFO_STRING (MOUNT_DEVNAME, mount_devname),
FSINFO_STRUCT_ARRAY (MOUNT_CHILDREN, mount_child),
FSINFO_STRING_N (MOUNT_SUBMOUNT, mount_submount),
+ FSINFO_STRUCT (SB_NOTIFICATIONS, sb_notifications),
};

#define FSINFO_NAME(X,Y) [FSINFO_ATTR_##X] = #Y
@@ -118,6 +119,7 @@ static const char *fsinfo_attr_names[FSINFO_ATTR__NR] = {
FSINFO_NAME (MOUNT_DEVNAME, mount_devname),
FSINFO_NAME (MOUNT_CHILDREN, mount_children),
FSINFO_NAME (MOUNT_SUBMOUNT, mount_submount),
+ FSINFO_NAME (SB_NOTIFICATIONS, sb_notifications),
};

union reply {
@@ -133,6 +135,7 @@ union reply {
struct fsinfo_server_address srv_addr;
struct fsinfo_mount_info mount_info;
struct fsinfo_mount_child mount_children[1];
+ struct fsinfo_sb_notifications sb_notifications;
};

static void dump_hex(unsigned int *data, int from, int to)
@@ -384,6 +387,15 @@ static void dump_attr_MOUNT_CHILDREN(union reply *r, int size)
printf("\t[%u] %8x %8x\n", i++, f->mnt_id, f->change_counter);
}

+static void dump_attr_SB_NOTIFICATIONS(union reply *r, int size)
+{
+ struct fsinfo_sb_notifications *f = &r->sb_notifications;
+
+ printf("\n");
+ printf("\twatch_id: %llx\n", (unsigned long long)f->watch_id);
+ printf("\tnotifs : %llx\n", (unsigned long long)f->notify_counter);
+}
+
/*
*
*/
@@ -402,6 +414,7 @@ static const dumper_t fsinfo_attr_dumper[FSINFO_ATTR__NR] = {
FSINFO_DUMPER(SERVER_ADDRESS),
FSINFO_DUMPER(MOUNT_INFO),
FSINFO_DUMPER(MOUNT_CHILDREN),
+ FSINFO_DUMPER(SB_NOTIFICATIONS),
};

static void dump_fsinfo(enum fsinfo_attribute attr,

2019-07-01 03:21:33

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH 2/6] Adjust watch_queue documentation to mention mount and superblock watches. [ver #5]

Hi David,

On 6/28/19 8:50 AM, David Howells wrote:
> Signed-off-by: David Howells <[email protected]>
> ---
>
> Documentation/watch_queue.rst | 20 +++++++++++++++++++-
> drivers/misc/Kconfig | 5 +++--
> 2 files changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst
> index 4087a8e670a8..1bec2018d549 100644
> --- a/Documentation/watch_queue.rst
> +++ b/Documentation/watch_queue.rst
> @@ -13,6 +13,10 @@ receive notifications from the kernel. This can be used in conjunction with::
>
> * USB subsystem event notifications
>
> + * Mount topology change notifications
> +
> + * Superblock event notifications
> +
>
> The notifications buffers can be enabled by:
>
> @@ -324,6 +328,19 @@ Any particular buffer can be fed from multiple sources. Sources include:
> for buses and devices. Watchpoints of this type are set on the global
> device watch list.
>
> + * WATCH_TYPE_MOUNT_NOTIFY
> +
> + Notifications of this type indicate mount tree topology changes and mount
> + attribute changes. A watch can be set on a particular file or directory
> + and notifications from the path subtree rooted at that point will be
> + intercepted.
> +
> + * WATCH_TYPE_SB_NOTIFY
> +
> + Notifications of this type indicate superblock events, such as quota limits
> + being hit, I/O errors being produced or network server loss/reconnection.
> + Watches of this type are set directly on superblocks.
> +
>
> Event Filtering
> ===============
> @@ -365,7 +382,8 @@ Where:
> (watch.info & info_mask) == info_filter
>
> This could be used, for example, to ignore events that are not exactly on
> - the watched point in a mount tree.
> + the watched point in a mount tree by specifying NOTIFY_MOUNT_IN_SUBTREE
> + must be 0.

I'm having a little trouble parsing that sentence.
Could you clarify it or maybe rewrite/modify it?
Thanks.

>
> * ``subtype_filter`` is a bitmask indicating the subtypes that are of
> interest. Bit 0 of subtype_filter[0] corresponds to subtype 0, bit 1 to



--
~Randy

2019-07-01 08:53:29

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 2/6] Adjust watch_queue documentation to mention mount and superblock watches. [ver #5]

Randy Dunlap <[email protected]> wrote:

> I'm having a little trouble parsing that sentence.
> Could you clarify it or maybe rewrite/modify it?
> Thanks.

How about:

* ``info_filter`` and ``info_mask`` act as a filter on the info field of the
notification record. The notification is only written into the buffer if::

(watch.info & info_mask) == info_filter

This could be used, for example, to ignore events that are not exactly on
the watched point in a mount tree by specifying NOTIFY_MOUNT_IN_SUBTREE
must not be set, e.g.::

{
.type = WATCH_TYPE_MOUNT_NOTIFY,
.info_filter = 0,
.info_mask = NOTIFY_MOUNT_IN_SUBTREE,
.subtype_filter = ...,
}

as an event would be only permissible with this filter if::

(watch.info & NOTIFY_MOUNT_IN_SUBTREE) == 0

David

2019-07-01 15:36:24

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH 2/6] Adjust watch_queue documentation to mention mount and superblock watches. [ver #5]

On 7/1/19 1:52 AM, David Howells wrote:
> Randy Dunlap <[email protected]> wrote:
>
>> I'm having a little trouble parsing that sentence.
>> Could you clarify it or maybe rewrite/modify it?
>> Thanks.
>
> How about:
>
> * ``info_filter`` and ``info_mask`` act as a filter on the info field of the
> notification record. The notification is only written into the buffer if::
>
> (watch.info & info_mask) == info_filter
>
> This could be used, for example, to ignore events that are not exactly on
> the watched point in a mount tree by specifying NOTIFY_MOUNT_IN_SUBTREE
> must not be set, e.g.::
>
> {
> .type = WATCH_TYPE_MOUNT_NOTIFY,
> .info_filter = 0,
> .info_mask = NOTIFY_MOUNT_IN_SUBTREE,
> .subtype_filter = ...,
> }
>
> as an event would be only permissible with this filter if::
>
> (watch.info & NOTIFY_MOUNT_IN_SUBTREE) == 0
>
> David
>

Yes, better. Thanks.

--
~Randy

2019-07-12 20:13:39

by James Morris

[permalink] [raw]
Subject: Re: [PATCH 1/6] security: Add hooks to rule on setting a superblock or mount watch [ver #5]

On Fri, 28 Jun 2019, David Howells wrote:

> Add security hooks that will allow an LSM to rule on whether or not a watch
> may be set on a mount or on a superblock. More than one hook is required
> as the watches watch different types of object.
>
> Signed-off-by: David Howells <[email protected]>
> cc: Casey Schaufler <[email protected]>
> cc: Stephen Smalley <[email protected]>
> cc: [email protected]
> ---
>
> include/linux/lsm_hooks.h | 16 ++++++++++++++++
> include/linux/security.h | 10 ++++++++++
> security/security.c | 10 ++++++++++
> 3 files changed, 36 insertions(+)


Acked-by: James Morris <[email protected]>


--
James Morris
<[email protected]>