Hi Linus,
If you could consider pulling this - or would you prefer it to go through
Al? It adds a couple of VFS-related event sources for the general
notification mechanism:
(1) Mount topology events, such as mounting, unmounting, mount expiry,
mount reconfiguration.
(2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O
errors (not complete yet).
WHY
===
(1) Mount notifications.
This one is wanted to avoid repeated trawling of /proc/mounts or
similar to work out changes to the mount object attributes and mount
topology. I'm told that the proc file holding the namespace_sem is a
point of contention, especially as the process of generating the text
descriptions of the mounts/superblocks can be quite involved.
Whilst you can use poll() on /proc/mounts, it doesn't give you any
clues as to what changed. The notification generated here directly
indicates the mounts involved in any particular event and gives an
idea of what the change was.
This is combined with a new fsinfo() system call that allows, amongst
other things, the ability to retrieve in one go an { id,
change_counter } tuple from all the children of a specified mount,
allowing buffer overruns to be dealt with quickly.
This can be used by systemd to improve efficiency:
https://lore.kernel.org/linux-fsdevel/[email protected]/
And it's not just Red Hat that's potentially interested in this:
https://lore.kernel.org/linux-fsdevel/[email protected]/
Also, this can be used to improve management of containers by allowing
watches to be set in foreign mount namespaces, such as are in a
container.
(2) Superblock notifications.
This one is provided to allow systemd or the desktop to more easily
detect events such as I/O errors and EDQUOT/ENOSPC. This would be of
interest to Postgres:
https://lore.kernel.org/linux-fsdevel/[email protected]/
But could also be used to indicate to systemd when a superblock has
had its configuration changed.
Thanks,
David
---
The following changes since commit 694435dbde3d1da79aafaf4cd680802f9eb229b7:
smack: Implement the watch_key and post_notification hooks (2020-03-19 17:31:09 +0000)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/notifications-fs-20200330
for you to fetch changes up to 8dbf1aa122da5bbb4ede0f363a8a18dfc723be33:
watch_queue: sample: Display superblock notifications (2020-03-19 17:31:09 +0000)
----------------------------------------------------------------
Filesystem notifications
----------------------------------------------------------------
David Howells (6):
watch_queue: Add security hooks to rule on setting mount and sb watches
watch_queue: Implement mount topology and attribute change notifications
watch_queue: sample: Display mount tree change notifications
watch_queue: Introduce a non-repeating system-unique superblock ID
watch_queue: Add superblock notifications
watch_queue: sample: Display superblock notifications
Documentation/watch_queue.rst | 24 ++-
arch/alpha/kernel/syscalls/syscall.tbl | 2 +
arch/arm/tools/syscall.tbl | 2 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 4 +
arch/ia64/kernel/syscalls/syscall.tbl | 2 +
arch/m68k/kernel/syscalls/syscall.tbl | 2 +
arch/microblaze/kernel/syscalls/syscall.tbl | 2 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 2 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 2 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 2 +
arch/parisc/kernel/syscalls/syscall.tbl | 2 +
arch/powerpc/kernel/syscalls/syscall.tbl | 2 +
arch/s390/kernel/syscalls/syscall.tbl | 2 +
arch/sh/kernel/syscalls/syscall.tbl | 2 +
arch/sparc/kernel/syscalls/syscall.tbl | 2 +
arch/x86/entry/syscalls/syscall_32.tbl | 2 +
arch/x86/entry/syscalls/syscall_64.tbl | 2 +
arch/xtensa/kernel/syscalls/syscall.tbl | 2 +
fs/Kconfig | 21 +++
fs/Makefile | 1 +
fs/internal.h | 1 +
fs/mount.h | 21 +++
fs/mount_notify.c | 228 ++++++++++++++++++++++++++++
fs/namespace.c | 22 +++
fs/super.c | 205 +++++++++++++++++++++++++
include/linux/dcache.h | 1 +
include/linux/fs.h | 62 ++++++++
include/linux/lsm_hooks.h | 24 +++
include/linux/security.h | 16 ++
include/linux/syscalls.h | 4 +
include/uapi/asm-generic/unistd.h | 6 +-
include/uapi/linux/watch_queue.h | 65 +++++++-
kernel/sys_ni.c | 6 +
samples/watch_queue/watch_test.c | 81 +++++++++-
security/security.c | 14 ++
36 files changed, 835 insertions(+), 5 deletions(-)
create mode 100644 fs/mount_notify.c
On Mon, Mar 30, 2020 at 7:37 AM David Howells <[email protected]> wrote:
>
> If you could consider pulling this - or would you prefer it to go through
> Al? It adds a couple of VFS-related event sources for the general
> notification mechanism:
<y issue with these remains the same it was last time, so I'll just
quote what I said back then:
"So I no longer hate the implementation, but I do want to see the
actual user space users come out of the woodwork and try this out for
their use cases.
I'd hate to see a new event queue interface that people then can't
really use due to it not fulfilling their needs, or can't use for some
other reason."
I want to see somebody step up enough to say "yes, I actually use
this, and have the patches for the user space side, and it helps my
load by 3000%, and here are the numbers, and the event overflow case
isn't an issue because Y"
Or whatever. It doesn't have to be performance, but the separate
discussion I've seen has been about that being the reason for it.
I just don't want it to be a _hypothetical_ reason. I want it to be a
tested reason where people said "yeah, this is easy to use and
actually fixes the problems".
Because if what happens is that when the events overflow, and maybe
people fall back on the old model (or whatever) then that probably
just means that you do better up until a point where you start doing
_worse_ than we used to.
Or people find out that they needed more information anyway, and the
event model doesn't work when you restart your special server because
you've lost the original state. Or any other number of "cool feature,
but I can't really use it".
IOW, I really want to know that yes, the design is what people will
then use and it actually fixes real-world issues.
And it needs to be interesting and pressing enough that those people
actually at least do a working prototype on top of a patch-set that
hasn't made it into the kernel yet.
Now, I realize that other projects won't _upstream_ their support
before the kernel has the infrastructure, so I'm not looking for
_that_ kind of "yeah, look, project XYZ already does this and Red Hat
ships it". No, I'm looking for those outside developers who say more
than "this is a pet peeve of mine with the existing interface". I want
to see some actual use - even if it's just in a development
environment - that shows that it's (a) sufficient and (b) actually
fixes problems.
Linus
Hi,
On 2020-04-04 14:13:03 -0700, Linus Torvalds wrote:
> And it needs to be interesting and pressing enough that those people
> actually at least do a working prototype on top of a patch-set that
> hasn't made it into the kernel yet.
>
> Now, I realize that other projects won't _upstream_ their support
> before the kernel has the infrastructure, so I'm not looking for
> _that_ kind of "yeah, look, project XYZ already does this and Red Hat
> ships it". No, I'm looking for those outside developers who say more
> than "this is a pet peeve of mine with the existing interface". I want
> to see some actual use - even if it's just in a development
> environment - that shows that it's (a) sufficient and (b) actually
> fixes problems.
FWIW, postgres remains interested in using the per-superblock events.
On 2020-03-30 15:36:54 +0100, David Howells wrote:
> (2) Superblock notifications.
>
> This one is provided to allow systemd or the desktop to more easily
> detect events such as I/O errors and EDQUOT/ENOSPC. This would be of
> interest to Postgres:
>
> https://lore.kernel.org/linux-fsdevel/[email protected]/
>
> But could also be used to indicate to systemd when a superblock has
> had its configuration changed.
What prevents me from coming up with a prototype is that the error
handling pieces aren't complete, as far as I can tell:
On 2020-03-30 15:36:54 +0100, David Howells wrote:
> (2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O
> errors (not complete yet).
There's afaict no notify_sb_error() callers, making it hard for me to
actually test anything.
The important issue for us is I/O errors, but EDQUOT/ENOSPC could also
be useful (but is not urgent).
Greetings,
Andres Freund