2022-06-28 10:46:19

by Guowei Du

[permalink] [raw]
Subject: [PATCH 6/6] fanotify: add current_user_instances node

From: duguowei <[email protected]>

Add a node of sysctl, which is current_user_instances.
It shows current initialized group counts of system.

Signed-off-by: duguowei <[email protected]>
---
fs/notify/fanotify/fanotify_user.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index c2255b440df9..39674fbffc4f 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -51,6 +51,8 @@

/* configurable via /proc/sys/fs/fanotify/ */
static int fanotify_max_queued_events __read_mostly;
+/* current initialized group count */
+static int fanotify_user_instances __read_mostly;

#ifdef CONFIG_SYSCTL

@@ -86,6 +88,14 @@ static struct ctl_table fanotify_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO
},
+ {
+ .procname = "current_user_instances",
+ .data = &fanotify_user_instances,
+ .maxlen = sizeof(int),
+ .mode = 0444,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO
+ },
{ }
};

@@ -905,6 +915,8 @@ static int fanotify_release(struct inode *ignored, struct file *file)
/* matches the fanotify_init->fsnotify_alloc_group */
fsnotify_destroy_group(group);

+ fanotify_user_instances--;
+
return 0;
}

@@ -1459,6 +1471,8 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
if (fd < 0)
goto out_destroy_group;

+ fanotify_user_instances++;
+
return fd;

out_destroy_group:
--
2.36.1


2022-06-28 11:04:04

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 6/6] fanotify: add current_user_instances node

On Tue 28-06-22 18:14:13, Guowei Du wrote:
> From: duguowei <[email protected]>
>
> Add a node of sysctl, which is current_user_instances.
> It shows current initialized group counts of system.
>
> Signed-off-by: duguowei <[email protected]>

Hum, I'm not sure about a wider context here but the changelog is certainly
missing a motivation of this change - why do you need this counter? In
particular because we already do maintain (and limit) the number of
fanotify groups each user has allocated in a particular namespace...

Honza

> ---
> fs/notify/fanotify/fanotify_user.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index c2255b440df9..39674fbffc4f 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -51,6 +51,8 @@
>
> /* configurable via /proc/sys/fs/fanotify/ */
> static int fanotify_max_queued_events __read_mostly;
> +/* current initialized group count */
> +static int fanotify_user_instances __read_mostly;
>
> #ifdef CONFIG_SYSCTL
>
> @@ -86,6 +88,14 @@ static struct ctl_table fanotify_table[] = {
> .proc_handler = proc_dointvec_minmax,
> .extra1 = SYSCTL_ZERO
> },
> + {
> + .procname = "current_user_instances",
> + .data = &fanotify_user_instances,
> + .maxlen = sizeof(int),
> + .mode = 0444,
> + .proc_handler = proc_dointvec_minmax,
> + .extra1 = SYSCTL_ZERO
> + },
> { }
> };
>
> @@ -905,6 +915,8 @@ static int fanotify_release(struct inode *ignored, struct file *file)
> /* matches the fanotify_init->fsnotify_alloc_group */
> fsnotify_destroy_group(group);
>
> + fanotify_user_instances--;
> +
> return 0;
> }
>
> @@ -1459,6 +1471,8 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
> if (fd < 0)
> goto out_destroy_group;
>
> + fanotify_user_instances++;
> +
> return fd;
>
> out_destroy_group:
> --
> 2.36.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2022-06-28 11:16:39

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH 6/6] fanotify: add current_user_instances node

On Tue, Jun 28, 2022 at 12:45:28PM +0200, Jan Kara wrote:
> On Tue 28-06-22 18:14:13, Guowei Du wrote:
> > From: duguowei <[email protected]>
> >
> > Add a node of sysctl, which is current_user_instances.
> > It shows current initialized group counts of system.
> >
> > Signed-off-by: duguowei <[email protected]>
>
> Hum, I'm not sure about a wider context here but the changelog is certainly
> missing a motivation of this change - why do you need this counter? In
> particular because we already do maintain (and limit) the number of
> fanotify groups each user has allocated in a particular namespace...

Yeah, that's pretty strange as there's
/proc/sys/user/max_fanotify_groups
/proc/sys/user/max_fanotify_marks
it could be to have a ro counter that allows to display the current
number of groups? But that seems strange as we don't expose that
information anywhere for similar things. What would this be used for
even?

2022-06-28 12:31:11

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 6/6] fanotify: add current_user_instances node

On Tue, Jun 28, 2022 at 2:50 PM guowei du <[email protected]> wrote:
>
> hi, Mr Kara, Mr Brauner,
>
> I want to know how many fanotify readers are monitoring the fs event.
> If userspace daemons monitoring all file system events are too many, maybe there will be an impact on performance.
>

I want something else which is more than just the number of groups.

I want to provide the admin the option to enumerate over all groups and
list their marks and blocked events.

This would be similar to listing all the fdinfo of anon_inode:[fanotify] fds
of processes that initialised fanotify groups.

This enumeration could be done for example in /sys/fs/fanotify/groups/

My main incentive is not only the enumeration.
My main incentive is to provide an administrative interface to
check for any fs operations that are currently blocked by a rogue
fanotify permission events reader and an easy way for administrators
to kill those rogue processes (i.e. buggy anti-malware).

This interface is inspired by the ability to enumerate and abort
fuse connections for rogue fuse servers.

I want to do that for the existing permission events as a prerequisite
to adding new blocking events to be used for implementation of
hierarchical storage managers, similar the Windows ProjFs [1].
This was allegedly the intended use case for group class
FAN_CLASS_PRE_CONTENT (see man page).

Do you want to implement the first step of enumerating fdinfo
of all groups via /sys/fs/fanotify/groups/?

Jan,

If you have objections to any of the ideas above please shout.
I was going to prepare a roadmap for blocking events and post it
for comments, but this patch triggered a heads up.

Thanks,
Amir.

[1] https://docs.microsoft.com/en-us/windows/win32/projfs/projected-file-system

2022-06-28 13:49:55

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 6/6] fanotify: add current_user_instances node

On Tue 28-06-22 15:29:08, Amir Goldstein wrote:
> On Tue, Jun 28, 2022 at 2:50 PM guowei du <[email protected]> wrote:
> >
> > hi, Mr Kara, Mr Brauner,
> >
> > I want to know how many fanotify readers are monitoring the fs event.
> > If userspace daemons monitoring all file system events are too many, maybe there will be an impact on performance.
>
> I want something else which is more than just the number of groups.
>
> I want to provide the admin the option to enumerate over all groups and
> list their marks and blocked events.

Listing all groups and marks makes sense to me. Often enough I was
extracting this information from a crashdump :).

Dumping of events may be a bit more challenging (especially as we'd need to
format the events which has some non-trivial implications) so I'm not 100%
convinced about that. I agree it might be useful but I'd have to see the
implementation...

> This would be similar to listing all the fdinfo of anon_inode:[fanotify] fds
> of processes that initialised fanotify groups.
>
> This enumeration could be done for example in /sys/fs/fanotify/groups/
>
> My main incentive is not only the enumeration.
> My main incentive is to provide an administrative interface to
> check for any fs operations that are currently blocked by a rogue
> fanotify permission events reader and an easy way for administrators
> to kill those rogue processes (i.e. buggy anti-malware).
>
> This interface is inspired by the ability to enumerate and abort
> fuse connections for rogue fuse servers.
>
> I want to do that for the existing permission events as a prerequisite
> to adding new blocking events to be used for implementation of
> hierarchical storage managers, similar the Windows ProjFs [1].
> This was allegedly the intended use case for group class
> FAN_CLASS_PRE_CONTENT (see man page).

Yes, that was the original intent of FAN_CLASS_PRE_CONTENT AFAIK.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2022-06-28 14:20:00

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 6/6] fanotify: add current_user_instances node

On Tue, Jun 28, 2022 at 3:56 PM Jan Kara <[email protected]> wrote:
>
> On Tue 28-06-22 15:29:08, Amir Goldstein wrote:
> > On Tue, Jun 28, 2022 at 2:50 PM guowei du <[email protected]> wrote:
> > >
> > > hi, Mr Kara, Mr Brauner,
> > >
> > > I want to know how many fanotify readers are monitoring the fs event.
> > > If userspace daemons monitoring all file system events are too many, maybe there will be an impact on performance.
> >
> > I want something else which is more than just the number of groups.
> >
> > I want to provide the admin the option to enumerate over all groups and
> > list their marks and blocked events.
>
> Listing all groups and marks makes sense to me. Often enough I was
> extracting this information from a crashdump :).
>
> Dumping of events may be a bit more challenging (especially as we'd need to
> format the events which has some non-trivial implications) so I'm not 100%
> convinced about that. I agree it might be useful but I'd have to see the
> implementation...
>

I don't really care about the events.
I would like to list the tasks that are blocked on permission events
and the fanotify reader process that blocks them, so that it could be killed.

Technically, it is enough to list the blocked task pids in fanotify_fdinfo().
But it is also low hanging to print the number of queued events
in fanotify_fdinfo() and inotify_fdinfo().

Thanks,
Amir.

2022-06-28 14:39:11

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH 6/6] fanotify: add current_user_instances node

On Tue, Jun 28, 2022 at 04:55:25PM +0300, Amir Goldstein wrote:
> On Tue, Jun 28, 2022 at 3:56 PM Jan Kara <[email protected]> wrote:
> >
> > On Tue 28-06-22 15:29:08, Amir Goldstein wrote:
> > > On Tue, Jun 28, 2022 at 2:50 PM guowei du <[email protected]> wrote:
> > > >
> > > > hi, Mr Kara, Mr Brauner,
> > > >
> > > > I want to know how many fanotify readers are monitoring the fs event.
> > > > If userspace daemons monitoring all file system events are too many, maybe there will be an impact on performance.
> > >
> > > I want something else which is more than just the number of groups.
> > >
> > > I want to provide the admin the option to enumerate over all groups and
> > > list their marks and blocked events.
> >
> > Listing all groups and marks makes sense to me. Often enough I was
> > extracting this information from a crashdump :).
> >
> > Dumping of events may be a bit more challenging (especially as we'd need to
> > format the events which has some non-trivial implications) so I'm not 100%
> > convinced about that. I agree it might be useful but I'd have to see the
> > implementation...
> >
>
> I don't really care about the events.
> I would like to list the tasks that are blocked on permission events
> and the fanotify reader process that blocks them, so that it could be killed.
>
> Technically, it is enough to list the blocked task pids in fanotify_fdinfo().
> But it is also low hanging to print the number of queued events
> in fanotify_fdinfo() and inotify_fdinfo().

That's always going to be racy, right? You might list the blocked tasks
but it's impossible for userspace to ensure that the pids it parses
still refer to the same processes by the time it tries to kill them.

You would need an interface that allows you to kill specific blocked
tasks or at least all blocked tasks. You could just make this an - ahem
- ioctl on a suitable fanotify fd and somehow ensure that the task is
actually the one you want to kill?

If you can avoid adding a whole new /sys/kernel/fanotify/ interface
that'd be quite nice for userspace, I think.

2022-06-28 16:03:10

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 6/6] fanotify: add current_user_instances node

On Tue, Jun 28, 2022 at 5:25 PM Christian Brauner <[email protected]> wrote:
>
> On Tue, Jun 28, 2022 at 04:55:25PM +0300, Amir Goldstein wrote:
> > On Tue, Jun 28, 2022 at 3:56 PM Jan Kara <[email protected]> wrote:
> > >
> > > On Tue 28-06-22 15:29:08, Amir Goldstein wrote:
> > > > On Tue, Jun 28, 2022 at 2:50 PM guowei du <[email protected]> wrote:
> > > > >
> > > > > hi, Mr Kara, Mr Brauner,
> > > > >
> > > > > I want to know how many fanotify readers are monitoring the fs event.
> > > > > If userspace daemons monitoring all file system events are too many, maybe there will be an impact on performance.
> > > >
> > > > I want something else which is more than just the number of groups.
> > > >
> > > > I want to provide the admin the option to enumerate over all groups and
> > > > list their marks and blocked events.
> > >
> > > Listing all groups and marks makes sense to me. Often enough I was
> > > extracting this information from a crashdump :).
> > >
> > > Dumping of events may be a bit more challenging (especially as we'd need to
> > > format the events which has some non-trivial implications) so I'm not 100%
> > > convinced about that. I agree it might be useful but I'd have to see the
> > > implementation...
> > >
> >
> > I don't really care about the events.
> > I would like to list the tasks that are blocked on permission events
> > and the fanotify reader process that blocks them, so that it could be killed.
> >
> > Technically, it is enough to list the blocked task pids in fanotify_fdinfo().
> > But it is also low hanging to print the number of queued events
> > in fanotify_fdinfo() and inotify_fdinfo().
>
> That's always going to be racy, right? You might list the blocked tasks
> but it's impossible for userspace to ensure that the pids it parses
> still refer to the same processes by the time it tries to kill them.
>
> You would need an interface that allows you to kill specific blocked
> tasks or at least all blocked tasks. You could just make this an - ahem
> - ioctl on a suitable fanotify fd and somehow ensure that the task is
> actually the one you want to kill?

I don't want to kill the blocked tasks
I want to kill the permission event reader process that is blocking them
or abort the blocking group without terminating the process in some
technique similar to fuse connection abort.

It is an emergency button for admin when all users get blocked
from accessing files.

The problem with mandatory locks IMO was not the fact that they
could be used to DoS other users, but the fact that there was no
escape door for admin override.

Windows servers have mandatory file locks, but they also have
an escape door for admin override:
https://www.technipages.com/windows-how-to-release-file-lock.

fanotify could be used to DoS users and admin has no
good tools to cope with that now.

>
> If you can avoid adding a whole new /sys/kernel/fanotify/ interface
> that'd be quite nice for userspace, I think.

On the contrary. I think that user will like enumerating the groups
in /sys/kernel/fanotify/ better then enumerating all fds of all procs
looking for fanotify fds - the lsof method is not efficient and not
scalable when you have many thousands of tasks and just one blocker.

w.r.t races, it is possible that /sys/kernel/fanotify/ could be used
to acquire some sort of fanotify fd clones that can only be used for
ioctls and not for read/write.

An ioctl can return the number of blocked tasks and possibly
their pidfd's for further inspection.

And of course an ABORT or SHUTDOWN ioctl to cancel all
blocked permission events and stop queueing events.

The same fd clone could also be acquired by opening
/proc/<fanotify_proc>/fd/<fanotify_fd>
to perform ABORT in case killing the process does not
work because the process itself is blocked on IO.

Current fanotify is not immune against this sort of deadlocks
similar deadlocks are described in FUSE documentation
in the section explaining about connection abort:
https://www.kernel.org/doc/html/latest/filesystems/fuse.html#aborting-a-filesystem-connection

Thanks,
Amir.