2017-03-23 11:56:57

by Jeff Layton

[permalink] [raw]
Subject: Re: fanotify read returns with errno == EOPENSTALE

On Thu, 2017-03-23 at 07:46 -0400, Amir Goldstein wrote:
> On Thu, Mar 23, 2017 at 4:13 AM, Marko Rauhamaa
> <[email protected]> wrote:
> > Amir Goldstein <[email protected]>:
> >
> > > On Wed, Mar 22, 2017 at 3:31 PM, Al Viro <[email protected]> wrote:
> > > > On Wed, Mar 22, 2017 at 02:20:15PM -0400, Amir Goldstein wrote:
> > > >
> > > > > Well, the behavior was changed in kernel 4.7 (and stable kernels) by
> > > > > commit by Al Viro:
> > > > > fac7d19 fix EOPENSTALE bug in do_last()
> > > > >
> > > > > Since that commit userspace will be able to see this error in
> > > > > fanotify events.
> > > >
> > > > Unless *notify somehow uses do_last() directly, that commit should
> > > > have no effect on it (and it definitely has no effect on
> > > > dentry_open() callers)...
> > >
> > > Right. I'm being silly :/
> > >
> > > Back to Redhat I guess...
> >
> > I will gladly take the issue to RedHat. However, the discussion so far
> > confuses me a bit. To confirm, is there a consensus here that EOPENSTALE
> > should never leak to userspace (through fanotify read anyway)?
> >
> > If EOPENSTALE *is* a valid possible return from fanotify read, this is
> > my bug and not RedHat's. In that case, what is the correct recovery?
> >
> > As for reproduction, I don't yet have one. At the moment, I just need an
> > authoritative user-space API clarification.
> >
>
> There is a consensus that the commit I pointed to has nothing do to with
> this alleged error leak.
>
> It certainly looks like it wasn't the intention that this error will end up in
> userspace, but I can't say that it is bad that it got there in this case.
> The error is quite useful for you to understand what happened.
>
> On a second look, I think that beyond pointing to an irrelevant commit
> my analysis still looks correct correct. I think upstream kernel *can* deliver
> this error on fanotify event and all those callers I mentioned that call
> dentry_open() directly without checking EOPENSTALE later on.
>
> IIUC, the only way to get out of a stale dentry situation is that some thread
> will lookup that path and revalidate the dentry.
>
> But if file has become stale between the time that the event was queued
> and the time the event is read and no process has tried looking it up since
> then the event read will have -EOPENSTALE for metadata->fd.
>
> It's probably much harder to hit this in other cases I mentioned, but
> seems quite plausible with fanotify because events are often read some
> time after they happened.
>
> With overlayfs readdir for example, lower dir will be revalidated on
> open of overlay dir, so process would have to wait some time
> before open and readdir to make this case likely.
>
> I have been wrong once. Could be wrong again.
> Anyone?

It was definitely not the intention to leak this error code to
userland. EOPENSTALE is not a POSIX sanctioned error code, so
applications generally don't know anything about it and will be
confused.

I haven't looked closely at this particular problem, but IIRC we
usually just translate EOPENSTALE to ESTALE, and that may be all that
needs to be done here. If this happened in the RHEL kernel, then please
do open a bug with Red Hat and we'll get it straightened out.

That said, you should take heed that all of the [fa|i|d]notify APIs do
not extend beyond the local machine when you use them on network
filesystems. If you're expecting to get notification of events that are
occurring on other clients, you're going to be disappointed here.
--
Jeff Layton <[email protected]>


2017-03-23 13:16:28

by Marko Rauhamaa

[permalink] [raw]
Subject: Re: fanotify read returns with errno == EOPENSTALE

Jeff Layton <[email protected]>:

> It was definitely not the intention to leak this error code to
> userland. EOPENSTALE is not a POSIX sanctioned error code, so
> applications generally don't know anything about it and will be
> confused.

Got it. I will try to work on a reproduction and make a proper bug
report.

> I haven't looked closely at this particular problem, but IIRC we
> usually just translate EOPENSTALE to ESTALE, and that may be all that
> needs to be done here. If this happened in the RHEL kernel, then
> please do open a bug with Red Hat and we'll get it straightened out.

ESTALE has not been mentioned as a possible error code from an fanotify
read. Most importantly, since read fails, I suppose there is no recovery
but you must close the fanotify fd and call fanotify_init() again. Or
should I just ignore it and read on? If so, why bother returning the
error from the kernel in the first place?

> That said, you should take heed that all of the [fa|i|d]notify APIs do
> not extend beyond the local machine when you use them on network
> filesystems. If you're expecting to get notification of events that
> are occurring on other clients, you're going to be disappointed here.

That certainly is disappointing. However, there is a certain level of
coherency one would expect, namely:

* An NFS4 client opening a file should be subject to an OPEN_PERM check
on that client (if the client is monitoring the mount point).

* An NFS4 client opening a file should be subject to an OPEN_PERM check
on the server (if the server is monitoring the mount point).

* An fanotify read should not fail mysteriously. Rather, a read on
metadata->fd should be the one failing.


Marko

2017-03-23 13:47:09

by Amir Goldstein

[permalink] [raw]
Subject: Re: fanotify read returns with errno == EOPENSTALE

On Thu, Mar 23, 2017 at 8:43 AM, Marko Rauhamaa
<[email protected]> wrote:
> Jeff Layton <[email protected]>:
>
>> It was definitely not the intention to leak this error code to
>> userland. EOPENSTALE is not a POSIX sanctioned error code, so
>> applications generally don't know anything about it and will be
>> confused.
>
> Got it. I will try to work on a reproduction and make a proper bug
> report.
>

Try this:

- watch a single file for permissions events (so you will only have
one event in the queue)
- open file from client to generate single event (don't read event yet)
- remove file from server (to make it stale)
- read event (with stale file)

>> I haven't looked closely at this particular problem, but IIRC we
>> usually just translate EOPENSTALE to ESTALE, and that may be all that
>> needs to be done here. If this happened in the RHEL kernel, then
>> please do open a bug with Red Hat and we'll get it straightened out.
>
> ESTALE has not been mentioned as a possible error code from an fanotify
> read. Most importantly, since read fails, I suppose there is no recovery
> but you must close the fanotify fd and call fanotify_init() again. Or
> should I just ignore it and read on? If so, why bother returning the
> error from the kernel in the first place?
>

Oh my. I completely misread your report before.
I though you were trying to read from the event->fd.
Now I understand that you mean read from fanotify fd.
That will definitely return the error, but only in the special case
where open error
happened on the first event being read to the buffer.
If error happens after adding some events to the buffer, fanotify
process will not know
about this. Regular event will be silently dropped and permission event will be
denied.

2017-04-19 13:46:17

by Marko Rauhamaa

[permalink] [raw]
Subject: Re: fanotify read returns with errno == EOPENSTALE

Amir Goldstein <[email protected]>:

> On Thu, Mar 23, 2017 at 8:43 AM, Marko Rauhamaa
> <[email protected]> wrote:
>> Jeff Layton <[email protected]>:
>>
>>> It was definitely not the intention to leak this error code to
>>> userland. EOPENSTALE is not a POSIX sanctioned error code, so
>>> applications generally don't know anything about it and will be
>>> confused.
>>
>> Got it. I will try to work on a reproduction and make a proper bug
>> report.
>
> Try this:
>
> - watch a single file for permissions events (so you will only have
> one event in the queue)
> - open file from client to generate single event (don't read event yet)
> - remove file from server (to make it stale)
> - read event (with stale file)

I did that and reproduced the problem on a recent development kernel.
Happens every time.

Just take the example program listed under "man fanotify" ("fantest")
and follow these steps:

==============================================================
NFS Server NFS Client(1) NFS Client(2)
==============================================================
# echo foo >/nfsshare/bar.txt
# cat /nfsshare/bar.txt
foo
# ./fantest /nfsshare
Press enter key to terminate.
Listening for events.
# rm -f /nfsshare/bar.txt
# cat /nfsshare/bar.txt
read: Unknown error 518
cat: /nfsshare/bar.txt: Operation not permitted
==============================================================

where NFS Client (1) and (2) are two terminal sessions on a single NFS
Client machine.

So what do we conclude? Is this a kernel bug or works as designed?

> Oh my. I completely misread your report before. I though you were
> trying to read from the event->fd. Now I understand that you mean read
> from fanotify fd. That will definitely return the error, but only in
> the special case where open error happened on the first event being
> read to the buffer. If error happens after adding some events to the
> buffer, fanotify process will not know about this. Regular event will
> be silently dropped and permission event will be denied.
>
> [...]
>
> You do NOT need to call fanotify_init() again, the next read will read
> the next event.

It does appear that reading the fanotify fd again does the trick.

However, the client gets an EPERM instead of ENOENT, which is a bit
weird.

> The fix seems trivial and I can post it once you have the test:
> - return EAGAIN for read in case of a single event in queue without fd
> so apps getting the read error will have a good idea what to do
> - in case of non single event, maybe copy event with error on event->fd
> to the buffer for specific errors that make sense to report (EMFILE)
> so a watcher checks the values of negative event->fd can maybe do
> something about it (e.g. provide a smaller buffer).

EAGAIN would be perfect for me since I'm using fanotify in a nonblocking
mode. It might be a bit surprising in the blocking case.


Marko

--
+358 44 990 4795
Skype: marko.rauhamaa_f-secure

2017-04-20 11:06:13

by Amir Goldstein

[permalink] [raw]
Subject: Re: fanotify read returns with errno == EOPENSTALE

On Wed, Apr 19, 2017 at 4:46 PM, Marko Rauhamaa
<[email protected]> wrote:
> Amir Goldstein <[email protected]>:
>
>> On Thu, Mar 23, 2017 at 8:43 AM, Marko Rauhamaa
>> <[email protected]> wrote:
>>> Jeff Layton <[email protected]>:
>>>
>>>> It was definitely not the intention to leak this error code to
>>>> userland. EOPENSTALE is not a POSIX sanctioned error code, so
>>>> applications generally don't know anything about it and will be
>>>> confused.
>>>
>>> Got it. I will try to work on a reproduction and make a proper bug
>>> report.
>>
>> Try this:
>>
>> - watch a single file for permissions events (so you will only have
>> one event in the queue)
>> - open file from client to generate single event (don't read event yet)
>> - remove file from server (to make it stale)
>> - read event (with stale file)
>
> I did that and reproduced the problem on a recent development kernel.
> Happens every time.
>
> Just take the example program listed under "man fanotify" ("fantest")
> and follow these steps:
>
> ==============================================================
> NFS Server NFS Client(1) NFS Client(2)
> ==============================================================
> # echo foo >/nfsshare/bar.txt
> # cat /nfsshare/bar.txt
> foo
> # ./fantest /nfsshare
> Press enter key to terminate.
> Listening for events.
> # rm -f /nfsshare/bar.txt
> # cat /nfsshare/bar.txt
> read: Unknown error 518
> cat: /nfsshare/bar.txt: Operation not permitted
> ==============================================================
>
> where NFS Client (1) and (2) are two terminal sessions on a single NFS
> Client machine.
>

Thanks for the reproducer.
I'll try it myself when I get to it.

> So what do we conclude? Is this a kernel bug or works as designed?
>

Exposing EOPENSTALE to userspace is definitely a kernel bug.


>> Oh my. I completely misread your report before. I though you were
>> trying to read from the event->fd. Now I understand that you mean read
>> from fanotify fd. That will definitely return the error, but only in
>> the special case where open error happened on the first event being
>> read to the buffer. If error happens after adding some events to the
>> buffer, fanotify process will not know about this. Regular event will
>> be silently dropped and permission event will be denied.
>>
>> [...]
>>
>> You do NOT need to call fanotify_init() again, the next read will read
>> the next event.
>
> It does appear that reading the fanotify fd again does the trick.
>
> However, the client gets an EPERM instead of ENOENT, which is a bit
> weird.
>

Why would the client get ENOENT? That EOPENSTALE event is already
consumed, the client reads the next event in the queue.

>> The fix seems trivial and I can post it once you have the test:
>> - return EAGAIN for read in case of a single event in queue without fd
>> so apps getting the read error will have a good idea what to do
>> - in case of non single event, maybe copy event with error on event->fd
>> to the buffer for specific errors that make sense to report (EMFILE)
>> so a watcher checks the values of negative event->fd can maybe do
>> something about it (e.g. provide a smaller buffer).
>
> EAGAIN would be perfect for me since I'm using fanotify in a nonblocking
> mode. It might be a bit surprising in the blocking case.
>
>

Can you please try this patch?
Can you please try it with blocking and non-blocking
Can you please try to add to reproducer the non empty queue case:
- Add another mark on another mount without PERM events in the mask
- Populate other mount with some files
- Before reading from nfsshare, read from other mount to fill the
event queue, e.g.:
# cat /tmp/foo* /nfsshare/bar.txt /tmp/bar*

This should result (depending on number of files) with
>= 2 buffer reads - first with /tmp/foo* files access
last with /tmp/bar* files access


diff --git a/fs/notify/fanotify/fanotify_user.c
b/fs/notify/fanotify/fanotify_user.c
index 2b37f27..5b14890 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -295,6 +295,17 @@ static ssize_t fanotify_read(struct file *file,
char __user *buf,
}

ret = copy_event_to_user(group, kevent, buf);
+ if (unlikely(ret == -EOPENSTALE)) {
+ /*
+ * We cannot report events with stale fd so drop it.
+ * Setting ret/mask to 0 will continue the event loop
+ * and do the right thing if there are no more events
+ * to read (i.e. return bytes read, -EAGAIN or wait).
+ */
+ kevent->mask = 0;
+ ret = 0;
+ }
+
/*
* Permission events get queued to wait for response. Other
* events can be destroyed now.

---

2017-04-20 11:33:06

by Amir Goldstein

[permalink] [raw]
Subject: Re: fanotify read returns with errno == EOPENSTALE

On Thu, Apr 20, 2017 at 2:06 PM, Amir Goldstein <[email protected]> wrote:
> On Wed, Apr 19, 2017 at 4:46 PM, Marko Rauhamaa
> <[email protected]> wrote:
>> Amir Goldstein <[email protected]>:
>>
>>> On Thu, Mar 23, 2017 at 8:43 AM, Marko Rauhamaa
>>> <[email protected]> wrote:
>>>> Jeff Layton <[email protected]>:
>>>>
>>>>> It was definitely not the intention to leak this error code to
>>>>> userland. EOPENSTALE is not a POSIX sanctioned error code, so
>>>>> applications generally don't know anything about it and will be
>>>>> confused.
>>>>
>>>> Got it. I will try to work on a reproduction and make a proper bug
>>>> report.
>>>
>>> Try this:
>>>
>>> - watch a single file for permissions events (so you will only have
>>> one event in the queue)
>>> - open file from client to generate single event (don't read event yet)
>>> - remove file from server (to make it stale)
>>> - read event (with stale file)
>>
>> I did that and reproduced the problem on a recent development kernel.
>> Happens every time.
>>
>> Just take the example program listed under "man fanotify" ("fantest")
>> and follow these steps:
>>
>> ==============================================================
>> NFS Server NFS Client(1) NFS Client(2)
>> ==============================================================
>> # echo foo >/nfsshare/bar.txt
>> # cat /nfsshare/bar.txt
>> foo
>> # ./fantest /nfsshare
>> Press enter key to terminate.
>> Listening for events.
>> # rm -f /nfsshare/bar.txt
>> # cat /nfsshare/bar.txt
>> read: Unknown error 518
>> cat: /nfsshare/bar.txt: Operation not permitted
>> ==============================================================
>>
>> where NFS Client (1) and (2) are two terminal sessions on a single NFS
>> Client machine.
>>
>
> Thanks for the reproducer.
> I'll try it myself when I get to it.
>
>> So what do we conclude? Is this a kernel bug or works as designed?
>>
>
> Exposing EOPENSTALE to userspace is definitely a kernel bug.
>
>
>>> Oh my. I completely misread your report before. I though you were
>>> trying to read from the event->fd. Now I understand that you mean read
>>> from fanotify fd. That will definitely return the error, but only in
>>> the special case where open error happened on the first event being
>>> read to the buffer. If error happens after adding some events to the
>>> buffer, fanotify process will not know about this. Regular event will
>>> be silently dropped and permission event will be denied.
>>>
>>> [...]
>>>
>>> You do NOT need to call fanotify_init() again, the next read will read
>>> the next event.
>>
>> It does appear that reading the fanotify fd again does the trick.
>>
>> However, the client gets an EPERM instead of ENOENT, which is a bit
>> weird.
>>
>
> Why would the client get ENOENT? That EOPENSTALE event is already
> consumed, the client reads the next event in the queue.

Sorry, I keep confusing when you refer to read of file and read of fanotify fd
when kernel fails to get response from fanotify daemon it will deny access
to file. That's the default.

>
>>> The fix seems trivial and I can post it once you have the test:
>>> - return EAGAIN for read in case of a single event in queue without fd
>>> so apps getting the read error will have a good idea what to do
>>> - in case of non single event, maybe copy event with error on event->fd
>>> to the buffer for specific errors that make sense to report (EMFILE)
>>> so a watcher checks the values of negative event->fd can maybe do
>>> something about it (e.g. provide a smaller buffer).
>>
>> EAGAIN would be perfect for me since I'm using fanotify in a nonblocking
>> mode. It might be a bit surprising in the blocking case.
>>
>>
>
> Can you please try this patch?
> Can you please try it with blocking and non-blocking
> Can you please try to add to reproducer the non empty queue case:
> - Add another mark on another mount without PERM events in the mask
> - Populate other mount with some files
> - Before reading from nfsshare, read from other mount to fill the
> event queue, e.g.:
> # cat /tmp/foo* /nfsshare/bar.txt /tmp/bar*
>
> This should result (depending on number of files) with
>>= 2 buffer reads - first with /tmp/foo* files access
> last with /tmp/bar* files access
>
>

Sorry I messed up the previous patch. please try this one:

diff --git a/fs/notify/fanotify/fanotify_user.c
b/fs/notify/fanotify/fanotify_user.c
index 2b37f27..7864354 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -295,6 +295,16 @@ static ssize_t fanotify_read(struct file *file,
char __user *buf,
}

ret = copy_event_to_user(group, kevent, buf);
+ if (unlikely(ret == -EOPENSTALE)) {
+ /*
+ * We cannot report events with stale fd so drop it.
+ * Setting ret to 0 will continue the event loop and
+ * do the right thing if there are no more events to
+ * read (i.e. return bytes read, -EAGAIN or wait).
+ */
+ ret = 0;
+ }
+
/*
* Permission events get queued to wait for response. Other
* events can be destroyed now.
@@ -305,7 +315,7 @@ static ssize_t fanotify_read(struct file *file,
char __user *buf,
break;
} else {
#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
- if (ret < 0) {
+ if (ret <= 0) {
FANOTIFY_PE(kevent)->response = FAN_DENY;
wake_up(&group->fanotify_data.access_waitq);
break;

2017-04-20 12:43:28

by Marko Rauhamaa

[permalink] [raw]
Subject: Re: fanotify read returns with errno == EOPENSTALE

Amir Goldstein <[email protected]>:

> Sorry I messed up the previous patch. please try this one:

I will try it.

> + * do the right thing if there are no more events to
> + * read (i.e. return bytes read, -EAGAIN or wait).

EAGAIN is the right thing to do when FAN_NONBLOCK has been specified.
Without FAN_NONBLOCK, EAGAIN is bound to confuse the application. That
could be documented, of course.

More importantly, does EAGAIN here still guarantee EPOLLET semantics of
epoll(7)? IOW, if I get EAGAIN, I shouldn't have to try read(2)ing the
fanotify fd again before calling epoll_wait(2).


Marko

2017-04-20 13:34:56

by Amir Goldstein

[permalink] [raw]
Subject: Re: fanotify read returns with errno == EOPENSTALE

On Thu, Apr 20, 2017 at 3:43 PM, Marko Rauhamaa
<[email protected]> wrote:
> Amir Goldstein <[email protected]>:
>
>> Sorry I messed up the previous patch. please try this one:
>
> I will try it.
>
>> + * do the right thing if there are no more events to
>> + * read (i.e. return bytes read, -EAGAIN or wait).
>
> EAGAIN is the right thing to do when FAN_NONBLOCK has been specified.
> Without FAN_NONBLOCK, EAGAIN is bound to confuse the application. That
> could be documented, of course.
>

My comment says "do the right thing ... -EAGAIN or wait", meaning depending
FAN_NONBLOCK. The same code that checks for FAN_NONBLOCK will
take care of that. My patch only takes care of dropping the stale event and
continue to next event. If there is no next event, code will "do the
right thing".

> More importantly, does EAGAIN here still guarantee EPOLLET semantics of
> epoll(7)? IOW, if I get EAGAIN, I shouldn't have to try read(2)ing the
> fanotify fd again before calling epoll_wait(2).
>

Yes, if you get EAGAIN it means there are no more events in the queue,
so shouldn't have to try read again.

Amir.

2017-04-20 14:47:49

by Jan Kara

[permalink] [raw]
Subject: Re: fanotify read returns with errno == EOPENSTALE

On Thu 20-04-17 14:33:04, Amir Goldstein wrote:
>
> Sorry I messed up the previous patch. please try this one:
>
> diff --git a/fs/notify/fanotify/fanotify_user.c
> b/fs/notify/fanotify/fanotify_user.c
> index 2b37f27..7864354 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -295,6 +295,16 @@ static ssize_t fanotify_read(struct file *file,
> char __user *buf,
> }
>
> ret = copy_event_to_user(group, kevent, buf);
> + if (unlikely(ret == -EOPENSTALE)) {
> + /*
> + * We cannot report events with stale fd so drop it.
> + * Setting ret to 0 will continue the event loop and
> + * do the right thing if there are no more events to
> + * read (i.e. return bytes read, -EAGAIN or wait).
> + */
> + ret = 0;
> + }
> +
> /*
> * Permission events get queued to wait for response. Other
> * events can be destroyed now.
> @@ -305,7 +315,7 @@ static ssize_t fanotify_read(struct file *file,
> char __user *buf,
> break;
> } else {
> #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
> - if (ret < 0) {
> + if (ret <= 0) {
> FANOTIFY_PE(kevent)->response = FAN_DENY;
> wake_up(&group->fanotify_data.access_waitq);
> break;

I don't think you want to break out of the reading loop when ret == 0 and
the code might be more readable as:

if (!(kevent->mask & FAN_ALL_PERM_EVENTS)) {
fsnotify_destroy_event(group, kevent);
} else {
#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
if (ret <= 0) {
FANOTIFY_PE(kevent)->response = FAN_DENY;
wake_up(&group->fanotify_data.access_waitq);
} else {
spin_lock(&group->notification_lock);
list_add_tail(&kevent->list,
&group->fanotify_data.access_list);
spin_unlock(&group->notification_lock);
}
#endif
}
if (ret < 0)
break;

Hmm?

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2017-04-20 15:06:20

by Amir Goldstein

[permalink] [raw]
Subject: Re: fanotify read returns with errno == EOPENSTALE

On Thu, Apr 20, 2017 at 5:20 PM, Jan Kara <[email protected]> wrote:
> On Thu 20-04-17 14:33:04, Amir Goldstein wrote:
>>
>> Sorry I messed up the previous patch. please try this one:
>>
>> diff --git a/fs/notify/fanotify/fanotify_user.c
>> b/fs/notify/fanotify/fanotify_user.c
>> index 2b37f27..7864354 100644
>> --- a/fs/notify/fanotify/fanotify_user.c
>> +++ b/fs/notify/fanotify/fanotify_user.c
>> @@ -295,6 +295,16 @@ static ssize_t fanotify_read(struct file *file,
>> char __user *buf,
>> }
>>
>> ret = copy_event_to_user(group, kevent, buf);
>> + if (unlikely(ret == -EOPENSTALE)) {
>> + /*
>> + * We cannot report events with stale fd so drop it.
>> + * Setting ret to 0 will continue the event loop and
>> + * do the right thing if there are no more events to
>> + * read (i.e. return bytes read, -EAGAIN or wait).
>> + */
>> + ret = 0;
>> + }
>> +
>> /*
>> * Permission events get queued to wait for response. Other
>> * events can be destroyed now.
>> @@ -305,7 +315,7 @@ static ssize_t fanotify_read(struct file *file,
>> char __user *buf,
>> break;
>> } else {
>> #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
>> - if (ret < 0) {
>> + if (ret <= 0) {
>> FANOTIFY_PE(kevent)->response = FAN_DENY;
>> wake_up(&group->fanotify_data.access_waitq);
>> break;
>
> I don't think you want to break out of the reading loop when ret == 0 and
> the code might be more readable as:
>
> if (!(kevent->mask & FAN_ALL_PERM_EVENTS)) {
> fsnotify_destroy_event(group, kevent);
> } else {
> #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
> if (ret <= 0) {
> FANOTIFY_PE(kevent)->response = FAN_DENY;
> wake_up(&group->fanotify_data.access_waitq);
> } else {
> spin_lock(&group->notification_lock);
> list_add_tail(&kevent->list,
> &group->fanotify_data.access_list);
> spin_unlock(&group->notification_lock);
> }
> #endif
> }
> if (ret < 0)
> break;
>
> Hmm?
>

Right, I missed that.
Thanks.

2017-04-21 13:13:47

by Marko Rauhamaa

[permalink] [raw]
Subject: Re: fanotify read returns with errno == EOPENSTALE

Amir Goldstein <[email protected]>:

> On Thu, Apr 20, 2017 at 3:43 PM, Marko Rauhamaa
> <[email protected]> wrote:
>> Amir Goldstein <[email protected]>:
>>
>>> Sorry I messed up the previous patch. please try this one:
>>
>> I will try it.

Tried it. Superficially, it seems to be working, but...

>> More importantly, does EAGAIN here still guarantee EPOLLET semantics
>> of epoll(7)? IOW, if I get EAGAIN, I shouldn't have to try read(2)ing
>> the fanotify fd again before calling epoll_wait(2).
>
> Yes, if you get EAGAIN it means there are no more events in the queue,
> so shouldn't have to try read again.

I ran into a system hang with our real product that suggests there might
be a problem left. It would be explained if your patch generated an
EAGAIN while there were other events waiting in the queue.

I will have to investigate further.


Marko

--
+358 44 990 4795
Skype: marko.rauhamaa_f-secure

2017-04-22 07:22:22

by Amir Goldstein

[permalink] [raw]
Subject: Re: fanotify read returns with errno == EOPENSTALE

On Thu, Apr 20, 2017 at 6:06 PM, Amir Goldstein <[email protected]> wrote:
> On Thu, Apr 20, 2017 at 5:20 PM, Jan Kara <[email protected]> wrote:
>> On Thu 20-04-17 14:33:04, Amir Goldstein wrote:
>>>
>>> Sorry I messed up the previous patch. please try this one:
>>>
>>> diff --git a/fs/notify/fanotify/fanotify_user.c
>>> b/fs/notify/fanotify/fanotify_user.c
>>> index 2b37f27..7864354 100644
>>> --- a/fs/notify/fanotify/fanotify_user.c
>>> +++ b/fs/notify/fanotify/fanotify_user.c
>>> @@ -295,6 +295,16 @@ static ssize_t fanotify_read(struct file *file,
>>> char __user *buf,
>>> }
>>>
>>> ret = copy_event_to_user(group, kevent, buf);
>>> + if (unlikely(ret == -EOPENSTALE)) {
>>> + /*
>>> + * We cannot report events with stale fd so drop it.
>>> + * Setting ret to 0 will continue the event loop and
>>> + * do the right thing if there are no more events to
>>> + * read (i.e. return bytes read, -EAGAIN or wait).
>>> + */
>>> + ret = 0;
>>> + }
>>> +
>>> /*
>>> * Permission events get queued to wait for response. Other
>>> * events can be destroyed now.
>>> @@ -305,7 +315,7 @@ static ssize_t fanotify_read(struct file *file,
>>> char __user *buf,
>>> break;
>>> } else {
>>> #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
>>> - if (ret < 0) {
>>> + if (ret <= 0) {
>>> FANOTIFY_PE(kevent)->response = FAN_DENY;
>>> wake_up(&group->fanotify_data.access_waitq);
>>> break;
>>
>> I don't think you want to break out of the reading loop when ret == 0 and
>> the code might be more readable as:
>>
>> if (!(kevent->mask & FAN_ALL_PERM_EVENTS)) {
>> fsnotify_destroy_event(group, kevent);
>> } else {
>> #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
>> if (ret <= 0) {
>> FANOTIFY_PE(kevent)->response = FAN_DENY;
>> wake_up(&group->fanotify_data.access_waitq);
>> } else {
>> spin_lock(&group->notification_lock);
>> list_add_tail(&kevent->list,
>> &group->fanotify_data.access_list);
>> spin_unlock(&group->notification_lock);
>> }
>> #endif
>> }
>> if (ret < 0)
>> break;
>>
>> Hmm?
>>
>

On Fri, Apr 21, 2017 at 5:27 PM, Marko Rauhamaa
<[email protected]> wrote:
> Amir Goldstein <[email protected]>:
>
>> Did you notice Jan's comments on my patch? I had a bug that broke out
>> of the loop. Without his corrections read will return even if there
>> are more events in the queue.
>
> Yes, I now tried Jan's fix, and it did the trick.
>
> It will now take months or years before distros have a proper fix. In
> the interim, I will absorb EOPENSTALE and schedule a reread in that
> situation.
>
> Thank you both for your attention.
>
>

Marko,

Were you able to verify that both blocking and non-blocking mode work correctly?
May I add your Tested-by and Reported-by tags?

Thanks!
Amir.

2017-04-24 07:40:56

by Marko Rauhamaa

[permalink] [raw]
Subject: Re: fanotify read returns with errno == EOPENSTALE

Amir Goldstein <[email protected]>:

> Were you able to verify that both blocking and non-blocking mode work
> correctly? May I add your Tested-by and Reported-by tags?

I have only verified the nonblocking case in my tests (which are by no
means extensive). The reported issue can no longer be reproduced with
the patched kernel, and the regular OPEN_PERM function is still
operational.

Feel free to add me to the tags.


Marko