2011-08-19 23:10:25

by Sylvain Rochet

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing

Hi,

On Tue, Oct 19, 2010 at 12:35:40AM +0200, Sylvain Rochet wrote:
>
> ... upgraded to 2.6.33.5, then 2.6.33.7, finally to 2.6.35.7, and I
> always end up with the same ending, it seems inotify can miss some VFS
> events from time to time.

I finally find out why.

The NFS server does not always know the name of the modified file, if
the modified inode was cleared from the VFS cache fsnotify does not know
as well the filename then inotify child events on directories are
silently tossed.

Easy way to reproduce:

Add a few printk debug (here it only works if /data is the NFS export):

--- begin//fs/nfsd/vfs.c 2011-07-22 04:17:23.000000000 +0200
+++ linux-3.0/fs/nfsd/vfs.c 2011-07-30 03:18:17.837560809 +0200
@@ -975,6 +975,8 @@
inode = dentry->d_inode;
exp = fhp->fh_export;

+ printk("nfsd write inode=%ld name=%s\n", inode->i_ino, dentry->d_name.name);
+
/*
* Request sync writes if
* - the sync export option has been set, or

diff -Nru begin//include/linux/fsnotify.h linux-3.0/include/linux/fsnotify.h
--- begin//include/linux/fsnotify.h 2011-07-22 04:17:23.000000000 +0200
+++ linux-3.0/include/linux/fsnotify.h 2011-07-30 03:07:00.330239062 +0200
@@ -216,8 +232,15 @@
mask |= FS_ISDIR;

if (!(file->f_mode & FMODE_NONOTIFY)) {
+ if( !strcmp(path->mnt->mnt_mountpoint->d_name.name, "data") )
+ printk("fsnotify modify inode=%ld name=%s\n", inode->i_ino, file->f_dentry->d_name.name);
fsnotify_parent(path, NULL, mask);
fsnotify(inode, mask, path, FSNOTIFY_EVENT_PATH, NULL, 0);
+ } else {
+ if( !strcmp(path->mnt->mnt_mountpoint->d_name.name, "data") )
+ printk("fsnotify modify-nonotify inode=%ld name=%s\n", inode->i_ino, file->f_dentry->d_name.name);
}
}


On the NFS client, open a fd and send some data:

# exec 1> test
# ls -la
#

On the NFS server, check the kern log:

Aug 20 00:57:44 inotifydebug kernel: nfsd write inode=13 name=test
Aug 20 00:57:44 inotifydebug kernel: fsnotify modify inode=13 name=test

Everything goes well.

Now, clear the VFS cache on the NFS server:

# echo 3 > /proc/sys/vm/drop_caches

On the NFS client, send some data to the fd:

# ls -la
#

On the NFS server, check the kern log:

Aug 20 00:58:56 inotifydebug kernel: nfsd write inode=13 name=
Aug 20 00:58:56 inotifydebug kernel: fsnotify modify inode=13 name=

The filename is lost, fsnotify does not know the filename anymore,
therefore inotify cannot send event about a modified file in a watched
directory.

End of the story.

I guess this is almost impossible to fix this fsnotify bug, this is due
by the fact that NFS use inode as file identifiers, so in some case this
is impossible to know the modified filepath, and therefore impossible to
match the file event to the directory watch.

Kind regards,
Sylvain


Attachments:
(No filename) (2.88 kB)
signature.asc (198.00 B)
Digital signature
Download all attachments

2011-08-21 23:08:04

by NeilBrown

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing

On Sun, 21 Aug 2011 21:20:58 +0100 Jamie Lokier <[email protected]> wrote:

> J. Bruce Fields wrote:
> > On Sat, Aug 20, 2011 at 04:03:35AM +0100, Jamie Lokier wrote:
> > > Well you still have your sense of humour...
> > >
> > > I've never understood why you think it's about the file manager /
> > > desktop, or why you so strongly dislike the feature. It originated
> > > there historically, but that is not it's primary use.
> > >
> > > The implementation, sure, but you seem to dislike the very *principle*
> > > of subscribing to changes.
> > >
> > > Every interesting use of inotify that I've seen is for some kind of
> > > cache support, to eliminate the majority of stat() calls, to remove
> > > disk I/O (no stat means no inode), to ensure correctness (st_mtime is
> > > coarse and unreliable),
> >
> > It seems rather fragile as an mtime replacement unless it's also got
> > some sort of logging built in at a pretty low level so that you don't
> > lose events while you're not listening.
>
> It mainly serves as an accelerator for existing stat/mtime checks,
> though it does improve change detection in the last second or so since
> a previous change, which with mtime you have to make pessimistic or
> sometimes-incorrect assumptions.
>
> Quite a few programs use inotify now because it saves a little power,
> and is a bit more responsive than, say, polling config files with stat().
>
> For reliable filesystem tracking across times when not listening,
> especially if you don't trust the clock to have no backward steps (and
> you should not), a lazy change count file attribute would do. It's
> been discussed but never implemented.
>
> > And of course events have to be defined very carefully to avoid problems
> > such as this one.
>
> This thread has revealed quite a big hole, I agree. Apps cannot even
> use their normal filesystem-type whitelisting to catch this. This is bad!
>
> It is not the first hole that was found in inotify/dnotify, but it's
> the first one I'm aware of that wasn't pointed out long ago and
> then quietly ignored :-/
>
> > > and to avoid having to modify every
> > > application which might affect any file from which cached items are
> > > derived to explicitly notify all the applications which might use any
> > > of those files.
> > >
> > > You like high performance, reliable and correct behaviour, and high
> > > scalability. So I have never understood why you dislike the
> > > change-subscription principle so strongly, because it is a natural
> > > ally to those properties.
> >
> > I don't think we've seen a design that does all of that yet.
>
> Designs get discussed from time to time, over the decades.
>
> I think one of the reasons it doesn't go further is Al's well-known
> objection -- why put the effort in if you know it will be rejected.
> And a widespread view that it's just unimportant GUI file manager fluff.
> The latter also means dependability issues have tended not to be taken
> seriously.

I know you weren't asking for design suggestions, but somehow I just couldn't
help myself :-)

The (or "a") problem with {d,i,fa}notify is that it makes a core assumption
that is flawed. i.e. that a file is in some directory. It might be nice if
that were a reliable fact but thanks to our founding fathers, it is not.
If a file only ever had one name - never more nor less - and could not have
that name changed while it were open, then quite a lot of things would be a
lot easier. And probably a lot of things would be a lot harder. But we don't
live in that world (others do - I think you know where it is).

So we must drop this assumption.

Getting notification on an fd when the opened file changes makes perfect
sense. Some /proc and /sys files already provide this functionality and we
can expect that more will. Adding that to regular filesystems may not be out
of the question. This would be useful, but of limited use. You could find
out when a given file changed - either an mtime-like change or a ctime-like
change. By monitoring a directory you could find out when a name was added
or removed. But to find out when "any file in a directory changes" you would
need to open and monitor every file, which is expensive.

The other ("another") problem is the lack of recursion. You can find out
when a file in a directory changes, but not a file in a directory tree. This
significantly reduces the value. We really want to know about directory
trees. However a "directory tree" - much like "all the files in a directory"
isn't really a very well defined concept - at least from the perspective of
providing notifications. You cannot easily answer "is this file in that
tree?" or "which tree(s) is this file in?".

However there are well defined sets of files such that we could reliably
generate notifications if any file in the set were changed, or if a file were
added-to or remove-from the set. We should be looking for these sorts of
sets and seeing which are useful.

e.g.
- all files in a given filesystem. Generating notification for any change
in a given filesystem is a well defined task. It might generate too much
noise, but it would still have a place.

- all files with a given uid (or gid).

- all directories. or all regular files

- all setuid, setgid, or world-writable files

Each of these are strongly defined and we can map from file to set quite
easily. We could obviously intersect the sets to, so I could get events when
any directory owned by me on a particular filesystem was changed. It would
even be reasonable for the events to contain a newly opened fd from which I
can extract dev/inode info and possibly extract a path name.

However this still might not be fine-grained enough. While a "directory
tree" is not really a well defined concept, it is in my mind. e.g. it
seems reasonable to want to find out about all changes in $HOME/.config

I can see two approaches to this - though there might be others. All of
them must in some way create a strong concept of a directory tree.

One is to use bind mounts. i.e. I effectively do
mount --bind $HOME/.config $HOME/.config
and ask for events from the newly created vfsmnt.
This will not catch changes made through file descriptors that were opened
before I did the mount, or through hard links from some other directory
tree. But for a particular use-case that might not be a problem.

The other requires support from the filesystem and so cannot be provided
universally. It could possibly be imposed generically for filesystems that
support extended attributes .... but I feel dirty even suggesting that (Dobby
must now go and iron his hands!)

The filesystem could support the concept of a 'directory tree' much like
BTRFS allows subvolumes which a like independent filesystems within the one
big filesystem. However for this purpose the 'directory tree' would be a
very light weight concept (it wouldn't need its own inode number space).

For example, each inode could store an extra number which is the inode number
of the root of its directory tree. This would be inherited from parent
during create. Renaming or linking a file would fail if the target had a
different directory tree number. (renaming a file with only one link might
succeed and change the directory tree number). An empty directory could be
told to become a root somehow.

Then you would have a strong concept of a directory tree that could be used
for notifications.
Obviously this approach could not be used to solve any immediate problems.
But if new filesystems started supporting light-weight-directory-trees as a
well defined set of files, then in 5-10 years we might have a nice working
solution.


[of course then you need to layer any design you come up with on NFS ... but
that can probably be done in user-space with libraries and daemons].

NeilBrown

2011-08-20 03:03:37

by Jamie Lokier

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing

Al Viro wrote:
> On Sat, Aug 20, 2011 at 12:37:56AM +0100, Jamie Lokier wrote:
>
> > Possible solution:
>
> > Then this can be solved, in principle (if there's no better way), by
> > watching a "virtual directory" that gets all events for when the
> > access doesn't have a parent directory. There needs to be some way to
> > watch it, and some way to get the appropriate file from the event (as
> > there is no real directory. Or maybe there could be a virtual
> > filesystem (like /proc, /sys etc.) containing a magic directory that
> > receives these inode-only events, such that lookups in that directory
> > yield the affected file. Exactly as if the directory contains a hard
> > link to every file, perhaps a text encoding of the handles passed
> > through sys_open_by_handle_at.
>
> There is a better way - stop using idiotify... It has always been a
> mistake, driven down our throats by filemangler and desktop crowd.

Well you still have your sense of humour...

I've never understood why you think it's about the file manager /
desktop, or why you so strongly dislike the feature. It originated
there historically, but that is not it's primary use.

The implementation, sure, but you seem to dislike the very *principle*
of subscribing to changes.

Every interesting use of inotify that I've seen is for some kind of
cache support, to eliminate the majority of stat() calls, to remove
disk I/O (no stat means no inode), to ensure correctness (st_mtime is
coarse and unreliable), and to avoid having to modify every
application which might affect any file from which cached items are
derived to explicitly notify all the applications which might use any
of those files.

You like high performance, reliable and correct behaviour, and high
scalability. So I have never understood why you dislike the
change-subscription principle so strongly, because it is a natural
ally to those properties.

All the best,
-- Jamie

2011-08-20 01:29:45

by Al Viro

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing

On Sat, Aug 20, 2011 at 12:37:56AM +0100, Jamie Lokier wrote:

> Possible solution:

> Then this can be solved, in principle (if there's no better way), by
> watching a "virtual directory" that gets all events for when the
> access doesn't have a parent directory. There needs to be some way to
> watch it, and some way to get the appropriate file from the event (as
> there is no real directory. Or maybe there could be a virtual
> filesystem (like /proc, /sys etc.) containing a magic directory that
> receives these inode-only events, such that lookups in that directory
> yield the affected file. Exactly as if the directory contains a hard
> link to every file, perhaps a text encoding of the handles passed
> through sys_open_by_handle_at.

There is a better way - stop using idiotify... It has always been a
mistake, driven down our throats by filemangler and desktop crowd.
Broken in many, _many_ respects... Deprecate that crap, remove it
completely in a couple of revisions, let these clowns cope. Yeah,
yeah, I know... Not going to happen ;-/ One can dream, though...

2011-08-22 17:22:47

by J. Bruce Fields

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing

On Mon, Aug 22, 2011 at 09:07:51AM +1000, NeilBrown wrote:
> One is to use bind mounts. i.e. I effectively do
> mount --bind $HOME/.config $HOME/.config
> and ask for events from the newly created vfsmnt.
> This will not catch changes made through file descriptors that were opened
> before I did the mount, or through hard links from some other directory
> tree. But for a particular use-case that might not be a problem.

I'm missing what the extra vfsmount gets you here. The problems seem
just the same as if you don't have one.

Oh, wait, I see, it's that the file descriptors are associated with
vfsmounts, not just dentries. Hm.

--b.

2011-08-25 21:47:08

by Sylvain Rochet

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing

Hi,

On Sat, Aug 20, 2011 at 12:37:56AM +0100, Jamie Lokier wrote:
>
> Can it also be bypassed with sys_open_by_handle_at?

Just checked that and... as planned it can be bypassed with
name_to_handle_at/open_by_handle_at as well.

(I added my quick'n'dirty test code, maybe someone will be interested by
a working fhandle example on Linux 3.0.x).

Sylvain


Attachments:
(No filename) (0.00 B)
signature.asc (198.00 B)
Digital signature
Download all attachments

2011-08-21 20:21:03

by Jamie Lokier

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing

J. Bruce Fields wrote:
> On Sat, Aug 20, 2011 at 04:03:35AM +0100, Jamie Lokier wrote:
> > Well you still have your sense of humour...
> >
> > I've never understood why you think it's about the file manager /
> > desktop, or why you so strongly dislike the feature. It originated
> > there historically, but that is not it's primary use.
> >
> > The implementation, sure, but you seem to dislike the very *principle*
> > of subscribing to changes.
> >
> > Every interesting use of inotify that I've seen is for some kind of
> > cache support, to eliminate the majority of stat() calls, to remove
> > disk I/O (no stat means no inode), to ensure correctness (st_mtime is
> > coarse and unreliable),
>
> It seems rather fragile as an mtime replacement unless it's also got
> some sort of logging built in at a pretty low level so that you don't
> lose events while you're not listening.

It mainly serves as an accelerator for existing stat/mtime checks,
though it does improve change detection in the last second or so since
a previous change, which with mtime you have to make pessimistic or
sometimes-incorrect assumptions.

Quite a few programs use inotify now because it saves a little power,
and is a bit more responsive than, say, polling config files with stat().

For reliable filesystem tracking across times when not listening,
especially if you don't trust the clock to have no backward steps (and
you should not), a lazy change count file attribute would do. It's
been discussed but never implemented.

> And of course events have to be defined very carefully to avoid problems
> such as this one.

This thread has revealed quite a big hole, I agree. Apps cannot even
use their normal filesystem-type whitelisting to catch this. This is bad!

It is not the first hole that was found in inotify/dnotify, but it's
the first one I'm aware of that wasn't pointed out long ago and
then quietly ignored :-/

> > and to avoid having to modify every
> > application which might affect any file from which cached items are
> > derived to explicitly notify all the applications which might use any
> > of those files.
> >
> > You like high performance, reliable and correct behaviour, and high
> > scalability. So I have never understood why you dislike the
> > change-subscription principle so strongly, because it is a natural
> > ally to those properties.
>
> I don't think we've seen a design that does all of that yet.

Designs get discussed from time to time, over the decades.

I think one of the reasons it doesn't go further is Al's well-known
objection -- why put the effort in if you know it will be rejected.
And a widespread view that it's just unimportant GUI file manager fluff.
The latter also means dependability issues have tended not to be taken
seriously.

-- Jamie

2011-08-21 17:07:18

by J. Bruce Fields

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing

On Sat, Aug 20, 2011 at 04:03:35AM +0100, Jamie Lokier wrote:
> Well you still have your sense of humour...
>
> I've never understood why you think it's about the file manager /
> desktop, or why you so strongly dislike the feature. It originated
> there historically, but that is not it's primary use.
>
> The implementation, sure, but you seem to dislike the very *principle*
> of subscribing to changes.
>
> Every interesting use of inotify that I've seen is for some kind of
> cache support, to eliminate the majority of stat() calls, to remove
> disk I/O (no stat means no inode), to ensure correctness (st_mtime is
> coarse and unreliable),

It seems rather fragile as an mtime replacement unless it's also got
some sort of logging built in at a pretty low level so that you don't
lose events while you're not listening.

And of course events have to be defined very carefully to avoid problems
such as this one.

> and to avoid having to modify every
> application which might affect any file from which cached items are
> derived to explicitly notify all the applications which might use any
> of those files.
>
> You like high performance, reliable and correct behaviour, and high
> scalability. So I have never understood why you dislike the
> change-subscription principle so strongly, because it is a natural
> ally to those properties.

I don't think we've seen a design that does all of that yet.

--b.

2011-08-22 23:22:02

by NeilBrown

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing

On Mon, 22 Aug 2011 13:22:39 -0400 "J. Bruce Fields" <[email protected]>
wrote:

> On Mon, Aug 22, 2011 at 09:07:51AM +1000, NeilBrown wrote:
> > One is to use bind mounts. i.e. I effectively do
> > mount --bind $HOME/.config $HOME/.config
> > and ask for events from the newly created vfsmnt.
> > This will not catch changes made through file descriptors that were opened
> > before I did the mount, or through hard links from some other directory
> > tree. But for a particular use-case that might not be a problem.
>
> I'm missing what the extra vfsmount gets you here. The problems seem
> just the same as if you don't have one.
>
> Oh, wait, I see, it's that the file descriptors are associated with
> vfsmounts, not just dentries. Hm.

"Hm" might be right. writes have a 'struct file', but mkdir and rename etc
just get an inode.
So to be able to notify based on vfsmnt, mkdirat - for examine - would need
to pass the path to vfs_mkdir rather than path.dentry->d_inode, and vfs_mkdir
would have to pass that path to fsnotify_mkdir. So more intrusive that I
imagined, but still quite do-able.... if it were thought to be useful.

NeilBrown


2011-08-19 23:37:58

by Jamie Lokier

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing

Sylvain Rochet wrote:
> Hi,
>
> On Tue, Oct 19, 2010 at 12:35:40AM +0200, Sylvain Rochet wrote:
> >
> > ... upgraded to 2.6.33.5, then 2.6.33.7, finally to 2.6.35.7, and I
> > always end up with the same ending, it seems inotify can miss some VFS
> > events from time to time.
>
> I finally find out why.
>
> The NFS server does not always know the name of the modified file, if
> the modified inode was cleared from the VFS cache fsnotify does not know
> as well the filename then inotify child events on directories are
> silently tossed.
>
> Easy way to reproduce:
>
> Add a few printk debug (here it only works if /data is the NFS export):
>
> --- begin//fs/nfsd/vfs.c 2011-07-22 04:17:23.000000000 +0200
> +++ linux-3.0/fs/nfsd/vfs.c 2011-07-30 03:18:17.837560809 +0200
> @@ -975,6 +975,8 @@
> inode = dentry->d_inode;
> exp = fhp->fh_export;
>
> + printk("nfsd write inode=%ld name=%s\n", inode->i_ino, dentry->d_name.name);
> +
> /*
> * Request sync writes if
> * - the sync export option has been set, or
>
> diff -Nru begin//include/linux/fsnotify.h linux-3.0/include/linux/fsnotify.h
> --- begin//include/linux/fsnotify.h 2011-07-22 04:17:23.000000000 +0200
> +++ linux-3.0/include/linux/fsnotify.h 2011-07-30 03:07:00.330239062 +0200
> @@ -216,8 +232,15 @@
> mask |= FS_ISDIR;
>
> if (!(file->f_mode & FMODE_NONOTIFY)) {
> + if( !strcmp(path->mnt->mnt_mountpoint->d_name.name, "data") )
> + printk("fsnotify modify inode=%ld name=%s\n", inode->i_ino, file->f_dentry->d_name.name);
> fsnotify_parent(path, NULL, mask);
> fsnotify(inode, mask, path, FSNOTIFY_EVENT_PATH, NULL, 0);
> + } else {
> + if( !strcmp(path->mnt->mnt_mountpoint->d_name.name, "data") )
> + printk("fsnotify modify-nonotify inode=%ld name=%s\n", inode->i_ino, file->f_dentry->d_name.name);
> }
> }
>
>
> On the NFS client, open a fd and send some data:
>
> # exec 1> test
> # ls -la
> #
>
> On the NFS server, check the kern log:
>
> Aug 20 00:57:44 inotifydebug kernel: nfsd write inode=13 name=test
> Aug 20 00:57:44 inotifydebug kernel: fsnotify modify inode=13 name=test
>
> Everything goes well.
>
> Now, clear the VFS cache on the NFS server:
>
> # echo 3 > /proc/sys/vm/drop_caches
>
> On the NFS client, send some data to the fd:
>
> # ls -la
> #
>
> On the NFS server, check the kern log:
>
> Aug 20 00:58:56 inotifydebug kernel: nfsd write inode=13 name=
> Aug 20 00:58:56 inotifydebug kernel: fsnotify modify inode=13 name=
>
> The filename is lost, fsnotify does not know the filename anymore,
> therefore inotify cannot send event about a modified file in a watched
> directory.
>
> End of the story.
>
> I guess this is almost impossible to fix this fsnotify bug, this is due
> by the fact that NFS use inode as file identifiers, so in some case this
> is impossible to know the modified filepath, and therefore impossible to
> match the file event to the directory watch.

Oh dear, that's a security hole, if something is using inotify/dnotify
to watch and assumes that file contents (on the same machine,
i.e. server in this case) do not change if there's no event received.

It also breaks cache applications which make the same assumption. Is
a solution to open inotify watches on every file individually? If so
that seems quite severe.

I do quite like the idea of using it to break past fanotify security
restrictions though ;-)

Can it also be bypassed with sys_open_by_handle_at?


Possible solution:

One way to look at this as like NFS having a secret hard link to the
file, which does not show up in st_nlink.

Hard links are already a bit tricky with fsnotify and directory
watches. You can monitor a directory, but a file in it can change
contents through another path.

However, you can track changes of hard-linked files accurately by
either putting a watch directly on all files whose st_nlink >= 2,
and/or making sure you have watches on enough distinct directories
that they contain st_nlink entries for the same file between them,
because at least one of those directories will get an event. This is
quite practical: You watch the files directly, until such time as you
have found all its links (if you ever do), then you can remove the
direct file watches.

That gives me an idea to help with the NFS no-name watching:

It looks like when a file is referenced by inode without a path, the
problem is there's no path, so no directory inode to receive the
event?

Then this can be solved, in principle (if there's no better way), by
watching a "virtual directory" that gets all events for when the
access doesn't have a parent directory. There needs to be some way to
watch it, and some way to get the appropriate file from the event (as
there is no real directory. Or maybe there could be a virtual
filesystem (like /proc, /sys etc.) containing a magic directory that
receives these inode-only events, such that lookups in that directory
yield the affected file. Exactly as if the directory contains a hard
link to every file, perhaps a text encoding of the handles passed
through sys_open_by_handle_at.

-- Jamie

2011-08-20 00:47:36

by Sylvain Rochet

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing

Hi Jamie,


On Sat, Aug 20, 2011 at 12:37:56AM +0100, Jamie Lokier wrote:
>
> Oh dear, that's a security hole, if something is using inotify/dnotify
> to watch and assumes that file contents (on the same machine,
> i.e. server in this case) do not change if there's no event received.
>
> It also breaks cache applications which make the same assumption.
>
> I do quite like the idea of using it to break past fanotify security
> restrictions though ;-)

It also probably means that fanotify misses some events when a filesystem
is modified over NFS. If fanotify is used the way it is designed, i.e.
with an antiviruse software, this may be an interesting way to skip the
antiviruse check.

Here we go:

NFS server, run the fanotify example tool:

~/fanotify-example# ./fanotify -m /data/

NFS client, open a fd then do some I/O:

# exec 1> test
# ls -la
#

NFS server log:

/data/test: pid=1235 modify close(writable)

NFS server, cache clearing:

# echo 3 > /proc/sys/vm/drop_caches

NFS client, more I/O:

# ls -la

NFS server log:

/data: pid=1234 modify close(writable)

We receive an event... which is obviously wrong. This is even worse than
no event at all, we receive an event about the wrong inode, the parent
inode of the modified file actually.


> Is a solution to open inotify watches on every file individually? If
> so that seems quite severe.

This is what I am going to do, at least temporarily, I only need to
watch about a million file (and slowly counting).

The startup time to watch an entire filesystem using inotify already
require a full filesystem walk, watching all files and directories
instead of directories only will not change much because most of the
time is spent waiting I/O operations. This may however require a lot
more memory both on kernel side and userland side.


> Can it also be bypassed with sys_open_by_handle_at?

No clue, this should be checked but there is no evident reason that it
cannot be bypassed this way as well.


> Possible solution:
>
> One way to look at this as like NFS having a secret hard link to the
> file, which does not show up in st_nlink.
>
> Hard links are already a bit tricky with fsnotify and directory
> watches. You can monitor a directory, but a file in it can change
> contents through another path.
>
> However, you can track changes of hard-linked files accurately by
> either putting a watch directly on all files whose st_nlink >= 2,
> and/or making sure you have watches on enough distinct directories
> that they contain st_nlink entries for the same file between them,
> because at least one of those directories will get an event. This is
> quite practical: You watch the files directly, until such time as you
> have found all its links (if you ever do), then you can remove the
> direct file watches.

Yup, I agree.


> That gives me an idea to help with the NFS no-name watching:
>
> It looks like when a file is referenced by inode without a path, the
> problem is there's no path, so no directory inode to receive the
> event?

No filepath and no filename at all actually. There is no way to find the
(or "a" if the file is linked to more than one directory) filename other
than walking among all the directory tree to find where the inode is
linked. We need a directory entry (dirent) to send an event about a
modified file inside a watched directory.


> Then this can be solved, in principle (if there's no better way), by
> watching a "virtual directory" that gets all events for when the
> access doesn't have a parent directory. There needs to be some way to
> watch it, and some way to get the appropriate file from the event (as
> there is no real directory. Or maybe there could be a virtual
> filesystem (like /proc, /sys etc.) containing a magic directory that
> receives these inode-only events, such that lookups in that directory
> yield the affected file. Exactly as if the directory contains a hard
> link to every file, perhaps a text encoding of the handles passed
> through sys_open_by_handle_at.

By doing that, we'll only get the inode nb as we cannot fetch the filename.


Sylvain


Attachments:
(No filename) (4.03 kB)
signature.asc (198.00 B)
Digital signature
Download all attachments

2011-08-20 03:21:08

by Jamie Lokier

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing

Sylvain Rochet wrote:
> Hi Jamie,
>
>
> On Sat, Aug 20, 2011 at 12:37:56AM +0100, Jamie Lokier wrote:
> >
> > Oh dear, that's a security hole, if something is using inotify/dnotify
> > to watch and assumes that file contents (on the same machine,
> > i.e. server in this case) do not change if there's no event received.
> >
> > It also breaks cache applications which make the same assumption.
> >
> > I do quite like the idea of using it to break past fanotify security
> > restrictions though ;-)
>
> It also probably means that fanotify misses some events when a filesystem
> is modified over NFS. If fanotify is used the way it is designed, i.e.
> with an antiviruse software, this may be an interesting way to skip the
> antiviruse check.
>
> Here we go:
>
> NFS server, run the fanotify example tool:
>
> ~/fanotify-example# ./fanotify -m /data/
>
> NFS client, open a fd then do some I/O:
>
> # exec 1> test
> # ls -la
> #
>
> NFS server log:
>
> /data/test: pid=1235 modify close(writable)
>
> NFS server, cache clearing:
>
> # echo 3 > /proc/sys/vm/drop_caches
>
> NFS client, more I/O:
>
> # ls -la
>
> NFS server log:
>
> /data: pid=1234 modify close(writable)
>
> We receive an event... which is obviously wrong. This is even worse than
> no event at all, we receive an event about the wrong inode, the parent
> inode of the modified file actually.

That sounds like a proper bug, maybe it can be fixed at least?

> > Is a solution to open inotify watches on every file individually? If
> > so that seems quite severe.
>
> This is what I am going to do, at least temporarily, I only need to
> watch about a million file (and slowly counting).
>
> The startup time to watch an entire filesystem using inotify already
> require a full filesystem walk, watching all files and directories
> instead of directories only will not change much because most of the
> time is spent waiting I/O operations. This may however require a lot
> more memory both on kernel side and userland side.

Watching an entire filesystem entails reading all the directories, but
you don't have to fetch the inodes of files. But still, it's very
slow (takes about 15 minutes on my /home from cold cache, just to read
the million or so directories).

There was some work on propagating events upwards so that efficient
recursive watches could be established, in the context of fanotify but
it would make sense to be available to all fsnotify users. I wonder
how that went.

> > Then this can be solved, in principle (if there's no better way), by
> > watching a "virtual directory" that gets all events for when the
> > access doesn't have a parent directory. There needs to be some way to
> > watch it, and some way to get the appropriate file from the event (as
> > there is no real directory. Or maybe there could be a virtual
> > filesystem (like /proc, /sys etc.) containing a magic directory that
> > receives these inode-only events, such that lookups in that directory
> > yield the affected file. Exactly as if the directory contains a hard
> > link to every file, perhaps a text encoding of the handles passed
> > through sys_open_by_handle_at.
>
> By doing that, we'll only get the inode nb as we cannot fetch the filename.

Yes... That's ok if it's one we are tracking inode->multiple-paths in
userspace anyway (for hard links). But it's quite demanding if we
hoped to avoid fetching and storing that in userspace for
st_nlink == 1 files.

In that case it is still better to get a notification "something
unknown on this FS has changed", rather than no notification.
Userspace would react by flushing all of its cached knowledge of
things under directory watches that don't have direct watches. But at
least that's reliable and correct behaviour, and if it happens often,
userspace heuristics can react by watching priority inodes more directly.

If that's the common case, then these nameless, pathless events could
just trigger a simple event with catch-all IN_NO_PATH flag set,
referring to the filesytem but no more detail than that. inotify
would accept that flag when adding a watch, ignore the inode given but
remember the filesystem, and send all events with no path to the
watch(es) created with that flag on that filesystem. It's a flag
because the event type is still useful.

All the best,
-- Jamie

2011-08-21 22:29:21

by NeilBrown

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing

On Sat, 20 Aug 2011 01:03:44 +0200 Sylvain Rochet <[email protected]>
wrote:

> Hi,
>
> On Tue, Oct 19, 2010 at 12:35:40AM +0200, Sylvain Rochet wrote:
> >
> > ... upgraded to 2.6.33.5, then 2.6.33.7, finally to 2.6.35.7, and I
> > always end up with the same ending, it seems inotify can miss some VFS
> > events from time to time.
>
> I finally find out why.
>
> The NFS server does not always know the name of the modified file, if
> the modified inode was cleared from the VFS cache fsnotify does not know
> as well the filename then inotify child events on directories are
> silently tossed.

This may not be helpful but if you export the filesystem with the
"subtree_check" option then the NFS server will always know a name for any
file that it modifies.

This is not without cost though, which is why it is no longer the default.

If the client opens a file, the server moves it to a new directory and then
the server cache is flushed, then the client will get an ESTALE error. This
is rarely a problem in practice but when it is it can be quite annoying...

NeilBrown



>
> Easy way to reproduce:
>
> Add a few printk debug (here it only works if /data is the NFS export):
>
> --- begin//fs/nfsd/vfs.c 2011-07-22 04:17:23.000000000 +0200
> +++ linux-3.0/fs/nfsd/vfs.c 2011-07-30 03:18:17.837560809 +0200
> @@ -975,6 +975,8 @@
> inode = dentry->d_inode;
> exp = fhp->fh_export;
>
> + printk("nfsd write inode=%ld name=%s\n", inode->i_ino, dentry->d_name.name);
> +
> /*
> * Request sync writes if
> * - the sync export option has been set, or
>
> diff -Nru begin//include/linux/fsnotify.h linux-3.0/include/linux/fsnotify.h
> --- begin//include/linux/fsnotify.h 2011-07-22 04:17:23.000000000 +0200
> +++ linux-3.0/include/linux/fsnotify.h 2011-07-30 03:07:00.330239062 +0200
> @@ -216,8 +232,15 @@
> mask |= FS_ISDIR;
>
> if (!(file->f_mode & FMODE_NONOTIFY)) {
> + if( !strcmp(path->mnt->mnt_mountpoint->d_name.name, "data") )
> + printk("fsnotify modify inode=%ld name=%s\n", inode->i_ino, file->f_dentry->d_name.name);
> fsnotify_parent(path, NULL, mask);
> fsnotify(inode, mask, path, FSNOTIFY_EVENT_PATH, NULL, 0);
> + } else {
> + if( !strcmp(path->mnt->mnt_mountpoint->d_name.name, "data") )
> + printk("fsnotify modify-nonotify inode=%ld name=%s\n", inode->i_ino, file->f_dentry->d_name.name);
> }
> }
>
>
> On the NFS client, open a fd and send some data:
>
> # exec 1> test
> # ls -la
> #
>
> On the NFS server, check the kern log:
>
> Aug 20 00:57:44 inotifydebug kernel: nfsd write inode=13 name=test
> Aug 20 00:57:44 inotifydebug kernel: fsnotify modify inode=13 name=test
>
> Everything goes well.
>
> Now, clear the VFS cache on the NFS server:
>
> # echo 3 > /proc/sys/vm/drop_caches
>
> On the NFS client, send some data to the fd:
>
> # ls -la
> #
>
> On the NFS server, check the kern log:
>
> Aug 20 00:58:56 inotifydebug kernel: nfsd write inode=13 name=
> Aug 20 00:58:56 inotifydebug kernel: fsnotify modify inode=13 name=
>
> The filename is lost, fsnotify does not know the filename anymore,
> therefore inotify cannot send event about a modified file in a watched
> directory.
>
> End of the story.
>
> I guess this is almost impossible to fix this fsnotify bug, this is due
> by the fact that NFS use inode as file identifiers, so in some case this
> is impossible to know the modified filepath, and therefore impossible to
> match the file event to the directory watch.
>
> Kind regards,
> Sylvain


2011-08-20 01:43:11

by Randy Dunlap

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing

On Sat, 20 Aug 2011 02:29:43 +0100 Al Viro wrote:

> On Sat, Aug 20, 2011 at 12:37:56AM +0100, Jamie Lokier wrote:
>
> > Possible solution:
>
> > Then this can be solved, in principle (if there's no better way), by
> > watching a "virtual directory" that gets all events for when the
> > access doesn't have a parent directory. There needs to be some way to
> > watch it, and some way to get the appropriate file from the event (as
> > there is no real directory. Or maybe there could be a virtual
> > filesystem (like /proc, /sys etc.) containing a magic directory that
> > receives these inode-only events, such that lookups in that directory
> > yield the affected file. Exactly as if the directory contains a hard
> > link to every file, perhaps a text encoding of the handles passed
> > through sys_open_by_handle_at.
>
> There is a better way - stop using idiotify... It has always been a
> mistake, driven down our throats by filemangler and desktop crowd.
> Broken in many, _many_ respects... Deprecate that crap, remove it
> completely in a couple of revisions, let these clowns cope. Yeah,
> yeah, I know... Not going to happen ;-/ One can dream, though...

What do you suggest as a functioning alternative to inotify?

---
~Randy

2011-08-20 02:01:47

by Al Viro

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing

On Fri, Aug 19, 2011 at 06:43:11PM -0700, Randy Dunlap wrote:

> What do you suggest as a functioning alternative to inotify?

Depends on what particular (ab)use do you have in mind...