2004-03-24 14:15:29

by Rüdiger Klaehn

[permalink] [raw]
Subject: [RFC,PATCH] dnotify: enhance or replace?

Hi all,

I have been working on a dnotify enhancement to let it work recursively
and to store information about what exactly has changed.

My current code can be found here:
<http://www.lambda-computing.com/~rudi/dnotify/>

From reading the list, I got the impression that there is a general
consensus that the current dnotify mechanism is less than optimal, and
that something should be done about it. Is that correct?

My current implementation enhances the dnotify mechanism, but is
backwards compatible to the old mechanism. This is obviously the least
intrusive approach, but it is also less than optimal. For example it
still requires an open file handle to watch for changes in a tree, so it
will create problems when unmounting a device.

In an offline discussion, the issue came up wether it would not be
better to replace dnotify with a completely new mechanism like e.g. a
special netlink socket. Since most userspace programs (e.g. KDE and
gnome) do not use dnotify directly, but through the fam daemon, the
required changes in user space applications would not be that great.

So what is your take on this? Enhance or replace?

best regards,

R?diger

p.s.: I cc'ed everybody who I think might be interested in a dnotify
enhancement/replacement.


2004-03-24 15:40:28

by Alexander Larsson

[permalink] [raw]
Subject: Re: [RFC,PATCH] dnotify: enhance or replace?

On Wed, 2004-03-24 at 15:17, R?diger Klaehn wrote:
> Hi all,
>
> I have been working on a dnotify enhancement to let it work recursively
> and to store information about what exactly has changed.
>
> My current code can be found here:
> <http://www.lambda-computing.com/~rudi/dnotify/>
>
> From reading the list, I got the impression that there is a general
> consensus that the current dnotify mechanism is less than optimal, and
> that something should be done about it. Is that correct?
>
> My current implementation enhances the dnotify mechanism, but is
> backwards compatible to the old mechanism. This is obviously the least
> intrusive approach, but it is also less than optimal. For example it
> still requires an open file handle to watch for changes in a tree, so it
> will create problems when unmounting a device.
>
> In an offline discussion, the issue came up wether it would not be
> better to replace dnotify with a completely new mechanism like e.g. a
> special netlink socket. Since most userspace programs (e.g. KDE and
> gnome) do not use dnotify directly, but through the fam daemon, the
> required changes in user space applications would not be that great.
>
> So what is your take on this? Enhance or replace?
>
> best regards,
>
> R?diger
>
> p.s.: I cc'ed everybody who I think might be interested in a dnotify
> enhancement/replacement.

I think everyone agrees that dnotify is a POS that needs replacement,
however coming up with a good new API and implementation seems to be
hard (or at least uninteresting to kernel developers).

I for sure would welcome a sane file change notification API, i.e. one
that doesn't require the use of signals. However, I don't really care
about recursive monitors, and I'm actually unsure if you really want the
DN_EXTENDED functionallity in the kernel. It seems like a great way to
make the kernel use a lot of unswappable memory, unless you limit the
event queues, and if you do that you need to stat all files in userspace
anyway so you can correctly handle queue overflows.

I think the most important properties for a good dnotify replacement is:

* Don't use signals or any other global resource that makes it
impossible to use the API in a library thats supposed to be used by all
sorts of applications.

* Get sane semantics. i.e. if a hardlink changes notify a file change in
all directories the file is in. (This is hard though, it needs backlinks
from the inodes to the directories, at least for the directories with a
monitor, something i guess we don't have today.)

* Some way to get an event when the last open fd to the file is closed
after a file change. This means you won't get hundreds of write events
for a single file change. (Of course, you won't catch writes to e.g.
logs which aren't closed, so this has to be optional. But for a desktop,
this is often what you want.)

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Alexander Larsson Red Hat, Inc
[email protected] [email protected]
He's a time-tossed guerilla cowboy who knows the secret of the alien invasion.
She's a cosmopolitan mutant mercenary living on borrowed time. They fight
crime!

2004-03-24 16:13:34

by Rüdiger Klaehn

[permalink] [raw]
Subject: Re: [RFC,PATCH] dnotify: enhance or replace?

Alexander Larsson wrote:

[snip]
> I think everyone agrees that dnotify is a POS that needs replacement,
> however coming up with a good new API and implementation seems to be
> hard (or at least uninteresting to kernel developers).
>
I want something like this, so I am willing to spend some time
implementing it.

> I for sure would welcome a sane file change notification API, i.e. one
> that doesn't require the use of signals. However, I don't really care
> about recursive monitors, and I'm actually unsure if you really want the
> DN_EXTENDED functionallity in the kernel. It seems like a great way to
> make the kernel use a lot of unswappable memory, unless you limit the
> event queues, and if you do that you need to stat all files in userspace
> anyway so you can correctly handle queue overflows.
>
About recursive notification:

Some way to watch for changes on a whole file system is a must.
Otherwise there is really no need to replace dnotify. When I start up
KDE it watches for 256 different directories in my /home directory. It
would probably watch even more directories if it could. With recursive
watching it would only need to watch two or three directories recursively.

About the buffer memory usage problem:

I have been testing the current approach for a few days continuously
now, and I don't get event buffer overflows even if I watch for all
events on "/". Of course the event buffer size should be limited. The
current implementation uses 10*4096 bytes, but in most cases starting
with a single 4096 byte page should be enough.

Note that the most common events (read and write) are quite small.
Currently they are 32 bytes, but it would not be that hard to get them
even smaller if nessecary. This is quite good compared to libraries like
dazuko that report the complete path for each change.

Extended information about the type of change has been requested by many
persons, and it is nessecary for many applications. People have been
writing ugly syscall table hacks for this, so they must be really
desperate to get this information...

It should be optional though.

> I think the most important properties for a good dnotify replacement is:
>
> * Don't use signals or any other global resource that makes it
> impossible to use the API in a library thats supposed to be used by all
> sorts of applications.
>
Agreed.

> * Get sane semantics. i.e. if a hardlink changes notify a file change in
> all directories the file is in. (This is hard though, it needs backlinks
> from the inodes to the directories, at least for the directories with a
> monitor, something i guess we don't have today.)
>
This would require large changes, and I think figuring out all aliases
to a path might as well be done in userspace. You don't gain much by
putting this in the kernel, and it requires a lot of complexity.

> * Some way to get an event when the last open fd to the file is closed
> after a file change. This means you won't get hundreds of write events
> for a single file change. (Of course, you won't catch writes to e.g.
> logs which aren't closed, so this has to be optional. But for a desktop,
> this is often what you want.)
>
Should be no problem to add this with the current approach. But it is
not that bad if you are getting hundreds of write events for a single
file. They are just 32 bytes, so you can just throw them away in the
userspace if you are not interested in them.

2004-03-24 16:31:11

by John McCutchan

[permalink] [raw]
Subject: Re: [RFC,PATCH] dnotify: enhance or replace?

On Wed, 2004-03-24 at 10:40, Alexander Larsson wrote:
>
> I think everyone agrees that dnotify is a POS that needs replacement,
> however coming up with a good new API and implementation seems to be
> hard (or at least uninteresting to kernel developers).
>
> I for sure would welcome a sane file change notification API, i.e. one
> that doesn't require the use of signals. However, I don't really care
> about recursive monitors, and I'm actually unsure if you really want the
> DN_EXTENDED functionallity in the kernel. It seems like a great way to
> make the kernel use a lot of unswappable memory, unless you limit the
> event queues, and if you do that you need to stat all files in userspace
> anyway so you can correctly handle queue overflows.
>
> I think the most important properties for a good dnotify replacement is:
>
> * Don't use signals or any other global resource that makes it
> impossible to use the API in a library thats supposed to be used by all
> sorts of applications.
>
> * Get sane semantics. i.e. if a hardlink changes notify a file change in
> all directories the file is in. (This is hard though, it needs backlinks
> from the inodes to the directories, at least for the directories with a
> monitor, something i guess we don't have today.)
>
> * Some way to get an event when the last open fd to the file is closed
> after a file change. This means you won't get hundreds of write events
> for a single file change. (Of course, you won't catch writes to e.g.
> logs which aren't closed, so this has to be optional. But for a desktop,
> this is often what you want.)

Maybe adding a rate limiter on these write events would be a better
idea, live updates are usefull for the desktop. Also with a netlink
socket I think the overhead of many events would drop siginificantly.

Also a couple other items I think need to be on the list of features,

* Some way to not have an open file descriptor for each directory you
are monitoring. This causes so many problems when unmounting, and this
is really the most noticable problem for the user.

* Better event vocabulary, we should fire events for all VFS ops. I
think right now it is limited to delete,create,written to. It would be
good to tell the listener exactly what happened, moved,renamed, etc.

John

2004-03-24 16:52:18

by Rüdiger Klaehn

[permalink] [raw]
Subject: Re: [RFC,PATCH] dnotify: enhance or replace?

John McCutchan wrote:

[snip]
> Maybe adding a rate limiter on these write events would be a better
> idea, live updates are usefull for the desktop. Also with a netlink
> socket I think the overhead of many events would drop siginificantly.
>
You could always merge read/write events if you get too many of them.
E.g. write [10,11] + write [11,12] => write [10,12]. But I never had
event buffer overflows with my tests. And a buffer of a few kbytes per
file system for fam won't be that bad, so I am not sure wether it is
nessecary to do something as complicated as rate limiting or merging.

> Also a couple other items I think need to be on the list of features,
>
> * Some way to not have an open file descriptor for each directory you
> are monitoring. This causes so many problems when unmounting, and this
> is really the most noticable problem for the user.
>
You can monitor a whole tree with a single file descriptor. But you need
at least one open fd per file system, so it would indeed be a problem
when unmounting.

> * Better event vocabulary, we should fire events for all VFS ops. I
> think right now it is limited to delete,create,written to. It would be
> good to tell the listener exactly what happened, moved,renamed, etc.
>
I had this for a short time, but I threw it away since I wanted to
concentrate on the event dispatch infrastructure first. It would not be
a big problem to add this again.

2004-03-24 19:47:04

by John McCutchan

[permalink] [raw]
Subject: Re: [RFC,PATCH] dnotify: enhance or replace?

One of the big requirements for a dnotify replacement is this

* Some way to not have an open file descriptor for each directory you
are monitoring. This causes so many problems when unmounting, and this
is really the most noticable problem for the user.

I wanted to get some possible ideas from kernel hackers about how this
could be done. Inode numbers are not unique, but is there any way to get
a unique identifier on a file without using an open file? I have come
up with a few ideas.. I don't think they are very good, but here is
goes,

- When user passes fd to kernel to watch, the kernel takes over this
fd, making it invalid in user space ( I know this is a terrible hack)
then when a volume is unmounted, the kernel can walk the list of
open fd's using for notifacation and close them, before attempting to
unmount.

- The user passes a path to the kernel, the kernel does some work so
that it can track anything to do with that path, and again when
an unmount is called the kernel cleans up anything used for
notification.

Both of these ideas are similar, does anyone have a better idea?

John

2004-03-24 20:00:28

by Paul Rolland

[permalink] [raw]
Subject: Re: [RFC,PATCH] dnotify: enhance or replace?


Hello,

> could be done. Inode numbers are not unique, but is there any
> way to get
> a unique identifier on a file without using an open file? I have come
I wonder if adding to the inode number something like the device
id is not enough to create some "unique key"...

> up with a few ideas.. I don't think they are very good, but here is
> goes,
>
> - When user passes fd to kernel to watch, the kernel takes over this
> fd, making it invalid in user space ( I know this is a
> terrible hack)
> then when a volume is unmounted, the kernel can walk the list of
> open fd's using for notifacation and close them, before
> attempting to
> unmount.
No, this is bad, because it would require to use dedicated fd for
dnotify. If I open a file/directory, give it to dnotify, I don't
want to re-open it to use it, read it, or anything else...

> - The user passes a path to the kernel, the kernel does some work so
> that it can track anything to do with that path, and again when
> an unmount is called the kernel cleans up anything used for
> notification.
That sounds much better to me...

Regards,
Paul

Paul Rolland, rol(at)as2917.net
ex-AS2917 Network administrator and Peering Coordinator

--

Please no HTML, I'm not a browser - Pas d'HTML, je ne suis pas un navigateur

"Some people dream of success... while others wake up and work hard at it"


2004-03-24 20:07:29

by Al Viro

[permalink] [raw]
Subject: Re: [RFC,PATCH] dnotify: enhance or replace?

On Wed, Mar 24, 2004 at 02:53:52PM -0500, John McCutchan wrote:
> - When user passes fd to kernel to watch, the kernel takes over this
> fd, making it invalid in user space ( I know this is a terrible hack)
> then when a volume is unmounted, the kernel can walk the list of
> open fd's using for notifacation and close them, before attempting to
> unmount.

And if umount fails? BTW, _which_ umount? The sucker can be present
in more than one place in more than one namespace.

> - The user passes a path to the kernel, the kernel does some work so
> that it can track anything to do with that path, and again when
> an unmount is called the kernel cleans up anything used for
> notification.

Ditto.

> Both of these ideas are similar, does anyone have a better idea?

"Doctor, It Hurts When I Do It"

Seriously, dnotify sucks in a lot of ways, starting with the basic
premise - that userland can do notification-based maintainig of directory
tree image. It's racy by definition, so any attempts to use it for
"security improvements" are scam. Which leaves us with file manglers
and their ilk.

Note that any attempts to trace "aliases" in userland are hopelessly racy;
that mounting/unmounting doesn't even show on the radar; that different
users can see different parts of tree or, while we are at it, completely
different trees; that this crap is a DDoS on a server that exports any
sort of network filesystem to many clients - *especially* if you want
notifications on the entire tree.

IOW, idea is fundamentally flawed and IMO the real fix is to try and figure
out a decent UI that would provide what file managers are really used for.

2004-03-26 09:51:02

by Rüdiger Klaehn

[permalink] [raw]
Subject: Re: [RFC,PATCH] dnotify: enhance or replace?

Alexander Larsson wrote:

[snip]
>>About recursive notification:
>>
>>Some way to watch for changes on a whole file system is a must.
>>Otherwise there is really no need to replace dnotify. When I start up
>>KDE it watches for 256 different directories in my /home directory. It
>>would probably watch even more directories if it could. With recursive
>>watching it would only need to watch two or three directories
recursively.
>
>
> I see no reasons for a file manager to monitor all of $HOME, unless
> you're showing all of it. Getting all the events for $HOME will just
> cause things to be slower.
>
I don't know why KDE does this. Probably because of the .desktop files.
But I can see plenty of reasons to monitor large trees. For example to
keep some kind of metadata consistent with data.

What is everybodys obsession with file managers anyway? They are the
most *boring* application of an enhanced file change mechanism, and
certainly not the most demanding.

[snip]
>>This would require large changes, and I think figuring out all aliases
>>to a path might as well be done in userspace. You don't gain much by
>>putting this in the kernel, and it requires a lot of complexity.
>
>
> There is no sane way to do that in userspace. How would you find all the
> aliases? traverse the whole filesystem?
>
This would indeed be an expensive operation. You would need a mapping
from paths to inode numbers for the files you want to watch. If you just
want to watch for changes for a single file or a small subdirectory it
is not that bad though.

Want me to code this into my userspace tool to see what I mean? I am
100% certain it can be done as long as you have inode numbers or
something else to uniquely and persistently identify a file (as opposed
to the name of the file). Unfortunately inode numbers are not unique and
persistent on all file systems, but for e.g. ext3 it should be no problem.

> What if a new alias was created?
>
You would get an event. And the new alias would point to the same file.
No problem there.

>
>>>* Some way to get an event when the last open fd to the file is closed
>>>after a file change. This means you won't get hundreds of write events
>>>for a single file change. (Of course, you won't catch writes to e.g.
>>>logs which aren't closed, so this has to be optional. But for a desktop,
>>>this is often what you want.)
>>>
>>
>>Should be no problem to add this with the current approach. But it is
>>not that bad if you are getting hundreds of write events for a single
>>file. They are just 32 bytes, so you can just throw them away in the
>>userspace if you are not interested in them.
>
>
> If you get a changed event for every write() to a file downloading to
> the desktop then fam and the filemanager will use 100% cpu while
> downloading. Been there, done that, added the rate limiter. However the
> rate limiter doesn't fix the cpu use from just getting and ignoring all
> the events.
>
Strange. I have my userspace tool running on the first console. It is
watching for everything on "/" (throwing away most of it immediately). I
am downloading various stuff with mldonkey, playing an mp3 file with
xmms, and running a KDE session and mozilla.

My CPU load is at an average of 5%. (Athlon 2500+ with 512mb ram)

If the file manager insists on redrawing itself each time he gets a
write event, you would get 100% cpu. But such a braindead implementation
should not be blamed on the notification mechanism.

2004-03-26 09:57:16

by Rüdiger Klaehn

[permalink] [raw]
Subject: Re: [RFC,PATCH] dnotify: enhance or replace?

Alexander Larsson wrote:
> On Thu, 2004-03-25 at 12:45, R?diger Klaehn wrote:
>
>>I don't know why KDE does this. Probably because of the .desktop files.
>> But I can see plenty of reasons to monitor large trees. For example to
>>keep some kind of metadata consistent with data.
>>
>>What is everybodys obsession with file managers anyway? They are the
>>most *boring* application of an enhanced file change mechanism, and
>>certainly not the most demanding.
>
>
> I happen to be interested in file managers because I maintain one.
>
OK, no offense. I just wanted to make it clear that file managers are
not the most demanding use case.

[snip]
>>Want me to code this into my userspace tool to see what I mean? I am
>>100% certain it can be done as long as you have inode numbers or
>>something else to uniquely and persistently identify a file (as opposed
>>to the name of the file). Unfortunately inode numbers are not unique and
>>persistent on all file systems, but for e.g. ext3 it should be no
problem.
>
>
> I'm sure you can look at all the files in the filesystem for a inode,
> but I hardly think such an expensive operation is useful in practice.
> Also, such a sweep over the filesystem would be racy and to fix that
> you'd have to monitor the whole filesystem to catch additions while
> scanning. And even this is racy, because of possible queue overflows.
>
You would get notified if the queue overflows. And as long as you empty
it often enough, it should not happen. Events are really small, so for a
reasonably sized queue you would have to try really hard to get it to
overflow.

> This gets even more complicated with recursive monitors. If you
> recursively monitor all of "/dir" you have to look for aliases to all
> the inodes inside /dir in the rest of the filesystem, and whenever new
> files show up in /dir you'd have to look for aliases to them too.
>
> Also, unless you also monitor mounts and unmounts that sort of changes
> won't be detected. (Not to mention that mounts can now be per-process.)
>
I guess I should just implement it. All this inode stuff would just be
nessecary if you want to be really sure you detect every change, even
the ones over hard links. For a file manager it would IMHO be OK just to
ignore hard links.

>
>> > What if a new alias was created?
>>
>>You would get an event. And the new alias would point to the same file.
>>No problem there.
>
>
> You'd only get an event if you were monitoring the directory where the
> new alias appeared. This means you have to constantly monitor the whole
> filesystem to reliably get told about them. Also, of course, in case of
> a queue overflow while monitoring the whole system you could miss the
> creation of an alias.
>
Yes of course. If you want to catch hard links, you need to watch the
root, and it is more expensive than just ignoring hard links like the
current dnotify does. But it *is* possible, and it is not prohibitively
expensive.

>
>>If the file manager insists on redrawing itself each time he gets a
>>write event, you would get 100% cpu. But such a braindead implementation
>>should not be blamed on the notification mechanism.
>
>
> If you're monitoring a file due to it being visible (say e.g. the file
> size) on the screen, you *will* be updating and redrawing it all the
> time, unless you rate-limit your changes.
>
Yes, but this rate limiting could as well be done in the file manager.

> Being able to get an event
> when the file is closed would make this a lot better, because then you
> can rate-limit heavily, but still get the final result instantly when
> the download is finished.
>
You can have that if you want.

2004-03-26 10:10:15

by Rüdiger Klaehn

[permalink] [raw]
Subject: Re: [RFC,PATCH] dnotify: enhance or replace?

[email protected] wrote:

[snip]

> "Doctor, It Hurts When I Do It"
>
> Seriously, dnotify sucks in a lot of ways, starting with the basic
> premise - that userland can do notification-based maintainig of directory
> tree image. It's racy by definition, so any attempts to use it for
> "security improvements" are scam. Which leaves us with file manglers
> and their ilk.
>
I thought about this some more. If you watch for e.g. all writes on the
root of a file system you get a complete, correctly ordered log of all
file writes on that filesystem. So you can find out wether a certain
file has been changed or not. That could be relevant security information.

You would get changes to the file pointed to by the path /etc/shadow,
even if the file has been changed by a hard link from /tmp/bla.

I am assuming here that there is a way like inode numbers to uniquely
identify and persistently identify a file. If something like this does
not exist, you are out of luck.

> Note that any attempts to trace "aliases" in userland are hopelessly
racy;

You don't trace aliases in userland. All the relevant information is
logged in kernel space. The only thing you do in userspace is to convert
this information into a user readable form. You can take as long as you
want for that.

Btw: why did you put aliases in quotes? Is aliases not the correct term
when refering to multiple paths pointing to the same file?

> that mounting/unmounting doesn't even show on the radar;

There is an event for mounting and unmounting.

> hat different users can see different parts of tree or, while we are
> at it, completely different trees;
>
That is why the paths returned by the mechanism are relative to the
directory from which you watch.

> that this crap is a DDoS on a server that exports any
> sort of network filesystem to many clients - *especially* if you want
> notifications on the entire tree.
>
If you have 100 clients, and each client wants its own notification for
/home, you would indeed have a problem. But if a single process like fam
watches for changes in /home on behalf of all 100 clients, it would be
no problem.

> IOW, idea is fundamentally flawed and IMO the real fix is to try and
figure
> out a decent UI that would provide what file managers are really used
for.
>
File managers are just one application of an enhanced file change
notification mechanism. There are many much more interesting
applications. For file managers the current dnotify mechanism is OK.