2013-03-10 04:51:33

by Simeon Bird

[permalink] [raw]
Subject: Fwd: [Nepomuk] Better support for (desktop) file search / indexing applications

Sent again in plain-text - apologies.

---------- Forwarded message ----------
From: Simeon Bird <[email protected]>
Date: 9 March 2013 23:49
Subject: Re: [Nepomuk] Better support for (desktop) file search /
indexing applications
To: Tvrtko Ursulin <[email protected]>
Cc: Martin Steigerwald <[email protected]>, Jan Kara <[email protected]>,
Robert Love <[email protected]>, [email protected], Eric
Paris <[email protected]>, Nepomuk Mailing List <[email protected]>,
Linux Filesystem Development Mailinglist
<[email protected]>, Eric Paris <[email protected]>,
John McCutchan <[email protected]>


On 12 November 2012 04:10, Tvrtko Ursulin <[email protected]> wrote:
>
> On Saturday 10 November 2012 17:53:45 Martin Steigerwald wrote:
> > Still fanotify needs root access and thus this would need a daemon running
> > as root and some policy kit stuff to access it and in case of mount point
> > watches robust and secure code so that each user may only see his/her own
> > results.
>
> Perhaps then also extend fanotify to support user watches, from the top of my
> head I can't think of a reason it would be very difficult to implement. But it
> has been a few years since I actively worked with that code.
>
> Since you are not the only group having issues with fanotify feature set I can
> see this mini-project (together with extensions from me previous reply) being
> useful. It is also better to evolve it than neglect due a few shortcomings and
> then in a few years someone will come up with something completely new and
> then we will have yet another notification system.
>
> Tvrtko


Hi,

We (nepomuk) recently looked at using fanotify, and indeed we would
need user watches, support for moves and recursive directory watches
(we need to support the case where /home is not a separate filesystem)
before it would be useful to us. If you are interested in adding
these, we would be delighted to use nepomuk as a test-case for them.

We were wondering also if it would be possible to extend inotify a
little? Our wishlist is:

1) Recursive folder watches
2) When a file moves, some way to get the destination without watching
the directory it moved to, so moves can be tracked without watching
every file on the system.

I understand that there are reasons of security and performance why
you cannot implement 1), but is 2) possible? Maybe by extending
IN_MOVED_TO, or adding a new event type?

2) is actually in some ways the more severe problem for us. As well as
being an indexer, nepomuk is a system that allows you to store file
metadata such as ratings. When users move the files, they want the
metadata to move too, so we need to track where the file moved, and
thus at the moment we recursively watch everything. This is
particularly problematic with removable media; because a lot of people
will plug in an external drive and then move files onto it, we have to
watch every drive as soon as it is plugged in. If we were able to get
the destination of move events without watching the destination
directory, we could watch only those directories with interesting
metadata in, which would make things a lot easier.

inotify move tracking would also be useful for other things - eg, a
text editor could use inotify to see if a file it has open has moved
and offer to re-open the file in its new location, which is impossible
at the moment.

Since the lack of recursive watches is really a problem because we
have a tendency to run out of watches, it would also really help if
the default limit was a bit higher - most people seem to have > 8000
folders, but I suspect far fewer have > 32000 (probably excepting
those who are indexing kernel source trees: I have 21000, and half of
that is KDE source).

Would any of this be possible? If you happen to know of a better way
to track moves using existing tools, that would be even better.

Thanks,
Simeon


2013-03-10 12:06:28

by Lijo Antony

[permalink] [raw]
Subject: Re: Fwd: [Nepomuk] Better support for (desktop) file search / indexing applications

On 03/10/2013 08:51 AM, Simeon Bird wrote:
>
> Hi,
>
> We (nepomuk) recently looked at using fanotify, and indeed we would
> need user watches, support for moves and recursive directory watches
> (we need to support the case where /home is not a separate filesystem)
> before it would be useful to us. If you are interested in adding
> these, we would be delighted to use nepomuk as a test-case for them.
>
> We were wondering also if it would be possible to extend inotify a
> little? Our wishlist is:
>
> 1) Recursive folder watches
> 2) When a file moves, some way to get the destination without watching
> the directory it moved to, so moves can be tracked without watching
> every file on the system.

I am also interested in these features. As of now, my solutions are,

1) When the limit is reached, ask the user to increase the limit and
restart the application.

2) When top level directory move is detected, do a file system search
based on inode to find out the new location. Very slow and not fool proof.

I would also like to any other solutions for these problems. I am yet to
look into fanotify.

-lijo

[leaving the rest for reference]

>
> I understand that there are reasons of security and performance why
> you cannot implement 1), but is 2) possible? Maybe by extending
> IN_MOVED_TO, or adding a new event type?
>
> 2) is actually in some ways the more severe problem for us. As well as
> being an indexer, nepomuk is a system that allows you to store file
> metadata such as ratings. When users move the files, they want the
> metadata to move too, so we need to track where the file moved, and
> thus at the moment we recursively watch everything. This is
> particularly problematic with removable media; because a lot of people
> will plug in an external drive and then move files onto it, we have to
> watch every drive as soon as it is plugged in. If we were able to get
> the destination of move events without watching the destination
> directory, we could watch only those directories with interesting
> metadata in, which would make things a lot easier.
>
> inotify move tracking would also be useful for other things - eg, a
> text editor could use inotify to see if a file it has open has moved
> and offer to re-open the file in its new location, which is impossible
> at the moment.
>
> Since the lack of recursive watches is really a problem because we
> have a tendency to run out of watches, it would also really help if
> the default limit was a bit higher - most people seem to have > 8000
> folders, but I suspect far fewer have > 32000 (probably excepting
> those who are indexing kernel source trees: I have 21000, and half of
> that is KDE source).
>
> Would any of this be possible? If you happen to know of a better way
> to track moves using existing tools, that would be even better.
>
> Thanks,
> Simeon
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2013-03-12 02:55:58

by Andreas Dilger

[permalink] [raw]
Subject: Re: [Nepomuk] Better support for (desktop) file search / indexing applications

On 2013-03-10, at 6:06, Lijo Antony <[email protected]> wrote:
> On 03/10/2013 08:51 AM, Simeon Bird wrote:
>>
>> We (nepomuk) recently looked at using fanotify, and indeed we would
>> need user watches, support for moves and recursive directory watches
>> (we need to support the case where /home is not a separate filesystem)
>> before it would be useful to us. If you are interested in adding
>> these, we would be delighted to use nepomuk as a test-case for them.
>>
>> We were wondering also if it would be possible to extend inotify a
>> little? Our wishlist is:
>>
>> 1) Recursive folder watches
>> 2) When a file moves, some way to get the destination without watching
>> the directory it moved to, so moves can be tracked without watching
>> every file on the system.
>
> I am also interested in these features. As of now, my solutions are,
>
> 1) When the limit is reached, ask the user to increase the limit and restart the application.
>
> 2) When top level directory move is detected, do a file system search based on inode to find out the new location. Very slow and not fool proof.

For Lustre, we implemented something similar to inotify with some
improvements that are possible because we limit the backend
filesystems that it runs on (ext4 and ZFS currently, but would also be
possible on Btrfs as well).

For #1 (event recording) we have a persistent transactional ChangeLog
that is updated atomically with the metadata operation (create, rename,
unlink, etc). This allows external applications to be notified of changes
in the whole filesystem, even if there are modifications while the watcher was
not running (to some limited extent). It is possible to limit the types of events
that are recorded in the ChangeLog, but not necessarily by pathname yet.
This is used for HSM and remote filesystem replication today.

For #2, we have a function "fid2path" that will generate in O(1) each
pathname of a file given the FID (essentially the inode number). This is
possible because each inode keeps an xattr ("link") that is updated for each link
or rename of the inode with the parent directory FID and directory entry name.

The "link" xattr is relatively low cost, since the inode needs to be updated for
each link/rename/unlink anyway (nlinks and ctime), and in the overwhelmingly
common case if a single link on a file there is only a single entry in the xattr,
so it can fit inside the inode.

>From the list of links, we can walk the namespace in reverse order with
fid2path() to generate all if the pathnames of an inode. Something
similar would also allow you to find the target directory of a renamed file
without having to watch all of the directories.

Cheers, Andreas

> I would also like to any other solutions for these problems. I am yet to look into fanotify.
>
> -lijo
>
> [leaving the rest for reference]
>
>>
>> I understand that there are reasons of security and performance why
>> you cannot implement 1), but is 2) possible? Maybe by extending
>> IN_MOVED_TO, or adding a new event type?
>>
>> 2) is actually in some ways the more severe problem for us. As well as
>> being an indexer, nepomuk is a system that allows you to store file
>> metadata such as ratings. When users move the files, they want the
>> metadata to move too, so we need to track where the file moved, and
>> thus at the moment we recursively watch everything. This is
>> particularly problematic with removable media; because a lot of people
>> will plug in an external drive and then move files onto it, we have to
>> watch every drive as soon as it is plugged in. If we were able to get
>> the destination of move events without watching the destination
>> directory, we could watch only those directories with interesting
>> metadata in, which would make things a lot easier.
>>
>> inotify move tracking would also be useful for other things - eg, a
>> text editor could use inotify to see if a file it has open has moved
>> and offer to re-open the file in its new location, which is impossible
>> at the moment.
>>
>> Since the lack of recursive watches is really a problem because we
>> have a tendency to run out of watches, it would also really help if
>> the default limit was a bit higher - most people seem to have > 8000
>> folders, but I suspect far fewer have > 32000 (probably excepting
>> those who are indexing kernel source trees: I have 21000, and half of
>> that is KDE source).
>>
>> Would any of this be possible? If you happen to know of a better way
>> to track moves using existing tools, that would be even better.
>>
>> Thanks,
>> Simeon
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html