2004-09-01 20:27:39

by Jamie Lokier

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

Tonnerre wrote:
> Quel horreur!
> Do it in userland, really.

I'm amazed that after all this discussion where all the realistic
implementations are done in userland with kernel support for calling
out to it, there are people who think the kernel is supposed to decode
MP3 files or whatever.

Nobody is advocating that!

> If I get the time, I'll write you a small daemon based on libmagic
> which stores the file attributes in xattrs, or if they're not
> supported, in some MacOS/Xish per-directory files. Even a file manager
> ("finder") can do that, there's not even the need for a daemon.

How are you going to do the part where the xattr changes when the file
is modified?

(For example, if I edit an HTML file which is encoded in iso-8859-1,
change it to utf-8 and indicate that in a META element, and save it
under the same name, the full content-type should change from
"text/html; charset=iso-8859-1" to "text/html; charset=utf-8".)

I don't see how you can do that without kernel support.

Don't say dnotify or inotify, because neither would work.

-- Jamie


2004-09-02 10:56:48

by Alan

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

On Mer, 2004-09-01 at 21:16, Jamie Lokier wrote:
> (For example, if I edit an HTML file which is encoded in iso-8859-1,
> change it to utf-8 and indicate that in a META element, and save it
> under the same name, the full content-type should change from
> "text/html; charset=iso-8859-1" to "text/html; charset=utf-8".)
>
> I don't see how you can do that without kernel support.
>
> Don't say dnotify or inotify, because neither would work.

inotify done right is useful here as well as in a lot of other desktop
cases where dnotify doesn't really scale. Its enough to let me

- Find the new file
- Virus scan it
- Classify its possible type heirarchies
- Index it

Alan

2004-09-02 18:16:43

by Christer Weinigel

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

Jamie Lokier <[email protected]> writes:

> (For example, if I edit an HTML file which is encoded in iso-8859-1,
> change it to utf-8 and indicate that in a META element, and save it
> under the same name, the full content-type should change from
> "text/html; charset=iso-8859-1" to "text/html; charset=utf-8".)
>
> I don't see how you can do that without kernel support.

charset_cache = dbm(os.getenv('HOME') + '/.charset_cache')

def get_charset(path):
file_mtime = get_mtime(path)
cache_tuple = dbm.get(path, (None, None))
if file_mtime != cache_tuple[0]:
cache_tuple = (file_mtime, figure_out_charset(path))
dbm.put(path, cache_tuple)
return cache_tuple[1]

This code is guaranteed to always give you an up to date charset for a
file, provided that the mtime is guaranteed to change every time the
file changes.

If the file can change during the call to get_charset you'll have to
lock the file while you are working with it or handle it some other
way (comparing mtime before and after and retrying if they differ),
but that is true even if you do it all in the kernel. This has the
added advantage that the cache is updated lazily, so the charset is
never calculated unless someone really needs it.

If the disk gets full, it is time to shrink the cache and preferably
stale entries are evicted first. If that's not good enough, a
notifier such as dnotify or inotify can be used to invalidate the
cache immediately after the file has changed.

If this gets popular, the cache management can be moved to a daemon
(naturally with all the security aspects in mind).

> Don't say dnotify or inotify, because neither would work.

Why not? The approach above works on any filesystem, even without
dnotify or inotify but will be more efficient with them.

/Christer

--
"Just how much can I get away with and still go to heaven?"

Freelance consultant specializing in device driver programming for Linux
Christer Weinigel <[email protected]> http://www.weinigel.se

2004-09-02 20:03:59

by Jamie Lokier

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

Christer Weinigel wrote:
> Jamie Lokier <[email protected]> writes:
>
> > (For example, if I edit an HTML file which is encoded in iso-8859-1,
> > change it to utf-8 and indicate that in a META element, and save it
> > under the same name, the full content-type should change from
> > "text/html; charset=iso-8859-1" to "text/html; charset=utf-8".)
> >
> > I don't see how you can do that without kernel support.
>
> [code which runs in the program which needs to read the content-type]
>
> This code is guaranteed to always give you an up to date charset for a
> file, provided that the mtime is guaranteed to change every time the
> file changes.

...which it isn't, but ignoring that.

Yes we all know that code will work, turning a blind eye to occasional
mtime failures.

What I objected to isn't that, but Tonerre's idea of a daemon (i.e. in
the background, that's what daemons do) updating xattrs. Your code
will work. Tonerre's has a serious race condition.

> > Don't say dnotify or inotify, because neither would work.
>
> Why not? The approach above works on any filesystem, even without
> dnotify or inotify but will be more efficient with them.

The above approach is in fact better with dnotify/inotify, when they
area available, because then it doesn't have mtime problems.

My objection is that dnotify & inotify don't fix the problems of a
"daemon" (i.e. in the background) updating an xattr - because the
daemon might not run scheduled between the write and when the file
needs to be used.

That said, yours has practical problems too. The intended purpose (*)
of storing content-type in an xattr was so that programs which need to
read content without understanding it may just do so (e.g. mail agents,
httpd), and are distinct from the programs which compute and set it
(e.g. programs which understand specific content formats).

(*) I don't necessarily agree with that purpose.

-- Jamie

2004-09-02 20:30:09

by Jamie Lokier

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

Alan Cox wrote:
> On Mer, 2004-09-01 at 21:16, Jamie Lokier wrote:
> > (For example, if I edit an HTML file which is encoded in iso-8859-1,
> > change it to utf-8 and indicate that in a META element, and save it
> > under the same name, the full content-type should change from
> > "text/html; charset=iso-8859-1" to "text/html; charset=utf-8".)
> >
> > I don't see how you can do that without kernel support.
> >
> > Don't say dnotify or inotify, because neither would work.
>
> inotify done right is useful here as well as in a lot of other desktop
> cases where dnotify doesn't really scale. Its enough to let me
>
> - Find the new file
> - Virus scan it
> - Classify its possible type heirarchies
> - Index it

Indeed it does, but it fails for the example I was commenting on to
which you replied..

1. The file /var/www/site/index.html's written in vi.
2. "The daemon" (that's what I objected to) receives inotify.
blocks waiting for scheduler, however...
3. Seeing that vi is now finished, I phone person on other side
of world and say the updated file is now available.
4. Person fetches http://site/index.html
...oh! They receive the wrong content-type (bollocks!)
5. Eventually after paging & scheduling deems it appropriate,
"The daemon" gets to run, looks at file, updates content-type.

In other words I was criticising Tonnerre's idea of a daemon which
updates the xattrs by looking at file contents... by definition, a
daemon runs in the background, and the whole point of it updating a
content-type xattr was obviously so that programs like httpd could
just _use_ that xattr.

That doesn't work, even if that daemon uses inotify.

Basically, anytime where you want an ordering guarantee that something
will be recalculated in the interval after a file is modified and
before the calculated result is used, you have to be very careful
about exactly which code is using dnotify or inotify to achieve that.
Single threaded programs can use it easily, but a multi-threaded file
server has to be very careful about ordering if it's to avoid glitches.

It is exactly a real problem I have faced in a multi-threaded HTTP
server which uses dnotify to detect changes to prerequisite files and
also changes to the path walked and permissions on it, and thus
invalidate cached generated pages, thus giving strong cache
guarantees. Performance is great, if you ignore the race conditions...

(One of the more popular small web servers, lighttpd (PHP people seem
to like it), also uses dnotify. Apparently it provides a big
performance boost, just by avoiding stat() calls. So I'm not alone in
using dnotify for such things.)

-- Jamie