2004-09-02 22:43:29

by Horst H. von Brand

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

Lee Revell <[email protected]> said:
> To: Pavel Machek <[email protected]>
> Cc: Spam <[email protected]>, Horst von Brand <[email protected]>,
> Jamie Lokier <[email protected]>, David Masover <[email protected]>,
> Chris Wedgwood <[email protected]>, [email protected],
> Linus Torvalds <[email protected]>, Christoph Hellwig <[email protected]>,
> Hans Reiser <[email protected]>, [email protected],
> linux-kernel <[email protected]>,
> Alexander Lyamin aka FLX <[email protected]>,
> ReiserFS List <[email protected]>
> X-Mailer: Ximian Evolution 1.4.6
> Date: Thu, 02 Sep 2004 16:01:17 -0400
>
> On Thu, 2004-09-02 at 15:49, Pavel Machek wrote:

[...]

> > You really need archive support in find. At the very least you need
> > option "enter archives" vs. "do not enter archives". Entering archives
> > automagically is seriously wrong.

I have used find(1) for quite some time now, and have never (or very
rarely) missed this.

> But is it efficient to make every application that reads files have to
> know how to get inside a tar file, just to read its contents?

Totally ridiculous, especially if you factor in .gz, .bz2, .zip, .a,
.whatever.new.format.they.come.up.with.tomorrow. But then again, this would
presumably reside in a (shared) library, so it isn't so bad...

> That
> seems like a massive duplication of effort.

Right. tar, gzip, bunzip, et al are already around.

> Better to have the contents
> accessible via a separate stream, in the same namespace. Fix it once in
> the kernel vs. fix it in umpteen apps.

Dead wrong. It is better to fix it in userspace (via a library, if
required; could call random unpacking etc programs at will, even be
configured on a user-by-user basis through ~/.wacky-file-handling or
environment) than force this junk into the kernel. Kernel code is _always_
resident, extremely security critical, and hard to get right. Besides, not
everybody will want to carry this around, and so it will forever stay a
"weird Linux kernel configuration only" feature, i.e., useless in practice.

> The key point here is that the expressive power of the system is greatly
> reduced by having a fragmented namespace. Of course there are any
> number of ways for an app to find out what is in a tar file, but
> exporting all of that information in a unified namespace is nontrivial
> and much more interesting.

I don't see how it is "nontrivial": "tar tf some.tar" is quite enough to
find out what is inside, in a customary format.

Placing random junk in the kernel doesn't magically make it fast, right, or
useful. Quite a lot of work is required for that, much more than for
getting the same in userland.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513


2004-09-02 23:28:33

by Jamie Lokier

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

Horst von Brand wrote:
> > > You really need archive support in find. At the very least you need
> > > option "enter archives" vs. "do not enter archives". Entering archives
> > > automagically is seriously wrong.
>
> I have used find(1) for quite some time now, and have never (or very
> rarely) missed this.

I've occasionally had the need to search all files on my system for
the one file which contains a particular phrase -- all I remember is
the phrase.

Just doing "grep -R" was a tedious job: at least half an hour.

Sometimes, I want to search all source files on my system for a
particular word, for example to search for uses of a particular system
call or library function.

That would require something that could search through all the .tar.gz
files and .zip files (nested if necessary) as well as plain files. It
would take so long -- hours at least, maybe more than a day -- that
I've never bothered doing such a thing.

"find "that entered archives really wouldn't help (although sometimes
"locate" that entered archives would be nice).

In other words, I'd use that capability if it was magically fast, but
as we expect it to be insanely slow (just grepping gigabytes is slow)
that makes it not so useful.

However, if we ever see that search engine index thing happen, it
would be a most excellent capability if it searched inside archive
files too. I would definitely use that. Not often, but occasionally I would.

-- Jamie

2004-09-02 23:58:05

by Alan

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

On Gwe, 2004-09-03 at 00:23, Jamie Lokier wrote:
> However, if we ever see that search engine index thing happen, it
> would be a most excellent capability if it searched inside archive
> files too. I would definitely use that. Not often, but occasionally I would.

Thats an indexer decision, the search backend (which is the performance
and complexity critical part) doesn't give a damn.

2004-09-03 02:14:43

by Spam

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4




> On Gwe, 2004-09-03 at 00:23, Jamie Lokier wrote:
>> However, if we ever see that search engine index thing happen, it
>> would be a most excellent capability if it searched inside archive
>> files too. I would definitely use that. Not often, but occasionally I would.

> Thats an indexer decision, the search backend (which is the performance
> and complexity critical part) doesn't give a damn.

I am just talking general now, but it seems to me that there have
been many suggestions on user-land solutions like shared librares
and so forth just to say there is no need for file streams and
plugins. Many of these ideas do exist in one way or another, but
none is truly system wide and and as application independent as
file streams+plugins would be. Would it not be much less effort to
implement these in a good way, than trying to reinvent lots of new
stuff in userland - that wouldn't be systemwide anyway?

~S



2004-09-03 13:52:31

by John Stoffel

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

>>>>> "Jamie" == Jamie Lokier <[email protected]> writes:

Jamie> Horst von Brand wrote:
>> > > You really need archive support in find. At the very least you need
>> > > option "enter archives" vs. "do not enter archives". Entering archives
>> > > automagically is seriously wrong.
>>
>> I have used find(1) for quite some time now, and have never (or very
>> rarely) missed this.

Jamie> I've occasionally had the need to search all files on my system
Jamie> for the one file which contains a particular phrase -- all I
Jamie> remember is the phrase.

So use glimpse and run an indexer once a night to pick up changes.

Jamie> Just doing "grep -R" was a tedious job: at least half an hour.

Ugh, esp since you're not caching any results of all that work either,
when you need to repeat 10 minutes down the line...

Jamie> Sometimes, I want to search all source files on my system for a
Jamie> particular word, for example to search for uses of a particular
Jamie> system call or library function.

I think we want to pull this back a bit more and define this better.
Instead of 'system' we should be thinking 'namespace' or 'hierarchy'.

Jamie> That would require something that could search through all the
Jamie> .tar.gz files and .zip files (nested if necessary) as well as
Jamie> plain files. It would take so long -- hours at least, maybe
Jamie> more than a day -- that I've never bothered doing such a thing.

Jamie> In other words, I'd use that capability if it was magically
Jamie> fast, but as we expect it to be insanely slow (just grepping
Jamie> gigabytes is slow) that makes it not so useful.

Jamie> However, if we ever see that search engine index thing happen,
Jamie> it would be a most excellent capability if it searched inside
Jamie> archive files too. I would definitely use that. Not often,
Jamie> but occasionally I would.

Is the overhead of pre-indexing all files and their contents worth it
though? And would the use of inotify allow us to efficiently update
the index in a fairly quick time?

All the various arguements have been that we need a filesystem which
magically indexes files, does queries and does it all atomically and
without any thought on the part of the developer/user. It could
happen, but then we'd have such a *slow* system in general, just to
make 1% of the usage simpler (for some measure of simple) that it
would be a waste.

Providing a simple, consistent set of semantics and syntax allows you
to build on top of it the layer(s) you need to provide the guarenttees
you need for your application.

I don't expect that Oracle has the same needs as does my mail spool,
or as an image store. Sure, Oracle could do both of them, but are
the overheads of oracle worth the slowdown in the common case?

John