2004-11-22 13:55:10

by Amit Gud

[permalink] [raw]
Subject: file as a directory

Hi people,

A straight forward question. Wouldn't adding a "file as a directory"
mechanism more logical in VFS itself, rather than having each fs (like
reiser4) to implement it seperately? My vision is to give archive-file
(.tar, .tar.gz, ...) support in the VFS itself, and of course
transparent to any fs and any user-land application. There are many
archive FSs around, but how feasible would it be to implement the
archive file support in the VFS at dentry-level? I'd be happy to share
my proposal.

AG
--
May the source be with you.


2004-11-22 14:41:52

by Al Viro

[permalink] [raw]
Subject: Re: file as a directory

On Mon, Nov 22, 2004 at 07:24:36PM +0530, Amit Gud wrote:
> I'd be happy to share my proposal.

So why don't you post your patches for review?

2004-11-22 14:50:56

by Martin Waitz

[permalink] [raw]
Subject: Re: file as a directory

hoi :)

On Mon, Nov 22, 2004 at 07:24:36PM +0530, Amit Gud wrote:
> A straight forward question. Wouldn't adding a "file as a directory"
> mechanism more logical in VFS itself, rather than having each fs (like
> reiser4) to implement it seperately?

wouldn't it be better if such things would be implemented in a library?
use gnome-vfs, or try to get a vfs layer into libc.
That way you can even support different and old kernels and all
filesystems.

The kernel already provides all methods that are neccessary to do that.
So there is no need to implement it in the kernel.

--
Martin Waitz


Attachments:
(No filename) (594.00 B)
(No filename) (189.00 B)
Download all attachments

2004-11-22 15:00:05

by Helge Hafting

[permalink] [raw]
Subject: Re: file as a directory

Amit Gud wrote:

>Hi people,
>
> A straight forward question. Wouldn't adding a "file as a directory"
>mechanism more logical in VFS itself, rather than having each fs (like
>reiser4) to implement it seperately? My vision is to give archive-file
>
>
Such support may happen for a few fs'es - people who
want this will then use those fses. Those who don't
like the ideas will use others.

>(.tar, .tar.gz, ...) support in the VFS itself, and of course
>transparent to any fs and any user-land application. There are many
>archive FSs around, but how feasible would it be to implement the
>archive file support in the VFS at dentry-level? I'd be happy to share
>my proposal.
>
>
>
You won't get .tar or .tar.gz support in the VFS, for a few simple reasons:
1. .tar and .tar.gz are complicated formats, and are therefore better
left to userland. You can get some of the same effect by using a shared
library that redefines fopen() and fread() though. It'll work fine for
the vast majority of apps that happens to use the C library.

It is hard to make a guaranteed bug-free decompressor that
is efficient and works with a finite amount of memory. The kernel
needs all that - userland doesn't.

2. Both .tar and .gz file formats may improve with time. Getting a new
version of tar og gunzip is easy enough - getting another compression
algorithm into the kernel won't be that easy.

3. Writing into a tar.gz file is surprisingly difficult from the kernel
side.
Userland may create a new temp file when you add to a .tar.gz.
Userland may assume that other processes aren't reading or writing
the .tar.gz as it isupdated. The kernel have no such luxuries.

I recommend looking at archived threads about file as directory,
you'll find many more arguments. Currently there is one kind
of support for archive files - loop mounts over files containing
filesystem images. These are not compressed though.

Helge Hafting

2004-11-22 17:26:03

by Martin Waitz

[permalink] [raw]
Subject: Re: file as a directory

hoi :)

On Mon, Nov 22, 2004 at 08:34:02AM -0700, Zan Lynx wrote:
> > The kernel already provides all methods that are neccessary to do that.
> > So there is no need to implement it in the kernel.
>
> There are already several things in filesystems that don't strictly
> belong inside the kernel. A filesystem could be implemented quite well
> as a user-space daemon that sat on top of the block device and
> communicated via sockets or shared memory just like an X server.

this is quite different.
As you need to enforce security policies when accessing the block
device, you have to move the filesystem into its own daemon.
You cannot do it in a library.
It is irrelevant for the application weather the fs resides in a
separate daemon or in the kernel itself.

But support of different views on files is something different.
You can do that in a library, you only need an interface that is
capable of storing your data. The kernel already provides that
interface.

--
Martin Waitz


Attachments:
(No filename) (988.00 B)
(No filename) (189.00 B)
Download all attachments

2004-11-22 17:34:59

by Tomas Carnecky

[permalink] [raw]
Subject: Re: file as a directory

Helge Hafting wrote:
> I recommend looking at archived threads about file as directory,
> you'll find many more arguments. Currently there is one kind
> of support for archive files - loop mounts over files containing
> filesystem images. These are not compressed though.

Isn't reiserfs trying to implement such things? I've read that in some
next version of reiserfs one will be able to open /etc/passwd/[username]
and get the informations about [username], like UID, GID, home
directory etc.

Still true? And when can we except such a version of reiserfs?

tom

2004-11-22 18:00:50

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: file as a directory

On Mon, 22 Nov 2004 19:24:36 +0530, Amit Gud said:

> A straight forward question. Wouldn't adding a "file as a directory"
> mechanism more logical in VFS itself,

There was quite the flame-fest on the lkml a while back regarding
how the semantics of "file as a directory" should operate. There's
a number of really nasty corner cases that you need to deal with.

Go back and re-read the whole flame-fest, understand all the points
raised, and let us know when you have a workable proposal.

(Hint - "file as directory" broke a number of programs that didn't
expect that a file *could* be a directory, when run on a reiser4
filesystem...)


Attachments:
(No filename) (226.00 B)

2004-11-22 18:20:13

by Jan Engelhardt

[permalink] [raw]
Subject: Re: file as a directory

>> A straight forward question. Wouldn't adding a "file as a directory"
>> mechanism more logical in VFS itself, rather than having each fs (like
>> reiser4) to implement it seperately?
>
>wouldn't it be better if such things would be implemented in a library?
>use gnome-vfs, or try to get a vfs layer into libc.
>That way you can even support different and old kernels and all
>filesystems.
>
>The kernel already provides all methods that are neccessary to do that.
>So there is no need to implement it in the kernel.


*cough* FUSE... *cough*




Jan Engelhardt
--
Gesellschaft für Wissenschaftliche Datenverarbeitung
Am Fassberg, 37077 Göttingen, http://www.gwdg.de

2004-11-22 18:26:28

by Jan Engelhardt

[permalink] [raw]
Subject: Re: file as a directory

>Go back and re-read the whole flame-fest, understand all the points
>raised, and let us know when you have a workable proposal.
>
>(Hint - "file as directory" broke a number of programs that didn't
>expect that a file *could* be a directory, when run on a reiser4
>filesystem...)

So let's keep it to reiser4 as to not wildly break programs running on other
filesystems used in "stable kernels". I'm saying that everybody who runs a R4
FS knows that apps might break, and thus is responsible for making them ready
for reiser4. (Or ask the prog's maintainer.)



Jan Engelhardt
--
Gesellschaft für Wissenschaftliche Datenverarbeitung
Am Fassberg, 37077 Göttingen, http://www.gwdg.de

2004-11-22 18:51:41

by Hans Reiser

[permalink] [raw]
Subject: Re: file as a directory

Tomas Carnecky wrote:

> Helge Hafting wrote:
>
>> I recommend looking at archived threads about file as directory,
>> you'll find many more arguments. Currently there is one kind
>> of support for archive files - loop mounts over files containing
>> filesystem images. These are not compressed though.
>
>
> Isn't reiserfs trying to implement such things? I've read that in some
> next version of reiserfs one will be able to open
> /etc/passwd/[username] and get the informations about [username],
> like UID, GID, home directory etc.


>
> Still true? And when can we except such a version of reiserfs?
>
> tom
>
>
It was more that we said we would like to implement the functionality
necessary for doing that (e.g. inheritance), not that we would
specifically do /etc/passwd. And that functionality will trickle in
over time.

Hans

2004-11-22 18:56:00

by Hans Reiser

[permalink] [raw]
Subject: Re: file as a directory

[email protected] wrote:

>
>(Hint - "file as directory" broke a number of programs that didn't
>expect that a file *could* be a directory, when run on a reiser4
>filesystem...)
>
>
It broke extraordinarily few.

2004-11-23 05:01:01

by Jan Engelhardt

[permalink] [raw]
Subject: Re: file as a directory

>>(Hint - "file as directory" broke a number of programs that didn't
>>expect that a file *could* be a directory, when run on a reiser4
>>filesystem...)
>
>It broke extraordinarily few.

(The fewer the better.)

That's good news, and frankly, I did not expect anything else. That's because
either programs definitely know that "it" is a file/directory because they just
mkdir'ed or so, or they implement correct error checks, e.g. the user just
created a directory and we check back (i.e. race protection).

What I am worried about is the opendir() libc call, which AFAIK does this:
fd = open("directory", myflags | O_DIRECTORY)

OTOH, I'm not worried, because it should be the user's duty to check whether
directory really is one or not. Anything else is sloppy programming.
(Exception: taking argv[xx] from the user)


Cheers,
Jan Engelhardt
--
Gesellschaft für Wissenschaftliche Datenverarbeitung
Am Fassberg, 37077 Göttingen, http://www.gwdg.de

2004-11-22 16:30:25

by Zan Lynx

[permalink] [raw]
Subject: Re: file as a directory

On Mon, 2004-11-22 at 15:37 +0100, Martin Waitz wrote:
> hoi :)
>
> On Mon, Nov 22, 2004 at 07:24:36PM +0530, Amit Gud wrote:
> > A straight forward question. Wouldn't adding a "file as a directory"
> > mechanism more logical in VFS itself, rather than having each fs (like
> > reiser4) to implement it seperately?
>
> wouldn't it be better if such things would be implemented in a library?
> use gnome-vfs, or try to get a vfs layer into libc.
> That way you can even support different and old kernels and all
> filesystems.
>
> The kernel already provides all methods that are neccessary to do that.
> So there is no need to implement it in the kernel.

There are already several things in filesystems that don't strictly
belong inside the kernel. A filesystem could be implemented quite well
as a user-space daemon that sat on top of the block device and
communicated via sockets or shared memory just like an X server.

So, the argument that because it could be implemented in userspace that
it should be implemented in userspace is not automatically true.
--
Zan Lynx <[email protected]>


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2004-11-23 06:25:43

by Amit Gud

[permalink] [raw]
Subject: Re: file as a directory

On Mon, 22 Nov 2004 16:04:28 +0100, Helge Hafting <[email protected]> wrote:

> You won't get .tar or .tar.gz support in the VFS, for a few simple reasons:
> 1. .tar and .tar.gz are complicated formats, and are therefore better
> left to userland.

Agreed that .tar.gz is a complicated format, but zlib is already in
the kernel. It _should_ simplify inflate and deflate of files. And as
compared to .gz format, .tar is much simpler, I guess.

>
> It is hard to make a guaranteed bug-free decompressor that
> is efficient and works with a finite amount of memory. The kernel
> needs all that - userland doesn't.

I think, finite amount of memory is the concern of worry, not the rest
... if we could rely on zlib.

> 2. Both .tar and .gz file formats may improve with time. Getting a new
> version of tar og gunzip is easy enough - getting another compression
> algorithm into the kernel won't be that easy.

Doesn't zlib in the kernel gets updated as the formats change? If not,
.tar formats would be worth trying first as proof of concept.

AG
--
May the source be with you.

2004-11-23 09:11:32

by Dirk Steinberg

[permalink] [raw]
Subject: Re: file as a directory

On Monday 22 November 2004 19:52, Hans Reiser wrote:
> [email protected] wrote:
> >(Hint - "file as directory" broke a number of programs that didn't
> >expect that a file *could* be a directory, when run on a reiser4
> >filesystem...)
>
> It broke extraordinarily few.

I know from personal experience that it *does* break Acrobat Reader,
which, unfortunately, is binary-only and also a programm that I
use quite often. For me this means I cannot use reiser4 (as root fs anyway)
without metas disabled.

How about making metas a mount option? Right now disabling metas
requires patching the source.

/Dirk Steinberg

2004-11-23 09:37:53

by Markus Tornqvist

[permalink] [raw]
Subject: Re: file as a directory

On Tue, Nov 23, 2004 at 10:11:21AM +0100, Dirk Steinberg wrote:
>
>How about making metas a mount option? Right now disabling metas
>requires patching the source.

Isn't there -o nopseudo already?

--
mjt

2004-11-23 09:47:31

by Amit Gud

[permalink] [raw]
Subject: Re: file as a directory

On Mon, 22 Nov 2004 20:05:25 +0100 (MET), Jan Engelhardt
<[email protected]> wrote:
> >>(Hint - "file as directory" broke a number of programs that didn't
> >>expect that a file *could* be a directory, when run on a reiser4
> >>filesystem...)
> >
> >It broke extraordinarily few.
>
> (The fewer the better.)
>
> That's good news, and frankly, I did not expect anything else. That's because
> either programs definitely know that "it" is a file/directory because they just
> mkdir'ed or so, or they implement correct error checks, e.g. the user just
> created a directory and we check back (i.e. race protection).
>

Correct me if I'm wrong, but the best way I know whether a file should
be treated as directory or as a file (atleast how I've implemented it)
depends upon the context (how the file is accessed) in the user-space
and this context is reflected in the kernel space in the flags of the
struct nameidata. So ...

----
/* check if the archive is a path component or if last
component with slash */
flags = (nd->flags & LOOKUP_CONTINUE) || (nd->flags & LOOKUP_DIRECTORY);
if(flags)
/* directory */
else
/* file */

----

> What I am worried about is the opendir() libc call, which AFAIK does this:
> fd = open("directory", myflags | O_DIRECTORY)
>

No more worries! Am I missing something?

AG
--
May the source be with you.

2004-11-23 14:01:54

by Jan Engelhardt

[permalink] [raw]
Subject: Re: file as a directory

>Correct me if I'm wrong, but the best way I know whether a file should
>be treated as directory or as a file (atleast how I've implemented it)
>depends upon the context (how the file is accessed) in the user-space
>and this context is reflected in the kernel space in the flags of the
>struct nameidata. So ...

And there I see a problem! The open() call (kernel: sys_open) allows to open
both files and directories in the standard operation.
There is the O_DIRECTORY user-space flag, but which only says "it must be a
directory". So there's something missing to say "must be a file".

Hell will freeze over if a reiser4 "object" can be ANY type, blockdev,
chardev, symlink, <think something up>.


Jan Engelhardt
--
Gesellschaft für Wissenschaftliche Datenverarbeitung
Am Fassberg, 37077 Göttingen, http://www.gwdg.de

2004-11-23 14:18:29

by Amit Gud

[permalink] [raw]
Subject: Re: file as a directory

On Tue, 23 Nov 2004 15:00:37 +0100 (MET), Jan Engelhardt
<[email protected]> wrote:
> >Correct me if I'm wrong, but the best way I know whether a file should
> >be treated as directory or as a file (atleast how I've implemented it)
> >depends upon the context (how the file is accessed) in the user-space
> >and this context is reflected in the kernel space in the flags of the
> >struct nameidata. So ...
>
> And there I see a problem! The open() call (kernel: sys_open) allows to open
> both files and directories in the standard operation.
> There is the O_DIRECTORY user-space flag, but which only says "it must be a
> directory". So there's something missing to say "must be a file".
>
> Hell will freeze over if a reiser4 "object" can be ANY type, blockdev,
> chardev, symlink, <think something up>.
>

Of course, I check before-hand if the file is archive (.tar, for now).
And then if the appropriate flag is set...treat it as a directory, or
else leave it. Again, if tar format looks as expected, support it or
else leave it.


AG
--
May the source be with you.

2004-11-23 19:11:20

by Hans Reiser

[permalink] [raw]
Subject: Re: file as a directory

Dirk Steinberg wrote:

>How about making metas a mount option?
>
That was always the intention. I thought it got implemented, sigh, my
guys need smaller todo lists....

>Right now disabling metas
>requires patching the source.
>
>/Dirk Steinberg
>
>
You mean, enabling it requires changing a #define (if you are using the
latest). We changed that after bugs were found in the implementation
that could cause crashes.

Hans

2004-11-24 09:19:14

by Peter Foldiak

[permalink] [raw]
Subject: Re: file as a directory

On Mon, 2004-11-22 at 18:48, Hans Reiser wrote:
> Tomas Carnecky wrote:
> > Helge Hafting wrote:
> >> I recommend looking at archived threads about file as directory,
> >> you'll find many more arguments. Currently there is one kind
> >> of support for archive files - loop mounts over files containing
> >> filesystem images. These are not compressed though.
> >
> > Isn't reiserfs trying to implement such things? I've read that in some
> > next version of reiserfs one will be able to open
> > /etc/passwd/[username] and get the informations about [username],
> > like UID, GID, home directory etc.
> > Still true? And when can we except such a version of reiserfs?
> > tom
> >
> It was more that we said we would like to implement the functionality
> necessary for doing that (e.g. inheritance), not that we would
> specifically do /etc/passwd. And that functionality will trickle in
> over time.
>
> Hans


I think something like

/etc/passwd/[username]

would be a really nice extension. The idea is more general, it would
unify the namespace for file selection and part-of-file selection.

So if you have a file named "/home/peter/book", you should be able to
look at its Introduction as "/home/peter/book/Introduction" or chapter
3, paragraph 2 as
/home/peter/book/chapter[3]/paragraph[2]

(this may not be the ideal syntax, but something like this should be
good.)

In this case you could use
/home/peter/book/chapter[3]/paragraph[2]
as a "real" file, read it, even edit it in a text editor. When you later
look at the whole book as /home/peter/book , you should see your
changes.

The idea is related to the W3C "XPath"
http://www.w3.org/TR/xpath
http://www.w3schools.com/xpath/
XPath only applies within XML files, but it is mostly a superset of
current Unix file naming, so it would be natural to unify the two
namespaces. The idea is that you would use the same syntax for selecting
file and part-of-file in a way that the user doesn't even need to know
at which level they "cross over" to selecting within the file.
In the above example, it may be that each chapter is stored in a
separate "real" file. We don't even need to know. In a sense, the whole
concept of a "file" becomes a bit blurred. (What we need to think about
is which fragments to store metadata for, and which fragments should
simply inherit their metadata from the parent.)
Hans' tree-based file system is really ideal for this kind of thing.
I am not sure how closely we would want to follow the XPath syntax (it
has some nice ideas), but it should then be easy to write an plugin
doing XPath really efficiently on top of this.(And it could be made to
work for non-XML files as well.)

I would really like to implement this for the next version of Hans' file
system.
Peter

2004-11-24 10:32:55

by Helge Hafting

[permalink] [raw]
Subject: Re: file as a directory

Amit Gud wrote:

>On Mon, 22 Nov 2004 16:04:28 +0100, Helge Hafting <[email protected]> wrote:
>
>
>
>>You won't get .tar or .tar.gz support in the VFS, for a few simple reasons:
>>1. .tar and .tar.gz are complicated formats, and are therefore better
>> left to userland.
>>
>>
>
>Agreed that .tar.gz is a complicated format, but zlib is already in
>the kernel. It _should_ simplify inflate and deflate of files. And as
>compared to .gz format, .tar is much simpler, I guess.
>
>
>
>> It is hard to make a guaranteed bug-free decompressor that
>> is efficient and works with a finite amount of memory. The kernel
>> needs all that - userland doesn't.
>>
>>
>
>I think, finite amount of memory is the concern of worry, not the rest
>... if we could rely on zlib.
>
>
>
>>2. Both .tar and .gz file formats may improve with time. Getting a new
>> version of tar og gunzip is easy enough - getting another compression
>> algorithm into the kernel won't be that easy.
>>
>>
>
>Doesn't zlib in the kernel gets updated as the formats change? If not,
>.tar formats would be worth trying first as proof of concept.
>
This is not so easy, as you have to audit the new version for
correctness. It is not the end of the world if tar or gzip
occationally crashes on some corner case. The kernel
must not do that though.

And then there is the much more complicated issues when
writing into such an archive. You skipped that part, or
are you looking for a read-only solution only?

Helge Hafting

2004-11-24 11:07:26

by Amit Gud

[permalink] [raw]
Subject: Re: file as a directory

On Wed, 24 Nov 2004 11:32:13 +0100, Helge Hafting <[email protected]> wrote:
> Amit Gud wrote:
>
>
>
> >On Mon, 22 Nov 2004 16:04:28 +0100, Helge Hafting <[email protected]> wrote:
> >
> >
> >
> >>You won't get .tar or .tar.gz support in the VFS, for a few simple reasons:
> >>1. .tar and .tar.gz are complicated formats, and are therefore better
> >> left to userland.
> >>
> >>
> >
> >Agreed that .tar.gz is a complicated format, but zlib is already in
> >the kernel. It _should_ simplify inflate and deflate of files. And as
> >compared to .gz format, .tar is much simpler, I guess.
> >
> >
> >
> >> It is hard to make a guaranteed bug-free decompressor that
> >> is efficient and works with a finite amount of memory. The kernel
> >> needs all that - userland doesn't.
> >>
> >>
> >
> >I think, finite amount of memory is the concern of worry, not the rest
> >... if we could rely on zlib.
> >
> >
> >
> >>2. Both .tar and .gz file formats may improve with time. Getting a new
> >> version of tar og gunzip is easy enough - getting another compression
> >> algorithm into the kernel won't be that easy.
> >>
> >>
> >
> >Doesn't zlib in the kernel gets updated as the formats change? If not,
> >.tar formats would be worth trying first as proof of concept.
> >
> This is not so easy, as you have to audit the new version for
> correctness. It is not the end of the world if tar or gzip
> occationally crashes on some corner case. The kernel
> must not do that though.
>

Yes, thats what I said in my last post...if the archive looks improper
forget it.

> And then there is the much more complicated issues when
> writing into such an archive. You skipped that part, or
> are you looking for a read-only solution only?
>

I'm coming up with something soon, along with the proof of
concept....to wrap up all scenarios....need some time ;)

AG

2004-11-24 14:16:15

by Jan Engelhardt

[permalink] [raw]
Subject: Re: file as a directory

>I think something like
>
>/etc/passwd/[username]
>
>would be a really nice extension. The idea is more general, it would
>unify the namespace for file selection and part-of-file selection.

Yeah, and where will you do that? (Possible answers are: kernel space, user
space).

I honestly vote against *these* kinds of plugins (i.e. reading .tar files,
/etc, and such). For one, it is to be done in kernel space, which means the
module code can not be swapped out. Debugging is more complex, segfaults will
kill the machine -- thus it's more open to blackhat hackers.

Also simply because it (the module code) would be a reinvention of wheel, it's
all been written before.




Jan Engelhardt
--
Gesellschaft für Wissenschaftliche Datenverarbeitung
Am Fassberg, 37077 Göttingen, http://www.gwdg.de

2004-11-24 15:04:30

by Paolo Ciarrocchi

[permalink] [raw]
Subject: Re: file as a directory

On 24 Nov 2004 09:16:03 +0000, Peter Foldiak
<[email protected]> wrote:
[...]
> I would really like to implement this for the next version of Hans' file
> system.

I don't undersand how you want to use Xpath for not XML file.
I agree with you that the idea behind Xpath is cool but I fail to
unserstand how it can be applied to anything but XML

--
Paolo
Picasa users groups: http://www.picasa-users.tk
join the blog group: http://groups-beta.google.com/group/blog-users

2004-11-24 16:14:23

by Christian Mayrhuber

[permalink] [raw]
Subject: Re: file as a directory

On Wednesday 24 November 2004 16:02, Paolo Ciarrocchi wrote:
> On 24 Nov 2004 09:16:03 +0000, Peter Foldiak
> <[email protected]> wrote:
> [...]
> > I would really like to implement this for the next version of Hans' file
> > system.
>
> I don't undersand how you want to use Xpath for not XML file.
> I agree with you that the idea behind Xpath is cool but I fail to
> unserstand how it can be applied to anything but XML
>
> --
> Paolo
> Picasa users groups: http://www.picasa-users.tk
> join the blog group: http://groups-beta.google.com/group/blog-users
>

Apache Cocoon uses so called generators to parse non-XML formats and produce a
XML representation thereof. This XML can be addressed by XPath.
To store modifications back this XML needs to be serialized to the original
format. That's **very** fat and slow.

Maybe an automount with a special fuse filesystem could accomplish this.

For example:
# cd /etc/passwd/..metas/contents/

automounts /etc/passwd as "fuse-xpath-passwd" fs
to /etc/passwd/..metas/contents/

Doing 'cat /etc/passwd/..metas/contents/shell[username = "joe"]' could work
then.

Reiser4 would need a content-mount plugin that automounts
the respective file by means of a per file configureable
mount command. Something like
# cat /etc/passwd/..metas/plugins/content-mount
-t fuse-xpath-passwd -o ro

fuse-xpath-* filesystems would have to be written. These could be designed
similiar to the apache cocoon approach of generators/serializers to
work with an intermediate XML representation of the file interior.

All the stuff besides mounting the fuse-xpath-fs's would happen in userspace.
I don't think that anyone can guarantee posix fs semantics by this approach.

--
lg, Chris

2004-11-24 18:49:24

by Peter Foldiak

[permalink] [raw]
Subject: Re: file as a directory

On Wed, 2004-11-24 at 15:02, Paolo Ciarrocchi wrote:
> On 24 Nov 2004 09:16:03 +0000, Peter Foldiak
> <[email protected]> wrote:
> [...]
> > I would really like to implement this for the next version of Hans' file
> > system.
>
> I don't undersand how you want to use Xpath for not XML file.
> I agree with you that the idea behind Xpath is cool but I fail to
> unserstand how it can be applied to anything but XML

My message was mainly about XML, for which it is easy.
For non-XML, you need some other way of knowing the file format. The
example that originally came up in this thread was

/etc/passwd/[username]

In this case, the passwd file has a known format.
Other file types, like LaTex, html, jpeg also have (at least partially)
known formats. Some selection should be possible even for unknown
formats (e.g. byte range, line-range). There could also be some way of
specifying a new format but I don't know how to do this well. You could
give names (like filenames) to parts of files.
But I think the first step would be to concentrate on XML, and worry
about the rest later. Peter

2004-11-26 18:59:39

by Hans Reiser

[permalink] [raw]
Subject: Re: file as a directory

Peter Foldiak wrote:

>
>The problem with the
>cat /etc/passwd/..metas/contents/shell[username = "joe"]
>syntax is that it doesn't really achieve namespace unification.
>
>
>
>
>
>
For the case Peter cites, yes, it does add clutter to the pathname to
say "..metas" (actually, it is "...." now in the current reiser4, not
"..metas"). This is because you aren't looking for metafile
information, you are looking for a subset and describing the subset, and
that just requires a file-directory plugin that can handle the name of
that subset and parse the file to find it.

2004-11-26 19:00:48

by Hans Reiser

[permalink] [raw]
Subject: Re: file as a directory

Peter Foldiak wrote:

>On Wed, 2004-11-24 at 15:02, Paolo Ciarrocchi wrote:
>
>
>>On 24 Nov 2004 09:16:03 +0000, Peter Foldiak
>><[email protected]> wrote:
>>[...]
>>
>>
>>>I would really like to implement this for the next version of Hans' file
>>>system.
>>>
>>>
>>I don't undersand how you want to use Xpath for not XML file.
>>I agree with you that the idea behind Xpath is cool but I fail to
>>unserstand how it can be applied to anything but XML
>>
>>
>
>My message was mainly about XML, for which it is easy.
>For non-XML, you need some other way of knowing the file format. The
>example that originally came up in this thread was
>
>/etc/passwd/[username]
>
>In this case, the passwd file has a known format.
>Other file types, like LaTex, html, jpeg also have (at least partially)
>known formats. Some selection should be possible even for unknown
>formats (e.g. byte range, line-range). There could also be some way of
>specifying a new format but I don't know how to do this well. You could
>give names (like filenames) to parts of files.
>But I think the first step would be to concentrate on XML, and worry
>about the rest later. Peter
>
>
>
>
>
I think Peter is right. It would be nice to have an interpreter for
each of the common file formats, and XML is just the biggest one.

2004-11-26 21:05:36

by Bodo Eggert

[permalink] [raw]
Subject: Re: file as a directory

Martin Waitz wrote:
> On Mon, Nov 22, 2004 at 08:34:02AM -0700, Zan Lynx wrote:

>> There are already several things in filesystems that don't strictly
>> belong inside the kernel. A filesystem could be implemented quite well
>> as a user-space daemon that sat on top of the block device and
>> communicated via sockets or shared memory just like an X server.
>
> this is quite different.
> As you need to enforce security policies when accessing the block
> device, you have to move the filesystem into its own daemon.
> You cannot do it in a library.

Only the mapping from block-nr to uid/gid/perms needs to be in the kernel.
The rest can be done in userspace, but it would be ugly as hell.

> It is irrelevant for the application weather the fs resides in a
> separate daemon or in the kernel itself.

ACK.

> But support of different views on files is something different.
> You can do that in a library, you only need an interface that is
> capable of storing your data. The kernel already provides that
> interface.

If you want to allow users to set their default shell using some extension,
a simple userspace library will not do the job. You'll need a central
authority that is able to synchronize the access to the file, prevent
unauthorized modifications, do the caching etc.

I think the special file handlers should generally be daemons, but the
access should be controlled by kernel hooks, maybe something like
automounting a userspace filesystem. Simple meta-filesystems, e.g. those
that could be "emulated" using mount -oloop, may reside in the kernel
space, more complicated but common ones may be stored in the kernel
(placed on the initramfs?) to accelerate starting the helper daemon and
uncommon ones would be registered at runtime (maybe user-specific).

Having functions in the kernel to support those filesystems in the kernel
will help, e.g. a tgz helper daemon would need to allocate temporary
storage for accelerating access (e.g. file tree, cache) as well as means
to reassemble the tgz for operations on the whole file after write
operations occured. Meta-filesystems on fs without these features can
(off cause) be done, but I think they would be very slow or show
inconsitencies in certain situations.
--
Bug? That's not a bug, that's a feature.

2004-11-26 21:18:32

by Christian Mayrhuber

[permalink] [raw]
Subject: Re: file as a directory

On Friday 26 November 2004 19:19, Hans Reiser wrote:
> For the case Peter cites, yes, it does add clutter to the pathname to
> say "..metas" (actually, it is "...." now in the current reiser4, not
> "..metas"). This is because you aren't looking for metafile
> information, you are looking for a subset and describing the subset, and
> that just requires a file-directory plugin that can handle the name of
> that subset and parse the file to find it.
>

Regarding namespace unification + XPath:
For files: cat /etc/passwd/[. = "joe"] should work like in XPath.
But what to do with directories?
Would 'cat /etc/[. = "passwd"]' output the contents of the passwd file
or does it mean to output the file '[. = "passwd"]'?
If the first is the case then you have to prohibit filenames looking
like '[foo bar]'.

If the shells wouldn't like * for themself, I'd suggest something like
cat /etc/*[. = "passwd"]
This means: list all contents and show the ones where filename = "passwd".

For the contents of /etc/passwd the following could become possible:
'cat /etc/passwd/*[. = "joe"]
'cat /etc/passwd/*[@shell = "/bin/tcsh"]
The XPath could behave similiar as if applied to the following XML:
<entries>
<root passwd="x" shell="/bin/sh" .... />
...
<joe passwd="x" shell="/bin/tcsh" uid="500" gid="500" .... />
</entries>
The output from the cat's above return the line of joe's entry:
joe:x:500:500:joe:/home/joe:/bin/tcsh

To change all tcsh entries to bash:
echo -n "/bin/bash" > /etc/passwd/*[@shell = "/bin/tcsh"]/@shell

I hope I'm not offending, but my impression is now that
XPath stuff fits better into some shell providing
a XPath view of the filesystem, than into the kernel.

--------------------------------------------------------------------

What about mapping the contents of files into "pure" posix namespace?
XML is basically a tree, too.
Notes:
1) "...." below is the entry to reiser4 namespace.
2) # denotes a shell command
For example:

# cd /etc/passwd/
# ls -a *
. .. .... joe root
# cd joe
# ls
gid home passwd shell uid
# cat shell
/bin/tcsh
# cd ../....
# ls
plugins

I guess an implementation in reiser4 would require some
mime-type/file extension dispatcher plus a special
directory plugin for each mime-type.

--
lg, Chris

2004-11-26 22:55:32

by Peter Foldiak

[permalink] [raw]
Subject: Re: file as a directory

On Wed, 2004-11-24 at 16:11, Christian Mayrhuber wrote:
> On Wednesday 24 November 2004 16:02, Paolo Ciarrocchi wrote:
> > On 24 Nov 2004 09:16:03 +0000, Peter Foldiak
> > <[email protected]> wrote:
> > [...]
> > > I would really like to implement this for the next version of Hans' file
> > > system.
> >
> > I don't undersand how you want to use Xpath for not XML file.
> > I agree with you that the idea behind Xpath is cool but I fail to
> > unserstand how it can be applied to anything but XML
> > Paolo
>
> Apache Cocoon uses so called generators to parse non-XML formats and produce a
> XML representation thereof. This XML can be addressed by XPath.
> To store modifications back this XML needs to be serialized to the original
> format. That's **very** fat and slow.
>
> Maybe an automount with a special fuse filesystem could accomplish this.
>
> For example:
> # cd /etc/passwd/..metas/contents/
>
> automounts /etc/passwd as "fuse-xpath-passwd" fs
> to /etc/passwd/..metas/contents/
>
> Doing 'cat /etc/passwd/..metas/contents/shell[username = "joe"]' could work
> then.
>
> Reiser4 would need a content-mount plugin that automounts
> the respective file by means of a per file configureable
> mount command. Something like
> # cat /etc/passwd/..metas/plugins/content-mount
> -t fuse-xpath-passwd -o ro
>
> fuse-xpath-* filesystems would have to be written. These could be designed
> similiar to the apache cocoon approach of generators/serializers to
> work with an intermediate XML representation of the file interior.
>
> All the stuff besides mounting the fuse-xpath-fs's would happen in userspace.
> I don't think that anyone can guarantee posix fs semantics by this approach.

The problem with the
cat /etc/passwd/..metas/contents/shell[username = "joe"]
syntax is that it doesn't really achieve namespace unification.
As far as I understand the benefits of a unified namespace are due to
the user and the applications not having to know the details of what
they are dealing with. So, for instance, the nice thing about the
unification of files and devices in Unix is that an application (most
often) can treat a device in the same way as a file (or a pipe, etc.).
This is what gives it real flexibility.
The above syntax assumes you know exactly where the file "ends", and
where the parts of the file "begins" (indicated by the ..metas in your
path). Couldn't we get rid of ..metas from the path?

Also, what I am suggesting is not just to be able to select inside XML
files but also to extend XPath-like selection ABOVE the file level too,
to be used as if the whole file system was like a single big virtual XML
file.
Peter

2004-11-26 23:49:50

by Pavel Machek

[permalink] [raw]
Subject: Re: file as a directory

Hi!

> Such support may happen for a few fs'es - people who
> want this will then use those fses. Those who don't
> like the ideas will use others.
>
> >(.tar, .tar.gz, ...) support in the VFS itself, and of course
> >transparent to any fs and any user-land application. There are many
> >archive FSs around, but how feasible would it be to implement the
> >archive file support in the VFS at dentry-level? I'd be happy to share
> >my proposal.
> >
> >
> >
> You won't get .tar or .tar.gz support in the VFS, for a few simple reasons:
> 1. .tar and .tar.gz are complicated formats, and are therefore better
> left to userland. You can get some of the same effect by using a shared
> library that redefines fopen() and fread() though. It'll work fine for
> the vast majority of apps that happens to use the C library.

It is not same effect -- with shared library you get no caching. And
that hurts a lot.

> It is hard to make a guaranteed bug-free decompressor that
> is efficient and works with a finite amount of memory. The kernel
> needs all that - userland doesn't.

If you have bug in decompressor, you are screwed, anyway, because you
get remote user exploit when mozilla gets the file from
web. Oops. [Ok, you at least do not get remote root exploit, but...]

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-27 07:02:00

by Hans Reiser

[permalink] [raw]
Subject: Re: file as a directory

Peter Foldiak wrote:

>
>
>I would really like to implement this for the next version of Hans' file
>system.
> Peter
>
>
>
>
>
It would be cool.

2004-11-27 11:09:20

by Peter Foldiak

[permalink] [raw]
Subject: Re: file as a directory

On Fri, 2004-11-26 at 21:13, Christian Mayrhuber wrote:
> Regarding namespace unification + XPath:
> For files: cat /etc/passwd/[. = "joe"] should work like in XPath.

I don't understand this. Why would you need the "."? And why the /
between passwd and [ ?
I would prefer (as was suggested earlier in the thread) that the syntax
for the entry for joe in the passwd file should be

/etc/passwd/joe

and the if you want to select joe's full name, then it should be

/etc/passwd/joe/fullname

and it would result in e.g. "Joe Smith" and

/etc/passwd/joe/shell

whould be the shell joe uses.

Of course something would have to tell the system that the passwd file
has one line for each user, and that fields within a line are separated
by ":"s and what the names of the fields are (e.g. the first field is
called "user"). And when you select inside the /etc/passwd file (or this
type of file) then the default interpretation of joe is that "user"
should be used to select the line.

So by default, /etc/passwd/joe should be equivalent to /etc/passwd[user
= "joe"]

But you should be able to select based on fullname too:

/etc/passwd[fullname = "Joe Smith"]

and

/etc/passwd[shell = "/bin/bash"]/user

should give you the user names of all users whose shell is /bin/bash,
right?

> But what to do with directories?
> Would 'cat /etc/[. = "passwd"]' output the contents of the passwd file
> or does it mean to output the file '[. = "passwd"]'?

I don't really see the point of this . = "passwd" syntax.

> If the first is the case then you have to prohibit filenames looking
> like '[foo bar]'.

I think so.

> If the shells wouldn't like * for themself, I'd suggest something like
> cat /etc/*[. = "passwd"]
> This means: list all contents and show the ones where filename = "passwd".

but isn't that just /etc/passwd ? why complicate it?

> For the contents of /etc/passwd the following could become possible:
> 'cat /etc/passwd/*[. = "joe"]
> 'cat /etc/passwd/*[@shell = "/bin/tcsh"]

no, I think it should be

cat /etc/passwd[shell = "bin/tcsh"]

why the *? and why the @? an attribute? why not simply a part of the
line with / ?

> To change all tcsh entries to bash:
> echo -n "/bin/bash" > /etc/passwd/*[@shell = "/bin/tcsh"]/@shell

I would say

echo -n "/bin/bash" > /etc/passwd[shell = "/bin/tcsh"]/shell


> What about mapping the contents of files into "pure" posix namespace?

Yes, that is exactly what I am suggesting, except that I would like to
extend the POSIX syntax by stealing some useful syntactic bits from
XPath.

> XML is basically a tree, too.
> Notes:
> 1) "...." below is the entry to reiser4 namespace.
> 2) # denotes a shell command
> For example:
>
> # cd /etc/passwd/
> # ls -a *
> . .. .... joe root
> # cd joe
> # ls
> gid home passwd shell uid

yes, but where is the username? that would be the first one listed here,
right?

> # cat shell
> /bin/tcsh
> # cd ../....
> # ls
> plugins
>
> I guess an implementation in reiser4 would require some
> mime-type/file extension dispatcher plus a special
> directory plugin for each mime-type.

2004-11-27 12:49:59

by Markus Tornqvist

[permalink] [raw]
Subject: Re: file as a directory

On Fri, Nov 26, 2004 at 10:19:57AM -0800, Hans Reiser wrote:

>For the case Peter cites, yes, it does add clutter to the pathname to
>say "..metas" (actually, it is "...." now in the current reiser4, not
>"..metas"). This is because you aren't looking for metafile

"...." shound like something that could be an alias for ../..
so not much better than reserving the word "metas" from the namespace.

I guess I'll still go with ..metas here, as it's the best compromise
showed. Or maybe even ..meta (as there is no need for the plural imo)

Just re-opening a damned useless, old, tired and daft can of worms :P

--
mjt

2004-11-27 13:14:10

by Christian Mayrhuber

[permalink] [raw]
Subject: Re: file as a directory

On Saturday 27 November 2004 12:09, Peter Foldiak wrote:
> On Fri, 2004-11-26 at 21:13, Christian Mayrhuber wrote:
> > Regarding namespace unification + XPath:
> > For files: cat /etc/passwd/[. = "joe"] should work like in XPath.
>
> I don't understand this. Why would you need the "."? And why the /
> between passwd and [ ?
Yes, I was confused by /etc/passwd/[username] in an earlier email.
I think we both mean basically the same.

> /etc/passwd/joe/shell
>
> whould be the shell joe uses.
Yes.

> So by default, /etc/passwd/joe should be equivalent to /etc/passwd[user
> = "joe"]
Yes.
/etc/passwd/joe/shell would be equivalent to
/etc/passwd[shell = "/bin/bash"]/joe/shell if joe has bash as shell, right?

>
> But you should be able to select based on fullname too:
>
> /etc/passwd[fullname = "Joe Smith"]
Ok.
This means that any XPath like expression will need to return a directory
entry representing a restricted view on the /etc/passwd contents.
The result of 'ls /etc/passwd[fullname = "Joe Smith"]' would be alike
'drwxr-xr-x 2 root root 48 2004-11-27 13:48 joe', right?

>
> and
>
> /etc/passwd[shell = "/bin/bash"]/user
>
> should give you the user names of all users whose shell is /bin/bash,
> right?
I'm confused again.
I expected 'ls /etc/passwd[shell = "/bin/bash"]/user' to give you the
passwd entries of "user" and 'ls /etc/passwd[shell = "/bin/bash"]/' to
give the users that have a bash shell.


> > # cd /etc/passwd/
> > # ls -a *
> > . .. .... joe root
> > # cd joe
> > # ls
> > gid home passwd shell uid
>
> yes, but where is the username? that would be the first one listed here,
> right?
joe is the username and a directory. There is no username entry in the
joe directory, because the username is already in the directory.
You can rename a user by renaming joe. 'mv joe newname'.

--
lg, Chris

2004-11-28 18:49:04

by Helge Hafting

[permalink] [raw]
Subject: Re: file as a directory

On Fri, Nov 26, 2004 at 12:09:37AM +0100, Pavel Machek wrote:
> Hi!
> > >
> > You won't get .tar or .tar.gz support in the VFS, for a few simple reasons:
> > 1. .tar and .tar.gz are complicated formats, and are therefore better
> > left to userland. You can get some of the same effect by using a shared
> > library that redefines fopen() and fread() though. It'll work fine for
> > the vast majority of apps that happens to use the C library.
>
> It is not same effect -- with shared library you get no caching. And
> that hurts a lot.
>
The compressed file is still cached, and the library can cache
file contents in a shared mapping. It does not have to
be a per-process thing.

> > It is hard to make a guaranteed bug-free decompressor that
> > is efficient and works with a finite amount of memory. The kernel
> > needs all that - userland doesn't.
>
> If you have bug in decompressor, you are screwed, anyway, because you
> get remote user exploit when mozilla gets the file from
> web. Oops. [Ok, you at least do not get remote root exploit, but...]
>
I don't worry about mozilla exploits - you get those from
nasty webpages as well. I worried about a decompressor
cras (or random memory overwrite). A userland implementation
will crash that particular userland process, with no ill effects on
the rest of the system.

A kernelside crash is much worse - it can hang the kernel and/or
mess up any process. As for exploits - an in-kernel exploit
is even worse than a root exploit. There are plenty
of thing even root can't do - at least not in
straightforward ways. The kernel has no limitations
whatsoever for what may go wrong.

Helge Hafting

2004-11-28 19:03:08

by Pavel Machek

[permalink] [raw]
Subject: Re: file as a directory

Hi!

> > > You won't get .tar or .tar.gz support in the VFS, for a few simple reasons:
> > > 1. .tar and .tar.gz are complicated formats, and are therefore better
> > > left to userland. You can get some of the same effect by using a shared
> > > library that redefines fopen() and fread() though. It'll work fine for
> > > the vast majority of apps that happens to use the C library.
> >
> > It is not same effect -- with shared library you get no caching. And
> > that hurts a lot.
> >
> The compressed file is still cached, and the library can cache
> file contents in a shared mapping. It does not have to
> be a per-process thing.

Okay, that way you can get it per-user but not system-global...

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-29 15:45:22

by Hans Reiser

[permalink] [raw]
Subject: Re: file as a directory

Markus T?rnqvist wrote:

>On Fri, Nov 26, 2004 at 10:19:57AM -0800, Hans Reiser wrote:
>
>
>
>>For the case Peter cites, yes, it does add clutter to the pathname to
>>say "..metas" (actually, it is "...." now in the current reiser4, not
>>"..metas"). This is because you aren't looking for metafile
>>
>>
>
>"...." shound like something that could be an alias for ../..
>so not much better than reserving the word "metas" from the namespace.
>
>I guess I'll still go with ..metas here, as it's the best compromise
>showed. Or maybe even ..meta (as there is no need for the plural imo)
>
>Just re-opening a damned useless, old, tired and daft can of worms :P
>
>
>
I agree that ..metas is much less likely to cause a namespace collision,
but I also think that if we called it "john" it would not be a major
problem, and since the issue is causing us political problems in getting
into the kernel, "...." is more PR right (as it does not slight Finnish
women named meta by suggesting they are too obscure to count), and so
"...." wins. "...." also has the advantage that it is elegant in
extending the Unix convention, in that we already have a ".." and a "."
and hidden files that start with ".".

2004-11-29 21:22:23

by Horst H. von Brand

[permalink] [raw]
Subject: Re: file as a directory

Christian Mayrhuber <[email protected]> said:
> On Saturday 27 November 2004 12:09, Peter Foldiak wrote:
> > On Fri, 2004-11-26 at 21:13, Christian Mayrhuber wrote:
> > > Regarding namespace unification + XPath:
> > > For files: cat /etc/passwd/[. = "joe"] should work like in XPath.
> >
> > I don't understand this. Why would you need the "."? And why the /
> > between passwd and [ ?
> Yes, I was confused by /etc/passwd/[username] in an earlier email.
> I think we both mean basically the same.

Now think about files with other formats, for instance the (in)famous
sendmail.cf, or less structured stuff like you find in /etc/init.d/, or
just Postgres databases (with fun stuff like permissions on records and
fields)... or just people groping in /etc/passwd wanting to find the whole
entry (not just one field), or perhaps look at the 15th character of the
entry for John Doe.

This way lies utter madness (what format description should be applied to
what file this time around?). Plus shove all this garbage into the kernel?!
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513

2004-11-29 23:03:49

by Peter Foldiak

[permalink] [raw]
Subject: Re: file as a directory

Horst von Brand wrote:

>Now think about files with other formats, for instance the (in)famous
>sendmail.cf, or less structured stuff like you find in /etc/init.d/, or
>just Postgres databases (with fun stuff like permissions on records and
>fields)... or just people groping in /etc/passwd wanting to find the whole
>entry (not just one field), or perhaps look at the 15th character of the
>entry for John Doe.
>
>This way lies utter madness (what format description should be applied to
>what file this time around?). Plus shove all this garbage into the kernel?!
>
>

I was suggesting this idea mainly form XML files, where the tags define
the parts clearly.
In addition, I was suggesting that some of the XPath syntax (normally
used for within-XML selection) could be extended above the file level
into the file system.
The problems you mention are all related to non-XML file format issues,
which was only a minor comment in parenthesis in my original mail. I am
happy to do it only for XML to begin with (and if possible later see if
it can be done for SOME non-XML formats). Peter

2004-11-29 23:13:09

by Peter Foldiak

[permalink] [raw]
Subject: Re: file as a directory

Horst von Brand wrote:

>... or just people groping in /etc/passwd wanting to find the whole
>entry (not just one field),
>
the whole entry for joe would be
/etc/passwd/joe

> or perhaps look at the 15th character of the
>entry for John Doe.
>
>
something like
/etc/passwd[fullname = "John Doe"]/character[15]
? what is so mad about that? Peter

2004-11-29 23:36:40

by Kevin Fox

[permalink] [raw]
Subject: Re: file as a directory

Heh. So, you can have a filename that can contain XPath looking junk.
Now, what happens when you have an XML file that points to another XML
file using XPath? How do you separate the file name XPath from the XML
XPath?

On Mon, 2004-11-29 at 22:59 +0000, Peter Foldiak wrote:
> Horst von Brand wrote:
>
> >Now think about files with other formats, for instance the (in)famous
> >sendmail.cf, or less structured stuff like you find in /etc/init.d/, or
> >just Postgres databases (with fun stuff like permissions on records and
> >fields)... or just people groping in /etc/passwd wanting to find the whole
> >entry (not just one field), or perhaps look at the 15th character of the
> >entry for John Doe.
> >
> >This way lies utter madness (what format description should be applied to
> >what file this time around?). Plus shove all this garbage into the kernel?!
> >
> >
>
> I was suggesting this idea mainly form XML files, where the tags define
> the parts clearly.
> In addition, I was suggesting that some of the XPath syntax (normally
> used for within-XML selection) could be extended above the file level
> into the file system.
> The problems you mention are all related to non-XML file format issues,
> which was only a minor comment in parenthesis in my original mail. I am
> happy to do it only for XML to begin with (and if possible later see if
> it can be done for SOME non-XML formats). Peter
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2004-11-30 08:58:05

by Peter Foldiak

[permalink] [raw]
Subject: Re: file as a directory

On Mon, 2004-11-29 at 23:35, Kevin Fox wrote:
> Heh. So, you can have a filename that can contain XPath looking junk.
> Now, what happens when you have an XML file that points to another XML
> file using XPath? How do you separate the file name XPath from the XML
> XPath?

My suggestion was simply about unifying the namespace for selection in
the file system and selection within XML files using a syntax related to
(but not necessarily identical with) XPath.
I was not suggesting you should do anything special with the content of
the XML files, even if the XML file contains an XPath reference.
(The latter could be interesting to think about as a separate issue
later, but it is certainly not part of my simpler suggestion.)
Peter

2004-11-30 14:53:41

by Horst H. von Brand

[permalink] [raw]
Subject: Re: file as a directory

Peter Foldiak <[email protected]> said:
> Horst von Brand wrote:
> >Now think about files with other formats, for instance the (in)famous
> >sendmail.cf, or less structured stuff like you find in /etc/init.d/, or
> >just Postgres databases (with fun stuff like permissions on records and
> >fields)... or just people groping in /etc/passwd wanting to find the whole
> >entry (not just one field), or perhaps look at the 15th character of the
> >entry for John Doe.

> >This way lies utter madness (what format description should be applied to
> >what file this time around?). Plus shove all this garbage into the kernel?!

> I was suggesting this idea mainly form XML files, where the tags define
> the parts clearly.

Use a XML parsing library then.

> In addition, I was suggesting that some of the XPath syntax (normally
> used for within-XML selection) could be extended above the file level
> into the file system.

Urgh.

> The problems you mention are all related to non-XML file format issues,

Most (say 99,95%) files aren't XML; and if they are, the requisite parsing
is probably on hand already, so...

> which was only a minor comment in parenthesis in my original mail. I am
> happy to do it only for XML to begin with (and if possible later see if
> it can be done for SOME non-XML formats).

Please don't.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513

2004-11-30 15:32:47

by Peter Foldiak

[permalink] [raw]
Subject: Re: file as a directory

On Tue, 2004-11-30 at 14:51, Horst von Brand wrote:
> > I was suggesting this idea mainly form XML files, where the tags define
> > the parts clearly.
>
> Use a XML parsing library then.

But namespace unification is important, and to unify the namespace, you
have to use the same syntax. I guess you disagree with me on that. (If
not, how would you do it?) Peter


2004-11-30 16:05:49

by Martin Waitz

[permalink] [raw]
Subject: Re: file as a directory

hoi :)

On Fri, Nov 26, 2004 at 10:13:57PM +0100, Christian Mayrhuber wrote:
> Regarding namespace unification + XPath:
> For files: cat /etc/passwd/[. = "joe"] should work like in XPath.
> But what to do with directories?
> Would 'cat /etc/[. = "passwd"]' output the contents of the passwd file
> or does it mean to output the file '[. = "passwd"]'?
> If the first is the case then you have to prohibit filenames looking
> like '[foo bar]'.

perhaps we should create a XML/XPath shell and a replacement for the
textutils package instead of implementing all these utilities inside the
kernel.

Then convert /etc/passwd to /etc/passwd.xml and all is well.

--
Martin Waitz


Attachments:
(No filename) (674.00 B)
(No filename) (189.00 B)
Download all attachments

2004-11-30 16:33:15

by Kevin Fox

[permalink] [raw]
Subject: Re: file as a directory

On Tue, 2004-11-30 at 08:54 +0000, Peter Foldiak wrote:
> On Mon, 2004-11-29 at 23:35, Kevin Fox wrote:
> > Heh. So, you can have a filename that can contain XPath looking junk.
> > Now, what happens when you have an XML file that points to another XML
> > file using XPath? How do you separate the file name XPath from the XML
> > XPath?
>
> My suggestion was simply about unifying the namespace for selection in
> the file system and selection within XML files using a syntax related to
> (but not necessarily identical with) XPath.

So long as its different enough that you can have current XPath
implementations be able to separate the XPath from the filename, it
should be fine.

Really simple example,
file#foo

Is #foo handled by XPath, or passed to the file system?

> I was not suggesting you should do anything special with the content of
> the XML files, even if the XML file contains an XPath reference.
> (The latter could be interesting to think about as a separate issue
> later, but it is certainly not part of my simpler suggestion.)
> Peter
>

What I was suggesting is that if your not careful, it may not be
possible for current libraries to know the difference between its XPath
like stuff and whatever is done in the file system.

Kevin

2004-11-30 16:37:12

by Horst H. von Brand

[permalink] [raw]
Subject: Re: file as a directory

Peter Foldiak <[email protected]> said:
> On Tue, 2004-11-30 at 14:51, Horst von Brand wrote:
> > > I was suggesting this idea mainly form XML files, where the tags define
> > > the parts clearly.

> > Use a XML parsing library then.

> But namespace unification is important,

Why? Directories are directories, files are files, file contents is file
contents. Mixing them up is a bad idea. Sure, you could build a filesystem
of sorts (perhaps more in the vein of persistent programming, or even data
base systems) where there simply is no distinction (because there are no
differences to show), but that is something different.

> and to unify the namespace, you
> have to use the same syntax. I guess you disagree with me on that. (If
> not, how would you do it?)

I'd go one level up: Eliminate the distinctions that bother you, not try to
patch over them.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513

2004-11-30 16:45:57

by Jan Engelhardt

[permalink] [raw]
Subject: Re: file as a directory

>> [quote]
>[and another one]

Do you really want to put an XML parser and all the associated code
(read: McDonalds*) into the kernel?



Jan Engelhardt
--
* Referring to "Supersize me"

2004-11-30 17:08:50

by Hans Reiser

[permalink] [raw]
Subject: Re: file as a directory

Horst von Brand wrote:

>Peter Foldiak <[email protected]> said:
>
>
>>On Tue, 2004-11-30 at 14:51, Horst von Brand wrote:
>>
>>
>>>>I was suggesting this idea mainly form XML files, where the tags define
>>>>the parts clearly.
>>>>
>>>>
>
>
>
>>>Use a XML parsing library then.
>>>
>>>
>
>
>
>>But namespace unification is important,
>>
>>
>
>Why? Directories are directories, files are files, file contents is file
>contents. Mixing them up is a bad idea. Sure, you could build a filesystem
>of sorts (perhaps more in the vein of persistent programming, or even data
>base systems) where there simply is no distinction (because there are no
>differences to show), but that is something different.
>
>
This is kind of like explaining to people around the office that they
could ever possibly need a disk drive of more than 10mb back in 1982 or
so. I could not convince them then, Peter, you cannot convince this guy
now, just spend the time coding it instead. Peter, you expect people to
understand the value of features they have never used. Works for some
of them. Only some of them.

>
>
>> and to unify the namespace, you
>>have to use the same syntax. I guess you disagree with me on that. (If
>>not, how would you do it?)
>>
>>
>
>I'd go one level up: Eliminate the distinctions that bother you, not try to
>patch over them.
>
>
Are you saying you'd rewrite xml to put separate objects in separate
files?

2004-11-30 17:10:09

by Peter Foldiak

[permalink] [raw]
Subject: Re: file as a directory

On Tue, 2004-11-30 at 16:31, Horst von Brand wrote:
> > But namespace unification is important,
>
> Why? Directories are directories, files are files, file contents is file
> contents. Mixing them up is a bad idea.

I disagree, I think it is a good idea.
Why is namespace unification important? Because you can use the same
tools on everything. Previously, each tool could handle one namespace.

A very simple example would be:
I want to count the words in the Appendix of my book.
If I can't select the appendix, my "wc" tool is useless (or very
difficult to use). On the other hand if I can say

wc ~/book/Appendix

it's fine. Hans Reiser would say that "namespaces are the roads and
waterways of the operating system" and "the value of an operating system
is proportional to the number of connections you can make". I think he
is right in that. And the authors of Unix knew it too, when they used
the same namespace for devices and files. They didn't say "files are
files and devices are devices". They said the difference should not
matter to the applications.
But there is still namespace fragmentation even in Unix, and this is
just one of them.

> Sure, you could build a filesystem
> of sorts (perhaps more in the vein of persistent programming, or even data
> base systems) where there simply is no distinction (because there are no
> differences to show), but that is something different.
>
> > and to unify the namespace, you
> > have to use the same syntax. I guess you disagree with me on that. (If
> > not, how would you do it?)
>
> I'd go one level up: Eliminate the distinctions that bother you, not try to
> patch over them.

But that is my point too. Peter

2004-11-30 17:24:56

by Giovanni A. Orlando

[permalink] [raw]
Subject: Re: file as a directory

Peter Foldiak wrote:

>On Tue, 2004-11-30 at 16:31, Horst von Brand wrote:
>
>
>>>But namespace unification is important,
>>>
>>>
>>Why? Directories are directories, files are files, file contents is file
>>contents. Mixing them up is a bad idea.
>>
>>
>
>I disagree, I think it is a good idea.
>
>
Hi,

Please remember DOS.

In DOS a directory is a file with a SPECIAL attrib: D_DIR.

In Unix, is basically the same.

There are nothing bad. The attrib specify that a file with the
directory attrib
may include additional files or directories.

Thanks,
Giovanni.

>Why is namespace unification important? Because you can use the same
>tools on everything. Previously, each tool could handle one namespace.
>
>A very simple example would be:
>I want to count the words in the Appendix of my book.
>If I can't select the appendix, my "wc" tool is useless (or very
>difficult to use). On the other hand if I can say
>
>wc ~/book/Appendix
>
>it's fine. Hans Reiser would say that "namespaces are the roads and
>waterways of the operating system" and "the value of an operating system
>is proportional to the number of connections you can make". I think he
>is right in that. And the authors of Unix knew it too, when they used
>the same namespace for devices and files. They didn't say "files are
>files and devices are devices". They said the difference should not
>matter to the applications.
>But there is still namespace fragmentation even in Unix, and this is
>just one of them.
>
>
>
>> Sure, you could build a filesystem
>>of sorts (perhaps more in the vein of persistent programming, or even data
>>base systems) where there simply is no distinction (because there are no
>>differences to show), but that is something different.
>>
>>
>>
>>> and to unify the namespace, you
>>>have to use the same syntax. I guess you disagree with me on that. (If
>>>not, how would you do it?)
>>>
>>>
>>I'd go one level up: Eliminate the distinctions that bother you, not try to
>>patch over them.
>>
>>
>
>But that is my point too. Peter
>
>
>


--


--
--

Check FT Websites ... http://www.futuretg.com - ftp://ftp.futuretg.com
http://www.FTLinuxCourse.com
http://www.FTLinuxCourse.com/Certification
http://www.rpmparadaise.org
http://GNULinuxUtilities.com
http://www.YourPersonalOperatingSystem.com

WorldWide Global Mobile: +39 393 665 4239

--

2004-11-30 17:49:45

by Jan Engelhardt

[permalink] [raw]
Subject: Re: file as a directory

>> >> [quote]
>> >[and another one]
>>
>> Do you really want to put an XML parser and all the associated code
>> (read: McDonalds*) into the kernel?
>
>AND assume the XML file is correct...

AND use newer compilers that supposedly create bigger code.

(I do not say that they do, and I neither say that they do not, I just read
about the backwards compat on kerneltraffic.)

Clearly, this is either a userspace job (-> FUSE, hint, hint) or nothing at
all. What if this madness was applied to any file{,-type} in the FS? You could
open() something and always got a shot (i.e. open(...) = ESUCCESS).

Having something like /etc/passwd/daemon spitting out the GECOS for user
"daemon" would make systems more open for stack smash attacks. Instead of
trying to squash a lot of shellcode just to read <your favorite file>, you just
need to use <your favorite file>/<your favorite entry> and *poof*. Even worse
when it's done with O_RDWR / O_WRONLY.

Hell, I do not even want to imagine

unlink("unlinkthisfile");
if(stat("unlinkthisfile", &sb) == 0) {
BUG(); <--
}

"<--" coming true just because that's the logic of some extension module.


Jan Engelhardt
--
ENOSPC

2004-11-30 17:49:53

by Jesse Pollard

[permalink] [raw]
Subject: Re: file as a directory

On Tuesday 30 November 2004 10:42, Jan Engelhardt wrote:
> >> [quote]
> >
> >[and another one]
>
> Do you really want to put an XML parser and all the associated code
> (read: McDonalds*) into the kernel?

AND assume the XML file is correct...

2004-11-30 18:06:31

by Horst H. von Brand

[permalink] [raw]
Subject: Re: file as a directory

Peter Foldiak <[email protected]> said:
> On Tue, 2004-11-30 at 16:31, Horst von Brand wrote:
> > > But namespace unification is important,

> > Why? Directories are directories, files are files, file contents is file
> > contents. Mixing them up is a bad idea.

> I disagree, I think it is a good idea.

Looks that way ;-)

> Why is namespace unification important? Because you can use the same
> tools on everything. Previously, each tool could handle one namespace.

Right.

> A very simple example would be:
> I want to count the words in the Appendix of my book.

I can do it... each chapter is a separate file ;-)

> If I can't select the appendix, my "wc" tool is useless (or very
> difficult to use). On the other hand if I can say
>
> wc ~/book/Appendix

> it's fine.

Exactly what I'd do.

> Hans Reiser would say that "namespaces are the roads and
> waterways of the operating system" and "the value of an operating system
> is proportional to the number of connections you can make". I think he
> is right in that.

I happen to agree.

> And the authors of Unix knew it too, when they used
> the same namespace for devices and files. They didn't say "files are
> files and devices are devices". They said the difference should not
> matter to the applications.

Right. And network connections, and pipes (connecting programs) are files
too.

> But there is still namespace fragmentation even in Unix, and this is
> just one of them.

You are right.

The next question is "Is the fragmentation required?", and then "How would
an OS without this fragmentation work?". Anwer to the first question is
"Not really, but...", answer to the second one is "It has been tried, got
nowhere" (The various IBM minicomputer OSes had no filesystem, just a
(relational) database for everything; didn't catch on. Multiple people have
worked on "persistent programming" (data floats around, programs munge it
and then go away), even APL and similar languages worked on that principle;
none did never get anywhere near "popular". There must be other examples
around). You can't just go around pretending an element in an array is the
same as a device or a full database, recursively. That way lies madness,
recursion _has_ to stop somewhere.

> > Sure, you could build a filesystem
> > of sorts (perhaps more in the vein of persistent programming, or even data
> > base systems) where there simply is no distinction (because there are no
> > differences to show), but that is something different.

> > > and to unify the namespace, you
> > > have to use the same syntax. I guess you disagree with me on that. (If
> > > not, how would you do it?)
> >
> > I'd go one level up: Eliminate the distinctions that bother you, not try to
> > patch over them.
>
> But that is my point too.

But the result _can't_ live inside the Unix worldview, it is quite at odds
with it on a fundamental level. I.e., build an environment that works that
way, don't go around trying to pretend things are what they aren't.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513

2004-11-30 18:26:25

by Amit Gud

[permalink] [raw]
Subject: Re: file as a directory

I cannot imagine viewing the entire filesystem as a single huge XML file.

My suggestion is to add a framework, an infrastructure, in the VFS
wherein a simple plugin can be written to poke into the file as if it
were a directory. So with that framework in place, I can write a
plugin for archive support (treating the .tar files as directories),
Peter could write a plugin for poking into /etc/passwd (treating it as
a directory), and Jon Doe could write a plugin for sendmail.cf

like:
--
struct file_operations ops = {
.read = tar_readdir,
.readdir = tar_readdir,
......
};

register_file_type("tar", &ops);
--

How good would this be?

AG
--
May the source be with you.

2004-11-30 18:43:21

by Jan Engelhardt

[permalink] [raw]
Subject: Re: file as a directory

>My suggestion is to add a framework, an infrastructure, in the VFS
>wherein a simple plugin can be written to poke into the file as if it
>were a directory. So with that framework in place, I can write a
>plugin for archive support (treating the .tar files as directories),
>Peter could write a plugin for poking into /etc/passwd (treating it as
>a directory), and Jon Doe could write a plugin for sendmail.cf

That's something I could live with, but how do you want to tag a file being
"tar" so that tar_ops is used instead of the "default file" ops?

You could not do so without an extra function, and once you use that extra
function to tag a certain file being "tar" -- you know that extensions are
kinda "worthless", and, especially, unrealiable -- you could also have used tar
-tvf.

Did I mention tar is not the perfect format? It's because it is lacking an
index and letting the kernel wade through a GB-sized tar file just to perform
and readdir (yet imagine reading the last file of it) would be a hell of
skipping. Keeping a non-persistent index in memory may solve the problem, but
hey, I also do not want to spend too much memory just for a single tar file.

>struct file_operations ops = {
> .read = tar_readdir,
> .readdir = tar_readdir,
> ......
>};
>
>register_file_type("tar", &ops);




Jan Engelhardt
--
ENOSPC

2004-12-01 02:45:52

by Scott Young

[permalink] [raw]
Subject: Re: file as a directory

On Tue, 30 Nov 2004 19:39:15 +0100 (MET), Jan Engelhardt
<[email protected]> wrote:
> >My suggestion is to add a framework, an infrastructure, in the VFS
> >wherein a simple plugin can be written to poke into the file as if it
> >were a directory. So with that framework in place, I can write a
> >plugin for archive support (treating the .tar files as directories),
> >Peter could write a plugin for poking into /etc/passwd (treating it as
> >a directory), and Jon Doe could write a plugin for sendmail.cf

The biggest problem I see with adding the complicated stuff to VFS is
the bloat and risk to system stability. However, some things cannot
be done in userspace, such as good caching. How is one userspace
library supposed to keep a transparent cache of, for example, an index
for a tar file, not clutter up the on-disk representation of the
cache, effectively manage space utilization, and be able to
efficiently detect changes to files in order to invalidate the cache?
This would become orders of magnitude easier if a ubiquitous
filesystem interface were in use. However, the only ubiquitous
filesystem interface is VFS, which shouldn't have to take all the code
bloat.

Maybe something crazy could work. Let's take some concepts from the
Aspect Oriented Programming paradigm. Whenever a program is loaded
into memory, calls in the program to the vfs interface are modified to
instead call new userspace functions that have all of the desired
functionality, and those userspace functions eventually call the real
system functions. The kernel wouldn't have to take the bloat, plus it
would be able to do things the userspace libraries wouldn't be able to
do efficiently. It's the best of both worlds, with a little insanity
thrown in (It'd be neat to see the loader bootstrap its own code to
weave in the caching of the pre-woven binaries).


> That's something I could live with, but how do you want to tag a file being
> "tar" so that tar_ops is used instead of the "default file" ops?
>
> You could not do so without an extra function, and once you use that extra
> function to tag a certain file being "tar" -- you know that extensions are
> kinda "worthless", and, especially, unrealiable -- you could also have used tar
> -tvf.
>
> Did I mention tar is not the perfect format? It's because it is lacking an
> index and letting the kernel wade through a GB-sized tar file just to perform
> and readdir (yet imagine reading the last file of it) would be a hell of
> skipping. Keeping a non-persistent index in memory may solve the problem, but
> hey, I also do not want to spend too much memory just for a single tar file.

It would also be nice to have an interface which can build, maintain,
and cache on the disk a persistent index into a tar file on the disk,
and then be able to delete this index when space is running low.
Plus, this index could be generated by streaming the file through
memory, so you don't need to consume too much memory for a single
file.


> >struct file_operations ops = {
> > .read = tar_readdir,
> > .readdir = tar_readdir,
> > ......
> >};
> >
> >register_file_type("tar", &ops);
>
> Jan Engelhardt
> --
> ENOSPC

2005-05-10 09:39:52

by Peter Foldiak

[permalink] [raw]
Subject: Re: file as a directory

Back in November 2004, I suggested on the linux-kernel and reiserfs
lists that the Reiser4 architecture could allow us to abolish the
unnatural naming distinction between directories/files/parts-of-file
(i.e. to unify naming within-file-system and within-file naming) in an
efficient way.
I suggested that one way of doing that would be to extend XPath-like
selection syntax above the (XML) file level.
(See the archive of the discussion starting at
http://www.ussg.iu.edu/hypermail/linux/kernel/0411.3/0044.html
Wed Nov 24 2004 - 04:21:13 EST.)

ITworld now has an interesting article by Sean McGrath on a very similar
idea, mentioning the XML OASIS Open Document Format. What do you think?

Peter Foldiak

Here it is:

--

ITworld

http://www.itworld.com/AppDev/1246/nls_ebizbooks050510/

Books/chapters and directories/files - dichotomies considered harmful
ITworld.com, Ebusiness in the Enterprise 5/9/05

Sean McGrath, ITworld.com

The distinction between a full book and a mere chapter of a book, is a
source of endless fascination for incurable information modellers like
me.

Obviously, at the logical level, the distinction is driven by the
content itself. A book is a complete unit of stuff. A chapter, is a
sub-division within the complete book. At the physical level, however,
technology starts to influence the book/chapter distinction. A chapter
boundary, for Microsoft Word users or Open Office users, is likely to be
influenced by how big the underlying file gets. Large files take longer
to load and get increasingly slower to work with in typical word
processing environments. Our decisions about where to draw the chapter
boundaries are influenced to some extent by technology limitations.

If the physical constraints are not allowed to dictate the boundaries
for chapters, then we can end up resorting to file naming conventions to
split the content into manageable chunks e.g. chapter1_a, chapter1_b and
so on. We might then decide to keep things clean by introducing a
subdirectory for each chapter, putting the sub-chapters tidily away in
their own little compartments.

All is well with the world. Or is it? This is where things get
interesting from an information management perspective. A full unit of
work - a book - has now been split into bits that are navigable through
a directory structure and bits that are navigable through an
application. The result? You can use off-the-shelf tools to navigate
your way through the directories. You can see the overall structure of
the book by simply looking at the directory structure as a hierarchy.
You can see that chapter 1 has a number of sub-chapters. However, that
is as far as you can go. To dig any further into the structure of
chapter 1, section A, you need to launch the editing application.

What a pity.

Why is it, that we have this hard and fast dichotomy between directory
structure and file structure? Why is it that file system exploring
utilities need to stop in their tracks when they hit things called
'files'?

As you have probably noticed, this artificial split can be breached in
certain circumstances, at least to some extent. Graphics file formats
are a good example. Many file system exploring tools know about, say,
JPEG files and can display thumbnails of their contents.

That is a start in the right direction but I think it needs to go a lot
further if the artificial directory/file distinction is to be
eradicated.

Let us go back to the book example. Let us use Microsoft's OLE
technology as an analogy. With OLE you can embed one thing in another.
So for example, you can embed an Excel spreadsheet into a Word document
file. Now, in your head, take that further. Imagine a world in which the
file system explorer is the top level application. It manages a single,
humungous file on the disk into which you embed documents, spreadsheets,
databases etc. Each think you embed into the explorer can itself embed
other things to any depth required.

In such a world, directories/files have merged into one abstraction. The
book author does not have to introduce artificial segmentation of the
book into separate entities. In such a world, filenames become something
of an oddity. What do you need filenames for? You would only really need
a filename at the point where you decided to exchange information
between systems A and B.

Moreover, once the package of data is pasted into System B's file system
explorer at some suitable point, the filename would be thrown away.

Sounds interesting wouldn't you say? So why don't we have systems that
work like that? There are, as ever, many reasons. One reason which was
an issue some years ago, is ceasing to be an issue very quickly now.
Obviously, in order to show the structure of a "file" a file system
explorer needs to look inside the file format. If the file format is
proprietary, then we can do nothing.

Enter XML-based file formats like the OASIS Open Document Format[1]. The
day is coming when file system explorers will be able to do for office
documents, what they currently do for JPEGs. That is a start in the
right direction. Eventually, I hope we will see the directory/file
distinction begin to melt away.

Technologies/applications that never quite made it to the mainstream
such as OpenDoc[2] and FrameMaker[3] with its powerful Book/Chapter
model, may yet have a second coming.

[1] http://www.oasis-open.org/committees/office/charter.php
[2] http://www.webopedia.com/TERM/O/OpenDoc.html
[3] http://www.adobe.com/products/framemaker/main.html

Sean McGrath is CTO of Propylon. He is an internationally acknowledged
authority on XML and related standards. He served as an invited expert
to the W3C's Expert Group that defined XML in 1998. He is the author of
three books on markup languages published by Prentice Hall. Visit his
site at: http://seanmcgrath.blogspot.com.


2005-05-10 14:53:51

by Hans Reiser

[permalink] [raw]
Subject: Re: file as a directory

I agree with the below in that sometimes you want to see a collection of
stuff as one file, and sometimes you want to see it as a tree, and that
file format browsers can be integrated into file system browsers to look
seamless to users.

A quibble: A name is just a means to select a file; he is completely
wrong to think that file browsers will eliminate filenames.

Hans

Peter Foldiak wrote:

>Back in November 2004, I suggested on the linux-kernel and reiserfs
>lists that the Reiser4 architecture could allow us to abolish the
>unnatural naming distinction between directories/files/parts-of-file
>(i.e. to unify naming within-file-system and within-file naming) in an
>efficient way.
>I suggested that one way of doing that would be to extend XPath-like
>selection syntax above the (XML) file level.
>(See the archive of the discussion starting at
>http://www.ussg.iu.edu/hypermail/linux/kernel/0411.3/0044.html
>Wed Nov 24 2004 - 04:21:13 EST.)
>
>ITworld now has an interesting article by Sean McGrath on a very similar
>idea, mentioning the XML OASIS Open Document Format. What do you think?
>
> Peter Foldiak
>
>Here it is:
>
>--
>
>ITworld
>
>http://www.itworld.com/AppDev/1246/nls_ebizbooks050510/
>
>Books/chapters and directories/files - dichotomies considered harmful
>ITworld.com, Ebusiness in the Enterprise 5/9/05
>
>Sean McGrath, ITworld.com
>
>The distinction between a full book and a mere chapter of a book, is a
>source of endless fascination for incurable information modellers like
>me.
>
>Obviously, at the logical level, the distinction is driven by the
>content itself. A book is a complete unit of stuff. A chapter, is a
>sub-division within the complete book. At the physical level, however,
>technology starts to influence the book/chapter distinction. A chapter
>boundary, for Microsoft Word users or Open Office users, is likely to be
>influenced by how big the underlying file gets. Large files take longer
>to load and get increasingly slower to work with in typical word
>processing environments. Our decisions about where to draw the chapter
>boundaries are influenced to some extent by technology limitations.
>
>If the physical constraints are not allowed to dictate the boundaries
>for chapters, then we can end up resorting to file naming conventions to
>split the content into manageable chunks e.g. chapter1_a, chapter1_b and
>so on. We might then decide to keep things clean by introducing a
>subdirectory for each chapter, putting the sub-chapters tidily away in
>their own little compartments.
>
>All is well with the world. Or is it? This is where things get
>interesting from an information management perspective. A full unit of
>work - a book - has now been split into bits that are navigable through
>a directory structure and bits that are navigable through an
>application. The result? You can use off-the-shelf tools to navigate
>your way through the directories. You can see the overall structure of
>the book by simply looking at the directory structure as a hierarchy.
>You can see that chapter 1 has a number of sub-chapters. However, that
>is as far as you can go. To dig any further into the structure of
>chapter 1, section A, you need to launch the editing application.
>
>What a pity.
>
>Why is it, that we have this hard and fast dichotomy between directory
>structure and file structure? Why is it that file system exploring
>utilities need to stop in their tracks when they hit things called
>'files'?
>
>As you have probably noticed, this artificial split can be breached in
>certain circumstances, at least to some extent. Graphics file formats
>are a good example. Many file system exploring tools know about, say,
>JPEG files and can display thumbnails of their contents.
>
>That is a start in the right direction but I think it needs to go a lot
>further if the artificial directory/file distinction is to be
>eradicated.
>
>Let us go back to the book example. Let us use Microsoft's OLE
>technology as an analogy. With OLE you can embed one thing in another.
>So for example, you can embed an Excel spreadsheet into a Word document
>file. Now, in your head, take that further. Imagine a world in which the
>file system explorer is the top level application. It manages a single,
>humungous file on the disk into which you embed documents, spreadsheets,
>databases etc. Each think you embed into the explorer can itself embed
>other things to any depth required.
>
>In such a world, directories/files have merged into one abstraction. The
>book author does not have to introduce artificial segmentation of the
>book into separate entities. In such a world, filenames become something
>of an oddity. What do you need filenames for? You would only really need
>a filename at the point where you decided to exchange information
>between systems A and B.
>
>Moreover, once the package of data is pasted into System B's file system
>explorer at some suitable point, the filename would be thrown away.
>
>Sounds interesting wouldn't you say? So why don't we have systems that
>work like that? There are, as ever, many reasons. One reason which was
>an issue some years ago, is ceasing to be an issue very quickly now.
>Obviously, in order to show the structure of a "file" a file system
>explorer needs to look inside the file format. If the file format is
>proprietary, then we can do nothing.
>
>Enter XML-based file formats like the OASIS Open Document Format[1]. The
>day is coming when file system explorers will be able to do for office
>documents, what they currently do for JPEGs. That is a start in the
>right direction. Eventually, I hope we will see the directory/file
>distinction begin to melt away.
>
>Technologies/applications that never quite made it to the mainstream
>such as OpenDoc[2] and FrameMaker[3] with its powerful Book/Chapter
>model, may yet have a second coming.
>
>[1] http://www.oasis-open.org/committees/office/charter.php
>[2] http://www.webopedia.com/TERM/O/OpenDoc.html
>[3] http://www.adobe.com/products/framemaker/main.html
>
>Sean McGrath is CTO of Propylon. He is an internationally acknowledged
>authority on XML and related standards. He served as an invited expert
>to the W3C's Expert Group that defined XML in 1998. He is the author of
>three books on markup languages published by Prentice Hall. Visit his
>site at: http://seanmcgrath.blogspot.com.
>
>
>
>
>
>

2005-05-10 15:20:50

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: file as a directory

On Tue, 10 May 2005 10:39:23 BST, Peter Foldiak said:
> Back in November 2004, I suggested on the linux-kernel and reiserfs
> lists that the Reiser4 architecture could allow us to abolish the
> unnatural naming distinction between directories/files/parts-of-file
> (i.e. to unify naming within-file-system and within-file naming) in an
> efficient way.
> I suggested that one way of doing that would be to extend XPath-like
> selection syntax above the (XML) file level.

I believe the consensus was that this needs to happen at the VFS layer, not
the FS level. The next step would be designing an API for this - what would
the VFS present to userspace, and in what way, and how would backward
combatability be maintained?


Attachments:
(No filename) (226.00 B)

2005-05-10 15:32:44

by Peter Foldiak

[permalink] [raw]
Subject: Re: file as a directory

On Tue, 2005-05-10 at 15:53, Hans Reiser wrote:
> I agree with the below in that sometimes you want to see a collection of
> stuff as one file, and sometimes you want to see it as a tree, and that
> file format browsers can be integrated into file system browsers to look
> seamless to users.
>
> A quibble: A name is just a means to select a file; he is completely
> wrong to think that file browsers will eliminate filenames.

Yes, even if you think of the whole file system as a single "file", you
need a way to select the bit you need, and you will use names for that
(and whether you call that a filename, a file-part name or an object
name doesn't really matter).

It is interesting that both he and I gave the example of a book and
chapters, which is essentially a linear sequence, and the issue was just
the selection of a part of that sequence. It would also be interesting
to think about how you could map an arbitrary data structure more
complicated than a linear sequence (an "object") to disk. This brings up
issues of serialization and object databases....

2005-05-10 15:40:10

by Peter Foldiak

[permalink] [raw]
Subject: Re: file as a directory

On Tue, 2005-05-10 at 16:14, [email protected] wrote:
> On Tue, 10 May 2005 10:39:23 BST, Peter Foldiak said:
> > Back in November 2004, I suggested on the linux-kernel and reiserfs
> > lists that the Reiser4 architecture could allow us to abolish the
> > unnatural naming distinction between directories/files/parts-of-file
> > (i.e. to unify naming within-file-system and within-file naming) in an
> > efficient way.
> > I suggested that one way of doing that would be to extend XPath-like
> > selection syntax above the (XML) file level.
>
> I believe the consensus was that this needs to happen at the VFS layer, not
> the FS level. The next step would be designing an API for this - what would
> the VFS present to userspace, and in what way, and how would backward
> combatability be maintained?

But can it be done efficiently above the file system level??

As far as I understand, Reiser4 has this nice tree structure, which
means that the part of file selection could be done with almost no extra
effort, you just attach additional names to inside nodes of the tree, so
the same tree can be used to store the whole object, and part of the
same tree can be used to select the object part. Right?
If you do this above the file system level, I don't think it would have
such an efficient implementation. Or would it? Peter

2005-05-10 16:30:43

by Sean McGrath

[permalink] [raw]
Subject: Re: file as a directory

Peter Foldiak wrote:

>On Tue, 2005-05-10 at 15:53, Hans Reiser wrote:
>
>
>>I agree with the below in that sometimes you want to see a collection of
>>stuff as one file, and sometimes you want to see it as a tree, and that
>>file format browsers can be integrated into file system browsers to look
>>seamless to users.
>>
>>A quibble: A name is just a means to select a file; he is completely
>>wrong to think that file browsers will eliminate filenames.
>>
>>
>
>Yes, even if you think of the whole file system as a single "file", you
>need a way to select the bit you need, and you will use names for that
>(and whether you call that a filename, a file-part name or an object
>name doesn't really matter).
>
>
The thing that interests me most is the difference (if any) between
giving a stream of bytes an opaque name e.g. "Chapter 1 of my book.sxw"
versus giving a stream of bytes a query expression that can also be
considered an opaque name e.g.
"/book/chapter[1] "

This is what the Russell/Frege descriptive theory of proper names
applied to storage systems in a sense[1].

I've written about this stuff before on ITWorld (warning: chatty prose
style ahead):

Fractals, Self Similarity, and the Whimsical Boundaries of XML Documents
http://www.itworld.com/nl/xml_prac/04252002/

A study in XML culture and evolution
http://www.itworld.com/nl/ebiz_ent/03252003/

[1] http://en.wikipedia.org/wiki/Proper_name

Sean


2005-05-10 17:21:17

by Hans Reiser

[permalink] [raw]
Subject: Re: file as a directory

Peter Foldiak wrote:

>On Tue, 2005-05-10 at 16:14, [email protected] wrote:
>
>
>>On Tue, 10 May 2005 10:39:23 BST, Peter Foldiak said:
>>
>>
>>>Back in November 2004, I suggested on the linux-kernel and reiserfs
>>>lists that the Reiser4 architecture could allow us to abolish the
>>>unnatural naming distinction between directories/files/parts-of-file
>>>(i.e. to unify naming within-file-system and within-file naming) in an
>>>efficient way.
>>>I suggested that one way of doing that would be to extend XPath-like
>>>selection syntax above the (XML) file level.
>>>
>>>
>>I believe the consensus was that this needs to happen at the VFS layer, not
>>the FS level. The next step would be designing an API for this - what would
>>the VFS present to userspace, and in what way, and how would backward
>>combatability be maintained?
>>
>>
>
>But can it be done efficiently above the file system level??
>
>As far as I understand, Reiser4 has this nice tree structure, which
>means that the part of file selection could be done with almost no extra
>effort, you just attach additional names to inside nodes of the tree, so
>the same tree can be used to store the whole object, and part of the
>same tree can be used to select the object part. Right?
>If you do this above the file system level, I don't think it would have
>such an efficient implementation. Or would it? Peter
>
>
The tree structure Peter speaks of is a storage layer entity, and so I
think Peter's argument is not correct, but what Reiser4 also has is a
plugin architecture, and it would be much easier to code it if we use
the plugin architecture.

Hans

2005-05-10 17:26:10

by Hans Reiser

[permalink] [raw]
Subject: Re: file as a directory

Sean McGrath wrote:

> The thing that interests me most is the difference (if any) between
> giving a stream of bytes an opaque name e.g. "Chapter 1 of my
> book.sxw" versus giving a stream of bytes a query expression that can
> also be considered an opaque name e.g.
> "/book/chapter[1] "
>
What is an opaque name?

2005-05-10 17:39:50

by Sean McGrath

[permalink] [raw]
Subject: Re: file as a directory

Hans Reiser wrote:

>Sean McGrath wrote:
>
>
>
>>The thing that interests me most is the difference (if any) between
>>giving a stream of bytes an opaque name e.g. "Chapter 1 of my
>>book.sxw" versus giving a stream of bytes a query expression that can
>>also be considered an opaque name e.g.
>>"/book/chapter[1] "
>>
>>
>>
>What is an opaque name?
>
>
>
>
By "opaque name" I mean a name that is purely a label. A name that
cannot be interpreted as a query expression.

Sean



2005-05-10 18:52:39

by Hans Reiser

[permalink] [raw]
Subject: Re: file as a directory

Sean McGrath wrote:

> Hans Reiser wrote:
>
>> Sean McGrath wrote:
>>
>>
>>
>>> The thing that interests me most is the difference (if any) between
>>> giving a stream of bytes an opaque name e.g. "Chapter 1 of my
>>> book.sxw" versus giving a stream of bytes a query expression that can
>>> also be considered an opaque name e.g.
>>> "/book/chapter[1] "
>>>
>>>
>>
>> What is an opaque name?
>>
>>
>>
>>
> By "opaque name" I mean a name that is purely a label. A name that
> cannot be interpreted as a query expression.

Isn't query just another name for name?

>
> Sean
>
>
>
>
>

2005-05-10 19:39:30

by Sean McGrath

[permalink] [raw]
Subject: Re: file as a directory

Hans Reiser wrote:

>Sean McGrath wrote:
>
>
>>Hans Reiser wrote:
>>
>>
>>>Sean McGrath wrote:
>>>
>>>
>>>>The thing that interests me most is the difference (if any) between
>>>>giving a stream of bytes an opaque name e.g. "Chapter 1 of my
>>>>book.sxw" versus giving a stream of bytes a query expression that can
>>>>also be considered an opaque name e.g.
>>>>"/book/chapter[1] "
>>>>
>>>>
>>>>
>>>>
>>>What is an opaque name?
>>>
>>>
>>>
>>>
>>>
>>>
>>By "opaque name" I mean a name that is purely a label. A name that
>>cannot be interpreted as a query expression.
>>
>>
>
>Isn't query just another name for name?
>
>
>
That is a major philosophical nugget :-)

I recommend Saul Kripke's Naming and Necessity:
http://www.answers.com/topic/saul-kripke

Sean


2005-05-10 20:12:12

by Hans Reiser

[permalink] [raw]
Subject: Re: file as a directory

Sean McGrath wrote:

>
>>>> What is an opaque name?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> By "opaque name" I mean a name that is purely a label. A name that
>>> cannot be interpreted as a query expression.
>>>
>>
>>
>> Isn't query just another name for name?
>>
>>
>>
> That is a major philosophical nugget :-)
>
> I recommend Saul Kripke's Naming and Necessity:
> http://www.answers.com/topic/saul-kripke
>
> Sean
>
>
>
>
I suggest considering your opaque names to be what reiser4 calls "keys",
that is, names that exist for the purpose of finding the object via the
storage layer.

Hans

2005-05-11 10:18:45

by Helge Hafting

[permalink] [raw]
Subject: Re: file as a directory

On Tue, May 10, 2005 at 04:38:48PM +0100, Peter Foldiak wrote:
> On Tue, 2005-05-10 at 16:14, [email protected] wrote:
> > On Tue, 10 May 2005 10:39:23 BST, Peter Foldiak said:
> > > Back in November 2004, I suggested on the linux-kernel and reiserfs
> > > lists that the Reiser4 architecture could allow us to abolish the
> > > unnatural naming distinction between directories/files/parts-of-file
> > > (i.e. to unify naming within-file-system and within-file naming) in an
> > > efficient way.
> > > I suggested that one way of doing that would be to extend XPath-like
> > > selection syntax above the (XML) file level.
> >
> > I believe the consensus was that this needs to happen at the VFS layer, not
> > the FS level. The next step would be designing an API for this - what would
> > the VFS present to userspace, and in what way, and how would backward
> > combatability be maintained?
>
> But can it be done efficiently above the file system level??
>
Anything that can be done at the fs level should be doable on the vfs level too.
That is simple to show in theory: You could make the VFS api identical to
the reiser4 api, and reiser4 should continue to work as efficiently as before.

> As far as I understand, Reiser4 has this nice tree structure, which
> means that the part of file selection could be done with almost no extra
> effort, you just attach additional names to inside nodes of the tree, so
> the same tree can be used to store the whole object, and part of the
> same tree can be used to select the object part. Right?
> If you do this above the file system level, I don't think it would have
> such an efficient implementation. Or would it? Peter

I cannot see why reiser4 should suffer - but of course this might be hard to
implement for other filesystems.

Helge Hafting

2005-05-16 12:33:18

by Leo Comerford

[permalink] [raw]
Subject: Re: file as a directory

On 5/10/05, Peter Foldiak <[email protected]> wrote:
> On Tue, 2005-05-10 at 15:53, Hans Reiser wrote:
> > I agree with the below in that sometimes you want to see a collection of
> > stuff as one file, and sometimes you want to see it as a tree, and that
> > file format browsers can be integrated into file system browsers to look
> > seamless to users.
> >
> > A quibble: A name is just a means to select a file; he is completely
> > wrong to think that file browsers will eliminate filenames.
>
> Yes, even if you think of the whole file system as a single "file", you
> need a way to select the bit you need, and you will use names for that
> (and whether you call that a filename, a file-part name or an object
> name doesn't really matter).
>
> It is interesting that both he and I gave the example of a book and
> chapters, which is essentially a linear sequence, and the issue was just
> the selection of a part of that sequence. It would also be interesting
> to think about how you could map an arbitrary data structure more
> complicated than a linear sequence (an "object") to disk. This brings up
> issues of serialization and object databases....
>
>
Here's how you might go about it.

First, some necesary background. Some (not all) of this I've mentioned
before, mainly on reiserfs-list. I've marked the start of the
"compound object"-related material below.

The fundamental problem with reiser4-style metas is semantic. The core
Unix idiom is that pathnames assert predicates of non-directory files.
So for example "/etc/passwd" should be interpreted as "the file with
inode x is the password file", while "/bin" asserts "this is a binary
executable" of all of its non-directory descendants. In OO terms we
can see these as isAs: '/bin/touch' asserts that the file with that
name isA touch binary and, transitively, isA binary executable
('/bin'). (So pathnames from root are really better not thought of as
names at all, but never mind that now.)

Now in fact this idiom isn't as widely or consistently applied in *nix
as it could be if certain technical barriers were removed. But it's
strong enough to allow us to say that in general *nix directories
exist to provide metadata (in the form of pathnames) about their
non-directory descendants.

Under reiser4, we have two different metadata systems cohabiting in
the same namespace but remaining distinct. (If I want to say something
about a file, I might make a link to it, or I might put something in
its .... directory. Or maybe both.) One is the old Unix pathname
system, designed for finding a file given certain metadata (which is
the password file?). The other is metas, a replacement for stat() and
friends which like them is designed for finding some metadata for a
given file (who owns this file?). It uses the syntax and the API of
Unix filenames but a completely contrary semantics involving
non-directory files which exist to provide metadata about their parent
directories. Many of the difficulties of implementing metas are simply
symptoms of the underlying reality that metas are not namespace
integration but namespace duct-taping.

Now I think Hans is dead right when he says that namespace unification
is vital and that all you need are files and directories. Just
removing the problems that limit the usefulness of the "pathnames are
predicates" convention would achieve a lot. There are many pieces of
metadata that could readily be expressed as pathnames rather than
being stuffed away in stat blocks or ..metas files - ownership and
permissions data, for example.

One limitation to remove is the fact that it's impossible to (using
Sean's language) give a directory an "opaque name". An opaque
directory name would assert a predicate of the directory file itself
and not its descendants, whereas a normal directory name asserts a
predicate of the directory's non-directory descendants and not the
directory itself. So, for example, we might indicate the owner of a
directory by giving it the appropriate opaque name.

Probably the biggest barrier is the fact that it's nigh-impossible to
take a specific (non-directory) file and find its pathnames! We need
the ability to do this for any file, directory or otherwise, and for
all types of pathnames applied to the file.

But of course single-place predicates aren't enough to express all the
file metadata we might want. Sometimes we need relations too.

(**The answer begins here.** Apologies for the long set-up.)

Suppose I am using an improved *nix of the type I am proposing. There
is an image file on my computer named ~/photos/dessau-bauhaus . I have
a short description, stored in a text file, which I want to associate
with the image. I go the the directory

/(something)/description/

and create the following:

/(something)/description/aardvark:

/(something)/description/ is an ordinary directory, and
'/(something)/description' is an ordinary (non-opaque) name such as
directories have in the *nix of today. The only detail is that the
"pathnames are predicates" convention is stronger on this system than
on today's *nixes. So you can be sure that each of the (opaque)
descendants of /(something)/description/ isA
'/(something)/description' while /(something)/description/ itself is
not, unless it happens to be an opaque descendant of itself! (Again,
just as /bin/touch isA binary executable while /bin/ itself is not.)
Call /(something)/description/ a 'predicate-directory' or just a
directory.

/(something)/description/aardvark: (the colon is a delimiter) is
almost an ordinary directory. The calls for dealing with aardvark: are
probably almost exactly the same as those for handling an ordinary
directory; the internal representation of aardvark: may be identical
to that of an ordinary directory. The important difference is
semantic. aardvark: is a 'relation-directory': it expresses an
instance of a relation.

Consequently, '/(something)/description/aardvark' is not an ordinary
directory name. It's opaque: the predicate is asserted of the
directory itself and not its descendants.
/(something)/description/aardvark: isA
'/(something)/description/aardvark'. It therefore also isA
'/(something)/description', and so on transitively. By contrast, no
descendant of /something/description/aardvark: is (for example) a
'/(something)/description', except maybe through some other link. But
'/(something)/description/aardvark' is understood as asserting a file
type for aardvark, just as dessau-bauhaus might have another pathname
which would be understood as asserting that it is a jpeg.

Now I link /(whatever)/photos/dessau-bauhaus to aardvark: by the name

/(something)/description/aardvark:described

and I link the description to aardvark: by the name

/(something)/description/aardvark:description

So: aardvark: is an instance of the relation
'/(something)/description'. (Actually, it's also the only instance of
the more specialised relation '/(something)/description/aardvark', but
I haven't described the necessary tweak to get rid of that nuisance.)
In this instance of the relation, the file elsewhere named
'~/photos/dessau-bauhaus' has the role 'described', and the
description has taken the role 'description'. Now when one searches
for the pathnames of the file ~/photos/dessau-bauhaus,
'/(something)/description/aardvark:described' will be among them. A
person or program that knows the semantics of the
'/(something)/description' relation will be able to find the
description and know it as such. And if you don't know the semantics
you can still poke around inside the instance, like examining an XML
document whose schema you don't have.

So a relation-directory can express arbitrary relationships between
files. It's a bit like the relational model's weakly-typed sister,
where /(something)/description is a table, aardvark: is a row, and
~/photos/dessau-bauhaus is aardvark's entry in the column
/(something)/description/(the various relation-directories):described.
In OOese, /(something)/description is an association, aardvark: is one
of the links of that association, and (the various
relation-directories):described is a role name. (At least, that's the
Rumbaugh-Blaha-Premerlani-Eddy-and-Lorensen version; the UMLese may
vary.)

A relation-directory can have more than two children, and can have as
children predicate-directories (either opaquely or non-opaquely) or
other relation-directories.

But what about compound objects? Here's an example.

/(something)/concatenation/zebra:
/(something)/concatenation/zebra:1
/(something)/concatenation/zebra:2
/(something)/concatenation/zebra:3

This is of course the "concatenated file" example from the Reiser4
paper redone using a relation-directory. Just as with the earlier
description example, zebra is describing a relationship that exists
between its various member files. If we list the pathnames of
~/sometext, we might find that '/(something)/concatenation/zebra:2' is
one of its pathnames, which tells us that it is part 2 of some
specific concatenation. (Of course, we might also find pathnames
telling us that it is part 2 or 42 or some other concatenation, or a
description for ~/dessau-bauhaus, or very important, or whatever.) The
only difference is that in this case we find it useful to have the
relationship itself as an object in the filesystem. Easily done - just
treat the relationship's in-filesystem representation, the
relation-directory, as a file in its own right rather than just a
source of metadata about other files. In other words, just make some
links to the file otherwise known as
'/(something)/concatenation/zebra'. Call it '~/mybook', or
'/etc/passwd' if you're feeling brave. :) Now the real glory of this
system only reveals itself when you combine it with liberal use of
userspace mount() and (less importantly) file methods. That is the
synthesis of the dialectic between the Unix ideal of the file as the
atom of the filesystem and the world of compound documents, etc. - and
it's hoch-Unix. (Your slogan for the day: "bash Is My Object Browser".
Tomorrow: "mount() Is My Serialiser".) But note for now that if we
define 'atomic file' as 'just a simple sequence of bytes', we can
redefine 'file' as 'either an atomic file or a relationship between
files'.

To amplify, though, the "main", "real" purpose (so to speak) of
relation-directories is to express relationships between files. Doing
this properly just happens to give us compund objects for free. As
Rumbaugh, Blaha, Premerlani, Eddy and Lorensen say (in unison?),
"Aggregation is a special form of association, not an independent
concept." Beware the visual/spatial metaphors which subtly warp one's
understanding of the Unix file system. It's not a set of Russian dolls
or a maze of twisty little passages. In particular, files are not
physically inside directories, at any lelel of abstraction. aardvark:
just provides some metadata about ~/photos/dessau-bauhaus, and that's
all that ~/photos does too.

(Great, isn't it? The filesystem namespace: they're not names, and
it's not a space. :) )

On the other hand, garbage collection will be a significant hurdle,
for two reasons. One is cycles. The semantics of predicate-directories
mean that it's unnecessary to permit cycles containing only
predicate-directories, but if you're going to instances of, say the
singly-linked-list-node relation, then the need for cycles is
unavoidable. The other is more sophisticated needs for automatic
deletion. For example, we would probably need
/(something)/description/aardvark: to be marked for deletion as soon
as either of its children were unlinked.

Leo Comerford.