2014-04-15 14:05:19

by Emmanuel Colbus

[permalink] [raw]
Subject: [RFC][6/11][MANUX] Kernel compatibility : directory hardlinks

Now for something that has to do both with syscalls and filesystems...

My operating system relies heavily upon hardlinks, and, amongst others,
directory hardlinks. (Yes, that's what my ext2l partitions are for. Not
only, but this is part of it).

To allow distinguishing them from true directories, I've introduced a
value S_IFDHL equal to 0130000, both for the file mode in stat(2) and
for the type_entry field in the dirent structure, in getdents64(2).
(However, it's not visible to normal applications, because seeing this
value in getdents() requires asking for it with a new syscall and having
the privileges to do so, while seeing the value in stat(2) is impossible
for them because their call gets routed towards the target directory -
my directory hardlinks are implemented somewhat like symlinks, but with
inode numbers. Thus, the userspace requires no modifications).

Is this value acceptable? And, if it is, could you mark it as reserved
(or otherwise avoid reusing it), so that there's no collision with it in
the future?

Thank you,

Emmanuel


2014-04-15 20:06:47

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC][6/11][MANUX] Kernel compatibility : directory hardlinks

On Tue, Apr 15, 2014 at 03:43:01PM +0200, Emmanuel Colbus wrote:
> Now for something that has to do both with syscalls and filesystems...
>
> My operating system relies heavily upon hardlinks, and, amongst others,
> directory hardlinks. (Yes, that's what my ext2l partitions are for. Not
> only, but this is part of it).
>
> To allow distinguishing them from true directories, I've introduced a
> value S_IFDHL equal to 0130000, both for the file mode in stat(2) and
> for the type_entry field in the dirent structure, in getdents64(2).

Not without more information about what the value means, and what it
would be used for. See my previous comments about why reserving code
points for random personal projects is not a something which is at all
scalable.

- Ted

2014-04-15 20:53:29

by Emmanuel Colbus

[permalink] [raw]
Subject: Re: [RFC][6/11][MANUX] Kernel compatibility : directory hardlinks

Le 15/04/2014 22:06, Theodore Ts'o a ?crit :
> On Tue, Apr 15, 2014 at 03:43:01PM +0200, Emmanuel Colbus wrote:
>> Now for something that has to do both with syscalls and filesystems...
>>
>> My operating system relies heavily upon hardlinks, and, amongst others,
>> directory hardlinks. (Yes, that's what my ext2l partitions are for. Not
>> only, but this is part of it).
>>
>> To allow distinguishing them from true directories, I've introduced a
>> value S_IFDHL equal to 0130000, both for the file mode in stat(2) and
>> for the type_entry field in the dirent structure, in getdents64(2).
>
> Not without more information about what the value means, and what it
> would be used for. See my previous comments about why reserving code
> points for random personal projects is not a something which is at all
> scalable.
>
> - Ted
>

Well, I can give you this information, but first, I would like to
mention that, since Alan Cox has pointed out the fact that the best
thing for me was to simply use a modified ELF header and route my own
syscalls this way, this information has become completely irrelevant. I
mean, since this value would only appear in my little personal ext2l
partitions, and in my own little syscalls, there is no point for you to
do anything anymore, not even reserve it. So, to make it clear, I fully
retract my previous demand.

Now, to give you this information, if you're still interested :

The value means that the file is not a true directory, but a directory
hardlink. Directory hardlinks, which only appear in my ro-compatible
ext2l partitions, are special files that have no content, and simply
point to a directory inode by using its inode number. This value is
simply stored within the fragment address, as my ext2l partitions don't
support fragmentation. As for the kernel, it uses these a little bit
like automatic mountpoint that can't cross partition limits.

But, as I said, that's only a theoretical point now.

Emmanuel

2014-04-15 22:01:24

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC][6/11][MANUX] Kernel compatibility : directory hardlinks

On Tue, Apr 15, 2014 at 10:53:24PM +0200, Emmanuel Colbus wrote:
> Well, I can give you this information, but first, I would like to
> mention that, since Alan Cox has pointed out the fact that the best
> thing for me was to simply use a modified ELF header and route my own
> syscalls this way, this information has become completely irrelevant. I
> mean, since this value would only appear in my little personal ext2l
> partitions, and in my own little syscalls, there is no point for you to
> do anything anymore, not even reserve it. So, to make it clear, I fully
> retract my previous demand.

The ELF header works fine for your own programs. The file system
format changes only matter if you care about interoperability or
future proofing with ext4. If it's only for a toy operating system,
then it won't matter, of course. But if you're going to depend on
e2fsprogs, or a version of e2fsprogs with your local changes, it's
going to be on you to make it work.

> This value is
> simply stored within the fragment address, as my ext2l partitions don't
> support fragmentation. As for the kernel, it uses these a little bit
> like automatic mountpoint that can't cross partition limits.

We currently using the fragment address for anything (yet), but that
could change in the future, as it's the last unallocated 32-bit field
in the inode. (I suspect we'll end up using it to support per-block
metdadata, which would be needed to support data block checksums and
reflinks, among other things).

The fragment number is currently being used to support file systems
larger than 16TB (i_file_acl_high).

> The value means that the file is not a true directory, but a directory
> hardlink. Directory hardlinks, which only appear in my ro-compatible
> ext2l partitions, are special files that have no content, and simply
> point to a directory inode by using its inode number.

I'm not sure what's the value of having a directory hard link given
the existence of symlinks. I undersatnd what the difference is, but
what value does it give to an end user? It's confusing, and if there
is a directory hard link to a directory, you won't be able to delete
the directory, lest you leave a dangling reference (and you can't just
remove the primary link to the directory, since then the ".." entry in
the directory will be pointing to the old parent directory). That
makes hard links of files fundamentally different from a directory
hard link, which will be even more confusing to users.

But if I were going to do such an insane thing, instead of trying to
do it by repurposing an inode field and using an inode, I'd probably
do it by using a bit in the file_type field of the directory entry to
indicate that it's this special "directory hard link" thing. This
doesn't solve the semantic questions of what happens if you want to
delete a directory that has one or more hard links to it, though.

Regards,

- Ted

2014-04-15 23:12:59

by Emmanuel Colbus

[permalink] [raw]
Subject: Re: [RFC][6/11][MANUX] Kernel compatibility : directory hardlinks

Le 16/04/2014 00:01, Theodore Ts'o a ?crit :
> On Tue, Apr 15, 2014 at 10:53:24PM +0200, Emmanuel Colbus wrote:

>
>> The value means that the file is not a true directory, but a directory
>> hardlink. Directory hardlinks, which only appear in my ro-compatible
>> ext2l partitions, are special files that have no content, and simply
>> point to a directory inode by using its inode number.
>
> I'm not sure what's the value of having a directory hard link given
> the existence of symlinks. I undersatnd what the difference is, but
> what value does it give to an end user? It's confusing, and if there
> is a directory hard link to a directory, you won't be able to delete
> the directory, lest you leave a dangling reference (and you can't just
> remove the primary link to the directory, since then the ".." entry in
> the directory will be pointing to the old parent directory). That
> makes hard links of files fundamentally different from a directory
> hard link, which will be even more confusing to users.

As for the end users, they aren't supposed to create any : this is a
privileged operation. However, for the operating system, this is very
useful.

Consider for example a program that stores some data within its /tmp. In
my OS, every process is chrooted, so he needs to have its own /tmp. But
then, this raises several issues :
1) Some applications use /tmp as a means of communication between each
other. POSIX explicitely allows this. So how is this supposed to work if
two applications, in two different chroots, try to communicate in this way?
2) How is this /tmp directory supposed to be cleaned when the system
shuts down?

The solution is to use directory hardlinks. This way :
- applications that want to communicate through /tmp can simply specify
that they have a dependancy on a package that provides a /tmp directory
for them; this way, the package manager will create a directory hardlink
named "/tmp" towards it in their chroot, and they will be able to do
their thing;
- as for the cleaning of /tmp, it is done by having all the /tmp
directories of all the applications hardlinked within the chroot of the
script tasked with cleaning them.

Symlinks, of course, wouldn't work, as they would require giving at
least one application access to the other's root.

>
> But if I were going to do such an insane thing, instead of trying to
> do it by repurposing an inode field and using an inode, I'd probably
> do it by using a bit in the file_type field of the directory entry to
> indicate that it's this special "directory hard link" thing.

Well, I'm using the underlying file as a mount point, so not having any
file seems tricky. And I'm considering using the file's *content* to
allow turning them into automatic cross-partition hardlinks. (Which
would certainly not be very *hard* -links anymore, but I think I
wouldn't be the first OS to implement not-so-hard hardlinks).

This
> doesn't solve the semantic questions of what happens if you want to
> delete a directory that has one or more hard links to it, though.

Deleting a directory that has one or more hardlinks towards it simply
fails. (In my ext2l partitions, in the case of directories, I have used
the fragment address field as a counter of hardlinks). My directory
hardlinks are fundamentally different from other hardlinks, both in
their implementation (an additional inode type instead of an additional
directory entry) and in their behaviour (the first hardlink is different
from the other ones).

Regards,

Emmanuel

2014-04-15 23:34:38

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC][6/11][MANUX] Kernel compatibility : directory hardlinks

On Wed, Apr 16, 2014 at 01:12:53AM +0200, Emmanuel Colbus wrote:
> The solution is to use directory hardlinks. This way :
> - applications that want to communicate through /tmp can simply specify
> that they have a dependancy on a package that provides a /tmp directory
> for them; this way, the package manager will create a directory hardlink
> named "/tmp" towards it in their chroot, and they will be able to do
> their thing;
> - as for the cleaning of /tmp, it is done by having all the /tmp
> directories of all the applications hardlinked within the chroot of the
> script tasked with cleaning them.


That's what "bind mounts" in Linux are used for.

If you haven't studied how bind mounts and mount namespaces work in
Linux, I'd strongly encourage that you take a look. First of all,
it's a much more powerful system, and secondly, that's what most
container based systems are using.

Cheers,

- Ted

2014-04-16 02:14:37

by Emmanuel Colbus

[permalink] [raw]
Subject: Re: [RFC][6/11][MANUX] Kernel compatibility : directory hardlinks

Le 16/04/2014 01:34, Theodore Ts'o a ?crit :
> On Wed, Apr 16, 2014 at 01:12:53AM +0200, Emmanuel Colbus wrote:
>> The solution is to use directory hardlinks. This way :
>> - applications that want to communicate through /tmp can simply specify
>> that they have a dependancy on a package that provides a /tmp directory
>> for them; this way, the package manager will create a directory hardlink
>> named "/tmp" towards it in their chroot, and they will be able to do
>> their thing;
>> - as for the cleaning of /tmp, it is done by having all the /tmp
>> directories of all the applications hardlinked within the chroot of the
>> script tasked with cleaning them.
>
>
> That's what "bind mounts" in Linux are used for.
>
> If you haven't studied how bind mounts and mount namespaces work in
> Linux, I'd strongly encourage that you take a look. First of all,
> it's a much more powerful system, and secondly, that's what most
> container based systems are using.

Yes, I know about bind mounts; in fact, I've even used them as a Linux
sysadmin, and I thought about using them here. My issues with them
were :

- bind mounts are not persistant (they have to be re-instated at every
boot);
- I would have needed plenty of them (if a computer had 10.000
applications that each used a /tmp directory, then this would have
required 10.000 bind mounts),
- the initialization of the operating system would have been impossible.
Who would have created the bind mounts required for the proper operation
of init?
- As an annoying note, the creation of the bind mounts would have
required giving mount an access to, de facto, any directory in the system;
- More problematically, giving the mount process this access would have
required either to install mount, and all its dependancies, unchrooted
and completely outside of the whole architecture, or using something to
give mount this access from within its chroot. The annoying thing being
that this "something" needed would have been, uh, a bind mount.
- and finally, unless I'm mistaken, bind mount do not stack. That is, if
you have three groups of directories, a, b, and c, and you bind mount c
into b/c and b into a/b, you won't be able to access c as a/b/c. Thus,
the amount of bind mounts required to do this would have been clearly
astronomic. (1)


In addition, I had already determined that, to do what I wanted to do, I
would need a new kind of partition (for the rootlinks). So I decided I
could also go with directory hardlinks.

However, I *did* recognize that bind mount were very close to what I
needed. That's why I created my directory hardlinks in this way, as, in
fact, "static" bind mounts.


Regards,

Emmanuel



(1) By the way, don't worry, I didn't forget to put a security
preventing circular mounts, even with intermediary steps :-)