2003-01-10 03:00:40

by Peter Chubb

[permalink] [raw]
Subject: ISO-9660 Rock Ridge gives different links different inums


In linux 2.5.54, multiple links to the same file on a rock-ridge CD
have different inode numbers. This confuses cpio, tar and cp -ra
because the multiple links are each copied separately as a single file.

It'll probably also confuse NFS, but I haven't tried that.

Example from the knoppix CD:

$ ls -il gunzip gzip uncompress zcat
1896278 -rwxr-xr-x 4 root root 49256 Oct 10 01:31 gunzip
1896564 -rwxr-xr-x 4 root root 49256 Oct 10 01:31 gzip
1902292 -rwxr-xr-x 4 root root 49256 Oct 10 01:31 uncompress
1902856 -rwxr-xr-x 4 root root 49256 Oct 10 01:31 zcat

(For comparison, here's what I see on XFS:
100663485 -rwxr-xr-x 4 root root 49288 Nov 7 11:37 gunzip
100663485 -rwxr-xr-x 4 root root 49288 Nov 7 11:37 gzip
100663485 -rwxr-xr-x 4 root root 49288 Nov 7 11:37 uncompress
100663485 -rwxr-xr-x 4 root root 49288 Nov 7 11:37 zcat
)


Currently the inode number appears to be the offset in bytes from the start of
the file system to the iso directory entry. Files with multiple
directory entries (i.e., links) therefore have different inums.

I don't know enough about the ISO9660 standard to be sure what's best
to do about this.

--
Dr Peter Chubb [email protected]
You are lost in a maze of BitKeeper repositories, all almost the same.


2003-01-10 03:15:13

by Andrew McGregor

[permalink] [raw]
Subject: Re: ISO-9660 Rock Ridge gives different links different inums



--On Friday, January 10, 2003 14:08:59 +1100 Peter Chubb
<[email protected]> wrote:

>
> In linux 2.5.54, multiple links to the same file on a rock-ridge CD
> have different inode numbers. This confuses cpio, tar and cp -ra
> because the multiple links are each copied separately as a single file.
>
> It'll probably also confuse NFS, but I haven't tried that.

Shouldn't do, but it will probably make the buffer cache on the server less
effective.

> Currently the inode number appears to be the offset in bytes from the
> start of the file system to the iso directory entry. Files with multiple
> directory entries (i.e., links) therefore have different inums.
>
> I don't know enough about the ISO9660 standard to be sure what's best
> to do about this.

Change it to be the offset to the data area, which should be the same for
all of them?

>
> --
> Dr Peter Chubb [email protected]
> You are lost in a maze of BitKeeper repositories, all almost the same.

2003-01-10 03:26:39

by Peter Chubb

[permalink] [raw]
Subject: Re: ISO-9660 Rock Ridge gives different links different inums

>>>>> "Andrew" == Andrew McGregor <[email protected]> writes:

Andrew> --On Friday, January 10, 2003 14:08:59 +1100 Peter Chubb
Andrew> <[email protected]> wrote:

>> In linux 2.5.54, multiple links to the same file on a rock-ridge CD
>> have different inode numbers. This confuses cpio, tar and cp -ra
>> because the multiple links are each copied separately as a single
>> file.
>>
>> It'll probably also confuse NFS, but I haven't tried that.

Andrew> Shouldn't do, but it will probably make the buffer cache on
Andrew> the server less effective.

>> Currently the inode number appears to be the offset in bytes from
>> the start of the file system to the iso directory entry. Files
>> with multiple directory entries (i.e., links) therefore have
>> different inums.
>>
>> I don't know enough about the ISO9660 standard to be sure what's
>> best to do about this.

Andrew> Change it to be the offset to the data area, which should be
Andrew> the same for all of them?

I thought about that, but I'm unsure if there's any way to get from
that offset to the directory information. As far as I can tell,
there's no concept of an inode separate from directory entry on iso9660
--- the directory entry/entries all contain all the information that
describes a file. Which means that the inumber has to point to some
directory node.

Preferably, all the inumbers for the same file would point to the same
directory entry; but I can see no easy way to do that. Keeping an
in-memory table for files with multiple links might be the best way,
as there aren't that many on a typical filesystem.

--
Dr Peter Chubb [email protected]
You are lost in a maze of BitKeeper repositories, all almost the same.

2003-01-10 06:32:50

by Denis Vlasenko

[permalink] [raw]
Subject: Re: ISO-9660 Rock Ridge gives different links different inums

On 10 January 2003 05:34, Peter Chubb wrote:
> >> I don't know enough about the ISO9660 standard to be sure what's
> >> best to do about this.
>
> Andrew> Change it to be the offset to the data area, which should be
> Andrew> the same for all of them?
>
> I thought about that, but I'm unsure if there's any way to get from
> that offset to the directory information. As far as I can tell,
> there's no concept of an inode separate from directory entry on
> iso9660 --- the directory entry/entries all contain all the
> information that describes a file. Which means that the inumber has
> to point to some directory node.
>
> Preferably, all the inumbers for the same file would point to the
> same directory entry; but I can see no easy way to do that. Keeping
> an in-memory table for files with multiple links might be the best
> way, as there aren't that many on a typical filesystem.

And what will happen on a non-typical filesystem with 1 million hardlinks?

The root of the problem is a fundamental layering violation in
traditional Unix filesystems: inode numbers should NOT be visible
to userspace. Userspace just needs a way to tell hardlinks from separate
files, that's all. Exposing inumbers does that, but creates tons
of problems for filesystems which do NOT have such a concept.

There is at least one way to redesign it:
* provide hash number instead of an inumber for each file
with the following semantics:
- hardlinks ALWAYS have equal hash numbers
- different files MAY have equal hash numbers (but rarely)
* provide is_hardlink(file1,file2) system call

But this will cause very long migration period (~10 years?)
and incompatibilities with other Unix variants...
--
vda

2003-01-10 08:48:33

by Peter Chubb

[permalink] [raw]
Subject: Re: ISO-9660 Rock Ridge gives different links different inums

>>>>> "Denis" == Denis Vlasenko <[email protected]> writes:

Denis> On 10 January 2003 05:34, Peter Chubb wrote:
>> Preferably, all the inumbers for the same file would point to the
>> same directory entry; but I can see no easy way to do that.
>> Keeping an in-memory table for files with multiple links might be
>> the best way, as there aren't that many on a typical filesystem.

Denis> And what will happen on a non-typical filesystem with 1 million
Denis> hardlinks?

Denis> The root of the problem is a fundamental layering violation in
Denis> traditional Unix filesystems: inode numbers should NOT be
Denis> visible to userspace. Userspace just needs a way to tell
Denis> hardlinks from separate files, that's all. Exposing inumbers
Denis> does that, but creates tons of problems for filesystems which
Denis> do NOT have such a concept.

The problem is that in Unix the fundamental identity of a file is
the tuple (blkdev, inum); names are merely indices (links) that resolve to
that tuple. Personally, I'd swap to a pair of system calls to map
name to (blkdev, inum), and open(blkdev, inum). Think of the inode
number as a unique within-filesystem index.

--
Dr Peter Chubb [email protected]
You are lost in a maze of BitKeeper repositories, all almost the same.

2003-01-10 08:53:45

by Denis Vlasenko

[permalink] [raw]
Subject: Re: ISO-9660 Rock Ridge gives different links different inums

On 10 January 2003 10:56, Peter Chubb wrote:
> >>>>> "Denis" == Denis Vlasenko <[email protected]>
> >>>>> writes:
>
> Denis> On 10 January 2003 05:34, Peter Chubb wrote:
> >> Preferably, all the inumbers for the same file would point to the
> >> same directory entry; but I can see no easy way to do that.
> >> Keeping an in-memory table for files with multiple links might be
> >> the best way, as there aren't that many on a typical filesystem.
>
> Denis> And what will happen on a non-typical filesystem with 1
> million Denis> hardlinks?
>
> Denis> The root of the problem is a fundamental layering violation in
> Denis> traditional Unix filesystems: inode numbers should NOT be
> Denis> visible to userspace. Userspace just needs a way to tell
> Denis> hardlinks from separate files, that's all. Exposing inumbers
> Denis> does that, but creates tons of problems for filesystems which
> Denis> do NOT have such a concept.
>
> The problem is that in Unix the fundamental identity of a file is
> the tuple (blkdev, inum); names are merely indices (links) that
> resolve to that tuple.

You are right. It is designed this way. This design is wrong.

> Personally, I'd swap to a pair of system
> calls to map name to (blkdev, inum), and open(blkdev, inum). Think
> of the inode number as a unique within-filesystem index.

This does not fix the design.
--
vda

2003-01-11 15:23:00

by Horst von Brand

[permalink] [raw]
Subject: Re: ISO-9660 Rock Ridge gives different links different inums

Peter Chubb <[email protected]> said:

[...]

> The problem is that in Unix the fundamental identity of a file is
> the tuple (blkdev, inum); names are merely indices (links) that resolve to
> that tuple. Personally, I'd swap to a pair of system calls to map
> name to (blkdev, inum), and open(blkdev, inum). Think of the inode
> number as a unique within-filesystem index.

That way any joker can go ahead and open any file, without any regard to
permission bits on the directories that lead there. Not nice.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513

2003-01-13 22:09:46

by Bill Davidsen

[permalink] [raw]
Subject: Re: ISO-9660 Rock Ridge gives different links different inums

On Fri, 10 Jan 2003, Peter Chubb wrote:

> >>>>> "Andrew" == Andrew McGregor <[email protected]> writes:

> Andrew> Change it to be the offset to the data area, which should be
> Andrew> the same for all of them?
>
> I thought about that, but I'm unsure if there's any way to get from
> that offset to the directory information. As far as I can tell,
> there's no concept of an inode separate from directory entry on iso9660
> --- the directory entry/entries all contain all the information that
> describes a file. Which means that the inumber has to point to some
> directory node.

I can see that you would have to carry that information forward to the
"inode" if you used the data area address, for stat that's probaby not an
issue, for open after you open the file you don't really need access
checking and the times on a CD don't change.

What's the case where you are starting with an inode and trying to get to
a filename without having gone through a dir entry to the inode? No one is
running things like dump/restore on iso9660 I hope!

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2003-01-13 23:04:29

by Peter Chubb

[permalink] [raw]
Subject: Re: ISO-9660 Rock Ridge gives different links different inums

>>>>> "Bill" == Bill Davidsen <[email protected]> writes:

Bill> On Fri, 10 Jan 2003, Peter Chubb wrote:
>> >>>>> "Andrew" == Andrew McGregor <[email protected]> writes:

Andrew> Change it to be the offset to the data area, which should be
Andrew> the same for all of them?
>> I thought about that, but I'm unsure if there's any way to get from
>> that offset to the directory information. As far as I can tell,
>> there's no concept of an inode separate from directory entry on
>> iso9660 --- the directory entry/entries all contain all the
>> information that describes a file. Which means that the inumber
>> has to point to some directory node.

Bill> I can see that you would have to carry that information forward
Bill> to the "inode" if you used the data area address, for stat
Bill> that's probaby not an issue, for open after you open the file
Bill> you don't really need access checking and the times on a CD
Bill> don't change.

In isofs, the on-disc `inode' is an iso_directory_record, which
contains the name as well as describing a single extent.
iso_directory_records are chained together for files that have more
than one extent on disc. The code currently uses iget() to get the
chained iso_directory_records.

Bill> What's the case where you are starting with an inode and trying
Bill> to get to a filename without having gone through a dir entry to
Bill> the inode? No one is running things like dump/restore on iso9660
Bill> I hope!

no it's where you're starting with an inode number, and want to get an
inode. Having looked at the code, now, I think that that's confined
to autofs and internally to the isofs code, so could be worked around.

Maybe we should deprecate iget() ???

--
Dr Peter Chubb [email protected]
You are lost in a maze of BitKeeper repositories, all almost the same.


2003-01-15 20:58:18

by Mark H. Wood

[permalink] [raw]
Subject: Re: ISO-9660 Rock Ridge gives different links different inums

On Fri, 10 Jan 2003, Horst von Brand wrote:
> Peter Chubb <[email protected]> said:
> [...]
> > The problem is that in Unix the fundamental identity of a file is
> > the tuple (blkdev, inum); names are merely indices (links) that resolve to
> > that tuple. Personally, I'd swap to a pair of system calls to map
> > name to (blkdev, inum), and open(blkdev, inum). Think of the inode
> > number as a unique within-filesystem index.
>
> That way any joker can go ahead and open any file, without any regard to
> permission bits on the directories that lead there. Not nice.

Welcome to VMS, which can open files by INDEXF.SYS offset. Some app.s
which create and delete files rapidly never bother to make directory
entries at all. It may not be what you're used to, and it may be contrary
to expected Unix semantics, but it's not unthinkable.

--
Mark H. Wood, Lead System Programmer [email protected]
MS Windows *is* user-friendly, but only for certain values of "user".

2003-01-15 21:23:18

by Jesse Pollard

[permalink] [raw]
Subject: Re: ISO-9660 Rock Ridge gives different links different inums

On Wednesday 15 January 2003 03:07 pm, Mark H. Wood wrote:
> On Fri, 10 Jan 2003, Horst von Brand wrote:
> > Peter Chubb <[email protected]> said:
> > [...]
> >
> > > The problem is that in Unix the fundamental identity of a file is
> > > the tuple (blkdev, inum); names are merely indices (links) that resolve
> > > to that tuple. Personally, I'd swap to a pair of system calls to map
> > > name to (blkdev, inum), and open(blkdev, inum). Think of the inode
> > > number as a unique within-filesystem index.
> >
> > That way any joker can go ahead and open any file, without any regard to
> > permission bits on the directories that lead there. Not nice.
>
> Welcome to VMS, which can open files by INDEXF.SYS offset. Some app.s
> which create and delete files rapidly never bother to make directory
> entries at all. It may not be what you're used to, and it may be contrary
> to expected Unix semantics, but it's not unthinkable.

Or UNICOS, which then restricts the system call to only privileged operation:

This is (I believe) used to optimize a user mode NFS daemon by eliminating
multiple namei translations (plus locking). The process is secure by not
permitting the user to have the same privilege mapping of the daemon (thus
the old "kill nfsd" denial of service attack fails). There are also hints that
this is used to optimize checkpoint/restart capabilities too.

NAME
openi - Opens a file by using the inode number

SYNOPSIS
int openi (long dev, long ino, long gen, long uflag);

IMPLEMENTATION
Cray PVP systems

DESCRIPTION
The openi system call presents the user with a flat view of all native
UNICOS file systems currently mounted. Rather than use the directory
tree structure to search through directories for a file, openi
provides access by inode number.

The openi system call accepts the following arguments:

dev Specifies the device number as built by the makedev macro
that is defined outside of the kernel.

ino Specifies an inode number for the file as reported by the ls
-i command.

gen Specifies the generation number of the inode. This provides
a unique identification for a specific file. The generation
number changes when an inode is reused. To print the inode
generation values, use the fck(1) command with the -i and -l
options.

uflag Specifies the open flags. These are bit values of the form
O_name that are defined in the fcntl.h file.

Character, block, and FIFO special files are not allowed. Specifying
a dev and ino pair that point to one of these will produce an EINVAL
error code.

NOTES
Only a process with appropriate privilege can use this system call.

If the PRIV_SU configuration option is enabled, the super user is
allowed to use this system call.

A process with the PRIV_MAC_READ and PRIV_DAC_OVERRIDE effective
privileges are allowed to use this system call. See the effective
privilege discussion in the NOTES section of the open(2) man page for
additional privilege requirements. The open(2) search access
discussions do not apply to this system call.

RETURN VALUES
If openi completes successfully, a nonnegative integer is returned
which may be used in further I/O operations. Otherwise, openi returns
a negative value, and errno is set to indicate the error.

ERRORS
The openi system call fails to open the specified file if the
following error condition or one of those listed on the open(2) man
page occurs.

Error Code Description

EINVAL A dev and ino pair point to a character, block, or
FIFO special file. The openi(2) system call does
not work with these types of files.

--
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.