2002-09-05 00:31:47

by Hans Reiser

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

David S. Miller wrote:

> From: Hans Reiser <[email protected]>
> Date: Thu, 05 Sep 2002 00:31:36 +0400
>
> The proper fix should be to make the result of the limit
> computation be accurately architecture specific.
>
>And then each and every Reiserfs partition is platform specific
>and cannot be mounted onto another Linux platform.
>
>Creating such a restriction is a grave error.
>
>
>
>
And you would cripple the 99% usage to aid those users who move disk
drives physically over to a sparc box AND have more than 31k links to a
file?

Or is there some usage pattern that I don't appreciate that makes it
highly likely that people will swap disks into a sparc? Maybe these hot
plug ruggedized with plastic covers IDE disks that there was some press
about being a substitute for DVDs a few months ago?

Frankly, I had doubts about our code that causes CPU order to not be
used in the disk format, but I was persuaded that it was not a
measurable performance loss to do it and said yes.

I recognize that you may see things from a perspective that I have not
experienced, so please articulate on that if so.

Hans


2002-09-05 00:34:43

by David Miller

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

From: Hans Reiser <[email protected]>
Date: Thu, 05 Sep 2002 04:36:15 +0400

And you would cripple the 99% usage to aid those users who move disk
drives physically over to a sparc box AND have more than 31k links to a
file?

This is not a sparc or a x86 or x86_64 or ia64 thing.

It's a global thing.

It's about being portable and clean and not installing unnecessary
restrictions.

2002-09-05 00:43:03

by Chris Mason

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

On Wed, 2002-09-04 at 20:36, Hans Reiser wrote:
> David S. Miller wrote:
>
> > From: Hans Reiser <[email protected]>
> > Date: Thu, 05 Sep 2002 00:31:36 +0400
> >
> > The proper fix should be to make the result of the limit
> > computation be accurately architecture specific.
> >
> >And then each and every Reiserfs partition is platform specific
> >and cannot be mounted onto another Linux platform.
> >
> >Creating such a restriction is a grave error.
> >
> >
> >
> >
> And you would cripple the 99% usage to aid those users who move disk
> drives physically over to a sparc box AND have more than 31k links to a
> file?

31k links to a file isn't really an issue, I really doubt anyone out
there is doing something like that.

31k links on a directory is a bigger problem, since each subdir is a
link. The good news is that reiserfs already works around this by
setting the link count to 1 and doing other checks to make sure a
directory really is empty.

My point isn't that we should not change the link max, it is that
changing the link max is not sufficient. Portability to sparc doesn't
matter one bit if it means breaking existing i386 users. Our disk
format has link counts > 32k, so any reiserfs fixes for this need to
expect those larger values on disk and play nicely with them.

-chris


2002-09-05 05:39:36

by David Miller

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

From: Tomas Szepe <[email protected]>
Date: Thu, 5 Sep 2002 07:40:08 +0200

> Our disk format has link counts > 32k

Does the internal reiserfs nlink value translate directly
to what stat() puts in st_nlink?

It really doesn't matter. Even if you have some huge value that can't
be represented in st_nlink, you can report to the user that st_nlink
is NLINK_MAX.

This is one possible solution to this whole problem.

2002-09-05 05:35:53

by Tomas Szepe

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

> Our disk format has link counts > 32k

Does the internal reiserfs nlink value translate directly
to what stat() puts in st_nlink?

2002-09-05 05:44:48

by Tomas Szepe

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

> From: Tomas Szepe <[email protected]>
> Date: Thu, 5 Sep 2002 07:40:08 +0200
>
> > Our disk format has link counts > 32k
>
> Does the internal reiserfs nlink value translate directly
> to what stat() puts in st_nlink?
>
> It really doesn't matter. Even if you have some huge value that can't
> be represented in st_nlink, you can report to the user that st_nlink
> is NLINK_MAX.
>
> This is one possible solution to this whole problem.

And a pretty straightforward one, too. Convert the internal reiserfs
link stuff to an unsigned short, find NLINK_MAX using the code I posted
last night (or maybe simply grab it from userspace includes) and add
a check to your stat() code to return NLINK_MAX if necessary.

T.

2002-09-05 05:48:17

by David Miller

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

From: Tomas Szepe <[email protected]>
Date: Thu, 5 Sep 2002 07:48:58 +0200

And a pretty straightforward one, too. Convert the internal reiserfs
link stuff to an unsigned short, find NLINK_MAX using the code I posted
last night (or maybe simply grab it from userspace includes) and add
a check to your stat() code to return NLINK_MAX if necessary.

Whose stat() code? These go straight to userspace via normal
syscalls.

It has to be done generically, in the VFS or reiserfs.

2002-09-05 05:52:04

by Oleg Drokin

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

Hello!

On Thu, Sep 05, 2002 at 07:48:58AM +0200, Tomas Szepe wrote:
> > Does the internal reiserfs nlink value translate directly
> > to what stat() puts in st_nlink?
> > It really doesn't matter. Even if you have some huge value that can't
> > be represented in st_nlink, you can report to the user that st_nlink
> > is NLINK_MAX.
> > This is one possible solution to this whole problem.
> And a pretty straightforward one, too. Convert the internal reiserfs
> link stuff to an unsigned short, find NLINK_MAX using the code I posted

Too bad it is 32bit nlink field in on disk format ;)

> last night (or maybe simply grab it from userspace includes) and add
> a check to your stat() code to return NLINK_MAX if necessary.

Ok, I think I will rework it to something sensible, because current code is
somewhat a mess and corrupt correct nlink value on overflows. Hm.

Bye,
Oleg

2002-09-05 05:55:23

by Tomas Szepe

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

> > > Does the internal reiserfs nlink value translate directly
> > > to what stat() puts in st_nlink?
> > > It really doesn't matter. Even if you have some huge value that can't
> > > be represented in st_nlink, you can report to the user that st_nlink
> > > is NLINK_MAX.
> > > This is one possible solution to this whole problem.
> > And a pretty straightforward one, too. Convert the internal reiserfs
> > link stuff to an unsigned short, find NLINK_MAX using the code I posted
>
> Too bad it is 32bit nlink field in on disk format ;)

Oh! What a confession. :)

> > last night (or maybe simply grab it from userspace includes) and add
> > a check to your stat() code to return NLINK_MAX if necessary.
>
> Ok, I think I will rework it to something sensible, because current code is
> somewhat a mess and corrupt correct nlink value on overflows. Hm.

Good.

2002-09-05 05:55:06

by David Miller

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

From: Oleg Drokin <[email protected]>
Date: Thu, 5 Sep 2002 09:56:38 +0400

On Thu, Sep 05, 2002 at 07:48:58AM +0200, Tomas Szepe wrote:
> > Does the internal reiserfs nlink value translate directly
> > to what stat() puts in st_nlink?
> > It really doesn't matter. Even if you have some huge value that can't
> > be represented in st_nlink, you can report to the user that st_nlink
> > is NLINK_MAX.
> > This is one possible solution to this whole problem.
>
> And a pretty straightforward one, too. Convert the internal reiserfs
> link stuff to an unsigned short, find NLINK_MAX using the code I posted

Too bad it is 32bit nlink field in on disk format ;)

We're only talking about what the user is told is the
nlink value, not what you happen to compute and put/get
from disk.

Your nlink can be legitimately 128-bits if you want, it
still can be made to work :-)

> last night (or maybe simply grab it from userspace includes) and add
> a check to your stat() code to return NLINK_MAX if necessary.

Ok, I think I will rework it to something sensible, because current code is
somewhat a mess and corrupt correct nlink value on overflows. Hm.

Ok.

Franks a lot,
David S. Miller
[email protected]

2002-09-05 06:03:16

by Oleg Drokin

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

Hello!

On Wed, Sep 04, 2002 at 10:52:34PM -0700, David S. Miller wrote:

> > And a pretty straightforward one, too. Convert the internal reiserfs
> > link stuff to an unsigned short, find NLINK_MAX using the code I posted
> Too bad it is 32bit nlink field in on disk format ;)
> We're only talking about what the user is told is the
> nlink value, not what you happen to compute and put/get
> from disk.

Yes, I just parsed that suggestion above incorrectly ;)

> Your nlink can be legitimately 128-bits if you want, it
> still can be made to work :-)

Yes, I know. That's what I plan to do ;)

Bye,
Oleg

2002-09-05 09:42:21

by Nikita Danilov

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

David S. Miller writes:
> From: Tomas Szepe <[email protected]>
> Date: Thu, 5 Sep 2002 07:48:58 +0200
>
> And a pretty straightforward one, too. Convert the internal reiserfs
> link stuff to an unsigned short, find NLINK_MAX using the code I posted
> last night (or maybe simply grab it from userspace includes) and add
> a check to your stat() code to return NLINK_MAX if necessary.
>
> Whose stat() code? These go straight to userspace via normal
> syscalls.

In 2.5 is goes through ->getattr() method.

>
> It has to be done generically, in the VFS or reiserfs.

Nikita.

2002-09-05 09:50:16

by Oleg Drokin

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

Hello!

On Wed, Sep 04, 2002 at 10:36:51PM -0700, David S. Miller wrote:
> > Our disk format has link counts > 32k
> Does the internal reiserfs nlink value translate directly
> to what stat() puts in st_nlink?
> It really doesn't matter. Even if you have some huge value that can't
> be represented in st_nlink, you can report to the user that st_nlink
> is NLINK_MAX.

Ok, since I really like this approach, below is the patch (for 2.4) that
demonstrates my solution.
Also it correctly calculates maximal number given type may hold ( does not work
with unsigned long long, though) with my own way ;)

FS compatibility is still intact and older kernel will work.
If nobody will object against this code, then I'll implement something
similar for 2.5 too and we will fix reiserfsck to understand this correct
nlink accounting.
The only downside of this patch is that v3.5 filesystems won't allow to
create more than 64k-2 subdirs in a dir, like old code did, but this is
not an issue I think. (already existent dirs should still work)

Bye,
Oleg

===== fs/reiserfs/inode.c 1.35 vs edited =====
--- 1.35/fs/reiserfs/inode.c Fri Aug 9 19:22:33 2002
+++ edited/fs/reiserfs/inode.c Thu Sep 5 13:43:26 2002
@@ -891,7 +891,8 @@
set_inode_item_key_version (inode, KEY_FORMAT_3_5);
set_inode_sd_version (inode, STAT_DATA_V1);
inode->i_mode = sd_v1_mode(sd);
- inode->i_nlink = sd_v1_nlink(sd);
+ inode->i_nlink = ( sd_v1_nlink(sd) > REISERFS_LINK_MAX ) ? REISERFS_LINK_MAX:sd_v1_nlink(sd);
+ inode->u.reiserfs_i.i_nlink_real = sd_v1_nlink(sd);
inode->i_uid = sd_v1_uid(sd);
inode->i_gid = sd_v1_gid(sd);
inode->i_size = sd_v1_size(sd);
@@ -923,7 +924,8 @@
struct stat_data * sd = (struct stat_data *)B_I_PITEM (bh, ih);

inode->i_mode = sd_v2_mode(sd);
- inode->i_nlink = sd_v2_nlink(sd);
+ inode->i_nlink = (sd_v2_nlink(sd)>REISERFS_LINK_MAX)?REISERFS_LINK_MAX:sd_v2_nlink(sd);
+ inode->u.reiserfs_i.i_nlink_real = sd_v2_nlink(sd);
inode->i_uid = sd_v2_uid(sd);
inode->i_size = sd_v2_size(sd);
inode->i_gid = sd_v2_gid(sd);
@@ -975,7 +977,7 @@
__u16 flags;

set_sd_v2_mode(sd_v2, inode->i_mode );
- set_sd_v2_nlink(sd_v2, inode->i_nlink );
+ set_sd_v2_nlink(sd_v2, inode->u.reiserfs_i.i_nlink_real );
set_sd_v2_uid(sd_v2, inode->i_uid );
set_sd_v2_size(sd_v2, inode->i_size );
set_sd_v2_gid(sd_v2, inode->i_gid );
@@ -1001,7 +1003,7 @@
set_sd_v1_mode(sd_v1, inode->i_mode );
set_sd_v1_uid(sd_v1, inode->i_uid );
set_sd_v1_gid(sd_v1, inode->i_gid );
- set_sd_v1_nlink(sd_v1, inode->i_nlink );
+ set_sd_v1_nlink(sd_v1, inode->u.reiserfs_i.i_nlink_real );
set_sd_v1_size(sd_v1, inode->i_size );
set_sd_v1_atime(sd_v1, inode->i_atime );
set_sd_v1_ctime(sd_v1, inode->i_ctime );
@@ -1537,7 +1539,7 @@

/* fill stat data */
inode->i_mode = mode;
- inode->i_nlink = (S_ISDIR (mode) ? 2 : 1);
+ inode->u.reiserfs_i.i_nlink_real = inode->i_nlink = (S_ISDIR (mode) ? 2 : 1);
inode->i_uid = current->fsuid;
if (dir->i_mode & S_ISGID) {
inode->i_gid = dir->i_gid;
===== fs/reiserfs/namei.c 1.24 vs edited =====
--- 1.24/fs/reiserfs/namei.c Fri Aug 9 19:22:33 2002
+++ edited/fs/reiserfs/namei.c Thu Sep 5 13:34:43 2002
@@ -8,8 +8,16 @@
#include <linux/reiserfs_fs.h>
#include <linux/smp_lock.h>

-#define INC_DIR_INODE_NLINK(i) if (i->i_nlink != 1) { i->i_nlink++; if (i->i_nlink >= REISERFS_LINK_MAX) i->i_nlink=1; }
-#define DEC_DIR_INODE_NLINK(i) if (i->i_nlink != 1) i->i_nlink--;
+#define INC_INODE_NLINK(i) { if (i->i_nlink < REISERFS_LINK_MAX) i->i_nlink++; i->u.reiserfs_i.i_nlink_real++; }
+#define DEC_INODE_NLINK(i) {if ( --i->u.reiserfs_i.i_nlink_real < REISERFS_LINK_MAX) i->i_nlink--;}
+
+// v3.5 files have 16bit nlink_t, v3.6 files have 32bit nlink_t.
+#define CAN_INCREASE_NLINK(i) ( i->u.reiserfs_i.i_nlink_real < ((get_inode_item_key_version (inode) != KEY_FORMAT_3_5)?MAX_UL_INT:MAX_US_INT))
+
+// Compatibility stuff with old trick that allowed to have lots of subdirs in
+// one dir. Such dirs had 1 as their nlink count.
+#define INC_DIR_INODE_NLINK(i) { if (i->i_nlink != 1) INC_INODE_NLINK(i);}
+#define DEC_DIR_INODE_NLINK(i) { if (i->i_nlink != 1) DEC_INODE_NLINK(i);}

// directory item contains array of entry headers. This performs
// binary search through that array
@@ -518,7 +526,7 @@
retval = reiserfs_add_entry (&th, dir, dentry->d_name.name, dentry->d_name.len,
inode, 1/*visible*/);
if (retval) {
- inode->i_nlink--;
+ DEC_INODE_NLINK(inode);
reiserfs_update_sd (&th, inode);
pop_journal_writer(windex) ;
// FIXME: should we put iput here and have stat data deleted
@@ -569,7 +577,7 @@
retval = reiserfs_add_entry (&th, dir, dentry->d_name.name, dentry->d_name.len,
inode, 1/*visible*/);
if (retval) {
- inode->i_nlink--;
+ DEC_INODE_NLINK(inode);
reiserfs_update_sd (&th, inode);
pop_journal_writer(windex) ;
journal_end(&th, dir->i_sb, jbegin_count) ;
@@ -613,7 +621,7 @@
dentry, inode, &retval);
if (!inode) {
pop_journal_writer(windex) ;
- dir->i_nlink-- ;
+ DEC_DIR_INODE_NLINK(dir);
journal_end(&th, dir->i_sb, jbegin_count) ;
return retval;
}
@@ -627,7 +635,7 @@
retval = reiserfs_add_entry (&th, dir, dentry->d_name.name, dentry->d_name.len,
inode, 1/*visible*/);
if (retval) {
- inode->i_nlink = 0;
+ inode->u.reiserfs_i.i_nlink_real = inode->i_nlink = 0;
DEC_DIR_INODE_NLINK(dir);
reiserfs_update_sd (&th, inode);
pop_journal_writer(windex) ;
@@ -711,7 +719,7 @@
if ( inode->i_nlink != 2 && inode->i_nlink != 1 )
printk ("reiserfs_rmdir: empty directory has nlink != 2 (%d)\n", inode->i_nlink);

- inode->i_nlink = 0;
+ inode->u.reiserfs_i.i_nlink_real = inode->i_nlink = 0;
inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
reiserfs_update_sd (&th, inode);

@@ -780,14 +788,14 @@
if (!inode->i_nlink) {
printk("reiserfs_unlink: deleting nonexistent file (%s:%lu), %d\n",
kdevname(inode->i_dev), inode->i_ino, inode->i_nlink);
- inode->i_nlink = 1;
+ inode->u.reiserfs_i.i_nlink_real = inode->i_nlink = 1;
}

retval = reiserfs_cut_from_item (&th, &path, &(de.de_entry_key), dir, NULL, 0);
if (retval < 0)
goto end_unlink;

- inode->i_nlink--;
+ DEC_INODE_NLINK(inode);
inode->i_ctime = CURRENT_TIME;
reiserfs_update_sd (&th, inode);

@@ -868,7 +876,7 @@
retval = reiserfs_add_entry (&th, parent_dir, dentry->d_name.name, dentry->d_name.len,
inode, 1/*visible*/);
if (retval) {
- inode->i_nlink--;
+ DEC_INODE_NLINK(inode);
reiserfs_update_sd (&th, inode);
pop_journal_writer(windex) ;
journal_end(&th, parent_dir->i_sb, jbegin_count) ;
@@ -896,7 +904,7 @@
if (S_ISDIR(inode->i_mode))
return -EPERM;

- if (inode->i_nlink >= REISERFS_LINK_MAX) {
+ if (CAN_INCREASE_NLINK(inode)) {
//FIXME: sd_nlink is 32 bit for new files
return -EMLINK;
}
@@ -917,7 +925,7 @@
return retval;
}

- inode->i_nlink++;
+ INC_INODE_NLINK(inode);
ctime = CURRENT_TIME;
inode->i_ctime = ctime;
reiserfs_update_sd (&th, inode);
@@ -1157,9 +1165,9 @@
if (new_dentry_inode) {
// adjust link number of the victim
if (S_ISDIR(new_dentry_inode->i_mode)) {
- new_dentry_inode->i_nlink = 0;
+ new_dentry_inode->u.reiserfs_i.i_nlink_real = new_dentry_inode->i_nlink = 0;
} else {
- new_dentry_inode->i_nlink--;
+ DEC_INODE_NLINK(new_dentry_inode);
}
ctime = CURRENT_TIME;
new_dentry_inode->i_ctime = ctime;
===== include/linux/reiserfs_fs.h 1.22 vs edited =====
--- 1.22/include/linux/reiserfs_fs.h Tue Aug 20 13:40:53 2002
+++ edited/include/linux/reiserfs_fs.h Thu Sep 5 13:42:08 2002
@@ -1163,7 +1163,6 @@
#define INODE_PKEY(inode) ((struct key *)((inode)->u.reiserfs_i.i_key))

#define MAX_UL_INT 0xffffffff
-#define MAX_INT 0x7ffffff
#define MAX_US_INT 0xffff

// reiserfs version 2 has max offset 60 bits. Version 1 - 32 bit offset
@@ -1186,8 +1185,9 @@
#define MAX_FC_NUM MAX_US_INT


-/* the purpose is to detect overflow of an unsigned short */
-#define REISERFS_LINK_MAX (MAX_US_INT - 1000)
+/* Find maximal number, that nlink_t can hold. GCC is able to calculate this
+ value at compile time, so do not worry about extra CPU overhead. */
+#define REISERFS_LINK_MAX ((((nlink_t) -1) > 0)?~0:((1u<<(sizeof(nlink_t)*8-1))-1))


/* The following defines are used in reiserfs_insert_item and reiserfs_append_item */
===== include/linux/reiserfs_fs_i.h 1.8 vs edited =====
--- 1.8/include/linux/reiserfs_fs_i.h Fri Aug 9 19:22:34 2002
+++ edited/include/linux/reiserfs_fs_i.h Thu Sep 5 13:41:01 2002
@@ -53,6 +53,8 @@
** flushed */
unsigned long i_trans_id ;
unsigned long i_trans_index ;
+ unsigned int i_nlink_real; /* We store real nlink number since field in
+ struct inode is too short for us */
};

#endif

2002-09-05 10:52:55

by David Miller

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

From: Oleg Drokin <[email protected]>
Date: Thu, 5 Sep 2002 13:54:42 +0400

Ok, since I really like this approach, below is the patch (for 2.4) that
demonstrates my solution.

I like it. Now we just need a JFS version, shouldn't bee too
hard :-)

Franks a lot,
David S. Miller
[email protected]

2002-09-05 13:52:45

by Oleg Drokin

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

Hello!

On Thu, Sep 05, 2002 at 05:49:02PM +0400, Oleg Drokin wrote:

> > Ok, since I really like this approach, below is the patch (for 2.4) that
> > demonstrates my solution.
> > Also it correctly calculates maximal number given type may hold ( does not work
> > with unsigned long long, though) with my own way ;)
> Version that actually works is now here ;)

Heh, diffed against wrong tree :(
Corrected diff is here.

Also I trimmed CC list since I afraid Marcelo, DaveM adn jfs people are not very
interested in this reiserfs only patch.

Bye,
Oleg

===== fs/reiserfs/inode.c 1.35 vs edited =====
--- 1.35/fs/reiserfs/inode.c Fri Aug 9 19:22:33 2002
+++ edited/fs/reiserfs/inode.c Thu Sep 5 13:43:26 2002
@@ -891,7 +891,8 @@
set_inode_item_key_version (inode, KEY_FORMAT_3_5);
set_inode_sd_version (inode, STAT_DATA_V1);
inode->i_mode = sd_v1_mode(sd);
- inode->i_nlink = sd_v1_nlink(sd);
+ inode->i_nlink = ( sd_v1_nlink(sd) > REISERFS_LINK_MAX ) ? REISERFS_LINK_MAX:sd_v1_nlink(sd);
+ inode->u.reiserfs_i.i_nlink_real = sd_v1_nlink(sd);
inode->i_uid = sd_v1_uid(sd);
inode->i_gid = sd_v1_gid(sd);
inode->i_size = sd_v1_size(sd);
@@ -923,7 +924,8 @@
struct stat_data * sd = (struct stat_data *)B_I_PITEM (bh, ih);

inode->i_mode = sd_v2_mode(sd);
- inode->i_nlink = sd_v2_nlink(sd);
+ inode->i_nlink = (sd_v2_nlink(sd)>REISERFS_LINK_MAX)?REISERFS_LINK_MAX:sd_v2_nlink(sd);
+ inode->u.reiserfs_i.i_nlink_real = sd_v2_nlink(sd);
inode->i_uid = sd_v2_uid(sd);
inode->i_size = sd_v2_size(sd);
inode->i_gid = sd_v2_gid(sd);
@@ -975,7 +977,7 @@
__u16 flags;

set_sd_v2_mode(sd_v2, inode->i_mode );
- set_sd_v2_nlink(sd_v2, inode->i_nlink );
+ set_sd_v2_nlink(sd_v2, inode->u.reiserfs_i.i_nlink_real );
set_sd_v2_uid(sd_v2, inode->i_uid );
set_sd_v2_size(sd_v2, inode->i_size );
set_sd_v2_gid(sd_v2, inode->i_gid );
@@ -1001,7 +1003,7 @@
set_sd_v1_mode(sd_v1, inode->i_mode );
set_sd_v1_uid(sd_v1, inode->i_uid );
set_sd_v1_gid(sd_v1, inode->i_gid );
- set_sd_v1_nlink(sd_v1, inode->i_nlink );
+ set_sd_v1_nlink(sd_v1, inode->u.reiserfs_i.i_nlink_real );
set_sd_v1_size(sd_v1, inode->i_size );
set_sd_v1_atime(sd_v1, inode->i_atime );
set_sd_v1_ctime(sd_v1, inode->i_ctime );
@@ -1537,7 +1539,7 @@

/* fill stat data */
inode->i_mode = mode;
- inode->i_nlink = (S_ISDIR (mode) ? 2 : 1);
+ inode->u.reiserfs_i.i_nlink_real = inode->i_nlink = (S_ISDIR (mode) ? 2 : 1);
inode->i_uid = current->fsuid;
if (dir->i_mode & S_ISGID) {
inode->i_gid = dir->i_gid;
===== fs/reiserfs/namei.c 1.24 vs edited =====
--- 1.24/fs/reiserfs/namei.c Fri Aug 9 19:22:33 2002
+++ edited/fs/reiserfs/namei.c Thu Sep 5 16:33:04 2002
@@ -8,8 +8,16 @@
#include <linux/reiserfs_fs.h>
#include <linux/smp_lock.h>

-#define INC_DIR_INODE_NLINK(i) if (i->i_nlink != 1) { i->i_nlink++; if (i->i_nlink >= REISERFS_LINK_MAX) i->i_nlink=1; }
-#define DEC_DIR_INODE_NLINK(i) if (i->i_nlink != 1) i->i_nlink--;
+#define INC_INODE_NLINK(i) { if (i->i_nlink < REISERFS_LINK_MAX) i->i_nlink++; i->u.reiserfs_i.i_nlink_real++; }
+#define DEC_INODE_NLINK(i) {if ( --i->u.reiserfs_i.i_nlink_real < REISERFS_LINK_MAX) i->i_nlink--;}
+
+// v3.5 files have 16bit nlink_t, v3.6 files have 32bit nlink_t.
+#define CAN_INCREASE_NLINK(i) ( i->u.reiserfs_i.i_nlink_real < ((get_inode_sd_version (i) != KEY_FORMAT_3_5)?MAX_UL_INT:MAX_US_INT))
+
+// Compatibility stuff with old trick that allowed to have lots of subdirs in
+// one dir. Such dirs had 1 as their nlink count.
+#define INC_DIR_INODE_NLINK(i) { if (i->i_nlink != 1) INC_INODE_NLINK(i);}
+#define DEC_DIR_INODE_NLINK(i) { if (i->i_nlink != 1) DEC_INODE_NLINK(i);}

// directory item contains array of entry headers. This performs
// binary search through that array
@@ -518,7 +526,7 @@
retval = reiserfs_add_entry (&th, dir, dentry->d_name.name, dentry->d_name.len,
inode, 1/*visible*/);
if (retval) {
- inode->i_nlink--;
+ DEC_INODE_NLINK(inode);
reiserfs_update_sd (&th, inode);
pop_journal_writer(windex) ;
// FIXME: should we put iput here and have stat data deleted
@@ -569,7 +577,7 @@
retval = reiserfs_add_entry (&th, dir, dentry->d_name.name, dentry->d_name.len,
inode, 1/*visible*/);
if (retval) {
- inode->i_nlink--;
+ DEC_INODE_NLINK(inode);
reiserfs_update_sd (&th, inode);
pop_journal_writer(windex) ;
journal_end(&th, dir->i_sb, jbegin_count) ;
@@ -592,6 +600,9 @@
struct reiserfs_transaction_handle th ;
int jbegin_count = JOURNAL_PER_BALANCE_CNT * 3;

+ if ( !CAN_INCREASE_NLINK(dir) )
+ return -EMLINK;
+
if (!(inode = new_inode(dir->i_sb))) {
return -ENOMEM ;
}
@@ -613,7 +624,7 @@
dentry, inode, &retval);
if (!inode) {
pop_journal_writer(windex) ;
- dir->i_nlink-- ;
+ DEC_DIR_INODE_NLINK(dir);
journal_end(&th, dir->i_sb, jbegin_count) ;
return retval;
}
@@ -627,7 +638,7 @@
retval = reiserfs_add_entry (&th, dir, dentry->d_name.name, dentry->d_name.len,
inode, 1/*visible*/);
if (retval) {
- inode->i_nlink = 0;
+ inode->u.reiserfs_i.i_nlink_real = inode->i_nlink = 0;
DEC_DIR_INODE_NLINK(dir);
reiserfs_update_sd (&th, inode);
pop_journal_writer(windex) ;
@@ -711,7 +722,7 @@
if ( inode->i_nlink != 2 && inode->i_nlink != 1 )
printk ("reiserfs_rmdir: empty directory has nlink != 2 (%d)\n", inode->i_nlink);

- inode->i_nlink = 0;
+ inode->u.reiserfs_i.i_nlink_real = inode->i_nlink = 0;
inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
reiserfs_update_sd (&th, inode);

@@ -780,14 +791,14 @@
if (!inode->i_nlink) {
printk("reiserfs_unlink: deleting nonexistent file (%s:%lu), %d\n",
kdevname(inode->i_dev), inode->i_ino, inode->i_nlink);
- inode->i_nlink = 1;
+ inode->u.reiserfs_i.i_nlink_real = inode->i_nlink = 1;
}

retval = reiserfs_cut_from_item (&th, &path, &(de.de_entry_key), dir, NULL, 0);
if (retval < 0)
goto end_unlink;

- inode->i_nlink--;
+ DEC_INODE_NLINK(inode);
inode->i_ctime = CURRENT_TIME;
reiserfs_update_sd (&th, inode);

@@ -868,7 +879,7 @@
retval = reiserfs_add_entry (&th, parent_dir, dentry->d_name.name, dentry->d_name.len,
inode, 1/*visible*/);
if (retval) {
- inode->i_nlink--;
+ DEC_INODE_NLINK(inode);
reiserfs_update_sd (&th, inode);
pop_journal_writer(windex) ;
journal_end(&th, parent_dir->i_sb, jbegin_count) ;
@@ -896,8 +907,7 @@
if (S_ISDIR(inode->i_mode))
return -EPERM;

- if (inode->i_nlink >= REISERFS_LINK_MAX) {
- //FIXME: sd_nlink is 32 bit for new files
+ if (!CAN_INCREASE_NLINK(inode)) {
return -EMLINK;
}

@@ -917,7 +927,7 @@
return retval;
}

- inode->i_nlink++;
+ INC_INODE_NLINK(inode);
ctime = CURRENT_TIME;
inode->i_ctime = ctime;
reiserfs_update_sd (&th, inode);
@@ -1021,6 +1031,8 @@
// and that its new parent directory has not too many links
// already

+ if ( !CAN_INCREASE_NLINK(new_dir) )
+ return -EMLINK;
if (new_dentry_inode) {
if (!reiserfs_empty_dir(new_dentry_inode)) {
return -ENOTEMPTY;
@@ -1157,9 +1169,9 @@
if (new_dentry_inode) {
// adjust link number of the victim
if (S_ISDIR(new_dentry_inode->i_mode)) {
- new_dentry_inode->i_nlink = 0;
+ new_dentry_inode->u.reiserfs_i.i_nlink_real = new_dentry_inode->i_nlink = 0;
} else {
- new_dentry_inode->i_nlink--;
+ DEC_INODE_NLINK(new_dentry_inode);
}
ctime = CURRENT_TIME;
new_dentry_inode->i_ctime = ctime;
===== include/linux/reiserfs_fs.h 1.22 vs edited =====
--- 1.22/include/linux/reiserfs_fs.h Tue Aug 20 13:40:53 2002
+++ edited/include/linux/reiserfs_fs.h Thu Sep 5 17:52:50 2002
@@ -1163,7 +1163,6 @@
#define INODE_PKEY(inode) ((struct key *)((inode)->u.reiserfs_i.i_key))

#define MAX_UL_INT 0xffffffff
-#define MAX_INT 0x7ffffff
#define MAX_US_INT 0xffff

// reiserfs version 2 has max offset 60 bits. Version 1 - 32 bit offset
@@ -1186,8 +1185,9 @@
#define MAX_FC_NUM MAX_US_INT


-/* the purpose is to detect overflow of an unsigned short */
-#define REISERFS_LINK_MAX (MAX_US_INT - 1000)
+/* Find maximal number, that nlink_t can hold. GCC is able to calculate this
+ value at compile time, so do not worry about extra CPU overhead. */
+#define REISERFS_LINK_MAX (nlink_t)((((nlink_t) -1) > 0)?~0:((1u<<(sizeof(nlink_t)*8-1))-1))


/* The following defines are used in reiserfs_insert_item and reiserfs_append_item */
===== include/linux/reiserfs_fs_i.h 1.8 vs edited =====
--- 1.8/include/linux/reiserfs_fs_i.h Fri Aug 9 19:22:34 2002
+++ edited/include/linux/reiserfs_fs_i.h Thu Sep 5 13:41:01 2002
@@ -53,6 +53,8 @@
** flushed */
unsigned long i_trans_id ;
unsigned long i_trans_index ;
+ unsigned int i_nlink_real; /* We store real nlink number since field in
+ struct inode is too short for us */
};

#endif

2002-09-05 13:45:00

by Oleg Drokin

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

Hello!

On Thu, Sep 05, 2002 at 01:54:42PM +0400, Oleg Drokin wrote:

> Ok, since I really like this approach, below is the patch (for 2.4) that
> demonstrates my solution.
> Also it correctly calculates maximal number given type may hold ( does not work
> with unsigned long long, though) with my own way ;)

Version that actually works is now here ;)
Also I have added checks to reiserfs_mkdir and reiserfs_rename to not
overflow the counter. Still reiserfs only version of the patch.
Actually I think this very approach can be used for a lot of other filesystems
including ext2, where max nlink is defined to be 32000 only (I am not sure
how much space is there reserved on disk, though).

Chris, can you please take a look at it?

Bye,
Oleg
===== fs/reiserfs/inode.c 1.35 vs edited =====
--- 1.35/fs/reiserfs/inode.c Fri Aug 9 19:22:33 2002
+++ edited/fs/reiserfs/inode.c Thu Sep 5 13:43:26 2002
@@ -891,7 +891,8 @@
set_inode_item_key_version (inode, KEY_FORMAT_3_5);
set_inode_sd_version (inode, STAT_DATA_V1);
inode->i_mode = sd_v1_mode(sd);
- inode->i_nlink = sd_v1_nlink(sd);
+ inode->i_nlink = ( sd_v1_nlink(sd) > REISERFS_LINK_MAX ) ? REISERFS_LINK_MAX:sd_v1_nlink(sd);
+ inode->u.reiserfs_i.i_nlink_real = sd_v1_nlink(sd);
inode->i_uid = sd_v1_uid(sd);
inode->i_gid = sd_v1_gid(sd);
inode->i_size = sd_v1_size(sd);
@@ -923,7 +924,8 @@
struct stat_data * sd = (struct stat_data *)B_I_PITEM (bh, ih);

inode->i_mode = sd_v2_mode(sd);
- inode->i_nlink = sd_v2_nlink(sd);
+ inode->i_nlink = (sd_v2_nlink(sd)>REISERFS_LINK_MAX)?REISERFS_LINK_MAX:sd_v2_nlink(sd);
+ inode->u.reiserfs_i.i_nlink_real = sd_v2_nlink(sd);
inode->i_uid = sd_v2_uid(sd);
inode->i_size = sd_v2_size(sd);
inode->i_gid = sd_v2_gid(sd);
@@ -975,7 +977,7 @@
__u16 flags;

set_sd_v2_mode(sd_v2, inode->i_mode );
- set_sd_v2_nlink(sd_v2, inode->i_nlink );
+ set_sd_v2_nlink(sd_v2, inode->u.reiserfs_i.i_nlink_real );
set_sd_v2_uid(sd_v2, inode->i_uid );
set_sd_v2_size(sd_v2, inode->i_size );
set_sd_v2_gid(sd_v2, inode->i_gid );
@@ -1001,7 +1003,7 @@
set_sd_v1_mode(sd_v1, inode->i_mode );
set_sd_v1_uid(sd_v1, inode->i_uid );
set_sd_v1_gid(sd_v1, inode->i_gid );
- set_sd_v1_nlink(sd_v1, inode->i_nlink );
+ set_sd_v1_nlink(sd_v1, inode->u.reiserfs_i.i_nlink_real );
set_sd_v1_size(sd_v1, inode->i_size );
set_sd_v1_atime(sd_v1, inode->i_atime );
set_sd_v1_ctime(sd_v1, inode->i_ctime );
@@ -1537,7 +1539,7 @@

/* fill stat data */
inode->i_mode = mode;
- inode->i_nlink = (S_ISDIR (mode) ? 2 : 1);
+ inode->u.reiserfs_i.i_nlink_real = inode->i_nlink = (S_ISDIR (mode) ? 2 : 1);
inode->i_uid = current->fsuid;
if (dir->i_mode & S_ISGID) {
inode->i_gid = dir->i_gid;
===== fs/reiserfs/namei.c 1.24 vs edited =====
--- 1.24/fs/reiserfs/namei.c Fri Aug 9 19:22:33 2002
+++ edited/fs/reiserfs/namei.c Thu Sep 5 16:33:04 2002
@@ -8,8 +8,16 @@
#include <linux/reiserfs_fs.h>
#include <linux/smp_lock.h>

-#define INC_DIR_INODE_NLINK(i) if (i->i_nlink != 1) { i->i_nlink++; if (i->i_nlink >= REISERFS_LINK_MAX) i->i_nlink=1; }
-#define DEC_DIR_INODE_NLINK(i) if (i->i_nlink != 1) i->i_nlink--;
+#define INC_INODE_NLINK(i) { if (i->i_nlink < REISERFS_LINK_MAX) i->i_nlink++; i->u.reiserfs_i.i_nlink_real++; }
+#define DEC_INODE_NLINK(i) {if ( --i->u.reiserfs_i.i_nlink_real < REISERFS_LINK_MAX) i->i_nlink--;}
+
+// v3.5 files have 16bit nlink_t, v3.6 files have 32bit nlink_t.
+#define CAN_INCREASE_NLINK(i) ( i->u.reiserfs_i.i_nlink_real < ((get_inode_sd_version (i) != KEY_FORMAT_3_5)?MAX_UL_INT:MAX_US_INT))
+
+// Compatibility stuff with old trick that allowed to have lots of subdirs in
+// one dir. Such dirs had 1 as their nlink count.
+#define INC_DIR_INODE_NLINK(i) { if (i->i_nlink != 1) INC_INODE_NLINK(i);}
+#define DEC_DIR_INODE_NLINK(i) { if (i->i_nlink != 1) DEC_INODE_NLINK(i);}

// directory item contains array of entry headers. This performs
// binary search through that array
@@ -518,7 +526,7 @@
retval = reiserfs_add_entry (&th, dir, dentry->d_name.name, dentry->d_name.len,
inode, 1/*visible*/);
if (retval) {
- inode->i_nlink--;
+ DEC_INODE_NLINK(inode);
reiserfs_update_sd (&th, inode);
pop_journal_writer(windex) ;
// FIXME: should we put iput here and have stat data deleted
@@ -569,7 +577,7 @@
retval = reiserfs_add_entry (&th, dir, dentry->d_name.name, dentry->d_name.len,
inode, 1/*visible*/);
if (retval) {
- inode->i_nlink--;
+ DEC_INODE_NLINK(inode);
reiserfs_update_sd (&th, inode);
pop_journal_writer(windex) ;
journal_end(&th, dir->i_sb, jbegin_count) ;
@@ -592,6 +600,9 @@
struct reiserfs_transaction_handle th ;
int jbegin_count = JOURNAL_PER_BALANCE_CNT * 3;

+ if ( !CAN_INCREASE_NLINK(dir) )
+ return -EMLINK;
+
if (!(inode = new_inode(dir->i_sb))) {
return -ENOMEM ;
}
@@ -613,7 +624,7 @@
dentry, inode, &retval);
if (!inode) {
pop_journal_writer(windex) ;
- dir->i_nlink-- ;
+ DEC_DIR_INODE_NLINK(dir);
journal_end(&th, dir->i_sb, jbegin_count) ;
return retval;
}
@@ -627,7 +638,7 @@
retval = reiserfs_add_entry (&th, dir, dentry->d_name.name, dentry->d_name.len,
inode, 1/*visible*/);
if (retval) {
- inode->i_nlink = 0;
+ inode->u.reiserfs_i.i_nlink_real = inode->i_nlink = 0;
DEC_DIR_INODE_NLINK(dir);
reiserfs_update_sd (&th, inode);
pop_journal_writer(windex) ;
@@ -711,7 +722,7 @@
if ( inode->i_nlink != 2 && inode->i_nlink != 1 )
printk ("reiserfs_rmdir: empty directory has nlink != 2 (%d)\n", inode->i_nlink);

- inode->i_nlink = 0;
+ inode->u.reiserfs_i.i_nlink_real = inode->i_nlink = 0;
inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
reiserfs_update_sd (&th, inode);

@@ -780,14 +791,14 @@
if (!inode->i_nlink) {
printk("reiserfs_unlink: deleting nonexistent file (%s:%lu), %d\n",
kdevname(inode->i_dev), inode->i_ino, inode->i_nlink);
- inode->i_nlink = 1;
+ inode->u.reiserfs_i.i_nlink_real = inode->i_nlink = 1;
}

retval = reiserfs_cut_from_item (&th, &path, &(de.de_entry_key), dir, NULL, 0);
if (retval < 0)
goto end_unlink;

- inode->i_nlink--;
+ DEC_INODE_NLINK(inode);
inode->i_ctime = CURRENT_TIME;
reiserfs_update_sd (&th, inode);

@@ -868,7 +879,7 @@
retval = reiserfs_add_entry (&th, parent_dir, dentry->d_name.name, dentry->d_name.len,
inode, 1/*visible*/);
if (retval) {
- inode->i_nlink--;
+ DEC_INODE_NLINK(inode);
reiserfs_update_sd (&th, inode);
pop_journal_writer(windex) ;
journal_end(&th, parent_dir->i_sb, jbegin_count) ;
@@ -896,8 +907,7 @@
if (S_ISDIR(inode->i_mode))
return -EPERM;

- if (inode->i_nlink >= REISERFS_LINK_MAX) {
- //FIXME: sd_nlink is 32 bit for new files
+ if (!CAN_INCREASE_NLINK(inode)) {
return -EMLINK;
}

@@ -917,7 +927,7 @@
return retval;
}

- inode->i_nlink++;
+ INC_INODE_NLINK(inode);
ctime = CURRENT_TIME;
inode->i_ctime = ctime;
reiserfs_update_sd (&th, inode);
@@ -1021,6 +1031,8 @@
// and that its new parent directory has not too many links
// already

+ if ( !CAN_INCREASE_NLINK(new_dir) )
+ return -EMLINK;
if (new_dentry_inode) {
if (!reiserfs_empty_dir(new_dentry_inode)) {
return -ENOTEMPTY;
@@ -1157,9 +1169,9 @@
if (new_dentry_inode) {
// adjust link number of the victim
if (S_ISDIR(new_dentry_inode->i_mode)) {
- new_dentry_inode->i_nlink = 0;
+ new_dentry_inode->u.reiserfs_i.i_nlink_real = new_dentry_inode->i_nlink = 0;
} else {
- new_dentry_inode->i_nlink--;
+ DEC_INODE_NLINK(new_dentry_inode);
}
ctime = CURRENT_TIME;
new_dentry_inode->i_ctime = ctime;
===== include/linux/reiserfs_fs.h 1.22 vs edited =====
--- 1.22/include/linux/reiserfs_fs.h Tue Aug 20 13:40:53 2002
+++ edited/include/linux/reiserfs_fs.h Thu Sep 5 13:42:08 2002
@@ -1163,7 +1163,6 @@
#define INODE_PKEY(inode) ((struct key *)((inode)->u.reiserfs_i.i_key))

#define MAX_UL_INT 0xffffffff
-#define MAX_INT 0x7ffffff
#define MAX_US_INT 0xffff

// reiserfs version 2 has max offset 60 bits. Version 1 - 32 bit offset
@@ -1186,8 +1185,9 @@
#define MAX_FC_NUM MAX_US_INT


-/* the purpose is to detect overflow of an unsigned short */
-#define REISERFS_LINK_MAX (MAX_US_INT - 1000)
+/* Find maximal number, that nlink_t can hold. GCC is able to calculate this
+ value at compile time, so do not worry about extra CPU overhead. */
+#define REISERFS_LINK_MAX ((((nlink_t) -1) > 0)?~0:((1u<<(sizeof(nlink_t)*8-1))-1))


/* The following defines are used in reiserfs_insert_item and reiserfs_append_item */
===== include/linux/reiserfs_fs_i.h 1.8 vs edited =====
--- 1.8/include/linux/reiserfs_fs_i.h Fri Aug 9 19:22:34 2002
+++ edited/include/linux/reiserfs_fs_i.h Thu Sep 5 13:41:01 2002
@@ -53,6 +53,8 @@
** flushed */
unsigned long i_trans_id ;
unsigned long i_trans_index ;
+ unsigned int i_nlink_real; /* We store real nlink number since field in
+ struct inode is too short for us */
};

#endif

2002-09-05 13:57:38

by Chris Mason

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

On Thu, 2002-09-05 at 09:49, Oleg Drokin wrote:
> Hello!
>
> On Thu, Sep 05, 2002 at 01:54:42PM +0400, Oleg Drokin wrote:
>
> > Ok, since I really like this approach, below is the patch (for 2.4) that
> > demonstrates my solution.
> > Also it correctly calculates maximal number given type may hold ( does not work
> > with unsigned long long, though) with my own way ;)
>
> Version that actually works is now here ;)
> Also I have added checks to reiserfs_mkdir and reiserfs_rename to not
> overflow the counter. Still reiserfs only version of the patch.
> Actually I think this very approach can be used for a lot of other filesystems
> including ext2, where max nlink is defined to be 32000 only (I am not sure
> how much space is there reserved on disk, though).
>
> Chris, can you please take a look at it?

read the -noleaf description on the find man page to see why we need to
set the directory link count to 1 when we are lying to userspace about
the actual link count on directories.

find isn't the only program that makes this assumption (it's just the
only one I can think of ;-)

Other than that the patch (the second one diffed against the correct
tree) looks sane.

-chris


2002-09-05 14:12:46

by Oleg Drokin

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

Hello!

On Thu, Sep 05, 2002 at 10:03:44AM -0400, Chris Mason wrote:

> read the -noleaf description on the find man page to see why we need to
> set the directory link count to 1 when we are lying to userspace about
> the actual link count on directories.

There is nothing about nlink == 1 means assume -noleaf, so it should not work
with old way too, right? Have anybody verified? ;)

Actually patch might be easily modified to represent i_nlink == 1 for
large directories, but still maintain correct on-disk nlink count.
(and show maximal possible nlink count for regular files. Hm,
I wonder if tar and stuff would break if met with file that have
67000 hardlinks ;) ).

> find isn't the only program that makes this assumption (it's just the
> only one I can think of ;-)

All of these programs are broken if there is no way to turn this feature off
then. ;)

> Other than that the patch (the second one diffed against the correct
> tree) looks sane.

Good.

Bye,
Oleg

2002-09-05 16:04:41

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

On Thursday 05 September 2002 04:54, Oleg Drokin wrote:

> +/* Find maximal number, that nlink_t can hold. GCC is able to
> calculate this + value at compile time, so do not worry about extra
> CPU overhead. */ +#define REISERFS_LINK_MAX ((((nlink_t) -1) >> 0)?~0:((1u<<(sizeof(nlink_t)*8-1))-1))

Shouldn't this be:

#define REISERFS_LINK_MAX ((((nlink_t) -1) >> 0)?(nlink_t) ~0:((1u<<(sizeof(nlink_t)*8-1))-1))

if nlink_t is u16, ~0 would still be 0xffffffff (assuming 32 bits)
--
David Kleikamp
IBM Linux Technology Center

2002-09-05 16:09:07

by Oleg Drokin

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

Hello!

I am sorry but yor mailer corrupted the message or something like that
so I cannot understand what do you mean.

Ah, I see now. My latest version of a patch does a cast
this way (yes, I noticed that problem).
#define REISERFS_LINK_MAX (nlink_t)((((nlink_t) -1) > 0)?~0:((1u<<(sizeof(nlink_t)*8-1))-1))

Bye,
Oleg
On Thu, Sep 05, 2002 at 11:09:12AM -0500, Dave Kleikamp wrote:
> On Thursday 05 September 2002 04:54, Oleg Drokin wrote:
>
> > +/* Find maximal number, that nlink_t can hold. GCC is able to
> > calculate this + value at compile time, so do not worry about extra
> > CPU overhead. */ +#define REISERFS_LINK_MAX ((((nlink_t) -1) >> 0)?~0:((1u<<(sizeof(nlink_t)*8-1))-1))
>
> Shouldn't this be:
>
> #define REISERFS_LINK_MAX ((((nlink_t) -1) >> 0)?(nlink_t) ~0:((1u<<(sizeof(nlink_t)*8-1))-1))
>
> if nlink_t is u16, ~0 would still be 0xffffffff (assuming 32 bits)
> --
> David Kleikamp
> IBM Linux Technology Center
>

2002-09-05 16:39:20

by Chris Mason

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

On Thu, 2002-09-05 at 10:17, Oleg Drokin wrote:
> Hello!
>
> On Thu, Sep 05, 2002 at 10:03:44AM -0400, Chris Mason wrote:
>
> > read the -noleaf description on the find man page to see why we need to
> > set the directory link count to 1 when we are lying to userspace about
> > the actual link count on directories.
>
> There is nothing about nlink == 1 means assume -noleaf, so it should not work
> with old way too, right? Have anybody verified? ;)

I remember that happening during the initial discussions for the link
patch. 1 was chosen as the best way to do it, since it was a flag to
various programs that the unix directory link convention was not being
followed.

>
> Actually patch might be easily modified to represent i_nlink == 1 for
> large directories, but still maintain correct on-disk nlink count.

Right.

> (and show maximal possible nlink count for regular files. Hm,
> I wonder if tar and stuff would break if met with file that have
> 67000 hardlinks ;) ).

Certainly seems like it would on sparc at least ;-)

-chris


2002-09-05 17:21:14

by Oleg Drokin

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

Hello!

On Thu, Sep 05, 2002 at 12:45:34PM -0400, Chris Mason wrote:
> > > read the -noleaf description on the find man page to see why we need to
> > > set the directory link count to 1 when we are lying to userspace about
> > > the actual link count on directories.
> >
> > There is nothing about nlink == 1 means assume -noleaf, so it should not work
> > with old way too, right? Have anybody verified? ;)
> I remember that happening during the initial discussions for the link
> patch. 1 was chosen as the best way to do it, since it was a flag to
> various programs that the unix directory link convention was not being
> followed.
> > Actually patch might be easily modified to represent i_nlink == 1 for
> > large directories, but still maintain correct on-disk nlink count.
> Right.

Ok, I will come up with revised patch tomorrow and do 2.5 port (and
advise Vitaly about reiserfsck part)

Bye,
Oleg

2002-09-05 17:53:57

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

Here's the JFS patch. When I first saw this thread I didn't expect that
the result would be increasing the max. number of links. :^)

Note that I made JFS_LINK_MAX the maximum supported by JFS,
and VFS_LINK_MAX as the number limited by the size of nlink_t.
VFS_LINK_MAX could be moved to fs.h if other file systems are
to use it in the same way.

I borrowed reiserfs's *_INODE_NLINK macros, but made them inline functions.

(I don't usually send patches from kmail. I hope it doesn't screw up the formatting.)

===== fs/jfs/jfs_filsys.h 1.1 vs edited =====
--- 1.1/fs/jfs/jfs_filsys.h Fri May 31 08:19:24 2002
+++ edited/fs/jfs/jfs_filsys.h Thu Sep 5 11:01:55 2002
@@ -125,7 +125,13 @@
#define MAXBLOCKSIZE 4096
#define MAXFILESIZE ((s64)1 << 52)

-#define JFS_LINK_MAX 65535 /* nlink_t is unsigned short */
+/*
+ * The max link count in struct inode is limited to the size of nlink_t.
+ * The JFS inode uses an unsigned 32-bit int, so we can really go higher
+ */
+#define JFS_LINK_MAX 0xffffffff /* real limit */
+#define VFS_LINK_MAX ((((nlink_t) -1) > 0) ? (nlink_t) ~0 : \
+ ((1u << (sizeof (nlink_t) * 8 - 1)) -1))

/* Minimum number of bytes supported for a JFS partition */
#define MINJFS (0x1000000)
===== fs/jfs/jfs_imap.c 1.6 vs edited =====
--- 1.6/fs/jfs/jfs_imap.c Wed Sep 4 11:11:52 2002
+++ edited/fs/jfs/jfs_imap.c Thu Sep 5 10:52:53 2002
@@ -3035,7 +3035,15 @@
jfs_ip->mode2 = le32_to_cpu(dip->di_mode);

ip->i_mode = le32_to_cpu(dip->di_mode) & 0xffff;
- ip->i_nlink = le32_to_cpu(dip->di_nlink);
+
+ jfs_ip->nlink_real = le32_to_cpu(dip->di_nlink);
+ if (jfs_ip->nlink_real <= VFS_LINK_MAX)
+ ip->i_nlink = jfs_ip->nlink_real;
+ else if (S_ISDIR(ip->i_mode))
+ ip->i_nlink = 1;
+ else
+ ip->i_nlink = VFS_LINK_MAX;
+
ip->i_uid = le32_to_cpu(dip->di_uid);
ip->i_gid = le32_to_cpu(dip->di_gid);
ip->i_size = le64_to_cpu(dip->di_size);
@@ -3089,7 +3097,7 @@
dip->di_gen = cpu_to_le32(ip->i_generation);
dip->di_size = cpu_to_le64(ip->i_size);
dip->di_nblocks = cpu_to_le64(PBLK2LBLK(ip->i_sb, ip->i_blocks));
- dip->di_nlink = cpu_to_le32(ip->i_nlink);
+ dip->di_nlink = cpu_to_le32(jfs_ip->nlink_real);
dip->di_uid = cpu_to_le32(ip->i_uid);
dip->di_gid = cpu_to_le32(ip->i_gid);
/*
===== fs/jfs/jfs_incore.h 1.5 vs edited =====
--- 1.5/fs/jfs/jfs_incore.h Mon Sep 2 05:48:42 2002
+++ edited/fs/jfs/jfs_incore.h Thu Sep 5 10:12:09 2002
@@ -38,6 +38,7 @@
struct inode *inode; /* pointer back to fs-independent inode */
int fileset; /* fileset number (always 16)*/
uint mode2; /* jfs-specific mode */
+ uint nlink_real; /* i_nlink is too short */
pxd_t ixpxd; /* inode extent descriptor */
dxd_t acl; /* dxd describing acl */
dxd_t ea; /* dxd describing ea */
===== fs/jfs/namei.c 1.9 vs edited =====
--- 1.9/fs/jfs/namei.c Mon Aug 26 14:07:42 2002
+++ edited/fs/jfs/namei.c Thu Sep 5 11:11:57 2002
@@ -44,6 +44,34 @@
s64 commitZeroLink(tid_t, struct inode *);

/*
+ * i_nlink accounting
+ * Keep track of real link count, while keeping i_nlink, which may be too
+ * small to hold the real value, sane.
+ */
+static inline void INC_INODE_NLINK(struct inode *inode)
+{
+ if (++JFS_IP(inode)->nlink_real <= VFS_LINK_MAX)
+ inode->i_nlink = JFS_IP(inode)->nlink_real;
+}
+static inline void DEC_INODE_NLINK(struct inode *inode)
+{
+ if (--JFS_IP(inode)->nlink_real <= VFS_LINK_MAX)
+ inode->i_nlink = JFS_IP(inode)->nlink_real;
+}
+/*
+ * Due to an optimazation in the find command (and other cases?),
+ * set i_nlink to one if i_nlink can't be correct.
+ */
+static inline void INC_DIR_INODE_NLINK(struct inode *inode)
+{
+ if (++JFS_IP(inode)->nlink_real <= VFS_LINK_MAX)
+ inode->i_nlink = JFS_IP(inode)->nlink_real;
+ else
+ inode->i_nlink = 1;
+}
+#define DEC_DIR_INODE_NLINK DEC_INODE_NLINK
+
+/*
* NAME: jfs_create(dip, dentry, mode)
*
* FUNCTION: create a regular file in the parent directory <dip>
@@ -142,7 +170,7 @@
up(&JFS_IP(dip)->commit_sem);
up(&JFS_IP(ip)->commit_sem);
if (rc) {
- ip->i_nlink = 0;
+ ip->i_nlink = JFS_IP(ip)->nlink_real = 0;
iput(ip);
}

@@ -185,7 +213,7 @@
jFYI(1, ("jfs_mkdir: dip:0x%p name:%s\n", dip, dentry->d_name.name));

/* link count overflow on parent directory ? */
- if (dip->i_nlink == JFS_LINK_MAX) {
+ if (JFS_IP(dip)->nlink_real == JFS_LINK_MAX) {
rc = EMLINK;
goto out1;
}
@@ -245,7 +273,7 @@
goto out3;
}

- ip->i_nlink = 2; /* for '.' */
+ ip->i_nlink = JFS_IP(ip)->nlink_real = 2; /* for '.' */
ip->i_op = &jfs_dir_inode_operations;
ip->i_fop = &jfs_dir_operations;
ip->i_mapping->a_ops = &jfs_aops;
@@ -256,7 +284,7 @@
d_instantiate(dentry, ip);

/* update parent directory inode */
- dip->i_nlink++; /* for '..' from child directory */
+ INC_DIR_INODE_NLINK(dip); /* for '..' from child directory */
dip->i_ctime = dip->i_mtime = CURRENT_TIME;
mark_inode_dirty(dip);

@@ -267,7 +295,7 @@
up(&JFS_IP(dip)->commit_sem);
up(&JFS_IP(ip)->commit_sem);
if (rc) {
- ip->i_nlink = 0;
+ ip->i_nlink = JFS_IP(ip)->nlink_real = 0;
iput(ip);
}

@@ -351,7 +379,7 @@
/* update parent directory's link count corresponding
* to ".." entry of the target directory deleted
*/
- dip->i_nlink--;
+ DEC_DIR_INODE_NLINK(dip);
dip->i_ctime = dip->i_mtime = CURRENT_TIME;
mark_inode_dirty(dip);

@@ -373,7 +401,7 @@
JFS_IP(ip)->acl.flag = 0;

/* mark the target directory as deleted */
- ip->i_nlink = 0;
+ ip->i_nlink = JFS_IP(ip)->nlink_real = 0;
mark_inode_dirty(ip);

rc = txCommit(tid, 2, &iplist[0], 0);
@@ -470,7 +498,7 @@
mark_inode_dirty(dip);

/* update target's inode */
- ip->i_nlink--;
+ DEC_INODE_NLINK(ip);
mark_inode_dirty(ip);

/*
@@ -768,7 +796,7 @@
down(&JFS_IP(dir)->commit_sem);
down(&JFS_IP(ip)->commit_sem);

- if (ip->i_nlink == JFS_LINK_MAX) {
+ if (JFS_IP(ip)->nlink_real == JFS_LINK_MAX) {
rc = EMLINK;
goto out;
}
@@ -790,7 +818,7 @@
goto out;

/* update object inode */
- ip->i_nlink++; /* for new link */
+ INC_INODE_NLINK(ip);
ip->i_ctime = CURRENT_TIME;
mark_inode_dirty(dir);
atomic_inc(&ip->i_count);
@@ -1004,7 +1032,7 @@
up(&JFS_IP(dip)->commit_sem);
up(&JFS_IP(ip)->commit_sem);
if (rc) {
- ip->i_nlink = 0;
+ ip->i_nlink = JFS_IP(ip)->nlink_real = 0;
iput(ip);
}

@@ -1091,7 +1119,7 @@
goto out3;
}
} else if ((new_dir != old_dir) &&
- (new_dir->i_nlink == JFS_LINK_MAX)) {
+ (JFS_IP(new_dir)->nlink_real == JFS_LINK_MAX)) {
rc = EMLINK;
goto out3;
}
@@ -1118,9 +1146,9 @@
old_ip->i_ino, JFS_RENAME);
if (rc)
goto out4;
- new_ip->i_nlink--;
+ DEC_INODE_NLINK(new_ip);
if (S_ISDIR(new_ip->i_mode)) {
- new_ip->i_nlink--;
+ DEC_DIR_INODE_NLINK(new_ip);
assert(new_ip->i_nlink == 0);
tblk = tid_to_tblock(tid);
tblk->xflag |= COMMIT_DELETE;
@@ -1162,7 +1190,7 @@
goto out4;
}
if (S_ISDIR(old_ip->i_mode))
- new_dir->i_nlink++;
+ INC_DIR_INODE_NLINK(new_dir);
}
/*
* Remove old directory entry
@@ -1178,7 +1206,7 @@
goto out4;
}
if (S_ISDIR(old_ip->i_mode)) {
- old_dir->i_nlink--;
+ DEC_DIR_INODE_NLINK(old_dir);
if (old_dir != new_dir) {
/*
* Change inode number of parent for moved directory
@@ -1351,7 +1379,7 @@
up(&JFS_IP(ip)->commit_sem);
up(&JFS_IP(dir)->commit_sem);
if (rc) {
- ip->i_nlink = 0;
+ ip->i_nlink = JFS_IP(ip)->nlink_real = 0;
iput(ip);
}



2002-09-05 21:14:19

by jw schultz

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

On Thu, Sep 05, 2002 at 09:25:45PM +0400, Oleg Drokin wrote:
> Hello!
>
> On Thu, Sep 05, 2002 at 12:45:34PM -0400, Chris Mason wrote:
> > > > read the -noleaf description on the find man page to see why we need to
> > > > set the directory link count to 1 when we are lying to userspace about
> > > > the actual link count on directories.
> > >
> > > There is nothing about nlink == 1 means assume -noleaf, so it should not work
> > > with old way too, right? Have anybody verified? ;)
> > I remember that happening during the initial discussions for the link
> > patch. 1 was chosen as the best way to do it, since it was a flag to
> > various programs that the unix directory link convention was not being
> > followed.
> > > Actually patch might be easily modified to represent i_nlink == 1 for
> > > large directories, but still maintain correct on-disk nlink count.
> > Right.

I'm assuming for the moment that the discussion here is
about what to report to stat(2) and not how to deal with
internal overflow (which should produce an error).

So what you are suggesting is that
<pseudocode>
if (i.i_nlink > ST_NLINK_MAX)
st.st_nlink = S_ISDIR(i.i_mode) ? 1 : ST_NLINK_MAX;
</pseudocode>

I don't know the rationale for st_nlink == 1 on directories
but it seems to me the thing to do is st_nlink == 0 on
overflow regardless of type. This simplifies the logic and
eliminates the use a funky special value that gets in the
way of supporting growth.

An st_nlink value of 0 could only occur if you were doing an
fstat() on a deleted file so i doubt much would break on
this odd case. Having said that no doubt someone will come
up with an example to prove me wrong :)

> Ok, I will come up with revised patch tomorrow and do 2.5 port (and
> advise Vitaly about reiserfsck part)

--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: [email protected]

Remember Cernan and Schmitt

2002-09-05 21:59:25

by Ragnar Kjørstad

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

On Thu, Sep 05, 2002 at 02:18:49PM -0700, jw schultz wrote:
> I'm assuming for the moment that the discussion here is
> about what to report to stat(2) and not how to deal with
> internal overflow (which should produce an error).
>
> So what you are suggesting is that
> <pseudocode>
> if (i.i_nlink > ST_NLINK_MAX)
> st.st_nlink = S_ISDIR(i.i_mode) ? 1 : ST_NLINK_MAX;
> </pseudocode>
>
> I don't know the rationale for st_nlink == 1 on directories
> but it seems to me the thing to do is st_nlink == 0 on
> overflow regardless of type. This simplifies the logic and
> eliminates the use a funky special value that gets in the
> way of supporting growth.

I believe nlink for directories and files are used differently,
and thus may have to be handled differently as well.

Amongs other things nlink on directories are used when traversing
directory-trees. If nlink=4 on a directory there must be 2
sub-directories, and you can stop looking once you've found the two.

By giving the incorrect nlink-number, applications using this
optimization will break.

Apperently some operatingsystems/filesystems (VMS?) report the special
value of nlink=1 when the information is not available, and some
applications use this information to automaticly disable the
optimization. This is why reiserfs has returned nlink=1 for directories
with more than MAX_REISERFS_NLINK subdirectories.

Now, I've just checked the source of GNU find (v4.1.7) and it does _not_
recognize nlink=1 as a special value. (It works as long as there are
less than 2^32 subdirectories though, because it is looking for -1
subdirectories and it wraps)


I'm not sure exactly what nlink is used for in userspace for regular
files, so I don't know what value should be used when the real number
doesn't fit in the interface.



(Of course new directories/hardlinks shouldn't be created at all once
the limit is exceeded, the above is only a workaround for people that
need it anyway :) )



--
Ragnar Kj?rstad

2002-09-05 22:54:13

by jw schultz

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

On Fri, Sep 06, 2002 at 12:02:49AM +0200, Ragnar Kj?rstad wrote:
> On Thu, Sep 05, 2002 at 02:18:49PM -0700, jw schultz wrote:
> > I'm assuming for the moment that the discussion here is
> > about what to report to stat(2) and not how to deal with
> > internal overflow (which should produce an error).
> >
> > So what you are suggesting is that
> > <pseudocode>
> > if (i.i_nlink > ST_NLINK_MAX)
> > st.st_nlink = S_ISDIR(i.i_mode) ? 1 : ST_NLINK_MAX;
> > </pseudocode>
> >
> > I don't know the rationale for st_nlink == 1 on directories
> > but it seems to me the thing to do is st_nlink == 0 on
> > overflow regardless of type. This simplifies the logic and
> > eliminates the use a funky special value that gets in the
> > way of supporting growth.
>
> I believe nlink for directories and files are used differently,
> and thus may have to be handled differently as well.
>
> Amongs other things nlink on directories are used when traversing
> directory-trees. If nlink=4 on a directory there must be 2
> sub-directories, and you can stop looking once you've found the two.
>
> By giving the incorrect nlink-number, applications using this
> optimization will break.
>
> Apperently some operatingsystems/filesystems (VMS?) report the special
> value of nlink=1 when the information is not available, and some
> applications use this information to automaticly disable the
> optimization. This is why reiserfs has returned nlink=1 for directories
> with more than MAX_REISERFS_NLINK subdirectories.
>
> Now, I've just checked the source of GNU find (v4.1.7) and it does _not_
> recognize nlink=1 as a special value. (It works as long as there are
> less than 2^32 subdirectories though, because it is looking for -1
> subdirectories and it wraps)

So a value of 0 would have the same effect.
(0 - 2 == -2 vs 1 - 2 == -1) Yes?

>
>
> I'm not sure exactly what nlink is used for in userspace for regular
> files, so I don't know what value should be used when the real number
> doesn't fit in the interface.

I know it is used for reporting purposes such as ls -l. It
would also used by archiving tools like cpio, tar and rsync
to identify files that may be linked so that not every file
must be checked against every previous file. A smart
archiving tool would track the link count and remove entries
that have all links found so any value that isn't recognized
as an overflow indicator would tend to break things. I see
the value of 0 as indicating "link count unsupported".

> (Of course new directories/hardlinks shouldn't be created at all once
> the limit is exceeded, the above is only a workaround for people that
> need it anyway :) )

It should not fail just because the type specified for
st_nlink has overflowed. It should fail only if the FS or
kernel internals overflow. Internally the kernel is moving
to an unsigned int. I looked it up a recently and each
filesystem has it's own idea of a maximum number of links
and most of them are not just arbitrary but even strange
such as 32000 or 0x7FFF - 1000.

The stat structure has already been left behind on this.
Eventually (3.0?) it should be updated to reflect the
changed internals. This update has to be delayed because it
will break binary compatability of oodles of code. I, at
least, don't think this is worth creating yet another
version of stat(2).

--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: [email protected]

Remember Cernan and Schmitt

2002-09-05 23:57:09

by Ragnar Kjørstad

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

On Thu, Sep 05, 2002 at 03:57:06PM -0700, jw schultz wrote:
> > Now, I've just checked the source of GNU find (v4.1.7) and it does _not_
> > recognize nlink=1 as a special value. (It works as long as there are
> > less than 2^32 subdirectories though, because it is looking for -1
> > subdirectories and it wraps)
>
> So a value of 0 would have the same effect.
> (0 - 2 == -2 vs 1 - 2 == -1) Yes?

Yes, it will. For GNU find.

But the reasoning for using nlink==1 is that that's how "all non-unix
filesystems" behaved, so applications out there could potentially check
for it.

> I know it is used for reporting purposes such as ls -l. It
> would also used by archiving tools like cpio, tar and rsync
> to identify files that may be linked so that not every file
> must be checked against every previous file. A smart
> archiving tool would track the link count and remove entries
> that have all links found so any value that isn't recognized
> as an overflow indicator would tend to break things. I see
> the value of 0 as indicating "link count unsupported".

Hmm, yes. Values of 1 or NLINK_MAX would definitively confuse such
applications. But then again, so would a value of 0 unless they know
it's meaning.



--
Ragnar Kj?rstad

2002-09-06 01:36:53

by jw schultz

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

On Fri, Sep 06, 2002 at 02:01:38AM +0200, Ragnar Kj?rstad wrote:
> On Thu, Sep 05, 2002 at 03:57:06PM -0700, jw schultz wrote:
> > > Now, I've just checked the source of GNU find (v4.1.7) and it does _not_
> > > recognize nlink=1 as a special value. (It works as long as there are
> > > less than 2^32 subdirectories though, because it is looking for -1
> > > subdirectories and it wraps)
> >
> > So a value of 0 would have the same effect.
> > (0 - 2 == -2 vs 1 - 2 == -1) Yes?
>
> Yes, it will. For GNU find.
>
> But the reasoning for using nlink==1 is that that's how "all non-unix
> filesystems" behaved, so applications out there could potentially check
> for it.

But we aren't talking about filesystems with different ideas
about directory linking. We are talking about directories
with an overflowed link-count.

> > I know it is used for reporting purposes such as ls -l. It
> > would also used by archiving tools like cpio, tar and rsync
> > to identify files that may be linked so that not every file
> > must be checked against every previous file. A smart
> > archiving tool would track the link count and remove entries
> > that have all links found so any value that isn't recognized
> > as an overflow indicator would tend to break things. I see
> > the value of 0 as indicating "link count unsupported".
>
> Hmm, yes. Values of 1 or NLINK_MAX would definitively confuse such
> applications. But then again, so would a value of 0 unless they know
> it's meaning.

The value of 1 can't be used for regular files because it
would mean the file doesn't need checking for other links.
Coding for NLINK_MAX would mean the apps would have to
adjust every time NLINK_MAX changed. Yes, it could be done
through #define in stat.h. It is a corner case right now
but these apps could know that

1 == no other links
>1 == known number of other links
0 == unknown number of other links

And the code would work on other systems with different or
no overflow/limit. So far as i know Linux is the only OS
that is allowing the creation of more links than 32767.
These apps are going to need to cope with this sooner or
later. Allowing nlinks to overflow st_nlink is going to
break them. Using 0 would mean that the application code to
handle this corner case would simply never get invoked on
other platforms without requiring build options. Sure 1 is
a red flag for directories (if you can spot it) but any
value greater than 0 is potentially valid for files.

That is why i lean toward 0. It is a clear indication that
the value is meaningless. It also sticks out like a sore
thumb. If i do an ls -l and see 0 i will know i've got a
reporting overflow without having to remember what the exact
limit is. This way i can do a 'find -links 0' and spot
directories and files with more links than fit etc. The
application code to cope with it won't break or require
special handling on other platforms.

So you see why i dislike NLINK_MAX and it just makes sens to me
to use the same for files and directories. Filesystems that
don't support hardlinks have been stuffing 1 into st_nlink
for both file and directories (treating them identically).
I am essentially saying we should treat files and directories
the same on overflow. We just shouldn't use some value that
in future could represent reality. Hence 0. So far i
haven't heard any reason why not.

NOTE:
Allowing the creation of links that exceed the capacity of
stat(2) to report will break utilities. There should be a
system wide tunable that allows restricting link creation to
a maximum number until the tools catch up. This tunable
would only affect link creation causing link(2) to set errno
to EMLINK. A mount option might be OK too.
This limit should not affect pseudo filesystems like
proc or driverfs.

--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: [email protected]

Remember Cernan and Schmitt

2002-09-06 02:24:48

by Ragnar Kjørstad

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

On Thu, Sep 05, 2002 at 06:41:22PM -0700, jw schultz wrote:
> > > So a value of 0 would have the same effect.
> > > (0 - 2 == -2 vs 1 - 2 == -1) Yes?
> >
> > Yes, it will. For GNU find.
> >
> > But the reasoning for using nlink==1 is that that's how "all non-unix
> > filesystems" behaved, so applications out there could potentially check
> > for it.
>
> But we aren't talking about filesystems with different ideas
> about directory linking. We are talking about directories
> with an overflowed link-count.

No, but from an application's point of view they're the same. They're
both directories unable to report the number of subdirectories they
have. So, they should be handled the same. Either with nlink==1, or
nlink==0, but I think it will be difficult to migrate to the later if a
lot of existing software implement the first one.

> The value of 1 can't be used for regular files because it
> would mean the file doesn't need checking for other links.
> Coding for NLINK_MAX would mean the apps would have to
> adjust every time NLINK_MAX changed. Yes, it could be done
> through #define in stat.h. It is a corner case right now
> but these apps could know that
>
> 1 == no other links
> >1 == known number of other links
> 0 == unknown number of other links

I agree.



--
Ragnar Kj?rstad

2002-09-06 13:50:09

by Oleg Drokin

[permalink] [raw]
Subject: Re: [reiserfs-dev] Re: [PATCH] sparc32: wrong type of nlink_t

Hello!

On Thu, Sep 05, 2002 at 12:58:27PM -0500, Dave Kleikamp wrote:
> Here's the JFS patch. When I first saw this thread I didn't expect that
> the result would be increasing the max. number of links. :^)
> Note that I made JFS_LINK_MAX the maximum supported by JFS,
> and VFS_LINK_MAX as the number limited by the size of nlink_t.
> VFS_LINK_MAX could be moved to fs.h if other file systems are
> to use it in the same way.
> I borrowed reiserfs's *_INODE_NLINK macros, but made them inline functions.
> (I don't usually send patches from kmail. I hope it doesn't screw up the formatting.)

Actually Linus' suggestion to change type of struct inode.i_nlink field is
better because otherwise everybody will implement that "real nlink" stuff.
I hope Marcello will accept such a patch.
Here is it (both reiserfs and jfs bits).

Hm. jfs patch became really small ;)

Note I do not play this "if directory, then set nlink to 1" games.
As it was already explained it will only lead to find(1) and others still
count only (nlink_t) -1 links, so not all of the links will be counted, but
exactly the same amount as we present in st_nlink field on arches
that have unsigned nlink_t and probably none on those with signed nlink_t.

Patch is against 2.4.20-pre5.

Bye,
Oleg

===== fs/stat.c 1.4 vs edited =====
--- 1.4/fs/stat.c Tue Feb 5 10:45:18 2002
+++ edited/fs/stat.c Fri Sep 6 17:35:02 2002
@@ -49,7 +49,7 @@
tmp.st_dev = kdev_t_to_nr(inode->i_dev);
tmp.st_ino = inode->i_ino;
tmp.st_mode = inode->i_mode;
- tmp.st_nlink = inode->i_nlink;
+ tmp.st_nlink = min_t(unsigned int, MAX_NLINK_T, inode->i_nlink);
SET_OLDSTAT_UID(tmp, inode->i_uid);
SET_OLDSTAT_GID(tmp, inode->i_gid);
tmp.st_rdev = kdev_t_to_nr(inode->i_rdev);
@@ -75,7 +75,7 @@
tmp.st_dev = kdev_t_to_nr(inode->i_dev);
tmp.st_ino = inode->i_ino;
tmp.st_mode = inode->i_mode;
- tmp.st_nlink = inode->i_nlink;
+ tmp.st_nlink = min_t(unsigned int, MAX_NLINK_T, inode->i_nlink);
SET_STAT_UID(tmp, inode->i_uid);
SET_STAT_GID(tmp, inode->i_gid);
tmp.st_rdev = kdev_t_to_nr(inode->i_rdev);
@@ -282,7 +282,7 @@
tmp.__st_ino = inode->i_ino;
#endif
tmp.st_mode = inode->i_mode;
- tmp.st_nlink = inode->i_nlink;
+ tmp.st_nlink = min_t(unsigned int, MAX_NLINK_T, inode->i_nlink);
tmp.st_uid = inode->i_uid;
tmp.st_gid = inode->i_gid;
tmp.st_rdev = kdev_t_to_nr(inode->i_rdev);
===== fs/jfs/jfs_filsys.h 1.1 vs edited =====
--- 1.1/fs/jfs/jfs_filsys.h Fri May 31 17:19:24 2002
+++ edited/fs/jfs/jfs_filsys.h Fri Sep 6 17:33:32 2002
@@ -125,7 +125,11 @@
#define MAXBLOCKSIZE 4096
#define MAXFILESIZE ((s64)1 << 52)

-#define JFS_LINK_MAX 65535 /* nlink_t is unsigned short */
+/*
+ * The max link count in struct inode is limited to the size of nlink_t.
+ * The JFS inode uses an unsigned 32-bit int, so we can really go higher
+ */
+#define JFS_LINK_MAX 0xffffffff /* real limit */

/* Minimum number of bytes supported for a JFS partition */
#define MINJFS (0x1000000)
===== fs/reiserfs/namei.c 1.24 vs edited =====
--- 1.24/fs/reiserfs/namei.c Fri Aug 9 19:22:33 2002
+++ edited/fs/reiserfs/namei.c Fri Sep 6 17:23:34 2002
@@ -8,8 +8,13 @@
#include <linux/reiserfs_fs.h>
#include <linux/smp_lock.h>

-#define INC_DIR_INODE_NLINK(i) if (i->i_nlink != 1) { i->i_nlink++; if (i->i_nlink >= REISERFS_LINK_MAX) i->i_nlink=1; }
-#define DEC_DIR_INODE_NLINK(i) if (i->i_nlink != 1) i->i_nlink--;
+// v3.5 files have 16bit nlink_t, v3.6 files have 32bit nlink_t.
+#define CAN_INCREASE_NLINK(i) ( i->i_nlink < ((get_inode_sd_version (i) != KEY_FORMAT_3_5)?MAX_UL_INT:MAX_US_INT))
+
+// Compatibility stuff with old trick that allowed to have lots of subdirs in
+// one dir. Such dirs had 1 as their nlink count.
+#define INC_DIR_INODE_NLINK(i) { if (i->i_nlink != 1) i->i_nlink++;}
+#define DEC_DIR_INODE_NLINK(i) { if (i->i_nlink != 1) i->i_nlink--;}

// directory item contains array of entry headers. This performs
// binary search through that array
@@ -592,6 +597,9 @@
struct reiserfs_transaction_handle th ;
int jbegin_count = JOURNAL_PER_BALANCE_CNT * 3;

+ if ( !CAN_INCREASE_NLINK(dir) )
+ return -EMLINK;
+
if (!(inode = new_inode(dir->i_sb))) {
return -ENOMEM ;
}
@@ -613,7 +621,7 @@
dentry, inode, &retval);
if (!inode) {
pop_journal_writer(windex) ;
- dir->i_nlink-- ;
+ DEC_DIR_INODE_NLINK(dir);
journal_end(&th, dir->i_sb, jbegin_count) ;
return retval;
}
@@ -896,8 +904,7 @@
if (S_ISDIR(inode->i_mode))
return -EPERM;

- if (inode->i_nlink >= REISERFS_LINK_MAX) {
- //FIXME: sd_nlink is 32 bit for new files
+ if (!CAN_INCREASE_NLINK(inode)) {
return -EMLINK;
}

@@ -1021,6 +1028,8 @@
// and that its new parent directory has not too many links
// already

+ if ( !CAN_INCREASE_NLINK(new_dir) )
+ return -EMLINK;
if (new_dentry_inode) {
if (!reiserfs_empty_dir(new_dentry_inode)) {
return -ENOTEMPTY;
===== include/linux/fs.h 1.68 vs edited =====
--- 1.68/include/linux/fs.h Fri Aug 23 17:27:33 2002
+++ edited/include/linux/fs.h Fri Sep 6 17:43:02 2002
@@ -442,7 +442,7 @@
atomic_t i_count;
kdev_t i_dev;
umode_t i_mode;
- nlink_t i_nlink;
+ unsigned int i_nlink;
uid_t i_uid;
gid_t i_gid;
kdev_t i_rdev;
@@ -513,6 +513,11 @@
void *generic_ip;
} u;
};
+
+/* maximal nlink_t value possible. Used insead of very high i_nlink values
+ that some filesystems might allow to prevent user visible negative
+ nlink counts. */
+#define MAX_NLINK_T (nlink_t)((((nlink_t) -1) > 0)?~0:((1u<<(sizeof(nlink_t)*8-1))-1))

struct fown_struct {
int pid; /* pid or -pgrp where SIGIO should be sent */
===== include/linux/reiserfs_fs.h 1.22 vs edited =====
--- 1.22/include/linux/reiserfs_fs.h Tue Aug 20 13:40:53 2002
+++ edited/include/linux/reiserfs_fs.h Fri Sep 6 17:44:53 2002
@@ -1163,7 +1163,6 @@
#define INODE_PKEY(inode) ((struct key *)((inode)->u.reiserfs_i.i_key))

#define MAX_UL_INT 0xffffffff
-#define MAX_INT 0x7ffffff
#define MAX_US_INT 0xffff

// reiserfs version 2 has max offset 60 bits. Version 1 - 32 bit offset
@@ -1184,10 +1183,6 @@

#define MAX_B_NUM MAX_UL_INT
#define MAX_FC_NUM MAX_US_INT
-
-
-/* the purpose is to detect overflow of an unsigned short */
-#define REISERFS_LINK_MAX (MAX_US_INT - 1000)


/* The following defines are used in reiserfs_insert_item and reiserfs_append_item */