2005-12-01 12:00:36

by Takashi Sato

[permalink] [raw]
Subject: stat64 for over 2TB file returned invalid st_blocks

Hi all,

I found a problem at stat64 on 32bit architecture.

When I called stat64 for a file which is larger than 2TB, stat64
returned an invalid number of blocks at st_blocks on 32bit
architecture, although it returned a valid number of blocks on 64bit
architecture(ia64).

The following describes the cause of this issue:
i_blocks in inode is 4bytes on 32bit architecture. If it receives
more than 2^32 number of blocks, it would overflow and set an
invalid number to st_blocks.

Below describes a sequence of setting overflowed inode.i_blocks
to st_blocks through stat64.

1. generic_fillattr(struct inode *inode, struct kstat *stat)
- Copy data from overflowed inode.i_blocks to kstat.blocks.

2. vfs_getattr(struct vfsmount *mnt, struct dentry *dentry,
struct kstat *stat)
- Return invalid kstat.blocks to sys_stat64().

3. sys_stat64(char __user * filename, struct stat64 __user * statbuf)
- Copy data from invalid kstat.blocks to stat64.st_blocks.

I also found the following problem.

- ioctl with FIOQSIZE command returns the size of file's data which
has written to disk. The size of file's data is calculated as
follows in inode_get_bytes().

(((loff_t)inode->i_blocks) << 9) + inode->i_bytes

On the file which is larger than 2TB, the ioctl will return an
invalid size because i_blocks can't express the right number of
blocks.

I think the following modification is essential to fix these
problems.

1. Change the type of inode.i_blocks and kstat.blocks from unsigned
long to unsigned long long.

2. Change the type of architecture dependent stat64.st_blocks in
include/asm/asm-*/stat.h from unsigned long to unsigned long long.
I tried modifying only stat64 of 32bit architecture
(include/asm-i386/stat.h).

I have some tested for a file whose size is 3TB on JFS filesystem.
The following is the patch.

Signed-off-by: Takashi Sato <[email protected]>

diff -uprN -X linux-2.6.14.org/Documentation/dontdiff linux-2.6.14.or
g/include/asm-i386/stat.h linux-2.6.14-blocks/include/asm-i386/stat.h
--- linux-2.6.14.org/include/asm-i386/stat.h 2005-10-28 09:02:08.000000000 +0900
+++ linux-2.6.14-blocks/include/asm-i386/stat.h 2005-11-18 22:42:37.000000000 +0900
@@ -58,8 +58,7 @@ struct stat64 {
long long st_size;
unsigned long st_blksize;

- unsigned long st_blocks; /* Number 512-byte blocks allocated. */
- unsigned long __pad4; /* future possible st_blocks high bits */
+ unsigned long long st_blocks; /* Number 512-byte blocks allocated. */

unsigned long st_atime;
unsigned long st_atime_nsec;
diff -uprN -X linux-2.6.14.org/Documentation/dontdiff linux-2.6.14.or
g/include/linux/fs.h linux-2.6.14-blocks/include/linux/fs.h
--- linux-2.6.14.org/include/linux/fs.h 2005-10-28 09:02:08.000000000 +0900
+++ linux-2.6.14-blocks/include/linux/fs.h 2005-11-18 17:08:03.000000000 +0900
@@ -438,7 +438,7 @@ struct inode {
unsigned int i_blkbits;
unsigned long i_blksize;
unsigned long i_version;
- unsigned long i_blocks;
+ unsigned long long i_blocks;
unsigned short i_bytes;
spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */
struct semaphore i_sem;
diff -uprN -X linux-2.6.14.org/Documentation/dontdiff linux-2.6.14.or
g/include/linux/stat.h linux-2.6.14-blocks/include/linux/stat.h
--- linux-2.6.14.org/include/linux/stat.h 2005-10-28 09:02:08.000000000 +0900
+++ linux-2.6.14-blocks/include/linux/stat.h 2005-11-18 17:08:56.000000000 +0900
@@ -69,7 +69,7 @@ struct kstat {
struct timespec mtime;
struct timespec ctime;
unsigned long blksize;
- unsigned long blocks;
+ unsigned long long blocks;
};

#endif

Any feedback and comments are welcome.

Best regards, Takashi Sato


2005-12-01 12:39:51

by Jörn Engel

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

On Thu, 1 December 2005 21:00:26 +0900, Takashi Sato wrote:
>
> I found a problem at stat64 on 32bit architecture.
>
> When I called stat64 for a file which is larger than 2TB, stat64
> returned an invalid number of blocks at st_blocks on 32bit
> architecture, although it returned a valid number of blocks on 64bit
> architecture(ia64).

My take was to simply hold a u64 in the fs-private inode structure and
use ULONG_MAX for inode->i_blocks in case of an overflow. Also has
the nice advantage of working with fs-sized blocks, not 512-byte ones:
inode->i_blocks = ULONG_MAX;
if (li->li_blocks<<3 < ULONG_MAX)
inode->i_blocks = li->li_blocks<<3;

That said, your solution appears to be much better, as long as it
doesnt subtly break binary compatibility.

J?rn

--
ticks = jiffies;
while (ticks == jiffies);
ticks = jiffies;
-- /usr/src/linux/init/main.c

2005-12-01 12:51:58

by Jörn Engel

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

On Thu, 1 December 2005 21:00:26 +0900, Takashi Sato wrote:
>
> diff -uprN -X linux-2.6.14.org/Documentation/dontdiff linux-2.6.14.or
> g/include/asm-i386/stat.h linux-2.6.14-blocks/include/asm-i386/stat.h
> --- linux-2.6.14.org/include/asm-i386/stat.h 2005-10-28 09:02:08.000000000
> +0900
> +++ linux-2.6.14-blocks/include/asm-i386/stat.h 2005-11-18
> 22:42:37.000000000 +0900
> @@ -58,8 +58,7 @@ struct stat64 {
> long long st_size;
> unsigned long st_blksize;
>
> - unsigned long st_blocks; /* Number 512-byte blocks allocated. */
> - unsigned long __pad4; /* future possible st_blocks high bits */
> + unsigned long long st_blocks; /* Number 512-byte blocks allocated. */

After a closer look: have you tested this on a big-endian machine as
well? This heavily smells like it will work one one endianness only.

J?rn

--
The only real mistake is the one from which we learn nothing.
-- John Powell

2005-12-01 13:53:13

by Avi Kivity

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

J?rn Engel wrote:

>On Thu, 1 December 2005 21:00:26 +0900, Takashi Sato wrote:
>
>
>>diff -uprN -X linux-2.6.14.org/Documentation/dontdiff linux-2.6.14.or
>>g/include/asm-i386/stat.h linux-2.6.14-blocks/include/asm-i386/stat.h
>>--- linux-2.6.14.org/include/asm-i386/stat.h 2005-10-28 09:02:08.000000000
>>+0900
>>+++ linux-2.6.14-blocks/include/asm-i386/stat.h 2005-11-18
>>22:42:37.000000000 +0900
>>@@ -58,8 +58,7 @@ struct stat64 {
>> long long st_size;
>> unsigned long st_blksize;
>>
>>- unsigned long st_blocks; /* Number 512-byte blocks allocated. */
>>- unsigned long __pad4; /* future possible st_blocks high bits */
>>+ unsigned long long st_blocks; /* Number 512-byte blocks allocated. */
>>
>>
>
>After a closer look: have you tested this on a big-endian machine as
>well? This heavily smells like it will work one one endianness only.
>
>
>
It's in asm-i386, which is always little endian.

2005-12-01 14:32:27

by Dave Kleikamp

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

On Thu, 2005-12-01 at 21:00 +0900, Takashi Sato wrote:
> Hi all,
>
> I found a problem at stat64 on 32bit architecture.
>
> When I called stat64 for a file which is larger than 2TB, stat64
> returned an invalid number of blocks at st_blocks on 32bit
> architecture, although it returned a valid number of blocks on 64bit
> architecture(ia64).

For jfs, it's a bigger problem than just stat64. When writing the inode
to disk, jfs calculates the number of blocks from the 32-bit value:
dip->di_nblocks = cpu_to_le64(PBLK2LBLK(ip->i_sb, ip->i_blocks))

So it won't only report the wrong number of blocks, but it will actually
store the wrong number. :-(

> The following describes the cause of this issue:
> i_blocks in inode is 4bytes on 32bit architecture. If it receives
> more than 2^32 number of blocks, it would overflow and set an
> invalid number to st_blocks.
>
> Below describes a sequence of setting overflowed inode.i_blocks
> to st_blocks through stat64.
>
> 1. generic_fillattr(struct inode *inode, struct kstat *stat)
> - Copy data from overflowed inode.i_blocks to kstat.blocks.
>
> 2. vfs_getattr(struct vfsmount *mnt, struct dentry *dentry,
> struct kstat *stat)
> - Return invalid kstat.blocks to sys_stat64().
>
> 3. sys_stat64(char __user * filename, struct stat64 __user * statbuf)
> - Copy data from invalid kstat.blocks to stat64.st_blocks.
>
> I also found the following problem.
>
> - ioctl with FIOQSIZE command returns the size of file's data which
> has written to disk. The size of file's data is calculated as
> follows in inode_get_bytes().
>
> (((loff_t)inode->i_blocks) << 9) + inode->i_bytes
>
> On the file which is larger than 2TB, the ioctl will return an
> invalid size because i_blocks can't express the right number of
> blocks.
>
> I think the following modification is essential to fix these
> problems.
>
> 1. Change the type of inode.i_blocks and kstat.blocks from unsigned
> long to unsigned long long.

This would be okay.

> 2. Change the type of architecture dependent stat64.st_blocks in
> include/asm/asm-*/stat.h from unsigned long to unsigned long long.
> I tried modifying only stat64 of 32bit architecture
> (include/asm-i386/stat.h).

This changes the API, but the structure does suggest that the 4-byte pad
should be used for the high-order bytes of st_blocks, so that's not
really a problem. A correct fix would replace __pad4 with
st_blocks_high (or something like that) and ensure that the high-order
word was stored there. Your proposed fix would only be correct on
little-endian hardware, as J?rn pointed out.

> I have some tested for a file whose size is 3TB on JFS filesystem.
> The following is the patch.
>
> Signed-off-by: Takashi Sato <[email protected]>
>
> diff -uprN -X linux-2.6.14.org/Documentation/dontdiff linux-2.6.14.or
> g/include/asm-i386/stat.h linux-2.6.14-blocks/include/asm-i386/stat.h
> --- linux-2.6.14.org/include/asm-i386/stat.h 2005-10-28 09:02:08.000000000 +0900
> +++ linux-2.6.14-blocks/include/asm-i386/stat.h 2005-11-18 22:42:37.000000000 +0900
> @@ -58,8 +58,7 @@ struct stat64 {
> long long st_size;
> unsigned long st_blksize;
>
> - unsigned long st_blocks; /* Number 512-byte blocks allocated. */
> - unsigned long __pad4; /* future possible st_blocks high bits */
> + unsigned long long st_blocks; /* Number 512-byte blocks allocated. */
>
> unsigned long st_atime;
> unsigned long st_atime_nsec;
> diff -uprN -X linux-2.6.14.org/Documentation/dontdiff linux-2.6.14.or
> g/include/linux/fs.h linux-2.6.14-blocks/include/linux/fs.h
> --- linux-2.6.14.org/include/linux/fs.h 2005-10-28 09:02:08.000000000 +0900
> +++ linux-2.6.14-blocks/include/linux/fs.h 2005-11-18 17:08:03.000000000 +0900
> @@ -438,7 +438,7 @@ struct inode {
> unsigned int i_blkbits;
> unsigned long i_blksize;
> unsigned long i_version;
> - unsigned long i_blocks;
> + unsigned long long i_blocks;
> unsigned short i_bytes;
> spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */
> struct semaphore i_sem;
> diff -uprN -X linux-2.6.14.org/Documentation/dontdiff linux-2.6.14.or
> g/include/linux/stat.h linux-2.6.14-blocks/include/linux/stat.h
> --- linux-2.6.14.org/include/linux/stat.h 2005-10-28 09:02:08.000000000 +0900
> +++ linux-2.6.14-blocks/include/linux/stat.h 2005-11-18 17:08:56.000000000 +0900
> @@ -69,7 +69,7 @@ struct kstat {
> struct timespec mtime;
> struct timespec ctime;
> unsigned long blksize;
> - unsigned long blocks;
> + unsigned long long blocks;
> };
>
> #endif
>
> Any feedback and comments are welcome.
>
> Best regards, Takashi Sato
--
David Kleikamp
IBM Linux Technology Center

2005-12-01 14:53:09

by Al Viro

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

On Thu, Dec 01, 2005 at 09:00:26PM +0900, Takashi Sato wrote:
> 2. Change the type of architecture dependent stat64.st_blocks in
> include/asm/asm-*/stat.h from unsigned long to unsigned long long.
> I tried modifying only stat64 of 32bit architecture
> (include/asm-i386/stat.h).

... watch libc have kittens on big-endian architectures subjected
to that treatment.

2005-12-02 13:18:19

by Takashi Sato

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

Hi,

> > Hi all,
> >
> > I found a problem at stat64 on 32bit architecture.
> >
> > When I called stat64 for a file which is larger than 2TB, stat64
> > returned an invalid number of blocks at st_blocks on 32bit
> > architecture, although it returned a valid number of blocks on 64bit
> > architecture(ia64).
>
> For jfs, it's a bigger problem than just stat64. When writing the inode
> to disk, jfs calculates the number of blocks from the 32-bit value:
> dip->di_nblocks = cpu_to_le64(PBLK2LBLK(ip->i_sb, ip->i_blocks))
>
> So it won't only report the wrong number of blocks, but it will actually
> store the wrong number. :-(

I also found another problem on generic quota code. In
dquot_transfer(), the file usage is calculated from i_blocks via
inode_get_bytes(). If the file is over 2TB, the change of usage is
less than expected.

To solve this problem, I think inode.i_blocks should be 8 byte.

>> 2. Change the type of architecture dependent stat64.st_blocks in
>> include/asm/asm-*/stat.h from unsigned long to unsigned long long.
>> I tried modifying only stat64 of 32bit architecture
>> (include/asm-i386/stat.h).
>
>This changes the API, but the structure does suggest that the 4-byte pad
>should be used for the high-order bytes of st_blocks, so that's not
>really a problem. A correct fix would replace __pad4 with
>st_blocks_high (or something like that) and ensure that the high-order
>word was stored there. Your proposed fix would only be correct on
>little-endian hardware, as J?rn pointed out.

Thank you for your advice. I'll research for glibc and consider
how to implement.
By the way I think, as Avi Kivity said, it's always little-endian on i386,
is it correct?

-- Takashi Sato

2005-12-02 14:11:58

by Dave Kleikamp

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

On Fri, 2005-12-02 at 22:18 +0900, Takashi Sato wrote:
> >> 2. Change the type of architecture dependent stat64.st_blocks in
> >> include/asm/asm-*/stat.h from unsigned long to unsigned long long.
> >> I tried modifying only stat64 of 32bit architecture
> >> (include/asm-i386/stat.h).
> >
> >This changes the API, but the structure does suggest that the 4-byte pad
> >should be used for the high-order bytes of st_blocks, so that's not
> >really a problem. A correct fix would replace __pad4 with
> >st_blocks_high (or something like that) and ensure that the high-order
> >word was stored there. Your proposed fix would only be correct on
> >little-endian hardware, as J?rn pointed out.
>
> Thank you for your advice. I'll research for glibc and consider
> how to implement.
> By the way I think, as Avi Kivity said, it's always little-endian on i386,
> is it correct?

That's true. The patch does fix i386 without any bad side-effects. A
generic fix that would fix all architectures would be a little more
complicated, since the size of st_blocks varies. 32-bit big-endian
would have to explicitly copy the high and low words. (The first time I
looked at this, I ignored the fact that the change was in asm-i386.)

Shaggy
--
David Kleikamp
IBM Linux Technology Center

2005-12-02 18:58:33

by Andreas Dilger

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

On Dec 02, 2005 22:18 +0900, Takashi Sato wrote:
> I also found another problem on generic quota code. In
> dquot_transfer(), the file usage is calculated from i_blocks via
> inode_get_bytes(). If the file is over 2TB, the change of usage is
> less than expected.
>
> To solve this problem, I think inode.i_blocks should be 8 byte.

Actually, it should probably be "sector_t", because it isn't really
possible to have a file with more blocks than the size of the block
device. This avoids memory overhead for small systems that have no
need for it in a very highly-used struct. It may be for some network
filesystems that support gigantic non-sparse files they would need to
enable CONFIG_LBD in order to get support for this.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2005-12-03 02:17:08

by Bodo Eggert

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

Takashi Sato <[email protected]> wrote:

> 1. Change the type of inode.i_blocks and kstat.blocks from unsigned
> long?to?unsigned?long?long.

I think the u64 data type should be used. ??
--
Ich danke GMX daf?r, die Verwendung meiner Adressen mittels per SPF
verbreiteten L?gen zu sabotieren.

2005-12-03 13:01:06

by Takashi Sato

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

Hi,

> On Dec 02, 2005 22:18 +0900, Takashi Sato wrote:
>> I also found another problem on generic quota code. In
>> dquot_transfer(), the file usage is calculated from i_blocks via
>> inode_get_bytes(). If the file is over 2TB, the change of usage is
>> less than expected.
>>
>> To solve this problem, I think inode.i_blocks should be 8 byte.
>
> Actually, it should probably be "sector_t", because it isn't really
> possible to have a file with more blocks than the size of the block
> device. This avoids memory overhead for small systems that have no
> need for it in a very highly-used struct. It may be for some network
> filesystems that support gigantic non-sparse files they would need to
> enable CONFIG_LBD in order to get support for this.

I think sector_t is ok for local filesystem as you said. However,
on NFS, there may be over 2TB file on server side, and inode.i_blocks
for over 2TB file will become invalid on client side in case CONFIG_LBD
is disabled.

So, I think inode.i_blocks should be a type of 8 byte
(ex. unsigned long long). How does it look?

-- Takashi Sato

2005-12-05 08:11:22

by Andreas Dilger

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

On Dec 03, 2005 22:00 +0900, Takashi Sato wrote:
> Andreas Dilger wrote:
> >Actually, it should probably be "sector_t", because it isn't really
> >possible to have a file with more blocks than the size of the block
> >device. This avoids memory overhead for small systems that have no
> >need for it in a very highly-used struct. It may be for some network
> >filesystems that support gigantic non-sparse files they would need to
> >enable CONFIG_LBD in order to get support for this.
>
> I think sector_t is ok for local filesystem as you said. However,
> on NFS, there may be over 2TB file on server side, and inode.i_blocks
> for over 2TB file will become invalid on client side in case CONFIG_LBD
> is disabled.

I don't know the exact specs of NFS v2 and v3, but I doubt they can have
single files larger than 2TB. Even if they could then this is not a
very common situation and if someone is running in such an environment
then they can easily enable CONFIG_LBD (or make e.g. CONFIG_NFS_V4 have
a dependency to enable this if it is important enough). What I'd rather
avoid is needless growth of heavily-used structures for rather uncommon
cases (at the current time at least, this can be re-examined later).

It might also be possible to have a separate CONFIG_LSF (or whatever)
that enables support for large single files, maybe enabled by default
with CONFIG_LBD and also configurable separately for clients of network
filesystems with large single files. Someone who cares more about the
proliferation of configuration options than I can decide whether it
makes sense to keep these as separate options.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2005-12-05 12:35:42

by Takashi Sato

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

Hi,

> On Dec 03, 2005 22:00 +0900, Takashi Sato wrote:
>> Andreas Dilger wrote:
>> >Actually, it should probably be "sector_t", because it isn't really
>> >possible to have a file with more blocks than the size of the block
>> >device. This avoids memory overhead for small systems that have no
>> >need for it in a very highly-used struct. It may be for some network
>> >filesystems that support gigantic non-sparse files they would need to
>> >enable CONFIG_LBD in order to get support for this.
>>
>> I think sector_t is ok for local filesystem as you said. However,
>> on NFS, there may be over 2TB file on server side, and inode.i_blocks
>> for over 2TB file will become invalid on client side in case CONFIG_LBD
>> is disabled.
>
> I don't know the exact specs of NFS v2 and v3, but I doubt they can have
> single files larger than 2TB. Even if they could then this is not a

I believe NFS v3 can handle over 2TB file. :-)

> very common situation and if someone is running in such an environment
> then they can easily enable CONFIG_LBD (or make e.g. CONFIG_NFS_V4 have
> a dependency to enable this if it is important enough). What I'd rather
> avoid is needless growth of heavily-used structures for rather uncommon
> cases (at the current time at least, this can be re-examined later).

I agree above.

> It might also be possible to have a separate CONFIG_LSF (or whatever)
> that enables support for large single files, maybe enabled by default
> with CONFIG_LBD and also configurable separately for clients of network
> filesystems with large single files. Someone who cares more about the
> proliferation of configuration options than I can decide whether it
> makes sense to keep these as separate options.

I don't think CONFIG_LBD should determine whether large single
files is enabled. So, I agree your suggestion basically.
It would be nice if there was a suitable configuration option for it.
I will examine the implementation in detail.

-- Takashi Sato

2005-12-05 13:34:39

by Trond Myklebust

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

On Mon, 2005-12-05 at 01:11 -0700, Andreas Dilger wrote:
> I don't know the exact specs of NFS v2 and v3, but I doubt they can have
> single files larger than 2TB.

The NFSv3 protocol supports file lengths up to and including 2^64,
however on 32-bit Linux clients, we're limited by the page cache's
inability to actually address more than 16TB.

Cheers,
Trond

2005-12-06 12:42:48

by Takashi Sato

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

Hi,

> > >> 2. Change the type of architecture dependent stat64.st_blocks in
> > >> include/asm/asm-*/stat.h from unsigned long to unsigned long long.
> > >> I tried modifying only stat64 of 32bit architecture
> > >> (include/asm-i386/stat.h).
> > >
> > >This changes the API, but the structure does suggest that the 4-byte pad
> > >should be used for the high-order bytes of st_blocks, so that's not
> > >really a problem. A correct fix would replace __pad4 with
> > >st_blocks_high (or something like that) and ensure that the high-order
> > >word was stored there. Your proposed fix would only be correct on
> > >little-endian hardware, as J?rn pointed out.
> >
> > Thank you for your advice. I'll research for glibc and consider
> > how to implement.
> > By the way I think, as Avi Kivity said, it's always little-endian on i386,
> > is it correct?
>
> That's true. The patch does fix i386 without any bad side-effects. A
> generic fix that would fix all architectures would be a little more
> complicated, since the size of st_blocks varies. 32-bit big-endian
> would have to explicitly copy the high and low words. (The first time I
> looked at this, I ignored the fact that the change was in asm-i386.)

I realized some 32-bit big-endian architectures such as sh and m68k
have a padding before 32-bit st_blocks, though mips and ppc have
64-bit st_blocks.

- asm-sh
#if defined(__BIG_ENDIAN__)
unsigned long __pad4; /* Future possible st_blocks hi bits */
unsigned long st_blocks; /* Number 512-byte blocks allocated. */
#else /* Must be little */
unsigned long st_blocks; /* Number 512-byte blocks allocated. */
unsigned long __pad4; /* Future possible st_blocks hi bits */
#endif

- asm-m68k
unsigned long __pad4; /* future possible st_blocks high bits */
unsigned long st_blocks; /* Number 512-byte blocks allocated. */

So I updated the patch. Any feedback and comments are welcome.

Signed-off-by: Takashi Sato <[email protected]>
Signed-off-by: ASANO Masahiro <[email protected]>

diff -uprN -X linux-2.6.15-rc5.org/Documentation/dontdiff linux-2.6.15-rc5.org/include/asm-i386/stat.h
linux-2.6.15-rc5-blocks/include/asm-i386/stat.h
--- linux-2.6.15-rc5.org/include/asm-i386/stat.h 2005-10-28 09:02:08.000000000 +0900
+++ linux-2.6.15-rc5-blocks/include/asm-i386/stat.h 2005-12-06 16:24:31.000000000 +0900
@@ -58,8 +58,7 @@ struct stat64 {
long long st_size;
unsigned long st_blksize;

- unsigned long st_blocks; /* Number 512-byte blocks allocated. */
- unsigned long __pad4; /* future possible st_blocks high bits */
+ unsigned long long st_blocks; /* Number 512-byte blocks allocated. */

unsigned long st_atime;
unsigned long st_atime_nsec;
diff -uprN -X linux-2.6.15-rc5.org/Documentation/dontdiff linux-2.6.15-rc5.org/include/asm-m68k/stat.h
linux-2.6.15-rc5-blocks/include/asm-m68k/stat.h
--- linux-2.6.15-rc5.org/include/asm-m68k/stat.h 2005-10-28 09:02:08.000000000 +0900
+++ linux-2.6.15-rc5-blocks/include/asm-m68k/stat.h 2005-12-06 16:29:50.000000000 +0900
@@ -60,8 +60,7 @@ struct stat64 {
long long st_size;
unsigned long st_blksize;

- unsigned long __pad4; /* future possible st_blocks high bits */
- unsigned long st_blocks; /* Number 512-byte blocks allocated. */
+ unsigned long long st_blocks; /* Number 512-byte blocks allocated. */

unsigned long st_atime;
unsigned long st_atime_nsec;
diff -uprN -X linux-2.6.15-rc5.org/Documentation/dontdiff linux-2.6.15-rc5.org/include/asm-sh/stat.h
linux-2.6.15-rc5-blocks/include/asm-sh/stat.h
--- linux-2.6.15-rc5.org/include/asm-sh/stat.h 2005-10-28 09:02:08.000000000 +0900
+++ linux-2.6.15-rc5-blocks/include/asm-sh/stat.h 2005-12-06 16:28:37.000000000 +0900
@@ -60,13 +60,7 @@ struct stat64 {
long long st_size;
unsigned long st_blksize;

-#if defined(__BIG_ENDIAN__)
- unsigned long __pad4; /* Future possible st_blocks hi bits */
- unsigned long st_blocks; /* Number 512-byte blocks allocated. */
-#else /* Must be little */
- unsigned long st_blocks; /* Number 512-byte blocks allocated. */
- unsigned long __pad4; /* Future possible st_blocks hi bits */
-#endif
+ unsigned long long st_blocks; /* Number 512-byte blocks allocated. */

unsigned long st_atime;
unsigned long st_atime_nsec;
diff -uprN -X linux-2.6.15-rc5.org/Documentation/dontdiff linux-2.6.15-rc5.org/include/linux/fs.h
linux-2.6.15-rc5-blocks/include/linux/fs.h
--- linux-2.6.15-rc5.org/include/linux/fs.h 2005-12-06 16:20:21.000000000 +0900
+++ linux-2.6.15-rc5-blocks/include/linux/fs.h 2005-12-06 16:26:01.000000000 +0900
@@ -450,7 +450,7 @@ struct inode {
unsigned int i_blkbits;
unsigned long i_blksize;
unsigned long i_version;
- unsigned long i_blocks;
+ unsigned long long i_blocks;
unsigned short i_bytes;
spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */
struct semaphore i_sem;
diff -uprN -X linux-2.6.15-rc5.org/Documentation/dontdiff linux-2.6.15-rc5.org/include/linux/stat.h
linux-2.6.15-rc5-blocks/include/linux/stat.h
--- linux-2.6.15-rc5.org/include/linux/stat.h 2005-10-28 09:02:08.000000000 +0900
+++ linux-2.6.15-rc5-blocks/include/linux/stat.h 2005-12-06 16:26:26.000000000 +0900
@@ -69,7 +69,7 @@ struct kstat {
struct timespec mtime;
struct timespec ctime;
unsigned long blksize;
- unsigned long blocks;
+ unsigned long long blocks;
};

#endif

-- Takashi Sato


2005-12-06 14:30:38

by Dave Kleikamp

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

On Tue, 2005-12-06 at 21:42 +0900, Takashi Sato wrote:
> I realized some 32-bit big-endian architectures such as sh and m68k
> have a padding before 32-bit st_blocks, though mips and ppc have
> 64-bit st_blocks.
>
> - asm-sh
> #if defined(__BIG_ENDIAN__)
> unsigned long __pad4; /* Future possible st_blocks hi bits */
> unsigned long st_blocks; /* Number 512-byte blocks allocated. */
> #else /* Must be little */
> unsigned long st_blocks; /* Number 512-byte blocks allocated. */
> unsigned long __pad4; /* Future possible st_blocks hi bits */
> #endif
>
> - asm-m68k
> unsigned long __pad4; /* future possible st_blocks high bits */
> unsigned long st_blocks; /* Number 512-byte blocks allocated. */
>
> So I updated the patch. Any feedback and comments are welcome.

I think it looks good. The only issue I have is that I agree with
Andreas that i_blocks should be of type sector_t. I find the case of
accessing very large files over nfs with CONFIG_LBD disabled to be very
unlikely.

> Signed-off-by: Takashi Sato <[email protected]>
> Signed-off-by: ASANO Masahiro <[email protected]>
>
> diff -uprN -X linux-2.6.15-rc5.org/Documentation/dontdiff linux-2.6.15-rc5.org/include/asm-i386/stat.h
> linux-2.6.15-rc5-blocks/include/asm-i386/stat.h
> --- linux-2.6.15-rc5.org/include/asm-i386/stat.h 2005-10-28 09:02:08.000000000 +0900
> +++ linux-2.6.15-rc5-blocks/include/asm-i386/stat.h 2005-12-06 16:24:31.000000000 +0900
> @@ -58,8 +58,7 @@ struct stat64 {
> long long st_size;
> unsigned long st_blksize;
>
> - unsigned long st_blocks; /* Number 512-byte blocks allocated. */
> - unsigned long __pad4; /* future possible st_blocks high bits */
> + unsigned long long st_blocks; /* Number 512-byte blocks allocated. */
>
> unsigned long st_atime;
> unsigned long st_atime_nsec;
> diff -uprN -X linux-2.6.15-rc5.org/Documentation/dontdiff linux-2.6.15-rc5.org/include/asm-m68k/stat.h
> linux-2.6.15-rc5-blocks/include/asm-m68k/stat.h
> --- linux-2.6.15-rc5.org/include/asm-m68k/stat.h 2005-10-28 09:02:08.000000000 +0900
> +++ linux-2.6.15-rc5-blocks/include/asm-m68k/stat.h 2005-12-06 16:29:50.000000000 +0900
> @@ -60,8 +60,7 @@ struct stat64 {
> long long st_size;
> unsigned long st_blksize;
>
> - unsigned long __pad4; /* future possible st_blocks high bits */
> - unsigned long st_blocks; /* Number 512-byte blocks allocated. */
> + unsigned long long st_blocks; /* Number 512-byte blocks allocated. */
>
> unsigned long st_atime;
> unsigned long st_atime_nsec;
> diff -uprN -X linux-2.6.15-rc5.org/Documentation/dontdiff linux-2.6.15-rc5.org/include/asm-sh/stat.h
> linux-2.6.15-rc5-blocks/include/asm-sh/stat.h
> --- linux-2.6.15-rc5.org/include/asm-sh/stat.h 2005-10-28 09:02:08.000000000 +0900
> +++ linux-2.6.15-rc5-blocks/include/asm-sh/stat.h 2005-12-06 16:28:37.000000000 +0900
> @@ -60,13 +60,7 @@ struct stat64 {
> long long st_size;
> unsigned long st_blksize;
>
> -#if defined(__BIG_ENDIAN__)
> - unsigned long __pad4; /* Future possible st_blocks hi bits */
> - unsigned long st_blocks; /* Number 512-byte blocks allocated. */
> -#else /* Must be little */
> - unsigned long st_blocks; /* Number 512-byte blocks allocated. */
> - unsigned long __pad4; /* Future possible st_blocks hi bits */
> -#endif
> + unsigned long long st_blocks; /* Number 512-byte blocks allocated. */
>
> unsigned long st_atime;
> unsigned long st_atime_nsec;
> diff -uprN -X linux-2.6.15-rc5.org/Documentation/dontdiff linux-2.6.15-rc5.org/include/linux/fs.h
> linux-2.6.15-rc5-blocks/include/linux/fs.h
> --- linux-2.6.15-rc5.org/include/linux/fs.h 2005-12-06 16:20:21.000000000 +0900
> +++ linux-2.6.15-rc5-blocks/include/linux/fs.h 2005-12-06 16:26:01.000000000 +0900
> @@ -450,7 +450,7 @@ struct inode {
> unsigned int i_blkbits;
> unsigned long i_blksize;
> unsigned long i_version;
> - unsigned long i_blocks;
> + unsigned long long i_blocks;
> unsigned short i_bytes;
> spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */
> struct semaphore i_sem;
> diff -uprN -X linux-2.6.15-rc5.org/Documentation/dontdiff linux-2.6.15-rc5.org/include/linux/stat.h
> linux-2.6.15-rc5-blocks/include/linux/stat.h
> --- linux-2.6.15-rc5.org/include/linux/stat.h 2005-10-28 09:02:08.000000000 +0900
> +++ linux-2.6.15-rc5-blocks/include/linux/stat.h 2005-12-06 16:26:26.000000000 +0900
> @@ -69,7 +69,7 @@ struct kstat {
> struct timespec mtime;
> struct timespec ctime;
> unsigned long blksize;
> - unsigned long blocks;
> + unsigned long long blocks;
> };
>
> #endif
>
> -- Takashi Sato
--
David Kleikamp
IBM Linux Technology Center

2005-12-06 14:48:55

by Trond Myklebust

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

On Tue, 2005-12-06 at 08:30 -0600, Dave Kleikamp wrote:
> On Tue, 2005-12-06 at 21:42 +0900, Takashi Sato wrote:
> > I realized some 32-bit big-endian architectures such as sh and m68k
> > have a padding before 32-bit st_blocks, though mips and ppc have
> > 64-bit st_blocks.
> >
> > - asm-sh
> > #if defined(__BIG_ENDIAN__)
> > unsigned long __pad4; /* Future possible st_blocks hi bits */
> > unsigned long st_blocks; /* Number 512-byte blocks allocated. */
> > #else /* Must be little */
> > unsigned long st_blocks; /* Number 512-byte blocks allocated. */
> > unsigned long __pad4; /* Future possible st_blocks hi bits */
> > #endif
> >
> > - asm-m68k
> > unsigned long __pad4; /* future possible st_blocks high bits */
> > unsigned long st_blocks; /* Number 512-byte blocks allocated. */
> >
> > So I updated the patch. Any feedback and comments are welcome.
>
> I think it looks good. The only issue I have is that I agree with
> Andreas that i_blocks should be of type sector_t. I find the case of
> accessing very large files over nfs with CONFIG_LBD disabled to be very
> unlikely.

NO! sector_t is a block-device specific type. It does not belong in the
generic inode.

Cheers,
Trond

2005-12-06 14:51:58

by Dave Kleikamp

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

On Tue, 2005-12-06 at 09:48 -0500, Trond Myklebust wrote:
> On Tue, 2005-12-06 at 08:30 -0600, Dave Kleikamp wrote:
> > On Tue, 2005-12-06 at 21:42 +0900, Takashi Sato wrote:
> > > So I updated the patch. Any feedback and comments are welcome.
> >
> > I think it looks good. The only issue I have is that I agree with
> > Andreas that i_blocks should be of type sector_t. I find the case of
> > accessing very large files over nfs with CONFIG_LBD disabled to be very
> > unlikely.
>
> NO! sector_t is a block-device specific type. It does not belong in the
> generic inode.

OK. I withdraw my objection. The patch looks good to me.
--
David Kleikamp
IBM Linux Technology Center

2005-12-06 21:24:18

by Andreas Dilger

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

On Dec 06, 2005 09:48 -0500, Trond Myklebust wrote:
> On Tue, 2005-12-06 at 08:30 -0600, Dave Kleikamp wrote:
> > I think it looks good. The only issue I have is that I agree with
> > Andreas that i_blocks should be of type sector_t. I find the case of
> > accessing very large files over nfs with CONFIG_LBD disabled to be very
> > unlikely.
>
> NO! sector_t is a block-device specific type. It does not belong in the
> generic inode.

sector_t would imply "units of 512-byte sectors", and this is exactly
what i_blocks is actually measuring, so I don't really understand your
objection.

If you have objection to the use of sector_t, it could be some other type
that is defined virtually identically as CONFIG_LBD sector_t, except that
it might be desirable to allow it to be configured separately for network
filesystems that have large files. I'm sure the embedded linux folks
wouldn't be thrilled at an extra 4 bytes in every inode and 64-bit math
if they don't really use it.

Even in HPC very few users have many-TB files, and Lustre is one of the few
filesystems that can actually do this today. We of course would enable
this option for our kernels, but I don't want to force it upon everyone.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2005-12-07 01:00:08

by Trond Myklebust

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

On Tue, 2005-12-06 at 14:24 -0700, Andreas Dilger wrote:
> On Dec 06, 2005 09:48 -0500, Trond Myklebust wrote:
> > On Tue, 2005-12-06 at 08:30 -0600, Dave Kleikamp wrote:
> > > I think it looks good. The only issue I have is that I agree with
> > > Andreas that i_blocks should be of type sector_t. I find the case of
> > > accessing very large files over nfs with CONFIG_LBD disabled to be very
> > > unlikely.
> >
> > NO! sector_t is a block-device specific type. It does not belong in the
> > generic inode.
>
> sector_t would imply "units of 512-byte sectors", and this is exactly
> what i_blocks is actually measuring, so I don't really understand your
> objection.

Strictly speaking, sector_t is a block offset that happens to be in
"units of 1<<inode->i_blkbits bytes". It is not a count of the number of
blocks in a file.

> If you have objection to the use of sector_t, it could be some other type
> that is defined virtually identically as CONFIG_LBD sector_t, except that
> it might be desirable to allow it to be configured separately for network
> filesystems that have large files. I'm sure the embedded linux folks
> wouldn't be thrilled at an extra 4 bytes in every inode and 64-bit math
> if they don't really use it.

That would be fine.

Cheers,
Trond

2005-12-07 10:57:46

by Takashi Sato

[permalink] [raw]
Subject: RE: stat64 for over 2TB file returned invalid st_blocks

Hi,

> > On Tue, 2005-12-06 at 08:30 -0600, Dave Kleikamp wrote:
> > > I think it looks good. The only issue I have is that I agree with
> > > Andreas that i_blocks should be of type sector_t. I find the case
> > > of accessing very large files over nfs with CONFIG_LBD disabled to
> > > be very unlikely.
> >
> > NO! sector_t is a block-device specific type. It does not belong in
> > the generic inode.
>
> sector_t would imply "units of 512-byte sectors", and this is exactly
> what i_blocks is actually measuring, so I don't really understand your
> objection.
>
> If you have objection to the use of sector_t, it could be some other
> type that is defined virtually identically as CONFIG_LBD sector_t,
> except that it might be desirable to allow it to be configured
> separately for network filesystems that have large files. I'm sure
> the embedded linux folks wouldn't be thrilled at an extra 4 bytes in
> every inode and 64-bit math if they don't really use it.
>
> Even in HPC very few users have many-TB files, and Lustre is one of
> the few filesystems that can actually do this today. We of course
> would enable this option for our kernels, but I don't want to force it
> upon everyone.

On my previous mail, I said that CONFIG_LBD should not determine
whether large single files is enabled. But after further
consideration, on such a small system that CONFIG_LBD is disabled,
using large filesystem over network seems to be very rare.
So I think that the type of i_blocks should be sector_t.

Below shows the updated patch.

This patch fixes st_blocks of >2TB file on 32-bit linux.
- inode.i_blocks
changes to sector_t whose size depends on CONFIG_LBD.
- kstat.blocks
changes to `unsigned long long'.
- stat64.st_blocks
changes to `unsigned long long' on i386, m68k and sh.

Any feedback and comments are welcome.

Signed-off-by: Takashi Sato <[email protected]>

diff -uprN -X linux-2.6.15-rc5.org/Documentation/dontdiff linux-2.6.15-rc5.org/include/asm-i386/stat.h
linux-2.6.15-rc5-blocks/include/asm-i386/stat.h
--- linux-2.6.15-rc5.org/include/asm-i386/stat.h 2005-10-28 09:02:08.000000000 +0900
+++ linux-2.6.15-rc5-blocks/include/asm-i386/stat.h 2005-12-06 16:24:31.000000000 +0900
@@ -58,8 +58,7 @@ struct stat64 {
long long st_size;
unsigned long st_blksize;

- unsigned long st_blocks; /* Number 512-byte blocks allocated. */
- unsigned long __pad4; /* future possible st_blocks high bits */
+ unsigned long long st_blocks; /* Number 512-byte blocks allocated. */

unsigned long st_atime;
unsigned long st_atime_nsec;
diff -uprN -X linux-2.6.15-rc5.org/Documentation/dontdiff linux-2.6.15-rc5.org/include/asm-m68k/stat.h
linux-2.6.15-rc5-blocks/include/asm-m68k/stat.h
--- linux-2.6.15-rc5.org/include/asm-m68k/stat.h 2005-10-28 09:02:08.000000000 +0900
+++ linux-2.6.15-rc5-blocks/include/asm-m68k/stat.h 2005-12-06 16:29:50.000000000 +0900
@@ -60,8 +60,7 @@ struct stat64 {
long long st_size;
unsigned long st_blksize;

- unsigned long __pad4; /* future possible st_blocks high bits */
- unsigned long st_blocks; /* Number 512-byte blocks allocated. */
+ unsigned long long st_blocks; /* Number 512-byte blocks allocated. */

unsigned long st_atime;
unsigned long st_atime_nsec;
diff -uprN -X linux-2.6.15-rc5.org/Documentation/dontdiff linux-2.6.15-rc5.org/include/asm-sh/stat.h
linux-2.6.15-rc5-blocks/include/asm-sh/stat.h
--- linux-2.6.15-rc5.org/include/asm-sh/stat.h 2005-10-28 09:02:08.000000000 +0900
+++ linux-2.6.15-rc5-blocks/include/asm-sh/stat.h 2005-12-06 16:28:37.000000000 +0900
@@ -60,13 +60,7 @@ struct stat64 {
long long st_size;
unsigned long st_blksize;

-#if defined(__BIG_ENDIAN__)
- unsigned long __pad4; /* Future possible st_blocks hi bits */
- unsigned long st_blocks; /* Number 512-byte blocks allocated. */
-#else /* Must be little */
- unsigned long st_blocks; /* Number 512-byte blocks allocated. */
- unsigned long __pad4; /* Future possible st_blocks hi bits */
-#endif
+ unsigned long long st_blocks; /* Number 512-byte blocks allocated. */

unsigned long st_atime;
unsigned long st_atime_nsec;
diff -uprN -X linux-2.6.15-rc5.org/Documentation/dontdiff linux-2.6.15-rc5.org/include/linux/fs.h
linux-2.6.15-rc5-blocks/include/linux/fs.h
--- linux-2.6.15-rc5.org/include/linux/fs.h 2005-12-06 16:20:21.000000000 +0900
+++ linux-2.6.15-rc5-blocks/include/linux/fs.h 2005-12-07 14:35:42.000000000 +0900
@@ -450,7 +450,7 @@ struct inode {
unsigned int i_blkbits;
unsigned long i_blksize;
unsigned long i_version;
- unsigned long i_blocks;
+ sector_t i_blocks;
unsigned short i_bytes;
spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */
struct semaphore i_sem;
diff -uprN -X linux-2.6.15-rc5.org/Documentation/dontdiff linux-2.6.15-rc5.org/include/linux/stat.h
linux-2.6.15-rc5-blocks/include/linux/stat.h
--- linux-2.6.15-rc5.org/include/linux/stat.h 2005-10-28 09:02:08.000000000 +0900
+++ linux-2.6.15-rc5-blocks/include/linux/stat.h 2005-12-06 16:26:26.000000000 +0900
@@ -69,7 +69,7 @@ struct kstat {
struct timespec mtime;
struct timespec ctime;
unsigned long blksize;
- unsigned long blocks;
+ unsigned long long blocks;
};

#endif

-- Takashi Sato


2005-12-07 13:52:33

by Trond Myklebust

[permalink] [raw]
Subject: RE: stat64 for over 2TB file returned invalid st_blocks

On Wed, 2005-12-07 at 19:57 +0900, Takashi Sato wrote:

> On my previous mail, I said that CONFIG_LBD should not determine
> whether large single files is enabled. But after further
> consideration, on such a small system that CONFIG_LBD is disabled,
> using large filesystem over network seems to be very rare.
> So I think that the type of i_blocks should be sector_t.

???? Where do you get this misinformation from?

Cheers,
Trond

2005-12-07 15:02:05

by Dave Kleikamp

[permalink] [raw]
Subject: RE: stat64 for over 2TB file returned invalid st_blocks

On Wed, 2005-12-07 at 08:52 -0500, Trond Myklebust wrote:
> On Wed, 2005-12-07 at 19:57 +0900, Takashi Sato wrote:
>
> > On my previous mail, I said that CONFIG_LBD should not determine
> > whether large single files is enabled. But after further
> > consideration, on such a small system that CONFIG_LBD is disabled,
> > using large filesystem over network seems to be very rare.
> > So I think that the type of i_blocks should be sector_t.
>
> ???? Where do you get this misinformation from?

Without some kind of counter-example, I would tend to agree with
Takashi. In what scenerio would someone build a kernel with CONFIG_LBD
disabled, yet would be needing to access files > 2TB across the network?

Shaggy
--
David Kleikamp
IBM Linux Technology Center

2005-12-07 15:35:39

by Trond Myklebust

[permalink] [raw]
Subject: RE: stat64 for over 2TB file returned invalid st_blocks

On Wed, 2005-12-07 at 09:01 -0600, Dave Kleikamp wrote:
> On Wed, 2005-12-07 at 08:52 -0500, Trond Myklebust wrote:
> > On Wed, 2005-12-07 at 19:57 +0900, Takashi Sato wrote:
> >
> > > On my previous mail, I said that CONFIG_LBD should not determine
> > > whether large single files is enabled. But after further
> > > consideration, on such a small system that CONFIG_LBD is disabled,
> > > using large filesystem over network seems to be very rare.
> > > So I think that the type of i_blocks should be sector_t.
> >
> > ???? Where do you get this misinformation from?
>
> Without some kind of counter-example, I would tend to agree with
> Takashi. In what scenerio would someone build a kernel with CONFIG_LBD
> disabled, yet would be needing to access files > 2TB across the network?

The word "filesystem" confused me. In my experience > 2TB filesystems
tend to be the norm for NFS shared systems these days, and so the fact
that 'struct kstatfs' uses sector_t irks me.
Access to > TB files is not unheard of in HPC circles (particularly in
the case of simulations), but these files tend to be pretty sparse, so
the block counts are not necessarily that large.

My point stands, though. whether or not sector_t may indeed be the right
size as far as i_block is concerned, a quick grep through (fs|block)/*.c
shows you that as a type, it is anything but a count of the number of
blocks in a file.

If you really want a variable size type here, then the right thing to do
is to define a __kernel_blkcnt_t or some such thing, and hide the
configuration knob for it somewhere in the arch-specific Kconfigs.

Cheers,
Trond

2005-12-07 16:34:12

by Dave Kleikamp

[permalink] [raw]
Subject: RE: stat64 for over 2TB file returned invalid st_blocks

On Wed, 2005-12-07 at 10:34 -0500, Trond Myklebust wrote:
> If you really want a variable size type here, then the right thing to do
> is to define a __kernel_blkcnt_t or some such thing, and hide the
> configuration knob for it somewhere in the arch-specific Kconfigs.

Takashi's patch does improve on what currently exists. Maybe someone
can create a separate patch to replace sector_t with blkcnt_t where it
makes sense.
--
David Kleikamp
IBM Linux Technology Center

2005-12-07 18:56:09

by Trond Myklebust

[permalink] [raw]
Subject: RE: stat64 for over 2TB file returned invalid st_blocks

VFS: Convert abuses of sector_t

The type "sector_t" is heavily tied in to the block layer interface as an
offset/handle to a block, and is subject to a supposedly block-specific
configuration option: CONFIG_LBD. Despite this, it is used in struct
kstatfs to save a couple of bytes on the stack whenever we call the
filesystems' ->statfs().

One consequence is that networked filesystems may break if CONFIG_LBD is
not set, since it is quite common to have multi-TB remote filesystems.

The following patch just converts struct kstatfs to use the standard type u64.

Signed-off-by: Trond Myklebust <[email protected]>
---

include/linux/statfs.h | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/statfs.h b/include/linux/statfs.h
index ad83a2b..b34cc82 100644
--- a/include/linux/statfs.h
+++ b/include/linux/statfs.h
@@ -8,11 +8,11 @@
struct kstatfs {
long f_type;
long f_bsize;
- sector_t f_blocks;
- sector_t f_bfree;
- sector_t f_bavail;
- sector_t f_files;
- sector_t f_ffree;
+ u64 f_blocks;
+ u64 f_bfree;
+ u64 f_bavail;
+ u64 f_files;
+ u64 f_ffree;
__kernel_fsid_t f_fsid;
long f_namelen;
long f_frsize;


Attachments:
linux-2.6.15-fix_borken_sector_t.dif (1.16 kB)

2005-12-08 11:39:06

by Takashi Sato

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

Hi,

> On Wed, 2005-12-07 at 10:34 -0500, Trond Myklebust wrote:
>> If you really want a variable size type here, then the right thing to do
>> is to define a __kernel_blkcnt_t or some such thing, and hide the
>> configuration knob for it somewhere in the arch-specific Kconfigs.
>
> Takashi's patch does improve on what currently exists. Maybe someone
> can create a separate patch to replace sector_t with blkcnt_t where it
> makes sense.

I prefer sector_t for i_blocks rather than newly defined blkcnt_t.
The reasons are:

- Both i_blocks and common sector_t are for on-disk 512-byte unit.
In this point of view, they have the same character.

- If we created the type blkcnt_t newly, the patch would have to
touch a lot of files as follows, like sector_t does.
block/Kconfig, asm-i386/types.h, asm-x86_64/types.h,
asm-ppc/types.h, asm-s390/types.h, asm-sh/types.h,
asm-h8300/types.h, asm-mips/types.h
It will be simple if we use sector_t for i_blocks.

Also, I cannot imagine the situation that > 2TB files are used over
network with CONFIG_LBD disabled kernel. Is there such a thing
realistically?

-- Takashi Sato

2005-12-08 14:27:54

by Trond Myklebust

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

On Thu, 2005-12-08 at 20:38 +0900, Takashi Sato wrote:

> I prefer sector_t for i_blocks rather than newly defined blkcnt_t.
> The reasons are:
>
> - Both i_blocks and common sector_t are for on-disk 512-byte unit.
> In this point of view, they have the same character.

One is a count of the number of blocks used by a file, and exists only
in order to help filesystems cache this value. The other is a handle to
a block. How is that the same?

> - If we created the type blkcnt_t newly, the patch would have to
> touch a lot of files as follows, like sector_t does.
> block/Kconfig, asm-i386/types.h, asm-x86_64/types.h,
> asm-ppc/types.h, asm-s390/types.h, asm-sh/types.h,
> asm-h8300/types.h, asm-mips/types.h
> It will be simple if we use sector_t for i_blocks.

That is not a particularly good reason.

> Also, I cannot imagine the situation that > 2TB files are used over
> network with CONFIG_LBD disabled kernel. Is there such a thing
> realistically?

Apart from this and the kstat wart, there is no reason to set CONFIG_LBD
for a networked filesystem. Why would you want to buy a > 2TB local disk
on an HPC cluster node if you already have a server?

I suppose we can make NFS use a private field instead, and just set
i_blocks to 0, but that's unnecessarily wasteful too.

Cheers,
Trond

2005-12-08 14:51:00

by Anton Altaparmakov

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

On Thu, 2005-12-08 at 09:27 -0500, Trond Myklebust wrote:
> On Thu, 2005-12-08 at 20:38 +0900, Takashi Sato wrote:
>
> > I prefer sector_t for i_blocks rather than newly defined blkcnt_t.
> > The reasons are:
> >
> > - Both i_blocks and common sector_t are for on-disk 512-byte unit.
> > In this point of view, they have the same character.
>
> One is a count of the number of blocks used by a file, and exists only
> in order to help filesystems cache this value. The other is a handle to
> a block. How is that the same?
>
> > - If we created the type blkcnt_t newly, the patch would have to
> > touch a lot of files as follows, like sector_t does.
> > block/Kconfig, asm-i386/types.h, asm-x86_64/types.h,
> > asm-ppc/types.h, asm-s390/types.h, asm-sh/types.h,
> > asm-h8300/types.h, asm-mips/types.h
> > It will be simple if we use sector_t for i_blocks.
>
> That is not a particularly good reason.
>
> > Also, I cannot imagine the situation that > 2TB files are used over
> > network with CONFIG_LBD disabled kernel. Is there such a thing
> > realistically?
>
> Apart from this and the kstat wart, there is no reason to set CONFIG_LBD
> for a networked filesystem. Why would you want to buy a > 2TB local disk
> on an HPC cluster node if you already have a server?
>
> I suppose we can make NFS use a private field instead, and just set
> i_blocks to 0, but that's unnecessarily wasteful too.

And it breaks applications, too (e.g. du will then report all your files
to be zero size)...

Best regards,

Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

2005-12-08 15:03:54

by Trond Myklebust

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

On Thu, 2005-12-08 at 14:50 +0000, Anton Altaparmakov wrote:

> And it breaks applications, too (e.g. du will then report all your files
> to be zero size)...

That's not a problem. We have our own ->getattr() method, so we can set
stat->blocks to the correct value. The only real side effect is
therefore that we waste space in the inode.

Cheers,
Trond

2005-12-10 11:23:09

by Takashi Sato

[permalink] [raw]
Subject: Re: stat64 for over 2TB file returned invalid st_blocks

Hi,

>> I prefer sector_t for i_blocks rather than newly defined blkcnt_t.
>> The reasons are:
>>
>> - Both i_blocks and common sector_t are for on-disk 512-byte unit.
>> In this point of view, they have the same character.
>
> One is a count of the number of blocks used by a file, and exists only
> in order to help filesystems cache this value. The other is a handle to
> a block. How is that the same?

Both i_blocks and sector_t handle on-disk blocks in a certain unit.

But their relation seems to be similar to the relation between off_t
and size_t of POSIX. So adding blkcnt_t makes sense too. I'll think
of this point further.

By the way, I think there is a kind of confusions on sector_t. Its
unit is 512 bytes on block/ and driver/ tree, but it is filesystem
block on fs/ tree. Does anyone think the latter should be fsblock_t
or something?

>> - If we created the type blkcnt_t newly, the patch would have to
>> touch a lot of files as follows, like sector_t does.
>> block/Kconfig, asm-i386/types.h, asm-x86_64/types.h,
>> asm-ppc/types.h, asm-s390/types.h, asm-sh/types.h,
>> asm-h8300/types.h, asm-mips/types.h
>> It will be simple if we use sector_t for i_blocks.
>
> That is not a particularly good reason.
>
>> Also, I cannot imagine the situation that > 2TB files are used over
>> network with CONFIG_LBD disabled kernel. Is there such a thing
>> realistically?

CONFIG_LBD is enabled not only on HPC but on all major distribution for
server and desktop. The only exception is embedded system, isn't it?

> Apart from this and the kstat wart, there is no reason to set CONFIG_LBD
> for a networked filesystem. Why would you want to buy a > 2TB local disk
> on an HPC cluster node if you already have a server?
>
> I suppose we can make NFS use a private field instead, and just set
> i_blocks to 0, but that's unnecessarily wasteful too.

I think your suggestion is effective only for the problem of stat64,
but other problems would be left on code which handles inode.i_blocks.
For example, ioctl with FIOQSIZE command returns the size of file's
data calculated from inode.i_blocks. If inode.i_blocks is 0, ioctl
with FIOQSIZE always returns 0 even if the file data exists.

-- Takashi Sato