2007-06-05 12:56:26

by Mark Knibbs

[permalink] [raw]
Subject: Possible ext2 bug with large sparse files?

Hi,

I have encountered a problem/issue relating to using ext2 filesystems with
(very) large sparse files, where the file length is much larger than the
filesystem size.

I'm using an x86 PC, the same issue appears in kernel 2.6.14 & 2.6.20. Use
dd 5.97 to replicate the problem.

On a ~500MB ext2 volume with about 60MB free (a 512MB flash card in a USB
reader), testing using dd like this:
dd if=/dev/zero of=test.bin bs=1 count=1 seek=17247252479
works fine, but
dd if=/dev/zero of=test.bin bs=1 count=1 seek=17247252480
gives a "File size limit exceeded" message and this warning appears in dmesg
output:
"EXT2-fs warning (device sdd): ext2_block_to_path: block > big"

[For whatever reason, it seems the maximum file size for that
partition is 17,247,252,480 bytes. In itself that behaviour isn't
necessarily a bug; but if you have any clue how the ext2 maximum file size
is related (or not) to the amount of free space or the volume size, please
let me know!]

After doing
dd if=/dev/zero of=test.bin bs=1 count=1 seek=17247252480
ls -l shows test.bin to be 17247252480 bytes long.

I unmounted the partition and used e2fsck -f to check it:
# e2fsck -f /dev/sdd
e2fsck 1.38 (30-Jun-2005)
Pass 1: Checking inodes, blocks, and sizes
Inode 123466, i_size is 17247252480, should be 0. Fix<y>? yes

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

VolName: ***** FILE SYSTEM WAS MODIFIED *****
VolName: 107/125488 files (2.8% non-contiguous), 428147/501760 blocks


Regards,
-- Mark


2007-06-05 13:56:15

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Possible ext2 bug with large sparse files?

On Tue, Jun 05, 2007 at 01:25:29PM +0100, Mark Knibbs wrote:
> For whatever reason, it seems the maximum file size for that
> partition is 17,247,252,480 bytes. In itself that behaviour isn't
> necessarily a bug; but if you have any clue how the ext2 maximum file size
> is related (or not) to the amount of free space or the volume size, please
> let me know!

You're using a filesystem with a 1k blocksize, and that's the cause of
the limit. The maximum number of 1k blocks that can be addressed
using the direct/indirect scheme is:

12 + 256 + 256*256 + 256*256*256 = 16843020 blocks

or
16843020 blocks * 1024 bytes/block = 17,247,252,480 bytes

Unless you are using a really tiny filesystem, you really don't want
to be using 1k block sizes; in addition to imposing this 16GB file
size limit, it also makes the filesystem much more inefficient for
large files.

Regards,

- Ted

2007-06-06 13:18:14

by Mark Knibbs

[permalink] [raw]
Subject: Re: Possible ext2 bug with large sparse files?

Hi,

This is a follow-up to my previous message. The bug is also present in ext3,
and applies to partitions with 2K blocks and (at least in part) to those
with 4K blocks. There is also another issue, which may well be a bug in
e2fsck.

For partitions with 2K blocks the maximum file size is 275,415,851,008
bytes; changing the seek= argument in the dd examples to 275415851007 &
275415851008 gives similar results.

The maximum file size on partitions with 4K blocks is 2,196,875,759,616
bytes, so I tested using dd with seek=2196875759615 & 2196875759616. With 4K
blocks there don't seem to be any problems with warnings in dmesg output, or
fsck. The only bug (or what I think is a bug) is that
dd if=/dev/zero of=test.bin bs=1 count=1 seek=2196875759616
causes the file size to show as 2196875759616, but it shouldn't since no
actual write took place.


As before, on an ext3 partition with 1K block size, doing
dd if=/dev/zero of=test.bin bs=1 count=1 seek=17247252480
(which causes the file size to show as 17247252480, but it shouldn't since
no actual write took place) gives a "File size limit exceeded" message and
this warning in dmesg output:
"EXT3-fs warning (device sde): ext3_block_to_path: block > big"

Unlike the ext2 case however, if you run fsck.ext3 -f it doesn't ask a
question like
"i_size is 17247252480, should be 0. Fix<y>?"
but seems to silently do that; on mounting after running fsck -f, test.bin
shows as 0 bytes long.


If instead you do
dd if=/dev/zero of=test.bin bs=1 count=1 seek=17247252479
(which should work okay, since the maximum file size is 17247252480) then
fsck -f gives a strange message:
"Inode 6073, i_size is 17247252480, should be 17247252480. Fix<y>?"
Even if you say yes to "fix" it, repeatedly running fsck always asks that
question.

This happens:
# cd /mnt/sde_removable
# dd if=/dev/zero of=test.bin bs=1 count=1 seek=17247252479
1+0 records in
1+0 records out
1 byte (1 B) copied, 0.132501 seconds, 0.0 kB/s
# ls -l
total 16
drwx------ 2 root root 12288 Jun 6 13:04 lost+found/
-rw-r--r-- 1 root root 17247252480 Jun 6 13:09 test.bin
# cd ~
# umount /mnt/sde_removable
# fsck.ext3 -f /dev/sde
e2fsck 1.38 (30-Jun-2005)
Pass 1: Checking inodes, blocks, and sizes
Inode 6073, i_size is 17247252480, should be 17247252480. Fix<y>? yes

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

Ext3_test: ***** FILE SYSTEM WAS MODIFIED *****
Ext3_test: 12/125488 files (0.0% non-contiguous), 24081/501760 blocks
# mount -t ext3 -o noatime /dev/sde /mnt/sde_removable
# ls -l /mnt/sde_removable
total 16
drwx------ 2 root root 12288 Jun 6 13:04 lost+found/
-rw-r--r-- 1 root root 17247252480 Jun 6 13:09 test.bin
# umount /mnt/sde_removable
# fsck.ext3 -f /dev/sde
e2fsck 1.38 (30-Jun-2005)
Pass 1: Checking inodes, blocks, and sizes
Inode 6073, i_size is 17247252480, should be 17247252480. Fix<y>? yes

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

Ext3_test: ***** FILE SYSTEM WAS MODIFIED *****
Ext3_test: 12/125488 files (0.0% non-contiguous), 24081/501760 blocks


Regards,
-- Mark

2007-06-07 17:51:57

by Andreas Dilger

[permalink] [raw]
Subject: Re: Possible ext2 bug with large sparse files?

On Jun 06, 2007 14:18 +0100, Mark Knibbs wrote:
> This is a follow-up to my previous message. The bug is also present in
> ext3, and applies to partitions with 2K blocks and (at least in part) to
> those with 4K blocks. There is also another issue, which may well be a bug
> in e2fsck.

Could you please clarify what the particular defect is that you are looking
at? Presumably it is not just that there is an upper limit on the size of
a file?

> For partitions with 2K blocks the maximum file size is 275,415,851,008
> bytes; changing the seek= argument in the dd examples to 275415851007 &
> 275415851008 gives similar results.
>
> The maximum file size on partitions with 4K blocks is 2,196,875,759,616
> bytes, so I tested using dd with seek=2196875759615 & 2196875759616. With
> 4K blocks there don't seem to be any problems with warnings in dmesg
> output, or fsck. The only bug (or what I think is a bug) is that
> dd if=/dev/zero of=test.bin bs=1 count=1 seek=2196875759616
> causes the file size to show as 2196875759616, but it shouldn't since no
> actual write took place.

The reason for this limitation is due to indirect block limits in the
ext2/3 file layout. In ext4 it is theoretically possible to have files
up to 2^60 bytes in size, because the extent format handles 2^48-bit
block numbers, and the inode has an extra 16 bits to store the high part
of the block count.

> If instead you do
> dd if=/dev/zero of=test.bin bs=1 count=1 seek=17247252479
> (which should work okay, since the maximum file size is 17247252480) then
> fsck -f gives a strange message:
> "Inode 6073, i_size is 17247252480, should be 17247252480. Fix<y>?"
> Even if you say yes to "fix" it, repeatedly running fsck always asks that
> question.

This is definitely a bug...

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-06-07 21:40:25

by Mark Knibbs

[permalink] [raw]
Subject: Re: Possible ext2 bug with large sparse files?

Hi,

[Apologies if this reply doesn't thread correctly.]

Andreas Dilger wrote:
> Could you please clarify what the particular defect is that you are
> looking at? Presumably it is not just that there is an upper limit on the
> size of a file?

No. It seems there are some deficiencies in ext2/3's and/or e2fsck's
handling of files which are the maximum length. Hopefully what follows is a
little more concise than my previous messages.


First problem
-------------
On a partition with 1K or 2K blocks (maxfilesize depends on the block size,
either 17247252480 or 275415851008), doing this:
dd if=/dev/zero of=test.bin bs=1 count=1 seek=[maxfilesize]
gives a "File size limit exceeded" message, as it should (otherwise the
resulting file would be maxfilesize+1 bytes long). However there are two
things which shouldn't be happening:

1) "EXT2-fs warning (device sdd): ext2_block_to_path: block > big" appears
in dmesg output

2) e2fsck finds a problem/inconsistency, saying something like
"Inode 123466, i_size is [maxfilesize], should be 0. Fix<y>?"
(With ext3, e2fsck doesn't ask that but seems to silently fix it, since
test.bin shows as 0 bytes long afterwards.)

It's as if the write is not getting caught early enough in the filesystem
code and partially completes, hence the warning from ext2_block_to_path
(which probably shouldn't be called in this case) and e2fsck issue.

(I didn't mention this one in my previous message.) Related to that, doing
dd if=/dev/zero of=test.bin bs=1 count=0 seek=[maxfilesize]
appears to work correctly, as it should since the zero-byte write doesn't
cause the file to be longer than the maximum. However the warning in dmesg
output and e2fsck problem happen here as well.


Second problem
--------------
After doing the one-byte-write dd test above, test.bin shows as maxfilesize
bytes long. I don't think it should, because the write didn't happen (it
would have caused the file to be maxfilesize+1 bytes long). That also
applies with 4K block size. The sequence is: open file, seek to offset
maxfilesize, attempt a write there (which fails).

I guess opinions here might differ, but I think the resulting file size
should be 0. I say opinions might differ because using dd with a zero-length
write does "fixate" the file length, e.g.
dd if=/dev/zero of=test.bin bs=1 count=0 seek=12345678
creates a 12345678-byte long file. But at least there the notional
zero-length write completes successfully. (I'm guessing that behaviour is
filesystem-dependent; maybe other filesystems require an actual write to
"fixate" the length?)


Third problem
-------------
There's a bug when creating a file of the maximum size. This should work
fine:
dd if=/dev/zero of=test.bin bs=1 count=1 seek=[maxfilesize-1]
It executes without error, but fscking the partition shows a problem which
isn't corrected, something like:
"Inode 6073, i_size is [maxfilesize], should be [maxfilesize]. Fix<y>?"
over and over again even when you tell e2fsck to fix it.

That "e2fsck looping" thing happens after any write that would take a file
past the maximum size (i.e. it probably occurs with any maximum-length
file). In the case of a partition with 2K blocks, the maximum size is
275415851008 bytes. So for example, if you do
dd if=/dev/urandom of=test.bin bs=1000 count=1 seek=275415851
then of course the write fails (with a warning in dmesg output), but the
first 8 bytes do actually get written. (Is that a bug? Should a write be
allowed to partly happen, even when the filesystem knows it cannot be
completed?)


Regards,
-- Mark