Hello,
I was recently testing a bit with creating very large files on ext2/ext3
(just to see if limits were what they should be). Now, I know that ext2/3
allows files just shy of 2TB right now, because of an issue with i_blocks
being in units of 512-byte sectors, instead of fs blocks.
I tried to create a (sparse!) file of 2TB size with:
dd if=/dev/zero of=/tmp/tt bs=1k count=1 seek=2047M
and it worked fine (finished immediately, don't try this with reiserfs...).
When I tried to make it just a bit bigger, with:
dd if=/dev/zero of=/tmp/tt bs=1k count=1 seek=2048M
dd fails the "llseek(fd, 2T, SEEK_SET)" with -EINVAL, and then proceeds
to loop "infinitely" reading from the file to try and manually advance
the file descriptor offset to the desired offset. That is bad.
I _think_ there is a bug in generic_file_llseek(), with it returning -EINVAL
instead of -EFBIG in the case where the offset is larger than the s_maxbytes.
AFAICS, the return -EINVAL is for the case where "whence" is invalid, not the
case where "offset" is too large for the underlying filesystem (I can see
-EINVAL for seeking to a negative position).
If I use:
dd if=/dev/zero of=/tmp/tt bs=1k count=1025 seek=2097151k
I correctly get "EFBIG (file too large)" and "SIGXFSZ" from write(2).
Does anyone know the correct LFS interpretation on this? From what I can
see (I have not read the whole thing) lseek() should return EOVERFLOW if
the resulting offset is too large to fit in the passed type. It doesn't
really say what should happen in this particular case - can someone try
on a non-Linux system and see what the result is?
Either way, I think the kernel is broken in this regard.
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
> I was recently testing a bit with creating very large files on ext2/ext3
> (just to see if limits were what they should be). Now, I know that ext2/3
> allows files just shy of 2TB right now, because of an issue with i_blocks
> being in units of 512-byte sectors, instead of fs blocks.
Does 2.4.13-ac7 show the same. There were some off by one fixes and its
possible I managed to forget to feed Linus one
On Nov 15, 2001 00:08 +0000, Alan Cox wrote:
> > I was recently testing a bit with creating very large files on ext2/ext3
> > (just to see if limits were what they should be). Now, I know that ext2/3
> > allows files just shy of 2TB right now, because of an issue with i_blocks
> > being in units of 512-byte sectors, instead of fs blocks.
>
> Does 2.4.13-ac7 show the same. There were some off by one fixes and its
> possible I managed to forget to feed Linus one
The test was done on 2.4.13, but I looked through 2.4.14, 2.4.15-pre4, and
2.4.13-ac8, and the code in question was not touched. It is definitely not
just an off-by-one error, since if I try to create a 40TB file it is the same
problem (i.e. llseek returns -EINVAL, dd tries to read 40TB of file to try
and make the file offset correct). When it would eventually get to the
correct offset (I can't see anything on the read path checking s_maxbytes,
and don't know what LFS says on this) it would just get EFBIG as soon as
it tries to write anything.
An example of what I'm thinking should be changed is (not tested, compiled,
anything yet; EFBIG might be EOVERFLOW, I don't know) just for illustration:
--- linux.orig/fs/read_write.c Tue Aug 14 12:09:09 2001
+++ linux/fs/read_write.c Wed Nov 14 16:11:23 2001
@@ -36,15 +36,24 @@
case 1:
offset += file->f_pos;
}
- retval = -EINVAL;
- if (offset>=0 && offset<=file->f_dentry->d_inode->i_sb->s_maxbytes) {
- if (offset != file->f_pos) {
- file->f_pos = offset;
- file->f_reada = 0;
- file->f_version = ++event;
- }
- retval = offset;
+
+ if (offset < 0) {
+ retval = -EINVAL;
+ goto out;
}
+
+ if (offset > file->f_dentry->d_inode->i_sb->s_maxbytes) {
+ retval = -EFBIG;
+ goto out;
+ }
+
+ if (offset != file->f_pos) {
+ file->f_pos = offset;
+ file->f_reada = 0;
+ file->f_version = ++event;
+ }
+ retval = offset;
+out:
return retval;
}
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
> I _think_ there is a bug in generic_file_llseek(), with it returning -EINVAL
> instead of -EFBIG in the case where the offset is larger than the s_maxbytes.
> AFAICS, the return -EINVAL is for the case where "whence" is invalid, not the
> case where "offset" is too large for the underlying filesystem
My reading of the standards is:
[EFBIG] - File too large. You get this when trying to write. Not for a seek.
The size of a file would exceed the maximum file size
of an implementation or offset maximum established
in the corresponding file description.
[EOVERFLOW] - Value too large for data type. This can happen for a seek.
The resulting file offset would be a value which cannot be represented
correctly in an object of the given type.
[EINVAL] - Bad value for other reasons. This will happen for a seek to a
negative offset.
Andries
After your message i tried to play a bit with dd. Bad idea.
I did 'dd if=/dev/zero of=test bs=1024k seek=2G' in a 10Gb ide disk, and
guess what ?
$ ls -l test
-rw-r--r-- 1 huma huma 2251799813685248 Nov 15 02:39 test
$ ls -lh test
-rw-r--r-- 1 huma huma 2.0P Nov 15 02:39 test
Yep, it says i have a 2 Petabyte file in a 10gb drive. Something is
_really_ broken here.
Deleting this file gave me some errors like this:
Nov 15 01:50:07 fargo kernel: EXT2-fs error (device ide3(34,1)):
ext2_free_blocks: Freeing blocks not in datazone - block = 161087505,
count = 1
Nov 15 01:50:07 fargo kernel: EXT2-fs error (device ide3(34,1)):
ext2_free_blocks: Freeing blocks not in datazone - block = 161153041,
count = 1
After that, i unmounted the partition and did an fsck, lots of errors and
several files corrupted that fsck ask me to delete because some inodes had
illegal blocks.
By the way, is a ext2 partition. Versions are: kernel 2.4.14, fileutils
4.1 and glibc 2.2.3.
> Hello,
> I was recently testing a bit with creating very large files on ext2/ext3
> (just to see if limits were what they should be). Now, I know that ext2/3
> allows files just shy of 2TB right now, because of an issue with i_blocks
> being in units of 512-byte sectors, instead of fs blocks.
>
> I tried to create a (sparse!) file of 2TB size with:
>
> dd if=/dev/zero of=/tmp/tt bs=1k count=1 seek=2047M
>
> and it worked fine (finished immediately, don't try this with reiserfs...).
>
> When I tried to make it just a bit bigger, with:
>
> dd if=/dev/zero of=/tmp/tt bs=1k count=1 seek=2048M
>
> dd fails the "llseek(fd, 2T, SEEK_SET)" with -EINVAL, and then proceeds
> to loop "infinitely" reading from the file to try and manually advance
> the file descriptor offset to the desired offset. That is bad.
>
> I _think_ there is a bug in generic_file_llseek(), with it returning -EINVAL
> instead of -EFBIG in the case where the offset is larger than the s_maxbytes.
> AFAICS, the return -EINVAL is for the case where "whence" is invalid, not the
> case where "offset" is too large for the underlying filesystem (I can see
> -EINVAL for seeking to a negative position).
>
> If I use:
>
> dd if=/dev/zero of=/tmp/tt bs=1k count=1025 seek=2097151k
>
> I correctly get "EFBIG (file too large)" and "SIGXFSZ" from write(2).
>
> Does anyone know the correct LFS interpretation on this? From what I can
> see (I have not read the whole thing) lseek() should return EOVERFLOW if
> the resulting offset is too large to fit in the passed type. It doesn't
> really say what should happen in this particular case - can someone try
> on a non-Linux system and see what the result is?
>
> Either way, I think the kernel is broken in this regard.
>
> Cheers, Andreas
> --
> Andreas Dilger
> http://sourceforge.net/projects/ext2resize/
> http://www-mddsp.enel.ucalgary.ca/People/adilger/
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
David G?mez
"The question of whether computers can think is just like the question of
whether submarines can swim." -- Edsger W. Dijkstra
On Nov 15, 2001 02:47 +0100, David Gomez wrote:
> I did 'dd if=/dev/zero of=test bs=1024k seek=2G' in a 10Gb ide disk, and
> guess what ?
>
> $ ls -l test
> -rw-r--r-- 1 huma huma 2251799813685248 Nov 15 02:39 test
> $ ls -lh test
> -rw-r--r-- 1 huma huma 2.0P Nov 15 02:39 test
>
> Yep, it says i have a 2 Petabyte file in a 10gb drive. Something is
> _really_ broken here.
No, that in itself is fine - it is a sparse file, with a single 1MB block
at 2PB offset. If you were to "du" this file, it would say 1MB of allocated
space. The problem is that this _should_ be impossible to create on ext2,
because the write would be way past the allowed file size limit.
> Deleting this file gave me some errors like this:
>
> Nov 15 01:50:07 fargo kernel: EXT2-fs error (device ide3(34,1)):
> ext2_free_blocks: Freeing blocks not in datazone - block = 161087505,
> count = 1
> Nov 15 01:50:07 fargo kernel: EXT2-fs error (device ide3(34,1)):
> ext2_free_blocks: Freeing blocks not in datazone - block = 161153041,
> count = 1
>
> After that, i unmounted the partition and did an fsck, lots of errors and
> several files corrupted that fsck ask me to delete because some inodes had
> illegal blocks.
That is really bad, I don't know how it would happen. Maybe there is
overflow internal to ext2, which causes it to write elsewhere in the fs?
When was the last time (previous to this problem) you fsck'd this fs?
Any chance you had problems before trying this? It's unlikely, but possible.
Note that I just tried this on a test fs, and got an interesting result:
#strace dd if=/dev/zero of=/mnt/tmp/tt bs=1024k count=1 seek=4G
execve("/bin/dd", ["dd", "if=/dev/zero", "of=/mnt/tmp/tt", "bs=1024k",
"count=1", "seek=4G"], [/* 37 vars */]) = 0
brk(0) = 0x804e258
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x40013000
open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=23547, ...}) = 0
mmap(NULL, 23547, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40014000
close(3) = 0
open("/lib/libc.so.6", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0755, st_size=5276082, ...}) = 0
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\200\210"..., 4096) =
4096
mmap(NULL, 946780, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x4001a000
mprotect(0x400fa000, 29276, PROT_NONE) = 0
mmap(0x400fa000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3,
0xdf000) = 0x400fa000
mmap(0x400fe000, 12892, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x400fe000
close(3) = 0
munmap(0x40014000, 23547) = 0
personality(PER_LINUX) = 0
getpid() = 14370
brk(0) = 0x804e258
brk(0x804e290) = 0x804e290
brk(0x804f000) = 0x804f000
open("/dev/zero", O_RDONLY|O_LARGEFILE) = 3
open("/mnt/tmp/tt", O_RDWR|O_CREAT|O_LARGEFILE, 0666) = 4
SYS_194(0x4, 0, 0x100000, 0, 0x4) = 0
rt_sigaction(SIGINT, NULL, {SIG_DFL}, 8) = 0
rt_sigaction(SIGINT, {0x80491dc, [], 0x4000000}, NULL, 8) = 0
rt_sigaction(SIGQUIT, NULL, {SIG_DFL}, 8) = 0
rt_sigaction(SIGQUIT, {0x80491dc, [], 0x4000000}, NULL, 8) = 0
rt_sigaction(SIGPIPE, NULL, {SIG_DFL}, 8) = 0
rt_sigaction(SIGPIPE, {0x80491dc, [], 0x4000000}, NULL, 8) = 0
rt_sigaction(SIGUSR1, NULL, {SIG_DFL}, 8) = 0
rt_sigaction(SIGUSR1, {0x8049240, [], 0x4000000}, NULL, 8) = 0
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x40102000
SYS_197(0x4, 0xbffff8cc, 0x400fd6e0, 0x40100eac, 0x4) = 0
_llseek(4, 4503599627370496, 0xbffff884, SEEK_SET) = -1 EINVAL (Invalid
argument)
read(4, "", 1048576) = 0
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576)
= 1048576
write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576)
= 1048576
munmap(0x40102000, 1052672) = 0
write(2, "1+0 records in\n", 151+0 records in
) = 15
write(2, "1+0 records out\n", 161+0 records out
) = 16
close(3) = 0
close(4) = 0
_exit(0) = ?
So, it tries to seek to 1024kB * 4GB = 2^20 * 2^32 = 2^52 = 4503599627370496
bytes. This fails with EINVAL, so instead dd tries to seek to an offset of
2^20 bytes via read (2^20 = 2^52 % 2^32), and writes a 1MB chunk there.
Must be a bug in dd where it is handling the size with a 32-bit value
somewhere. Strangely,
# ls -l /mnt/tmp/tt
-rw-r--r-- 1 root root 4503599627370496 Nov 14 22:07 /mnt/tmp/tt
# du -sk /mnt/tmp/tt
1024 /mnt/tmp/tt
which is what you would expect for the above operations.
While the outcome of the operations isn't what was originally intended, why
this would corrupt your filesystem, I don't know. Hmm, I now see in my
syslog (from testing earlier this afternoon):
kernel: EXT3-fs warning (device lvm(58,1)): ext3_block_to_path: block < 0
> By the way, is a ext2 partition. Versions are: kernel 2.4.14, fileutils
> 4.1 and glibc 2.2.3.
I have fileutils 4.0, glibc 2.1.3, kernel 2.4.13. Looks like I'll have to
see if the latest fileutils handles this better. It still doesn't resolve
the kernel issue about seeking to an "unsupported" address.
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
On Nov 15, 2001 00:55 +0000, [email protected] wrote:
> My reading of the standards is:
>
> [EFBIG] - File too large. You get this when trying to write. Not for a seek.
> The size of a file would exceed the maximum file size
> of an implementation or offset maximum established
> in the corresponding file description.
OK. So, for write only.
> [EOVERFLOW] - Value too large for data type. This can happen for a seek.
> The resulting file offset would be a value which cannot be represented
> correctly in an object of the given type.
Technically, the offset isn't too large for a 64-bit datatype, but it is
too large for an underlying filesystem datatype, so I'm OK with this. I
will change my patch and test it a bit to make sure it actually works.
> [EINVAL] - Bad value for other reasons. This will happen for a seek to a
> negative offset.
Agreed.
Now, there still appear to be a couple of issues related to LFS in the
kernel. One is that there doesn't appear to be any checking of the
file offset on a read call. Secondly, there appear to still be some
overflow or unchecked conditions on a write, namely doing this:
dd if=/dev/zero of=/mnt/tmp/tt bs=1k count=1 seek=81920M
causes the following syslog message:
EXT3-fs warning (device lvm(58,1)): ext3_block_to_path: block < 0
and dd if=/dev/zero of=/mnt/tmp/tt bs=1k count=1 seek=4096M
EXT3-fs warning (device lvm(58,1)): ext3_block_to_path: block > big
Yet, running with a smaller seek just returns -EINVAL (presumably from
generic_file_llseek(), but now I'm not so sure) and breaks dd like
it always has. We are seeking to 2^10 * 2^32 = 2^42 bytes (4TB),
on a 4kB = 2^12 byte ext3 filesystem, so 2^30 filesystem blocks.
We should never be getting as far as the fs.
In sys_lseek() we only set EOVERFLOW _after_ doing the llseek. But
according to LFS 2.1.1.6 this is explicitly what should not happen.
I can see the argument for doing this (to avoid failing the seek,
not checking the return, and overwriting some other data) but what
can happen instead is the creation of a large file by a non-large-file
aware application, which will subsequently not be able to open/read it.
It also says that this violates POSIX (changing position and returning
an error also).
Also, in all cases, regardless of what works and what doesn't, the file
size is set to the "new" size by ftruncate64, which succeeds:
ftruncate64(0x1, 0, 0x4fa, 0, 0x1) = 0
I don't know what SUS/POSIX/etc and LFS say about this. Is it reasonable
to have a file size which is larger than can be read/written/stored on
disk?
It also appears that if a filesystem sets a very large s_maxbytes (i.e.
larger than MAX_ULONG << PAGE_CACHE_SHIFT) the page index will overflow
in several places in generic_file_read/write. It appears that UDF and
NTFS set such an s_maxbytes (e.g. ~0ULL and ~0ULL >> 1, respectively).
It _looks_ like NTFS handles this correctly internally (it does not use
generic_file_{read,write}), but UDF does not.
Cheers, Andreas
=============== untested generic_file_{llseek,read} patch ================
--- linux.orig/mm/filemap.c Thu Oct 25 03:05:19 2001
+++ linux/mm/filemap.c Thu Nov 15 01:55:49 2001
@@ -1384,10 +1392,34 @@
*/
ssize_t generic_file_read(struct file * filp, char * buf, size_t count, loff_t *ppos)
{
- ssize_t retval;
+ ssize_t retval = -EINVAL;
if ((ssize_t) count < 0)
- return -EINVAL;
+ goto out;
+
+ retval = -EOVERFLOW;
+ /*
+ * LFS rule 2.2.1.25: can't read past MAX_NON_LFS for non-LFS file
+ * as that would put the file position past 2^31-1
+ */
+ if (!(filp->f_flags & O_LARGEFILE)) {
+ if (*ppos > MAX_NON_LFS)
+ goto out;
+
+ if (*ppos + count > MAX_NON_LFS)
+ count = MAX_NON_LFS - *ppos;
+ }
+
+ /*
+ * Are we about to exceed the fs block limit ?
+ *
+ * If are reading data it becomes a short read
+ */
+ if (*ppos > filp->f_dentry->d_inode->i_sb->s_maxbytes)
+ goto out;
+
+ if (*ppos + count > filp->f_dentry->d_inode->i_sb->s_maxbytes)
+ count = filp->f_dentry->d_inode->i_sb->s_maxbytes - *ppos;
retval = -EFAULT;
if (access_ok(VERIFY_WRITE, buf, count)) {
@@ -1407,6 +1439,7 @@
retval = desc.error;
}
}
+out:
return retval;
}
@@ -2680,7 +2713,7 @@
}
/*
- * LFS rule
+ * LFS rule 2.2.1.27
*/
if ( pos + count > MAX_NON_LFS && !(file->f_flags&O_LARGEFILE)) {
if (pos >= MAX_NON_LFS) {
@@ -2816,7 +2849,6 @@
err = written ? written : status;
out:
-
up(&inode->i_sem);
return err;
fail_write:
--- linux.orig/fs/read_write.c Tue Aug 14 12:09:09 2001
+++ linux/fs/read_write.c Thu Nov 15 01:54:35 2001
@@ -36,15 +36,24 @@
case 1:
offset += file->f_pos;
}
+
+ /* LFS 2.1.1.6: can't seek to a position that doesn't fit in off_t */
+ retval = -EOVERFLOW;
+ if ((!(file->f_flags & O_LARGEFILE) && offset > MAX_NON_LFS) ||
+ offset > file->f_dentry->d_inode->i_sb->s_maxbytes)
+ goto out;
+
retval = -EINVAL;
- if (offset>=0 && offset<=file->f_dentry->d_inode->i_sb->s_maxbytes) {
- if (offset != file->f_pos) {
- file->f_pos = offset;
- file->f_reada = 0;
- file->f_version = ++event;
- }
- retval = offset;
+ if (offset < 0)
+ goto out;
+
+ if (offset != file->f_pos) {
+ file->f_pos = offset;
+ file->f_reada = 0;
+ file->f_version = ++event;
}
+ retval = offset;
+out:
return retval;
}
@@ -64,6 +73,12 @@
case 1:
offset += file->f_pos;
}
+
+ /* LFS 2.1.1.6: can't seek to a position that doesn't fit in off_t */
+ retval = -EOVERFLOW;
+ if (!(file->f_flags & O_LARGEFILE) && offset > MAX_NON_LFS)
+ goto out;
+
retval = -EINVAL;
if (offset >= 0) {
if (offset != file->f_pos) {
@@ -73,6 +88,7 @@
}
retval = offset;
}
+out:
return retval;
}
@@ -103,8 +119,6 @@
if (origin <= 2) {
loff_t res = llseek(file, offset, origin);
retval = res;
- if (res != (loff_t)retval)
- retval = -EOVERFLOW; /* LFS: should only happen on 32 bit platforms */
}
fput(file);
bad:
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
>>>>> " " == Andreas Dilger <[email protected]> writes:
> --- linux.orig/fs/read_write.c Tue Aug 14 12:09:09 2001
> +++ linux/fs/read_write.c Thu Nov 15 01:54:35 2001
> @@ -36,15 +36,24 @@
> case 1:
> offset += file->f_pos;
> }
> +
> + /* LFS 2.1.1.6: can't seek to a position that doesn't fit in
> off_t */
> + retval = -EOVERFLOW;
> + if ((!(file->f_flags & O_LARGEFILE) && offset > MAX_NON_LFS)
> ||
> + offset > file->f_dentry->d_inode->i_sb->s_maxbytes)
> + goto out;
> +
> retval = -EINVAL;
> - if (offset>=0 &&
> offset<=file->f_dentry->d_inode->i_sb->s_maxbytes) {
> - if (offset != file->f_pos) {
> - file->f_pos = offset;
> - file->f_reada = 0;
> - file->f_version = ++event;
> - }
> - retval = offset;
> + if (offset < 0)
> + goto out;
> +
> + if (offset != file->f_pos) {
> + file->f_pos = offset;
> + file->f_reada = 0;
> + file->f_version = ++event;
> }
> + retval = offset;
> +out:
> return retval;
> }
> @@ -64,6 +73,12 @@
> case 1:
> offset += file->f_pos;
> }
> +
> + /* LFS 2.1.1.6: can't seek to a position that doesn't fit in
> off_t */
> + retval = -EOVERFLOW;
> + if (!(file->f_flags & O_LARGEFILE) && offset > MAX_NON_LFS)
> + goto out;
> +
> retval = -EINVAL; if (offset >= 0) {
> if (offset != file->f_pos) {
> @@ -73,6 +88,7 @@
> } retval = offset;
> }
> +out:
> return retval;
> }
> @@ -103,8 +119,6 @@
> if (origin <= 2) {
> loff_t res = llseek(file, offset, origin);
> retval = res;
> - if (res != (loff_t)retval)
> - retval = -EOVERFLOW; /* LFS: should only happen on 32 bit
> platforms */
> } fput(file);
> bad:
This breaks NFS badly for which a directory seek position is *not* a
file offset, but is an unsigned cookie. Please ensure that the above
checks are only made on regular files.
Cheers,
Trond
On Nov 15, 2001 10:38 +0100, Helge Hafting wrote:
> > On Nov 15, 2001 02:47 +0100, David Gomez wrote:
> > > I did 'dd if=/dev/zero of=test bs=1024k seek=2G' in a 10Gb ide disk, and
> > > guess what ?
> > >
> > > $ ls -l test
> > > -rw-r--r-- 1 huma huma 2251799813685248 Nov 15 02:39 test
> > > $ ls -lh test
> > > -rw-r--r-- 1 huma huma 2.0P Nov 15 02:39 test
> >
> > No, that in itself is fine - it is a sparse file, with a single 1MB block
> > at 2PB offset. If you were to "du" this file, it would say 1MB of allocated
> > space. The problem is that this _should_ be impossible to create on ext2,
> > because the write would be way past the allowed file size limit.
> >
> > > After that, i unmounted the partition and did an fsck, lots of errors and
> > > several files corrupted that fsck ask me to delete because some inodes had
> > > illegal blocks.
> >
> > That is really bad, I don't know how it would happen. Maybe there is
> > overflow internal to ext2, which causes it to write elsewhere in the fs?
> > When was the last time (previous to this problem) you fsck'd this fs?
>
> If he's _allowed_ to create a sparse file with impossible offset - what
> happens to the file's index blocks? I guess that's where something
> overflowed.
I think the problem is not coming from the llseek+write, but maybe from
ftruncate? Strace doesn't show any writes for me (only failed llseek +
lots of reads), yet when trying to create files > 4TB I get "block > big"
and > 8TB I get "block < 0" messages, which come from ext2_block_to_path().
In a couple of places (iblock, offsets) we are using an int/long to
store the block counts, don't know why we want to use a signed value
here instead of an unsigned (long). Looks like changing block numbers
to be unsigned longs goes into the guts of getblk and such. Ugh.
Maybe also sys_truncate should disallow truncating to a size larger
than s_maxbytes. Al? For now, returning EOVERFLOW from do_truncate()
when (length > inode->i_sb->s_maxbytes) should be OK.
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
> Maybe also sys_truncate should disallow truncating to a size larger
> than s_maxbytes. Al? For now, returning EOVERFLOW from do_truncate()
> when (length > inode->i_sb->s_maxbytes) should be OK.
It should do in the last patches I sent Linus.