2005-03-21 23:02:31

by jniehof

[permalink] [raw]
Subject: LBD/filesystems over 2TB: is it safe?

Someone posted to the LBD list last December regarding some supposedly
horrible bugs in large filesystems:
https://www.gelato.unsw.edu.au/archives/lbd/2004-December/000075.html
https://www.gelato.unsw.edu.au/archives/lbd/2004-December/000074.html

(I will admit that these were brought to my attention by that paragon of
fact-checking, a slashdot comment...)

I haven't found anything else online regarding these issues. Our initial
tests seem to indicate that it's possible to fill a 2.5TB ext3 filesystem
without corrupting the data or metadata, and as far as I and my colleagues
can understand the code it doesn't look too bad. But, before I start
loading all our production data up, I'd like to feel confident. Does this
poster know what he's talking about? Is there, or was there, any real
issue?

Running x86-32 using kernel 2.6.8 (from Debian sarge), although can always
roll my own if necessary. Preferred filesystem would be ext3, and I
anticipate no need to grow beyond the initial 2.5TB.


2005-03-22 00:31:22

by Peter Chubb

[permalink] [raw]
Subject: Re: LBD/filesystems over 2TB: is it safe?

>>>>> "jniehof" == jniehof <[email protected]> writes:

jniehof> Someone posted to the LBD list last December regarding some
jniehof> supposedly horrible bugs in large filesystems:
jniehof> https://www.gelato.unsw.edu.au/archives/lbd/2004-December/000075.html
jniehof> https://www.gelato.unsw.edu.au/archives/lbd/2004-December/000074.html

The changes in those emails are irrelevant --- they fail to take into
account the properties of the filesystems that they modify, that mean
that the 32-bit quantities being shifted will not overflow.

They're typically of the form:
- iblock = index << (PAGE_CACHE_SHIFT - inode->i_blkbits);
+ iblock = (sector_t) index << (PAGE_CACHE_SHIFT - inode->i_blkbits);

Now, on a 32-bit processor with 4k pages, PAGE_CACHE_SHIFT is 12, and
i_blkbits is also 12 if you're using 4k blocks (which you have to to
get a large filesystem). So this does nothing and is safe. The
on-disk format for ext[23] uses 32-bit block numbers, so your maximum
filesystem size is 16TB, and your maximum value of iblock is 2^32-1.

Please do benchmark XFS and ext3 on your system before choosing. Our
tests (to be published in Linux.Conf.Au next month) show that XFS is
significantly faster for some workloads.
Also its scalability to very large filesystems is much more mature than ext3.

--
Dr Peter Chubb http://www.gelato.unsw.edu.au peterc AT gelato.unsw.edu.au
The technical we do immediately, the political takes *forever*

2005-03-22 08:17:23

by Brad Campbell

[permalink] [raw]
Subject: Re: LBD/filesystems over 2TB: is it safe?

[email protected] wrote:

> Running x86-32 using kernel 2.6.8 (from Debian sarge), although can always
> roll my own if necessary. Preferred filesystem would be ext3, and I
> anticipate no need to grow beyond the initial 2.5TB.

I'm running 2.1TB and 3TB filesystems on ext3 here. It's probably not fast or optimal, however it's
been solidly reliable. The 2.1 has been running since May 2004 with a reasonable workload. The 3TB
is only 4 weeks old, but has been beaten pretty hard during burn-in testing.

x86-32 with 2.6.[5 6 9 10] on the 2.1 and 2.6.11-bk's on the 3.

Both filesystems have been filled to capacity during testing and real use. I unmount them and e2fsck
-f them weekly just for a laugh also. Never a hitch.

Regards,
Brad
--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams