Hi,
I suspect this being a FAQ already, but I could not find[0] it:
does ext4 really use 5% of available space for internal housekeeping?
After formatting and mounting a ~917GB partition I see:
# df -h /mnt/bench/
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 917G 200M 871G 1% /mnt/bench
The "200 MB used" would seem more sensible, but the difference between
"available" and "size" is really 46 GB. How comes?
Thanks,
Christian.
[0] http://ext4.wiki.kernel.org/index.php/Frequently_Asked_Questions
PS, a few more details how the filesystem got created and mounted:
# /opt/e2fsprogs/sbin/mkfs.ext4 -V
mke2fs 1.41.0 (10-Jul-2008)
Using EXT2FS Library version 1.41.0
# /opt/e2fsprogs/sbin/mkfs.ext4 /dev/sdb1
[...]
# /opt/e2fsprogs/sbin/tune2fs -E test_fs /dev/sdb1
tune2fs 1.41.0 (10-Jul-2008)
Setting test filesystem flag
# /opt/e2fsprogs/sbin/blkid /dev/sdb1
/dev/sdb1: UUID="cb2892a3-e42f-47c1-930b-6adab0cf023f" TYPE="ext4dev"
# /opt/e2fsprogs/sbin/tune2fs -l /dev/sdb1 | egrep 'features|flags'
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash test_filesystem
# mount -t ext4dev /dev/sdb1 /mnt/bench
# dmesg | tail
[35021.126539] kjournald2 starting. Commit interval 5 seconds
[35021.126963] EXT4 FS on sdb1, internal journal
[35021.126963] EXT4-fs: mounted filesystem with ordered data mode.
[35021.126963] EXT4-fs: file extents enabled
[35021.307115] EXT4-fs: mballoc enabled
# grep sdb1 /proc/mounts
/dev/sdb1 /mnt/bench ext4dev rw,barrier=1,noextents,nomballoc,data=ordered 0 0
--
BOFH excuse #398:
Data for intranet got routed through the extranet and landed on the internet.
Christian Kujau wrote:
> Hi,
>
> I suspect this being a FAQ already, but I could not find[0] it:
> does ext4 really use 5% of available space for internal housekeeping?
> After formatting and mounting a ~917GB partition I see:
>
> # df -h /mnt/bench/
> Filesystem Size Used Avail Use% Mounted on
> /dev/sdb1 917G 200M 871G 1% /mnt/bench
>
> The "200 MB used" would seem more sensible, but the difference between
> "available" and "size" is really 46 GB. How comes?
>
> Thanks,
> Christian.
>
> [0] http://ext4.wiki.kernel.org/index.php/Frequently_Asked_Questions
>
> PS, a few more details how the filesystem got created and mounted:
>
> # /opt/e2fsprogs/sbin/mkfs.ext4 -V
> mke2fs 1.41.0 (10-Jul-2008)
> Using EXT2FS Library version 1.41.0
>
> # /opt/e2fsprogs/sbin/mkfs.ext4 /dev/sdb1
> [...]
Somewhere in [...] was:
XXX blocks (5.00%) reserved for the super user
this can be tuned and maybe should actually be on a sliding scale for
larger filesystems.
-Eric
On Mon, Jul 14, 2008 at 01:04:20AM +0200, Christian Kujau wrote:
> Hi,
>
> I suspect this being a FAQ already, but I could not find[0] it:
> does ext4 really use 5% of available space for internal housekeeping?
> After formatting and mounting a ~917GB partition I see:
>
5% of your space is being reserved for root. You can disable this with
the "-m" argument to mkfs.
regards, Kyle
On Mon, July 14, 2008 01:45, Kyle McMartin wrote:
> 5% of your space is being reserved for root. You can disable this with
> the "-m" argument to mkfs.
Ah, the reserve for root, of course. It did cross my mind that this was
where the 5% came from and if I had read the mkfs printout more carefully
I'd have seen it (thanks, Eric!). I shall use -m to specify a different
value then.
With filesystems getting bigger and bigger, values like "5% of the
available diskspace" are actually becoming more and more visible. Although
they shouldn't, as diskspace gets cheaper and cheaper :-)
Thanks!
Christian.
--
BOFH excuse #442:
Trojan horse ran out of hay
On Mon, Jul 14, 2008 at 11:49:42AM +0200, Christian Kujau wrote:
> On Mon, July 14, 2008 01:45, Kyle McMartin wrote:
> > 5% of your space is being reserved for root. You can disable this with
> > the "-m" argument to mkfs.
>
> Ah, the reserve for root, of course. It did cross my mind that this was
> where the 5% came from and if I had read the mkfs printout more carefully
> I'd have seen it (thanks, Eric!). I shall use -m to specify a different
> value then.
>
> With filesystems getting bigger and bigger, values like "5% of the
> available diskspace" are actually becoming more and more visible. Although
> they shouldn't, as diskspace gets cheaper and cheaper :-)
There are two reasons for the %5 of available diskspace. One is
emergency reserve for root, for silly things like /var/log, etc. ---
that's something which a good argument can be made that it should be
adjusted down as disks get bigger. The other is that as the disk gets
bigger, the fragmentation resistance of a traditional BSD FFS-style
cylinder groups algorithm breaks down, and so the block layout and
resulting performance of the filesystem gets quite bad.
Both of these problems were cited in the "Design and Implementation of
the BSD Fast Filesystm" paper, if I recall correctly, and the BSD FFS
actually reserves 10%; we reduced it down to 5% for ext2.
Can we safely reduce it further for ext4? With delayed allocation,
ext4 does have much better fragmentation resistance for allocating
blocks for a specific file. However placement of files in the same
directory will break down as the disk fills up. That tends to happen
anyway, so it's probably not a big deal.
For people who care much more about available space than the rsulting
performance if the filesystem gets to 100% full, changing the tuning
parameter is definitely a good thing. In terms of what the default
should be, there probably is a potential OLS/Linux Kongress/LCA paper
here for someone who wants to study the fragmentation resistance of
various filesystms at different levels of filesystem utilization.
(I.e., age a filesystem with a system trace which keeps the filesystem
utilization at a certain average utilization level; repeat at
different levels of utilization, and for different filesystem types,
and graph the "layout score" as defined by [1] for different
filesystem types and different utilization levels). Would be a pretty
simple undergraduate thesis or graduate student paper....
The other observation I would make is that I should add a tuning knob
to /etc/mke2fs.conf so system administrators can adjust the default to
their liking much more easily. That should be pretty straightforward
to add.
- Ted
[1] K. Smith and M. Seltzer, "A Comparison of FFS Disk Allocation
Alogithms", Proceedings of the USENIX 1996 Annual Technical Conference
(Junary 1996), pp. 15-25