2020-08-10 12:38:38

by Maciej Jablonski

[permalink] [raw]
Subject: Re: libext2fs: mkfs.ext3 really slow on centos 8.2

Hi,

On upgrading from centos 7.6 to centos 8.2 mkfs slowed down by orders
of magnitude.

e.g. 35GB partition from under 8s to 4m+ on the same host.

Most time is spent on writing the journal to the disk.

strace shows the following:

We have got strace which shows that each each block is zeroed with
fallocate and each
invocation of fallocate takes 10ms, this accumulates of course.

We have found that using

UNIX_IO_NOZEROOUT=1 to affect libext2fs

Brings the timings back in line down to seconds.

If this is not a known bug I can send more details,

Looks that calling fallocate for each block is very inefficient on some system.
In our case this is dellr640 (skylake) with a mechanical disk.

Kind Regards,

Maciej


On Mon, 10 Aug 2020 at 13:35, Maciej Jablonski <[email protected]> wrote:
>
> Hi,
>
> On upgrading from centos 7.6 to centos 8.2 mkfs slowed down by orders of magnitude.
>
> e.g. 35GB partition from under 8s to 4m+ on the same host.
>
> Most time is spent on writing the journal to the disk.
>
> strace shows the following:
>
> 16:19:49.827056 prctl(PR_GET_DUMPABLE) = 1 (SUID_DUMP_USER)
> 16:19:49.827112 fallocate(3, FALLOC_FL_ZERO_RANGE, 3383296, 4096) = 0
> 16:19:49.835203 pwrite64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 3362816) = 4096
> 16:19:49.835321 getuid() = 0
> 16:19:49.835403 geteuid() = 0
> 16:19:49.835463 getgid() = 0
> 16:19:49.835513 getegid() = 0
> 16:19:49.835582 prctl(PR_GET_DUMPABLE) = 1 (SUID_DUMP_USER)
> 16:19:49.835657 fallocate(3, FALLOC_FL_ZERO_RANGE, 3387392, 4096) = 0
> 16:19:49.843471 pwrite64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 3366912) = 4096
> 16:19:49.843562 getuid() = 0
> 16:19:49.843619 geteuid() = 0
> 16:19:49.843669 getgid() = 0
> 16:19:49.843715 getegid() = 0
> 16:19:49.843785 prctl(PR_GET_DUMPABLE) = 1 (SUID_DUMP_USER)
> 16:19:49.843836 fallocate(3, FALLOC_FL_ZERO_RANGE, 3391488, 4096) = 0
> 16:19:49.851885 pwrite64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 3371008) = 4096
>
>
> Each invocation of fallocate takes 10ms, this accumulates of course.
> We have found that using
>
> UNIX_IO_NOZEROOUT=1 to affect libext2fs
>
> Brings the timings back in line down to seconds.
>
> If this is not a known bug I can send more details,
>
> Looks that calling fallocate for each block is very inefficient on some system.
> In our case this is dellr640 (skylake) with a mechanical disk.
>
> Kind Regards,
>
> Maciej
>
>


2020-08-12 22:45:41

by Andreas Dilger

[permalink] [raw]
Subject: Re: libext2fs: mkfs.ext3 really slow on centos 8.2

On Aug 10, 2020, at 6:37 AM, Maciej Jablonski <[email protected]> wrote:
>
> Hi,
>
> On upgrading from centos 7.6 to centos 8.2 mkfs slowed down by orders
> of magnitude.
>
> e.g. 35GB partition from under 8s to 4m+ on the same host.
>
> Most time is spent on writing the journal to the disk.
>
> strace shows the following:
>
> We have got strace which shows that each each block is zeroed with
> fallocate and each
> invocation of fallocate takes 10ms, this accumulates of course.

Do you really need to use mkfs.ext3, or can you use mkfs.ext4 and
mount the filesystem as type ext4? Then you can use the "flexbg"
feature and it will not only speed up mkfs but also many other
normal operations (e.g. mount, e2fsck, allocation, etc).

Cheers, Andreas

>
> We have found that using
>
> UNIX_IO_NOZEROOUT=1 to affect libext2fs
>
> Brings the timings back in line down to seconds.
>
> If this is not a known bug I can send more details,
>
> Looks that calling fallocate for each block is very inefficient on some system.
> In our case this is dellr640 (skylake) with a mechanical disk.
>
> Kind Regards,
>
> Maciej
>
>
> On Mon, 10 Aug 2020 at 13:35, Maciej Jablonski <[email protected]> wrote:
>>
>> Hi,
>>
>> On upgrading from centos 7.6 to centos 8.2 mkfs slowed down by orders of magnitude.
>>
>> e.g. 35GB partition from under 8s to 4m+ on the same host.
>>
>> Most time is spent on writing the journal to the disk.
>>
>> strace shows the following:
>>
>> 16:19:49.827056 prctl(PR_GET_DUMPABLE) = 1 (SUID_DUMP_USER)
>> 16:19:49.827112 fallocate(3, FALLOC_FL_ZERO_RANGE, 3383296, 4096) = 0
>> 16:19:49.835203 pwrite64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 3362816) = 4096
>> 16:19:49.835321 getuid() = 0
>> 16:19:49.835403 geteuid() = 0
>> 16:19:49.835463 getgid() = 0
>> 16:19:49.835513 getegid() = 0
>> 16:19:49.835582 prctl(PR_GET_DUMPABLE) = 1 (SUID_DUMP_USER)
>> 16:19:49.835657 fallocate(3, FALLOC_FL_ZERO_RANGE, 3387392, 4096) = 0
>> 16:19:49.843471 pwrite64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 3366912) = 4096
>> 16:19:49.843562 getuid() = 0
>> 16:19:49.843619 geteuid() = 0
>> 16:19:49.843669 getgid() = 0
>> 16:19:49.843715 getegid() = 0
>> 16:19:49.843785 prctl(PR_GET_DUMPABLE) = 1 (SUID_DUMP_USER)
>> 16:19:49.843836 fallocate(3, FALLOC_FL_ZERO_RANGE, 3391488, 4096) = 0
>> 16:19:49.851885 pwrite64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 3371008) = 4096
>>
>>
>> Each invocation of fallocate takes 10ms, this accumulates of course.
>> We have found that using
>>
>> UNIX_IO_NOZEROOUT=1 to affect libext2fs
>>
>> Brings the timings back in line down to seconds.
>>
>> If this is not a known bug I can send more details,
>>
>> Looks that calling fallocate for each block is very inefficient on some system.
>> In our case this is dellr640 (skylake) with a mechanical disk.
>>
>> Kind Regards,
>>
>> Maciej
>>
>>


Cheers, Andreas






Attachments:
signature.asc (890.00 B)
Message signed with OpenPGP

2020-08-12 23:14:48

by Reindl Harald

[permalink] [raw]
Subject: Re: libext2fs: mkfs.ext3 really slow on centos 8.2



Am 13.08.20 um 00:45 schrieb Andreas Dilger:
> On Aug 10, 2020, at 6:37 AM, Maciej Jablonski <[email protected]> wrote:
>> On upgrading from centos 7.6 to centos 8.2 mkfs slowed down by orders
>> of magnitude.
>>
>> e.g. 35GB partition from under 8s to 4m+ on the same host.
>>
>> Most time is spent on writing the journal to the disk.
>>
>> strace shows the following:
>>
>> We have got strace which shows that each each block is zeroed with
>> fallocate and each
>> invocation of fallocate takes 10ms, this accumulates of course.
>
> Do you really need to use mkfs.ext3, or can you use mkfs.ext4 and
> mount the filesystem as type ext4? Then you can use the "flexbg"
> feature and it will not only speed up mkfs but also many other
> normal operations (e.g. mount, e2fsck, allocation, etc)

typo: it's "flex_bg" and enabled by default (Filesystem created: Sun Aug
9 13:24:15 2020)

Filesystem features: has_journal ext_attr resize_inode dir_index
filetype needs_recovery extent 64bit flex_bg sparse_super large_file
huge_file dir_nlink extra_isize metadata_csum

ext3 is something of the past for a full decade now

2020-08-14 10:34:41

by Maciej Jablonski

[permalink] [raw]
Subject: Re: libext2fs: mkfs.ext3 really slow on centos 8.2

On Thu, 13 Aug 2020 at 00:14, Reindl Harald <[email protected]> wrote:
>
>
>
> Am 13.08.20 um 00:45 schrieb Andreas Dilger:
> > On Aug 10, 2020, at 6:37 AM, Maciej Jablonski <[email protected]> wrote:
> >> On upgrading from centos 7.6 to centos 8.2 mkfs slowed down by orders
> >> of magnitude.
> >>
> >> e.g. 35GB partition from under 8s to 4m+ on the same host.
> >>
> >> Most time is spent on writing the journal to the disk.
> >>
> >> strace shows the following:
> >>
> >> We have got strace which shows that each each block is zeroed with
> >> fallocate and each
> >> invocation of fallocate takes 10ms, this accumulates of course.
> >
> > Do you really need to use mkfs.ext3, or can you use mkfs.ext4 and
> > mount the filesystem as type ext4? Then you can use the "flexbg"
> > feature and it will not only speed up mkfs but also many other
> > normal operations (e.g. mount, e2fsck, allocation, etc)
>
> typo: it's "flex_bg" and enabled by default (Filesystem created: Sun Aug
> 9 13:24:15 2020)
>
> Filesystem features: has_journal ext_attr resize_inode dir_index
> filetype needs_recovery extent 64bit flex_bg sparse_super large_file
> huge_file dir_nlink extra_isize metadata_csum
>
> ext3 is something of the past for a full decade now

Hi Andreas,

Thanks for the insights,

A bit of background in what circumstances and how widely we see the problem

We run stock OS installers of wide range of linux distros - all
supported versions of RHEL, Ubuntu, Debian, SLES on bare metal
machines probably 60 different models in total of major brands from
some 5 past generations to current.

And we have noticed that installs of recent distro releases on recent
hardware are just significantly slower,
e.g. RHEL8.0 vs RHEL8.2 went up from 40 minutes to 90 (300GB partition
with default ext3 journal entries) on dell r640 and dell r630
machines. We confirmed at least some of the problems with distros to
be down to mkfs as I mentioned. Note on other machines (older ones)
there is seemingly no difference, we only suspect this might be down
to a disk controller.

We have been historically pegged to ext3, however, it now looks worth
to reconsider ext4.

Thanks,
Maciej

2020-08-14 11:42:13

by Reindl Harald

[permalink] [raw]
Subject: Re: libext2fs: mkfs.ext3 really slow on centos 8.2



Am 14.08.20 um 12:33 schrieb Maciej Jablonski:
> We have been historically pegged to ext3, however, it now looks worth
> to reconsider ext4.

all our virtual servers where installed in 2008 with Fedora 9 and at
that time ext3

* 2009 we migrated all datasisks to ext4
* https://fedoraproject.org/wiki/Dracut

after dracut came in the mix we where also ale to convert all rootfs to
ext4 and just reboot

never looked back to ext3

using recent kernels you most likely don't have an ext3 driver at all,
it's the ext4 which can handle ext3 pretty fine

tune2fs 1.45.5 (07-Jan-2020)
Filesystem volume name: system
Last mounted on: /
Filesystem UUID: 918f24a7-bc8e-4da5-8a23-8800d5104421
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index
filetype needs_recovery extent flex_bg sparse_super large_file uninit_bg
dir_nlink
Filesystem flags: signed_directory_hash
Default mount options: journal_data_writeback user_xattr acl nobarrier
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 393216
Block count: 1572354
Reserved block count: 2
Free blocks: 1350883
Free inodes: 366435
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 383
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
Filesystem created: Mon Aug 18 06:48:05 2008
Last mount time: Tue Aug 11 02:59:54 2020
Last write time: Fri Aug 14 02:09:32 2020
Mount count: 12
Maximum mount count: 30
Last checked: Thu Aug 6 21:48:28 2020
Check interval: 31104000 (12 months)
Next check after: Sun Aug 1 21:48:28 2021
Lifetime writes: 1483 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Journal inode: 8
First orphan inode: 32820
Default directory hash: half_md4
Directory Hash Seed: 1e9d689f-15fe-4c0d-aaba-9d323049c7f4
Journal backup: inode blocks