2022-07-28 10:07:42

by Jan Kara

[permalink] [raw]
Subject: Re: [Regression] ext4: changes to mb_optimize_scan cause issues on Raspberry Pi

Hello!

On Mon 18-07-22 15:29:47, Stefan Wahren wrote:
> i noticed that since Linux 5.18 (Linux 5.19-rc6 is still affected) i'm
> unable to run "rpi-update" without massive performance regression on my
> Raspberry Pi 4 (multi_v7_defconfig + CONFIG_ARM_LPAE). Using Linux 5.17 this
> tool successfully downloads the latest firmware (> 100 MB) on my development
> micro SD card (Kingston 16 GB Industrial) with a ext4 filesystem within ~ 1
> min. The same scenario on Linux 5.18 shows the following symptoms:

Thanks for report and the bisection!

> - download takes endlessly much time and leads to an abort by userspace in
> most cases because of the poor performance
> - massive system load during download even after download has been aborted
> (heartbeat LED goes wild)

OK, is it that the CPU is busy or are we waiting on the storage card?
Observing top(1) for a while should be enough to get the idea. (sorry, I'm
not very familiar with RPi so I'm not sure what heartbeat LED shows).

> - whole system becomes nearly unresponsive
> - system load goes back to normal after > 10 min

So what likely happens is that the downloaded data is in the pagecache and
what is causing the stuckage is that we are writing it back to the SD card
that somehow is much less efficient with mb_optimize_scan=1 for your setup.
Even if you stop the download, we still have dirty data in the page cache
which we need to write out so that is the reason why the system takes so
long to return back to normal.

> - dmesg doesn't show anything suspicious
>
> I was able to bisect this issue:
>
> ff042f4a9b050895a42cae893cc01fa2ca81b95c good
> 4b0986a3613c92f4ec1bdc7f60ec66fea135991f bad
> 25fd2d41b505d0640bdfe67aa77c549de2d3c18a bad
> b4bc93bd76d4da32600795cd323c971f00a2e788 bad
> 3fe2f7446f1e029b220f7f650df6d138f91651f2 bad
> b080cee72ef355669cbc52ff55dc513d37433600 good
> ad9c6ee642a61adae93dfa35582b5af16dc5173a good
> 9b03992f0c88baef524842e411fbdc147780dd5d bad
> aab4ed5816acc0af8cce2680880419cd64982b1d good
> 14705fda8f6273501930dfe1d679ad4bec209f52 good
> 5c93e8ecd5bd3bfdee013b6da0850357eb6ca4d8 good
> 8cb5a30372ef5cf2b1d258fce1711d80f834740a bad
> 077d0c2c78df6f7260cdd015a991327efa44d8ad bad
> cc5095747edfb054ca2068d01af20be3fcc3634f good
> 27b38686a3bb601db48901dbc4e2fc5d77ffa2c1 good
>
> commit 077d0c2c78df6f7260cdd015a991327efa44d8ad
> Author: Ojaswin Mujoo <[email protected]>
> Date:?? Tue Mar 8 15:22:01 2022 +0530
>
> ext4: make mb_optimize_scan performance mount option work with extents
>
> If i revert this commit with Linux 5.19-rc6 the performance regression
> disappears.
>
> Please ask if you need more information.

Can you run "iostat -x 1" while the download is running so that we can see
roughly how the IO pattern looks?

Also can get filesystem metadata image of your card like:
e2image -r <fs-device> - | gzip >/tmp/ext4-image.gz

and put it somewhere for download? The image will contain only fs metadata,
not data so it should be relatively small and we won't have access to your
secrets ;). With the image we'd be able to see how the free space looks
like and whether it perhaps does not trigger some pathological behavior.

My current suspicion is that because the new allocator strategy spreads
allocations over more block groups, we end up with more open erase blocks
on the SD card which forces the firmware to do more garbage collection and
RMW of erase blocks and write performance tanks...

Thanks.
Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR


2022-07-29 05:36:29

by Stefan Wahren

[permalink] [raw]
Subject: Re: [Regression] ext4: changes to mb_optimize_scan cause issues on Raspberry Pi

Hi Jan,

Am 28.07.22 um 12:00 schrieb Jan Kara:
> Hello!
> Can you run "iostat -x 1" while the download is running so that we can see
> roughly how the IO pattern looks?
>
> Also can get filesystem metadata image of your card like:
> e2image -r <fs-device> - | gzip >/tmp/ext4-image.gz
>
> and put it somewhere for download? The image will contain only fs metadata,
> not data so it should be relatively small and we won't have access to your
> secrets ;). With the image we'd be able to see how the free space looks
> like and whether it perhaps does not trigger some pathological behavior.
>
> My current suspicion is that because the new allocator strategy spreads
> allocations over more block groups, we end up with more open erase blocks
> on the SD card which forces the firmware to do more garbage collection and
> RMW of erase blocks and write performance tanks...
>
> Thanks.
thanks for your feedback. Unfortunately i'm busy the next days, so it
will take some time to provide this.
> Honza
>

2022-07-31 20:45:09

by Stefan Wahren

[permalink] [raw]
Subject: Re: [Regression] ext4: changes to mb_optimize_scan cause issues on Raspberry Pi

Hi Jan,

Am 28.07.22 um 12:00 schrieb Jan Kara:
>
> Also can get filesystem metadata image of your card like:
> e2image -r <fs-device> - | gzip >/tmp/ext4-image.gz
>
> and put it somewhere for download? The image will contain only fs metadata,
> not data so it should be relatively small and we won't have access to your
> secrets ;). With the image we'd be able to see how the free space looks
> like and whether it perhaps does not trigger some pathological behavior.
i've problems with this. If i try store uncompressed the metadata of the
second SD card partition (/dev/sdb2 = rootfs) the generated image file
is nearly big as the whole partition. In compressed state it's 25 MB. Is
this expected?

2022-08-06 15:25:47

by Stefan Wahren

[permalink] [raw]
Subject: Re: [Regression] ext4: changes to mb_optimize_scan cause issues on Raspberry Pi

Hi,

Am 31.07.22 um 22:42 schrieb Stefan Wahren:
> Hi Jan,
>
> Am 28.07.22 um 12:00 schrieb Jan Kara:
>>
>> Also can get filesystem metadata image of your card like:
>>    e2image -r <fs-device> - | gzip >/tmp/ext4-image.gz
>>
>> and put it somewhere for download? The image will contain only fs
>> metadata,
>> not data so it should be relatively small and we won't have access to
>> your
>> secrets ;). With the image we'd be able to see how the free space looks
>> like and whether it perhaps does not trigger some pathological behavior.
> i've problems with this. If i try store uncompressed the metadata of
> the second SD card partition (/dev/sdb2 = rootfs) the generated image
> file is nearly big as the whole partition. In compressed state it's 25
> MB. Is this expected?

This performance regression is also reproducible with 5.19 kernel
(arm64, defconfig) and 64-bit Raspberry Pi OS. Unfortunately the problem
with metadata generation is the same, the generated uncompressed file is
15 GB.

2022-08-14 10:12:29

by Stefan Wahren

[permalink] [raw]
Subject: Re: [Regression] ext4: changes to mb_optimize_scan cause issues on Raspberry Pi

Hi,

Am 06.08.22 um 17:23 schrieb Stefan Wahren:
> Hi,
>
> Am 31.07.22 um 22:42 schrieb Stefan Wahren:
>> Hi Jan,
>>
>> Am 28.07.22 um 12:00 schrieb Jan Kara:
>>>
>>> Also can get filesystem metadata image of your card like:
>>>    e2image -r <fs-device> - | gzip >/tmp/ext4-image.gz
>>>
>>> and put it somewhere for download? The image will contain only fs
>>> metadata,
>>> not data so it should be relatively small and we won't have access
>>> to your
>>> secrets ;). With the image we'd be able to see how the free space looks
>>> like and whether it perhaps does not trigger some pathological
>>> behavior.
>> i've problems with this. If i try store uncompressed the metadata of
>> the second SD card partition (/dev/sdb2 = rootfs) the generated image
>> file is nearly big as the whole partition. In compressed state it's
>> 25 MB. Is this expected?
>
> This performance regression is also reproducible with 5.19 kernel
> (arm64, defconfig) and 64-bit Raspberry Pi OS. Unfortunately the
> problem with metadata generation is the same, the generated
> uncompressed file is 15 GB.
>
recently i upgraded my build machine (Intel Core i7-1260P) which now
runs Ubuntu 22.04 including a recent 5.15 kernel. On my build machine if
i try to install my build kernel modules on the mentioned Industrial
Kingston SD card 16 GB (SDCIT) and call "sync" immediately, the process
will takes very long with recent LTS kernel (~ 5 minutes) and trigger a
hung task warning:

[ 1692.880208] INFO: task sync:22272 blocked for more than 120 seconds.
[ 1692.880222]       Not tainted 5.15.0-46-generic #49-Ubuntu
[ 1692.880225] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1692.880228] task:sync            state:D stack:    0 pid:22272 ppid: 
5238 flags:0x00004002
[ 1692.880237] Call Trace:
[ 1692.880240]  <TASK>
[ 1692.880246]  __schedule+0x23d/0x590
[ 1692.880257]  schedule+0x4e/0xc0
[ 1692.880261]  wb_wait_for_completion+0x59/0x90
[ 1692.880269]  ? wait_woken+0x70/0x70
[ 1692.880275]  sync_inodes_sb+0xbe/0x100
[ 1692.880282]  sync_inodes_one_sb+0x19/0x20
[ 1692.880288]  iterate_supers+0xab/0x110
[ 1692.880294]  ? __x64_sys_tee+0xe0/0xe0
[ 1692.880300]  ksys_sync+0x42/0xa0
[ 1692.880304]  __do_sys_sync+0xe/0x20
[ 1692.880307]  do_syscall_64+0x59/0xc0
[ 1692.880312]  ? do_user_addr_fault+0x1e7/0x670
[ 1692.880319]  ? syscall_exit_to_user_mode+0x27/0x50
[ 1692.880324]  ? exit_to_user_mode_prepare+0x37/0xb0
[ 1692.880331]  ? irqentry_exit_to_user_mode+0x9/0x20
[ 1692.880336]  ? irqentry_exit+0x1d/0x30
[ 1692.880340]  ? exc_page_fault+0x89/0x170
[ 1692.880345]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 1692.880353] RIP: 0033:0x7fcbda436abb
[ 1692.880358] RSP: 002b:00007ffc94923968 EFLAGS: 00000246 ORIG_RAX:
00000000000000a2
[ 1692.880363] RAX: ffffffffffffffda RBX: 00007ffc94923b48 RCX:
00007fcbda436abb
[ 1692.880366] RDX: 00007fcbda53c101 RSI: 00007ffc94923b48 RDI:
00007fcbda4f4e29
[ 1692.880369] RBP: 0000000000000001 R08: 0000000000000001 R09:
0000000000000000
[ 1692.880371] R10: 000055ca76fb4340 R11: 0000000000000246 R12:
000055ca752c3bc0
[ 1692.880373] R13: 000055ca752c119f R14: 00007fcbda53442c R15:
000055ca752c1034
[ 1692.880379]  </TASK>

Interestingly if i switch to the older ubuntu kernel version 5.15.25
(which shouldn't have "ext4: make mb_optimize_scan performance mount
option work with extents" applied) the write process is much faster (~ 1
minute) and i never saw the hung task.

Btw i setup a repo [1] to collect information about this issue. The
first file adds a kernel log in regression case. The kernel was 5.19-rc6
with multi_v7_defconfig and ARM_LPAE & EXT4_DEBUG enabled.

rpi-update was started before (backup & clean). The actual download
started at 11:11:29 and is aborted at 11:14:18.

HTH

[1] - https://github.com/lategoodbye/mb_optimize_scan_regress
[2] -
https://github.com/lategoodbye/mb_optimize_scan_regress/commit/6ff14dd4026d8607290b2727f5a2c3522567fb21

2022-08-15 10:40:52

by Jan Kara

[permalink] [raw]
Subject: Re: [Regression] ext4: changes to mb_optimize_scan cause issues on Raspberry Pi

Hi Stefan,

Back from vacation...

On Sun 31-07-22 22:42:56, Stefan Wahren wrote:
> Hi Jan,
>
> Am 28.07.22 um 12:00 schrieb Jan Kara:
> >
> > Also can get filesystem metadata image of your card like:
> > e2image -r <fs-device> - | gzip >/tmp/ext4-image.gz
> >
> > and put it somewhere for download? The image will contain only fs metadata,
> > not data so it should be relatively small and we won't have access to your
> > secrets ;). With the image we'd be able to see how the free space looks
> > like and whether it perhaps does not trigger some pathological behavior.
> i've problems with this. If i try store uncompressed the metadata of the
> second SD card partition (/dev/sdb2 = rootfs) the generated image file is
> nearly big as the whole partition. In compressed state it's 25 MB. Is this
> expected?

Yes, that is expected. The resulting file is a sparse file that contains
only metadata blocks that is the reason why it compresses so well but looks
big.

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2022-08-15 11:08:28

by Stefan Wahren

[permalink] [raw]
Subject: Re: [Regression] ext4: changes to mb_optimize_scan cause issues on Raspberry Pi

Hi Jan,

Am 15.08.22 um 12:34 schrieb Jan Kara:
> Hi Stefan,
>
> Back from vacation...
>
> On Sun 31-07-22 22:42:56, Stefan Wahren wrote:
>> Hi Jan,
>>
>> Am 28.07.22 um 12:00 schrieb Jan Kara:
>>> Also can get filesystem metadata image of your card like:
>>> e2image -r <fs-device> - | gzip >/tmp/ext4-image.gz
>>>
>>> and put it somewhere for download? The image will contain only fs metadata,
>>> not data so it should be relatively small and we won't have access to your
>>> secrets ;). With the image we'd be able to see how the free space looks
>>> like and whether it perhaps does not trigger some pathological behavior.
>> i've problems with this. If i try store uncompressed the metadata of the
>> second SD card partition (/dev/sdb2 = rootfs) the generated image file is
>> nearly big as the whole partition. In compressed state it's 25 MB. Is this
>> expected?
> Yes, that is expected. The resulting file is a sparse file that contains
> only metadata blocks that is the reason why it compresses so well but looks
> big.

i've added here:

https://github.com/lategoodbye/mb_optimize_scan_regress/blob/main/Kingston_SDCIT_rootfs_metadata.gz

>
> Honza
>