2022-07-25 15:12:18

by Ojaswin Mujoo

[permalink] [raw]
Subject: Re: [Regression] ext4: changes to mb_optimize_scan cause issues on Raspberry Pi

On Mon, Jul 18, 2022 at 03:29:47PM +0200, Stefan Wahren wrote:
> Hi,
>
> i noticed that since Linux 5.18 (Linux 5.19-rc6 is still affected) i'm
> unable to run "rpi-update" without massive performance regression on my
> Raspberry Pi 4 (multi_v7_defconfig + CONFIG_ARM_LPAE). Using Linux 5.17 this
> tool successfully downloads the latest firmware (> 100 MB) on my development
> micro SD card (Kingston 16 GB Industrial) with a ext4 filesystem within ~ 1
> min. The same scenario on Linux 5.18 shows the following symptoms:
>
> - download takes endlessly much time and leads to an abort by userspace in
> most cases because of the poor performance
> - massive system load during download even after download has been aborted
> (heartbeat LED goes wild)
> - whole system becomes nearly unresponsive
> - system load goes back to normal after > 10 min
> - dmesg doesn't show anything suspicious
>
> I was able to bisect this issue:
>
> ff042f4a9b050895a42cae893cc01fa2ca81b95c good
> 4b0986a3613c92f4ec1bdc7f60ec66fea135991f bad
> 25fd2d41b505d0640bdfe67aa77c549de2d3c18a bad
> b4bc93bd76d4da32600795cd323c971f00a2e788 bad
> 3fe2f7446f1e029b220f7f650df6d138f91651f2 bad
> b080cee72ef355669cbc52ff55dc513d37433600 good
> ad9c6ee642a61adae93dfa35582b5af16dc5173a good
> 9b03992f0c88baef524842e411fbdc147780dd5d bad
> aab4ed5816acc0af8cce2680880419cd64982b1d good
> 14705fda8f6273501930dfe1d679ad4bec209f52 good
> 5c93e8ecd5bd3bfdee013b6da0850357eb6ca4d8 good
> 8cb5a30372ef5cf2b1d258fce1711d80f834740a bad
> 077d0c2c78df6f7260cdd015a991327efa44d8ad bad
> cc5095747edfb054ca2068d01af20be3fcc3634f good
> 27b38686a3bb601db48901dbc4e2fc5d77ffa2c1 good
>
> commit 077d0c2c78df6f7260cdd015a991327efa44d8ad
> Author: Ojaswin Mujoo <[email protected]>
> Date:?? Tue Mar 8 15:22:01 2022 +0530
>
> ext4: make mb_optimize_scan performance mount option work with extents
>
> If i revert this commit with Linux 5.19-rc6 the performance regression
> disappears.
>
> Please ask if you need more information.

Hi Stefan,

Apologies, I had missed this email initially. So this particular patch
simply changed a typo in an if condition which was preventing the
mb_optimize_scan option to be enabled correctly (This feature was
introduced in the following commit [1]). I think with the
mb_optimize_scan now working, it is somehow causing the firmware
download/update to take a longer time.

I'll try to investigate this and get back with my findings.

Regard,
Ojaswin

[1]
commit 196e402adf2e4cd66f101923409f1970ec5f1af3
From: Harshad Shirwadkar <[email protected]>
Date: Thu, 1 Apr 2021 10:21:27 -0700

ext4: improve cr 0 / cr 1 group scanning

>
> Regards
>


2022-07-25 19:14:48

by Stefan Wahren

[permalink] [raw]
Subject: Re: [Regression] ext4: changes to mb_optimize_scan cause issues on Raspberry Pi

Hi Ojaswin,

Am 25.07.22 um 17:07 schrieb Ojaswin Mujoo:
> On Mon, Jul 18, 2022 at 03:29:47PM +0200, Stefan Wahren wrote:
>> Hi,
>>
>> i noticed that since Linux 5.18 (Linux 5.19-rc6 is still affected) i'm
>> unable to run "rpi-update" without massive performance regression on my
>> Raspberry Pi 4 (multi_v7_defconfig + CONFIG_ARM_LPAE). Using Linux 5.17 this
>> tool successfully downloads the latest firmware (> 100 MB) on my development
>> micro SD card (Kingston 16 GB Industrial) with a ext4 filesystem within ~ 1
>> min. The same scenario on Linux 5.18 shows the following symptoms:
>>
>> - download takes endlessly much time and leads to an abort by userspace in
>> most cases because of the poor performance
>> - massive system load during download even after download has been aborted
>> (heartbeat LED goes wild)
>> - whole system becomes nearly unresponsive
>> - system load goes back to normal after > 10 min
>> - dmesg doesn't show anything suspicious
>>
>> I was able to bisect this issue:
>>
>> ff042f4a9b050895a42cae893cc01fa2ca81b95c good
>> 4b0986a3613c92f4ec1bdc7f60ec66fea135991f bad
>> 25fd2d41b505d0640bdfe67aa77c549de2d3c18a bad
>> b4bc93bd76d4da32600795cd323c971f00a2e788 bad
>> 3fe2f7446f1e029b220f7f650df6d138f91651f2 bad
>> b080cee72ef355669cbc52ff55dc513d37433600 good
>> ad9c6ee642a61adae93dfa35582b5af16dc5173a good
>> 9b03992f0c88baef524842e411fbdc147780dd5d bad
>> aab4ed5816acc0af8cce2680880419cd64982b1d good
>> 14705fda8f6273501930dfe1d679ad4bec209f52 good
>> 5c93e8ecd5bd3bfdee013b6da0850357eb6ca4d8 good
>> 8cb5a30372ef5cf2b1d258fce1711d80f834740a bad
>> 077d0c2c78df6f7260cdd015a991327efa44d8ad bad
>> cc5095747edfb054ca2068d01af20be3fcc3634f good
>> 27b38686a3bb601db48901dbc4e2fc5d77ffa2c1 good
>>
>> commit 077d0c2c78df6f7260cdd015a991327efa44d8ad
>> Author: Ojaswin Mujoo <[email protected]>
>> Date:   Tue Mar 8 15:22:01 2022 +0530
>>
>> ext4: make mb_optimize_scan performance mount option work with extents
>>
>> If i revert this commit with Linux 5.19-rc6 the performance regression
>> disappears.
>>
>> Please ask if you need more information.
> Hi Stefan,
>
> Apologies, I had missed this email initially. So this particular patch
> simply changed a typo in an if condition which was preventing the
> mb_optimize_scan option to be enabled correctly (This feature was
> introduced in the following commit [1]). I think with the
> mb_optimize_scan now working, it is somehow causing the firmware
> download/update to take a longer time.
>
> I'll try to investigate this and get back with my findings.

thanks. I wasn't able to reproduce this heavy load symptoms with every
SD card. Maybe this depends on the write performance of the SD card to
trigger the situation (used command to measure write performance: dd
if=/dev/zero of=/boot/test bs=1M count=30 oflag=dsync,direct ).

I tested a Kingston consumer 32 GB which had nearly constant write
performance of 13 MB/s and didn't had the heavy load symptoms. The
firmware update was done in a few seconds, so hard to say that at least
the performance regression is reproducible.

I also tested 2x Kingston industrial 16 GB which had a floating write
performance between 5 and 10 MB/s (wear leveling?) and both had the
heavy load symptoms.

All SD cards has been detected as ultra high speed DDR50 by the emmc2
interface.

Best regards

>
> Regard,
> Ojaswin
>
> [1]
> commit 196e402adf2e4cd66f101923409f1970ec5f1af3
> From: Harshad Shirwadkar <[email protected]>
> Date: Thu, 1 Apr 2021 10:21:27 -0700
>
> ext4: improve cr 0 / cr 1 group scanning
>
>> Regards
>>

2022-07-26 06:45:14

by Ojaswin Mujoo

[permalink] [raw]
Subject: Re: [Regression] ext4: changes to mb_optimize_scan cause issues on Raspberry Pi

On Mon, Jul 25, 2022 at 09:09:32PM +0200, Stefan Wahren wrote:
> Hi Ojaswin,
>
> Am 25.07.22 um 17:07 schrieb Ojaswin Mujoo:
> > On Mon, Jul 18, 2022 at 03:29:47PM +0200, Stefan Wahren wrote:
> > > Hi,
> > >
> > > i noticed that since Linux 5.18 (Linux 5.19-rc6 is still affected) i'm
> > > unable to run "rpi-update" without massive performance regression on my
> > > Raspberry Pi 4 (multi_v7_defconfig + CONFIG_ARM_LPAE). Using Linux 5.17 this
> > > tool successfully downloads the latest firmware (> 100 MB) on my development
> > > micro SD card (Kingston 16 GB Industrial) with a ext4 filesystem within ~ 1
> > > min. The same scenario on Linux 5.18 shows the following symptoms:
> > >
> > > - download takes endlessly much time and leads to an abort by userspace in
> > > most cases because of the poor performance
> > > - massive system load during download even after download has been aborted
> > > (heartbeat LED goes wild)
> > > - whole system becomes nearly unresponsive
> > > - system load goes back to normal after > 10 min
> > > - dmesg doesn't show anything suspicious
> > >
> > > I was able to bisect this issue:
> > >
> > > ff042f4a9b050895a42cae893cc01fa2ca81b95c good
> > > 4b0986a3613c92f4ec1bdc7f60ec66fea135991f bad
> > > 25fd2d41b505d0640bdfe67aa77c549de2d3c18a bad
> > > b4bc93bd76d4da32600795cd323c971f00a2e788 bad
> > > 3fe2f7446f1e029b220f7f650df6d138f91651f2 bad
> > > b080cee72ef355669cbc52ff55dc513d37433600 good
> > > ad9c6ee642a61adae93dfa35582b5af16dc5173a good
> > > 9b03992f0c88baef524842e411fbdc147780dd5d bad
> > > aab4ed5816acc0af8cce2680880419cd64982b1d good
> > > 14705fda8f6273501930dfe1d679ad4bec209f52 good
> > > 5c93e8ecd5bd3bfdee013b6da0850357eb6ca4d8 good
> > > 8cb5a30372ef5cf2b1d258fce1711d80f834740a bad
> > > 077d0c2c78df6f7260cdd015a991327efa44d8ad bad
> > > cc5095747edfb054ca2068d01af20be3fcc3634f good
> > > 27b38686a3bb601db48901dbc4e2fc5d77ffa2c1 good
> > >
> > > commit 077d0c2c78df6f7260cdd015a991327efa44d8ad
> > > Author: Ojaswin Mujoo <[email protected]>
> > > Date:?? Tue Mar 8 15:22:01 2022 +0530
> > >
> > > ext4: make mb_optimize_scan performance mount option work with extents
> > >
> > > If i revert this commit with Linux 5.19-rc6 the performance regression
> > > disappears.
> > >
> > > Please ask if you need more information.
> > Hi Stefan,
> >
> > Apologies, I had missed this email initially. So this particular patch
> > simply changed a typo in an if condition which was preventing the
> > mb_optimize_scan option to be enabled correctly (This feature was
> > introduced in the following commit [1]). I think with the
> > mb_optimize_scan now working, it is somehow causing the firmware
> > download/update to take a longer time.
> >
> > I'll try to investigate this and get back with my findings.
>
> thanks. I wasn't able to reproduce this heavy load symptoms with every SD
> card. Maybe this depends on the write performance of the SD card to trigger
> the situation (used command to measure write performance: dd if=/dev/zero
> of=/boot/test bs=1M count=30 oflag=dsync,direct ).
>
> I tested a Kingston consumer 32 GB which had nearly constant write
> performance of 13 MB/s and didn't had the heavy load symptoms. The firmware
> update was done in a few seconds, so hard to say that at least the
> performance regression is reproducible.
>
> I also tested 2x Kingston industrial 16 GB which had a floating write
> performance between 5 and 10 MB/s (wear leveling?) and both had the heavy
> load symptoms.
>
> All SD cards has been detected as ultra high speed DDR50 by the emmc2
> interface.
>
> Best regards
>
> >
> > Regard,
> > Ojaswin
> >
> > [1]
> > commit 196e402adf2e4cd66f101923409f1970ec5f1af3
> > From: Harshad Shirwadkar <[email protected]>
> > Date: Thu, 1 Apr 2021 10:21:27 -0700
> >
> > ext4: improve cr 0 / cr 1 group scanning
> >
> > > Regards
> > >

Thanks for the info Stefan, I'm still trying to reproduce the issue but
it's slightly challenging since I don't have my RPi handy at the moment.

In the meantime, would you please try out the mb_optmize_scan=0 command
line options to see if that helps bypass the issue. This will help
confirm if the issue lies in mb_optmize_scan itself or if its something
else.

You can perhaps mount the root file system with this option using
the following kernel command line argument

rootflags="mb_optimize_scan=0"

You can also confirm if mb_optimize_scan was turned off by checking the
first line in output of:

cat /proc/fs/ext4/<dev>/mb_structs_summary

Regards,
Ojaswin

2022-07-26 16:13:59

by Stefan Wahren

[permalink] [raw]
Subject: Re: [Regression] ext4: changes to mb_optimize_scan cause issues on Raspberry Pi

Hi Ojaswin,

Am 26.07.22 um 08:43 schrieb Ojaswin Mujoo:
> On Mon, Jul 25, 2022 at 09:09:32PM +0200, Stefan Wahren wrote:
>> Hi Ojaswin,
>>
>> Am 25.07.22 um 17:07 schrieb Ojaswin Mujoo:
>>> On Mon, Jul 18, 2022 at 03:29:47PM +0200, Stefan Wahren wrote:
>>>> Hi,
>>>>
>>>> i noticed that since Linux 5.18 (Linux 5.19-rc6 is still affected) i'm
>>>> unable to run "rpi-update" without massive performance regression on my
>>>> Raspberry Pi 4 (multi_v7_defconfig + CONFIG_ARM_LPAE). Using Linux 5.17 this
>>>> tool successfully downloads the latest firmware (> 100 MB) on my development
>>>> micro SD card (Kingston 16 GB Industrial) with a ext4 filesystem within ~ 1
>>>> min. The same scenario on Linux 5.18 shows the following symptoms:
>>>>
>>>> - download takes endlessly much time and leads to an abort by userspace in
>>>> most cases because of the poor performance
>>>> - massive system load during download even after download has been aborted
>>>> (heartbeat LED goes wild)
>>>> - whole system becomes nearly unresponsive
>>>> - system load goes back to normal after > 10 min
>>>> - dmesg doesn't show anything suspicious
>>>>
>>>> I was able to bisect this issue:
>>>>
>>>> ff042f4a9b050895a42cae893cc01fa2ca81b95c good
>>>> 4b0986a3613c92f4ec1bdc7f60ec66fea135991f bad
>>>> 25fd2d41b505d0640bdfe67aa77c549de2d3c18a bad
>>>> b4bc93bd76d4da32600795cd323c971f00a2e788 bad
>>>> 3fe2f7446f1e029b220f7f650df6d138f91651f2 bad
>>>> b080cee72ef355669cbc52ff55dc513d37433600 good
>>>> ad9c6ee642a61adae93dfa35582b5af16dc5173a good
>>>> 9b03992f0c88baef524842e411fbdc147780dd5d bad
>>>> aab4ed5816acc0af8cce2680880419cd64982b1d good
>>>> 14705fda8f6273501930dfe1d679ad4bec209f52 good
>>>> 5c93e8ecd5bd3bfdee013b6da0850357eb6ca4d8 good
>>>> 8cb5a30372ef5cf2b1d258fce1711d80f834740a bad
>>>> 077d0c2c78df6f7260cdd015a991327efa44d8ad bad
>>>> cc5095747edfb054ca2068d01af20be3fcc3634f good
>>>> 27b38686a3bb601db48901dbc4e2fc5d77ffa2c1 good
>>>>
>>>> commit 077d0c2c78df6f7260cdd015a991327efa44d8ad
>>>> Author: Ojaswin Mujoo <[email protected]>
>>>> Date:   Tue Mar 8 15:22:01 2022 +0530
>>>>
>>>> ext4: make mb_optimize_scan performance mount option work with extents
>>>>
>>>> If i revert this commit with Linux 5.19-rc6 the performance regression
>>>> disappears.
>>>>
>>>> Please ask if you need more information.
>>> Hi Stefan,
>>>
>>> Apologies, I had missed this email initially. So this particular patch
>>> simply changed a typo in an if condition which was preventing the
>>> mb_optimize_scan option to be enabled correctly (This feature was
>>> introduced in the following commit [1]). I think with the
>>> mb_optimize_scan now working, it is somehow causing the firmware
>>> download/update to take a longer time.
>>>
>>> I'll try to investigate this and get back with my findings.
>> thanks. I wasn't able to reproduce this heavy load symptoms with every SD
>> card. Maybe this depends on the write performance of the SD card to trigger
>> the situation (used command to measure write performance: dd if=/dev/zero
>> of=/boot/test bs=1M count=30 oflag=dsync,direct ).
>>
>> I tested a Kingston consumer 32 GB which had nearly constant write
>> performance of 13 MB/s and didn't had the heavy load symptoms. The firmware
>> update was done in a few seconds, so hard to say that at least the
>> performance regression is reproducible.
>>
>> I also tested 2x Kingston industrial 16 GB which had a floating write
>> performance between 5 and 10 MB/s (wear leveling?) and both had the heavy
>> load symptoms.
>>
>> All SD cards has been detected as ultra high speed DDR50 by the emmc2
>> interface.
>>
>> Best regards
>>
>>> Regard,
>>> Ojaswin
>>>
>>> [1]
>>> commit 196e402adf2e4cd66f101923409f1970ec5f1af3
>>> From: Harshad Shirwadkar <[email protected]>
>>> Date: Thu, 1 Apr 2021 10:21:27 -0700
>>>
>>> ext4: improve cr 0 / cr 1 group scanning
>>>
>>>> Regards
>>>>
> Thanks for the info Stefan, I'm still trying to reproduce the issue but
> it's slightly challenging since I don't have my RPi handy at the moment.
>
> In the meantime, would you please try out the mb_optmize_scan=0 command
> line options to see if that helps bypass the issue. This will help
> confirm if the issue lies in mb_optmize_scan itself or if its something
> else.
>
I run the firmware update 5 times with mb_optimize_scan=0 on my
Raspberry Pi 4 and the industrial SD card and everytime the update worked.
>
> Regards,
> Ojaswin

2022-07-28 07:39:25

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [Regression] ext4: changes to mb_optimize_scan cause issues on Raspberry Pi



On 26.07.22 17:54, Stefan Wahren wrote:
> Hi Ojaswin,
>
> Am 26.07.22 um 08:43 schrieb Ojaswin Mujoo:
>> On Mon, Jul 25, 2022 at 09:09:32PM +0200, Stefan Wahren wrote:
>>> Hi Ojaswin,
>>>
>>> Am 25.07.22 um 17:07 schrieb Ojaswin Mujoo:
>>>> On Mon, Jul 18, 2022 at 03:29:47PM +0200, Stefan Wahren wrote:
>>>>> Hi,
>>>>>
>>>>> i noticed that since Linux 5.18 (Linux 5.19-rc6 is still affected) i'm
>>>>> unable to run "rpi-update" without massive performance regression
>>>>> on my
>>>>> Raspberry Pi 4 (multi_v7_defconfig + CONFIG_ARM_LPAE). Using Linux
>>>>> 5.17 this
>>>>> tool successfully downloads the latest firmware (> 100 MB) on my
>>>>> development
>>>>> micro SD card (Kingston 16 GB Industrial) with a ext4 filesystem
>>>>> within ~ 1
>>>>> min. The same scenario on Linux 5.18 shows the following symptoms:
>>>>>
>>>>> - download takes endlessly much time and leads to an abort by
>>>>> userspace in
>>>>> most cases because of the poor performance
>>>>> - massive system load during download even after download has been
>>>>> aborted
>>>>> (heartbeat LED goes wild)
>>>>> - whole system becomes nearly unresponsive
>>>>> - system load goes back to normal after > 10 min
>>>>> - dmesg doesn't show anything suspicious
>>>>>
>>>>> I was able to bisect this issue:
>>>>>
>>>>> ff042f4a9b050895a42cae893cc01fa2ca81b95c good
>>>>> 4b0986a3613c92f4ec1bdc7f60ec66fea135991f bad
>>>>> 25fd2d41b505d0640bdfe67aa77c549de2d3c18a bad
>>>>> b4bc93bd76d4da32600795cd323c971f00a2e788 bad
>>>>> 3fe2f7446f1e029b220f7f650df6d138f91651f2 bad
>>>>> b080cee72ef355669cbc52ff55dc513d37433600 good
>>>>> ad9c6ee642a61adae93dfa35582b5af16dc5173a good
>>>>> 9b03992f0c88baef524842e411fbdc147780dd5d bad
>>>>> aab4ed5816acc0af8cce2680880419cd64982b1d good
>>>>> 14705fda8f6273501930dfe1d679ad4bec209f52 good
>>>>> 5c93e8ecd5bd3bfdee013b6da0850357eb6ca4d8 good
>>>>> 8cb5a30372ef5cf2b1d258fce1711d80f834740a bad
>>>>> 077d0c2c78df6f7260cdd015a991327efa44d8ad bad
>>>>> cc5095747edfb054ca2068d01af20be3fcc3634f good
>>>>> 27b38686a3bb601db48901dbc4e2fc5d77ffa2c1 good
>>>>>
>>>>> commit 077d0c2c78df6f7260cdd015a991327efa44d8ad
>>>>> Author: Ojaswin Mujoo <[email protected]>
>>>>> Date:   Tue Mar 8 15:22:01 2022 +0530
>>>>>
>>>>> ext4: make mb_optimize_scan performance mount option work with extents
>>>>>
>>>>> If i revert this commit with Linux 5.19-rc6 the performance regression
>>>>> disappears.
>>>>>
>>>>> Please ask if you need more information.
>>>> Hi Stefan,
>>>>
>>>> Apologies, I had missed this email initially. So this particular patch
>>>> simply changed a typo in an if condition which was preventing the
>>>> mb_optimize_scan option to be enabled correctly (This feature was
>>>> introduced in the following commit [1]). I think with the
>>>> mb_optimize_scan now working, it is somehow causing the firmware
>>>> download/update to take a longer time.
>>>>
>>>> I'll try to investigate this and get back with my findings.
>>> thanks. I wasn't able to reproduce this heavy load symptoms with
>>> every SD
>>> card. Maybe this depends on the write performance of the SD card to
>>> trigger
>>> the situation (used command to measure write performance: dd
>>> if=/dev/zero
>>> of=/boot/test bs=1M count=30 oflag=dsync,direct ).
>>>
>>> I tested a Kingston consumer 32 GB which had nearly constant write
>>> performance of 13 MB/s and didn't had the heavy load symptoms. The
>>> firmware
>>> update was done in a few seconds, so hard to say that at least the
>>> performance regression is reproducible.
>>>
>>> I also tested 2x Kingston industrial 16 GB which had a floating write
>>> performance between 5 and 10 MB/s (wear leveling?) and both had the
>>> heavy
>>> load symptoms.
>>>
>>> All SD cards has been detected as ultra high speed DDR50 by the emmc2
>>> interface.
>>>
>>> Best regards
>>>
>>>> Regard,
>>>> Ojaswin
>>>>
>>>> [1]
>>>>     commit 196e402adf2e4cd66f101923409f1970ec5f1af3
>>>>     From: Harshad Shirwadkar <[email protected]>
>>>>     Date: Thu, 1 Apr 2021 10:21:27 -0700
>>>>     
>>>>     ext4: improve cr 0 / cr 1 group scanning
>>>>
>>>>> Regards
>>>>>
>> Thanks for the info Stefan, I'm still trying to reproduce the issue but
>> it's slightly challenging since I don't have my RPi handy at the moment.
>>
>> In the meantime, would you please try out the mb_optmize_scan=0 command
>> line options to see if that helps bypass the issue. This will help
>> confirm if the issue lies in mb_optmize_scan itself or if its something
>> else.
>>
> I run the firmware update 5 times with mb_optimize_scan=0 on my
> Raspberry Pi 4 and the industrial SD card and everytime the update worked.
>>

[CCing Jan]

FYI, Jan yesterday reported benchmark regresses that might or might not
be related Stefan's regression on the Raspberry Pi:
https://lore.kernel.org/all/20220727105123.ckwrhbilzrxqpt24@quack3/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

#regzbot monitor
https://lore.kernel.org/all/20220727105123.ckwrhbilzrxqpt24@quack3/