2023-12-15 11:20:46

by Ojaswin Mujoo

[permalink] [raw]
Subject: [PATCH 0/1] Fix for recent bugzilla reports related to long halts during block allocation

This patch intends to fix the recent bugzilla [1] report where the
kworker flush thread seemed to be taking 100% CPU utilizationa and was
slowing down the whole system. The backtrace indicated that we were
stuck in mballoc allocation path. The issue was only seen kernel 6.5+
and when ext4 was mounted with -o stripe (or stripe option was
implicitly added due us mkfs flags used).

Although I was not able to fully replicate this issue, from the perf
probe logs collected I have a possible root cause which I have explained
in the patch commit message.

Now, the one thing I'm still skeptical about is why this was only seen
in kernel 6.5+. We added a new mballoc criteria in kernel 6.5 but I was
not able to find a satisfactory explanation as to why that would have
any effect here. Furter, the issue still persisted when I asked one of
the reporters to disable the it using sysfs file and rerun the test.
Maybe there are some more factors at play?

Anyways, I would appreciate if the people experiencing this issue can
help test this patch and see if it fixes the regression.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=217965

Regards,
ojaswin

Ojaswin Mujoo (1):
ext4: fallback to complex scan if aligned scan doesn't work

fs/ext4/mballoc.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)

--
2.39.3



2024-01-09 02:54:27

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 0/1] Fix for recent bugzilla reports related to long halts during block allocation


On Fri, 15 Dec 2023 16:49:49 +0530, Ojaswin Mujoo wrote:
> This patch intends to fix the recent bugzilla [1] report where the
> kworker flush thread seemed to be taking 100% CPU utilizationa and was
> slowing down the whole system. The backtrace indicated that we were
> stuck in mballoc allocation path. The issue was only seen kernel 6.5+
> and when ext4 was mounted with -o stripe (or stripe option was
> implicitly added due us mkfs flags used).
>
> [...]

Applied, thanks!

[1/1] ext4: fallback to complex scan if aligned scan doesn't work
commit: a26b6faf7f1c9c1ba6edb3fea9d1390201f2ed50

Best regards,
--
Theodore Ts'o <[email protected]>

2024-03-20 16:53:17

by Frederick Lawler

[permalink] [raw]
Subject: Re: [PATCH 0/1] Fix for recent bugzilla reports related to long halts during block allocation

Hi Theodore and Ojaswin,

On Mon, Jan 08, 2024 at 09:53:18PM -0500, Theodore Ts'o wrote:
>
> On Fri, 15 Dec 2023 16:49:49 +0530, Ojaswin Mujoo wrote:
> > This patch intends to fix the recent bugzilla [1] report where the
> > kworker flush thread seemed to be taking 100% CPU utilizationa and was
> > slowing down the whole system. The backtrace indicated that we were
> > stuck in mballoc allocation path. The issue was only seen kernel 6.5+
> > and when ext4 was mounted with -o stripe (or stripe option was
> > implicitly added due us mkfs flags used).
> >
> > [...]
>
> Applied, thanks!

I backported this patch to at least 6.6 and tested on our fleet of
software RAID 0 NVME SSD nodes. This change worked very nicely
for us. We're interested in backporting this to at least 6.6.

I tried looking at xfstests, and didn't really see a good match
(user error?) to validate the fix via that. So I'm a little unclear what
the path forward here is.

Although we experienced this issue in 6.1, I didn't backport to 6.1 and
test to verify this also works there, however, setting stripe to 0 did in
the 6.1 case.

Best,
Fred

>
> [1/1] ext4: fallback to complex scan if aligned scan doesn't work
> commit: a26b6faf7f1c9c1ba6edb3fea9d1390201f2ed50
>
> Best regards,
> --
> Theodore Ts'o <[email protected]>

2024-03-22 08:33:35

by Ojaswin Mujoo

[permalink] [raw]
Subject: Re: [PATCH 0/1] Fix for recent bugzilla reports related to long halts during block allocation

On Wed, Mar 20, 2024 at 11:52:58AM -0500, Frederick Lawler wrote:
> Hi Theodore and Ojaswin,
>
> On Mon, Jan 08, 2024 at 09:53:18PM -0500, Theodore Ts'o wrote:
> >
> > On Fri, 15 Dec 2023 16:49:49 +0530, Ojaswin Mujoo wrote:
> > > This patch intends to fix the recent bugzilla [1] report where the
> > > kworker flush thread seemed to be taking 100% CPU utilizationa and was
> > > slowing down the whole system. The backtrace indicated that we were
> > > stuck in mballoc allocation path. The issue was only seen kernel 6.5+
> > > and when ext4 was mounted with -o stripe (or stripe option was
> > > implicitly added due us mkfs flags used).
> > >
> > > [...]
> >
> > Applied, thanks!
>
> I backported this patch to at least 6.6 and tested on our fleet of
> software RAID 0 NVME SSD nodes. This change worked very nicely
> for us. We're interested in backporting this to at least 6.6.
>
> I tried looking at xfstests, and didn't really see a good match
> (user error?) to validate the fix via that. So I'm a little unclear what
> the path forward here is.
>
> Although we experienced this issue in 6.1, I didn't backport to 6.1 and
> test to verify this also works there, however, setting stripe to 0 did in
> the 6.1 case.
>
> Best,
> Fred

Hi Fred,

If I understand correctly, you are looking for a test case which you
could use to confirm if the issue exists and if the backport is solving
it, right?

Actually, I was never able to replicate this at my end so I had to rely
on people hitting the bug to confirm if it works. I did set out to write
a testcase that could help us reliably replicate this issue but it needs
a very specially crafted FS that is a bit difficult to achieve from user
space. I was using debugfs to create an FS that could hit it but I kept
running into issues where it won't mount etc. Maybe there's a better
way to craft such an FS that I'm not aware of.

One more option is that maybe we can have KUnit test for this in the
mballoc code but I'd need to read some more about the kunit
infrastructure to see if it's possible/feasible.

Regards,
ojaswin
>
> >
> > [1/1] ext4: fallback to complex scan if aligned scan doesn't work
> > commit: a26b6faf7f1c9c1ba6edb3fea9d1390201f2ed50
> >
> > Best regards,
> > --
> > Theodore Ts'o <[email protected]>

2024-03-25 18:33:42

by Frederick Lawler

[permalink] [raw]
Subject: Re: [PATCH 0/1] Fix for recent bugzilla reports related to long halts during block allocation

On Fri, Mar 22, 2024 at 02:01:17PM +0530, Ojaswin Mujoo wrote:
> On Wed, Mar 20, 2024 at 11:52:58AM -0500, Frederick Lawler wrote:
> > Hi Theodore and Ojaswin,
> >
> > On Mon, Jan 08, 2024 at 09:53:18PM -0500, Theodore Ts'o wrote:
> > >
> > > On Fri, 15 Dec 2023 16:49:49 +0530, Ojaswin Mujoo wrote:
> > > > This patch intends to fix the recent bugzilla [1] report where the
> > > > kworker flush thread seemed to be taking 100% CPU utilizationa and was
> > > > slowing down the whole system. The backtrace indicated that we were
> > > > stuck in mballoc allocation path. The issue was only seen kernel 6.5+
> > > > and when ext4 was mounted with -o stripe (or stripe option was
> > > > implicitly added due us mkfs flags used).
> > > >
> > > > [...]
> > >
> > > Applied, thanks!
> >
> > I backported this patch to at least 6.6 and tested on our fleet of
> > software RAID 0 NVME SSD nodes. This change worked very nicely
> > for us. We're interested in backporting this to at least 6.6.
> >
> > I tried looking at xfstests, and didn't really see a good match
> > (user error?) to validate the fix via that. So I'm a little unclear what
> > the path forward here is.
> >
> > Although we experienced this issue in 6.1, I didn't backport to 6.1 and
> > test to verify this also works there, however, setting stripe to 0 did in
> > the 6.1 case.
> >
> > Best,
> > Fred
>
> Hi Fred,
>
> If I understand correctly, you are looking for a test case which you
> could use to confirm if the issue exists and if the backport is solving
> it, right?

Not quite. I made an assumption that having a test was a requirement
for backporting the patch. I know some other file systems prefer a few
loops of kdevops to backport patches, and was curious if that's a similar
flow for ext4. I only backported the patch to 6.6 and ensured that our
affected nodes perform as expected with it.

>
> Actually, I was never able to replicate this at my end so I had to rely
> on people hitting the bug to confirm if it works. I did set out to write
> a testcase that could help us reliably replicate this issue but it needs
> a very specially crafted FS that is a bit difficult to achieve from user
> space. I was using debugfs to create an FS that could hit it but I kept
> running into issues where it won't mount etc. Maybe there's a better
> way to craft such an FS that I'm not aware of.
>
> One more option is that maybe we can have KUnit test for this in the
> mballoc code but I'd need to read some more about the kunit
> infrastructure to see if it's possible/feasible.
>

I think kunit is an interesting idea. One thing to keep in mind is that
mocking is going to be the real problem with that approach. And with
more mocking may mean more brittle tests.

> Regards,
> ojaswin
> >
> > >
> > > [1/1] ext4: fallback to complex scan if aligned scan doesn't work
> > > commit: a26b6faf7f1c9c1ba6edb3fea9d1390201f2ed50
> > >
> > > Best regards,
> > > --
> > > Theodore Ts'o <[email protected]>