2023-12-22 20:54:45

by Carlos Carvalho

[permalink] [raw]
Subject: parity raid and ext4 get stuck in writes

This is finally a summary of a long standing problem. When lots of writes to
many files are sent in a short time the kernel gets stuck and stops sending
write requests to the disks. Sometimes it recovers and finally sends the
modified pages to permanent storage, sometimes not and eventually other
functions degrade and the machine crashes.

A simple way to reproduce: expand a kernel source tree, like
xzcat linux-6.5.tar.xz | tar x -f -

With the default vm settings for dirty_background_ratio and dirty_ratio this
will finish quickly with ~1.5GB of dirty pages in ram and ~100k inodes to be
written and the kernel gets stuck.

The bug exists in all 6.* kernels; I've tested the latest release of all
6.[1-6]. However some conditions must exist for the problem to appear:

- there must be many inodes to be flushed; just many bytes in a few files don't
show the problem
- it happens only with ext4 on a parity raid array

I've moved one of our arrays to xfs and everything works fine, so it's either
specific to ext4 or xfs is not affected. When the lockup happens the flush
kworker starts using 100% cpu permanently. I have not observed the bug in
raid10, only in raid[56].

The problem is more easily triggered with 6.[56] but 6.1 is also affected.

Limiting dirty_bytes and dirty_background_bytes to low values reduce the
probability of lockup, probably because the process generating writes is
stopped before too many files are created.


2023-12-22 23:06:27

by Eyal Lebedinsky

[permalink] [raw]
Subject: Re: parity raid and ext4 get stuck in writes

On 23/12/23 07:48, Carlos Carvalho wrote:
> This is finally a summary of a long standing problem. When lots of writes to
> many files are sent in a short time the kernel gets stuck and stops sending
> write requests to the disks. Sometimes it recovers and finally sends the
> modified pages to permanent storage, sometimes not and eventually other
> functions degrade and the machine crashes.
>
> A simple way to reproduce: expand a kernel source tree, like
> xzcat linux-6.5.tar.xz | tar x -f -
>
> With the default vm settings for dirty_background_ratio and dirty_ratio this
> will finish quickly with ~1.5GB of dirty pages in ram and ~100k inodes to be
> written and the kernel gets stuck.
>
> The bug exists in all 6.* kernels; I've tested the latest release of all
> 6.[1-6]. However some conditions must exist for the problem to appear:
>
> - there must be many inodes to be flushed; just many bytes in a few files don't
> show the problem
> - it happens only with ext4 on a parity raid array

This may be unrelated but there is an open problem that looks somewhat similar.
It is tracked at
https://bugzilla.kernel.org/show_bug.cgi?id=217965

If your fs is mounted with a non-zero 'stripe=' (as RAID arrays usually are),
try to get around the issue with
$ sudo mount -o remount,stripe=0 YourFS
If it makes a difference then you may be looking at a similar issue.

> I've moved one of our arrays to xfs and everything works fine, so it's either
> specific to ext4 or xfs is not affected. When the lockup happens the flush
> kworker starts using 100% cpu permanently. I have not observed the bug in
> raid10, only in raid[56].
>
> The problem is more easily triggered with 6.[56] but 6.1 is also affected.

The issue was seen in kernels 6.5 and later but not in 6.4, so maybe not the same thing.

> Limiting dirty_bytes and dirty_background_bytes to low values reduce the
> probability of lockup, probably because the process generating writes is
> stopped before too many files are created.

HTH

--
Eyal at Home ([email protected])


2023-12-25 07:39:39

by Daniel Dawson

[permalink] [raw]
Subject: Re: parity raid and ext4 get stuck in writes

On 12/22/23 12:48 PM, Carlos Carvalho wrote:
> This is finally a summary of a long standing problem. When lots of writes to
> many files are sent in a short time the kernel gets stuck and stops sending
> write requests to the disks. Sometimes it recovers and finally sends the
> modified pages to permanent storage, sometimes not and eventually other
> functions degrade and the machine crashes.
>
> A simple way to reproduce: expand a kernel source tree, like
> xzcat linux-6.5.tar.xz | tar x -f -
This sounds almost exactly like a problem I was having, right down to
triggering it by writing the files of a kernel tree, though the details
in my case are slightly different. I wanted to report it, but wanted to
get a better handle on it and never managed it, and now I've changed my
setup such that it doesn't happen anymore.
> - it happens only with ext4 on a parity raid array

This is where it differs for me. I experienced it only with btrfs. But I
had two arrays with it, one on SSDs and one on HDDs. The HDD array
exhibited the problem almost exclusively (the SSDs, I think, exhibited
it once in several months, while the HDDs did pretty much every time I
tried to compile a new kernel (until I started working around it), and
even from some other things, which was a couple of times a week). I
imagine because HDDs much slower and therefore allow more data to get
cached.

Now that I've switched the HDD array to ext4, I haven't experienced the
issue even once. But the setup has better performance, so maybe it's
just because it flushes its writes faster.

--
PGP fingerprint: 5BBD5080FEB0EF7F142F8173D572B791F7B4422A


2024-01-04 06:08:33

by Ojaswin Mujoo

[permalink] [raw]
Subject: Re: parity raid and ext4 get stuck in writes

On Fri, Dec 22, 2023 at 05:48:01PM -0300, Carlos Carvalho wrote:
> This is finally a summary of a long standing problem. When lots of writes to
> many files are sent in a short time the kernel gets stuck and stops sending
> write requests to the disks. Sometimes it recovers and finally sends the
> modified pages to permanent storage, sometimes not and eventually other
> functions degrade and the machine crashes.
>
> A simple way to reproduce: expand a kernel source tree, like
> xzcat linux-6.5.tar.xz | tar x -f -
>
> With the default vm settings for dirty_background_ratio and dirty_ratio this
> will finish quickly with ~1.5GB of dirty pages in ram and ~100k inodes to be
> written and the kernel gets stuck.
>
> The bug exists in all 6.* kernels; I've tested the latest release of all
> 6.[1-6]. However some conditions must exist for the problem to appear:
>
> - there must be many inodes to be flushed; just many bytes in a few files don't
> show the problem
> - it happens only with ext4 on a parity raid array
>
> I've moved one of our arrays to xfs and everything works fine, so it's either
> specific to ext4 or xfs is not affected. When the lockup happens the flush
> kworker starts using 100% cpu permanently. I have not observed the bug in
> raid10, only in raid[56].
>
> The problem is more easily triggered with 6.[56] but 6.1 is also affected.
>
> Limiting dirty_bytes and dirty_background_bytes to low values reduce the
> probability of lockup, probably because the process generating writes is
> stopped before too many files are created.

Hey Carlos,

Thanks for sharing this. So as per your comment on the kernel bugzilla,
it seems like the issue gets fixed for you with stripe=0 as well, so it
might actually be the same issue. However, most of the people there are
not able to replicate this in kernel before 6.5, so I'm interested in
your statement that you see this in 6.1 as well.

Would it possible to replicate this on 6.1 or any pre 6.5 kernel with
some perf probes and share the report? I've added the steps to add the
probes in pre 6.4 kernel here [1] (although it should hopefully work
with 6.1 - 6.3 as well, since I don't think there'll be much change in
the functions probed there). The probe would be helpful to confirm if
the issue we see ion 6.5+ kernels and the one you are seeing in 6.1 is
the same.

Thanks,
ojaswin

[1] https://bugzilla.kernel.org/show_bug.cgi?id=217965#c36

2024-01-04 06:11:34

by Ojaswin Mujoo

[permalink] [raw]
Subject: Re: parity raid and ext4 get stuck in writes

On Sun, Dec 24, 2023 at 11:39:05PM -0800, Daniel Dawson wrote:
> On 12/22/23 12:48 PM, Carlos Carvalho wrote:
> > This is finally a summary of a long standing problem. When lots of writes to
> > many files are sent in a short time the kernel gets stuck and stops sending
> > write requests to the disks. Sometimes it recovers and finally sends the
> > modified pages to permanent storage, sometimes not and eventually other
> > functions degrade and the machine crashes.
> >
> > A simple way to reproduce: expand a kernel source tree, like
> > xzcat linux-6.5.tar.xz | tar x -f -
> This sounds almost exactly like a problem I was having, right down to
> triggering it by writing the files of a kernel tree, though the details in
> my case are slightly different. I wanted to report it, but wanted to get a
> better handle on it and never managed it, and now I've changed my setup such
> that it doesn't happen anymore.
> > - it happens only with ext4 on a parity raid array
>
> This is where it differs for me. I experienced it only with btrfs. But I had

Hi Daniel,

So I think there are some other people noticing something similar on
btrfs as well [1]. Maybe this is related to the issue you are noticing
although they have not mentioned anything about raid in btrfs.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2242391

Regards,
ojaswin
> two arrays with it, one on SSDs and one on HDDs. The HDD array exhibited the
> problem almost exclusively (the SSDs, I think, exhibited it once in several
> months, while the HDDs did pretty much every time I tried to compile a new
> kernel (until I started working around it), and even from some other things,
> which was a couple of times a week). I imagine because HDDs much slower and
> therefore allow more data to get cached.
>
> Now that I've switched the HDD array to ext4, I haven't experienced the
> issue even once. But the setup has better performance, so maybe it's just
> because it flushes its writes faster.
>
> --
> PGP fingerprint: 5BBD5080FEB0EF7F142F8173D572B791F7B4422A
>