LinuxLists.cc - Fwd: Memory corruption in multithreaded user space program while calling fork

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Mon, Jul 3, 2023 at 2:53 AM Linux regression tracking (Thorsten
Leemhuis) <[email protected]> wrote:
>
> On 02.07.23 14:27, Bagas Sanjaya wrote:
> > I notice a regression report on Bugzilla [1]. Quoting from it:
> >
> >> After upgrading to kernel version 6.4.0 from 6.3.9, I noticed frequent but random crashes in a user space program. After a lot of reduction, I have come up with the following reproducer program:
> > [...]
> >> After tuning the various parameters for my computer, exit code 2, which indicates that memory corruption was detected, occurs approximately 99% of the time. Exit code 1, which occurs approximately 1% of the time, means it ran out of statically-allocated memory before reproducing the issue, and increasing the memory usage any more only leads to diminishing returns. There is also something like a 0.1% chance that it segfaults due to memory corruption elsewhere than in the statically-allocated buffer.
> >>
> >> With this reproducer in hand, I was able to perform the following bisection:
> > [...]
> >
> > See Bugzilla for the full thread.
>
> Additional details from
> https://bugzilla.kernel.org/show_bug.cgi?id=217624#c5 :
>
> ```
> I can confirm that v6.4 with 0bff0aaea03e2a3ed6bfa302155cca8a432a1829
> reverted no longer causes any memory corruption with either my
> reproducer or the original program.
> ```
>
> FWIW: 0bff0aaea03 ("x86/mm: try VMA lock-based page fault handling
> first") [merged for v6.4-rc1, authored by Suren Baghdasaryan [already CCed]]
>
> That's the same commit that causes build problems with go:
>
> https://lore.kernel.org/all/[email protected]/

Thanks! I'll investigate this later today. After discussing with
Andrew, we would like to disable CONFIG_PER_VMA_LOCK by default until
the issue is fixed. I'll post a patch shortly.

>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> #regzbot introduced: 0bff0aaea03e2a3

2023-07-03 18:43:21

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Mon, Jul 3, 2023 at 11:08 AM Suren Baghdasaryan <[email protected]> wrote:
>
> On Mon, Jul 3, 2023 at 2:53 AM Linux regression tracking (Thorsten
> Leemhuis) <[email protected]> wrote:
> >
> > On 02.07.23 14:27, Bagas Sanjaya wrote:
> > > I notice a regression report on Bugzilla [1]. Quoting from it:
> > >
> > >> After upgrading to kernel version 6.4.0 from 6.3.9, I noticed frequent but random crashes in a user space program. After a lot of reduction, I have come up with the following reproducer program:
> > > [...]
> > >> After tuning the various parameters for my computer, exit code 2, which indicates that memory corruption was detected, occurs approximately 99% of the time. Exit code 1, which occurs approximately 1% of the time, means it ran out of statically-allocated memory before reproducing the issue, and increasing the memory usage any more only leads to diminishing returns. There is also something like a 0.1% chance that it segfaults due to memory corruption elsewhere than in the statically-allocated buffer.
> > >>
> > >> With this reproducer in hand, I was able to perform the following bisection:
> > > [...]
> > >
> > > See Bugzilla for the full thread.
> >
> > Additional details from
> > https://bugzilla.kernel.org/show_bug.cgi?id=217624#c5 :
> >
> > ```
> > I can confirm that v6.4 with 0bff0aaea03e2a3ed6bfa302155cca8a432a1829
> > reverted no longer causes any memory corruption with either my
> > reproducer or the original program.
> > ```
> >
> > FWIW: 0bff0aaea03 ("x86/mm: try VMA lock-based page fault handling
> > first") [merged for v6.4-rc1, authored by Suren Baghdasaryan [already CCed]]
> >
> > That's the same commit that causes build problems with go:
> >
> > https://lore.kernel.org/all/[email protected]/
>
> Thanks! I'll investigate this later today. After discussing with
> Andrew, we would like to disable CONFIG_PER_VMA_LOCK by default until
> the issue is fixed. I'll post a patch shortly.

Posted at: https://lore.kernel.org/all/[email protected]/

>
> >
> > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> > --
> > Everything you wanna know about Linux kernel regression tracking:
> > https://linux-regtracking.leemhuis.info/about/#tldr
> > If I did something stupid, please tell me, as explained on that page.
> >
> > #regzbot introduced: 0bff0aaea03e2a3

2023-07-03 18:57:58

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Mon, Jul 03, 2023 at 11:27:19AM -0700, Suren Baghdasaryan wrote:
> On Mon, Jul 3, 2023 at 11:08 AM Suren Baghdasaryan <[email protected]> wrote:
> >
> > On Mon, Jul 3, 2023 at 2:53 AM Linux regression tracking (Thorsten
> > Leemhuis) <[email protected]> wrote:
> > >
> > > On 02.07.23 14:27, Bagas Sanjaya wrote:
> > > > I notice a regression report on Bugzilla [1]. Quoting from it:
> > > >
> > > >> After upgrading to kernel version 6.4.0 from 6.3.9, I noticed frequent but random crashes in a user space program. After a lot of reduction, I have come up with the following reproducer program:
> > > > [...]
> > > >> After tuning the various parameters for my computer, exit code 2, which indicates that memory corruption was detected, occurs approximately 99% of the time. Exit code 1, which occurs approximately 1% of the time, means it ran out of statically-allocated memory before reproducing the issue, and increasing the memory usage any more only leads to diminishing returns. There is also something like a 0.1% chance that it segfaults due to memory corruption elsewhere than in the statically-allocated buffer.
> > > >>
> > > >> With this reproducer in hand, I was able to perform the following bisection:
> > > > [...]
> > > >
> > > > See Bugzilla for the full thread.
> > >
> > > Additional details from
> > > https://bugzilla.kernel.org/show_bug.cgi?id=217624#c5 :
> > >
> > > ```
> > > I can confirm that v6.4 with 0bff0aaea03e2a3ed6bfa302155cca8a432a1829
> > > reverted no longer causes any memory corruption with either my
> > > reproducer or the original program.
> > > ```
> > >
> > > FWIW: 0bff0aaea03 ("x86/mm: try VMA lock-based page fault handling
> > > first") [merged for v6.4-rc1, authored by Suren Baghdasaryan [already CCed]]
> > >
> > > That's the same commit that causes build problems with go:
> > >
> > > https://lore.kernel.org/all/[email protected]/
> >
> > Thanks! I'll investigate this later today. After discussing with
> > Andrew, we would like to disable CONFIG_PER_VMA_LOCK by default until
> > the issue is fixed. I'll post a patch shortly.
>
> Posted at: https://lore.kernel.org/all/[email protected]/

As that change fixes something in 6.4, why not cc: stable on it as well?

thanks,

greg k-h

2023-07-04 07:59:09

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Mon, Jul 3, 2023 at 11:44 AM Greg KH <[email protected]> wrote:
>
> On Mon, Jul 03, 2023 at 11:27:19AM -0700, Suren Baghdasaryan wrote:
> > On Mon, Jul 3, 2023 at 11:08 AM Suren Baghdasaryan <[email protected]> wrote:
> > >
> > > On Mon, Jul 3, 2023 at 2:53 AM Linux regression tracking (Thorsten
> > > Leemhuis) <[email protected]> wrote:
> > > >
> > > > On 02.07.23 14:27, Bagas Sanjaya wrote:
> > > > > I notice a regression report on Bugzilla [1]. Quoting from it:
> > > > >
> > > > >> After upgrading to kernel version 6.4.0 from 6.3.9, I noticed frequent but random crashes in a user space program. After a lot of reduction, I have come up with the following reproducer program:
> > > > > [...]
> > > > >> After tuning the various parameters for my computer, exit code 2, which indicates that memory corruption was detected, occurs approximately 99% of the time. Exit code 1, which occurs approximately 1% of the time, means it ran out of statically-allocated memory before reproducing the issue, and increasing the memory usage any more only leads to diminishing returns. There is also something like a 0.1% chance that it segfaults due to memory corruption elsewhere than in the statically-allocated buffer.
> > > > >>
> > > > >> With this reproducer in hand, I was able to perform the following bisection:
> > > > > [...]
> > > > >
> > > > > See Bugzilla for the full thread.
> > > >
> > > > Additional details from
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=217624#c5 :
> > > >
> > > > ```
> > > > I can confirm that v6.4 with 0bff0aaea03e2a3ed6bfa302155cca8a432a1829
> > > > reverted no longer causes any memory corruption with either my
> > > > reproducer or the original program.
> > > > ```
> > > >
> > > > FWIW: 0bff0aaea03 ("x86/mm: try VMA lock-based page fault handling
> > > > first") [merged for v6.4-rc1, authored by Suren Baghdasaryan [already CCed]]
> > > >
> > > > That's the same commit that causes build problems with go:
> > > >
> > > > https://lore.kernel.org/all/[email protected]/
> > >
> > > Thanks! I'll investigate this later today. After discussing with
> > > Andrew, we would like to disable CONFIG_PER_VMA_LOCK by default until
> > > the issue is fixed. I'll post a patch shortly.
> >
> > Posted at: https://lore.kernel.org/all/[email protected]/
>
> As that change fixes something in 6.4, why not cc: stable on it as well?

Sorry, I thought since per-VMA locks were introduced in 6.4 and this
patch is fixing 6.4 I didn't need to send it to stable for older
versions. Did I miss something?
Thanks,
Suren.

>
> thanks,
>
> greg k-h

2023-07-04 08:27:54

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Tue, Jul 04, 2023 at 12:45:39AM -0700, Suren Baghdasaryan wrote:
> On Mon, Jul 3, 2023 at 11:44 AM Greg KH <[email protected]> wrote:
> >
> > On Mon, Jul 03, 2023 at 11:27:19AM -0700, Suren Baghdasaryan wrote:
> > > On Mon, Jul 3, 2023 at 11:08 AM Suren Baghdasaryan <[email protected]> wrote:
> > > >
> > > > On Mon, Jul 3, 2023 at 2:53 AM Linux regression tracking (Thorsten
> > > > Leemhuis) <[email protected]> wrote:
> > > > >
> > > > > On 02.07.23 14:27, Bagas Sanjaya wrote:
> > > > > > I notice a regression report on Bugzilla [1]. Quoting from it:
> > > > > >
> > > > > >> After upgrading to kernel version 6.4.0 from 6.3.9, I noticed frequent but random crashes in a user space program. After a lot of reduction, I have come up with the following reproducer program:
> > > > > > [...]
> > > > > >> After tuning the various parameters for my computer, exit code 2, which indicates that memory corruption was detected, occurs approximately 99% of the time. Exit code 1, which occurs approximately 1% of the time, means it ran out of statically-allocated memory before reproducing the issue, and increasing the memory usage any more only leads to diminishing returns. There is also something like a 0.1% chance that it segfaults due to memory corruption elsewhere than in the statically-allocated buffer.
> > > > > >>
> > > > > >> With this reproducer in hand, I was able to perform the following bisection:
> > > > > > [...]
> > > > > >
> > > > > > See Bugzilla for the full thread.
> > > > >
> > > > > Additional details from
> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=217624#c5 :
> > > > >
> > > > > ```
> > > > > I can confirm that v6.4 with 0bff0aaea03e2a3ed6bfa302155cca8a432a1829
> > > > > reverted no longer causes any memory corruption with either my
> > > > > reproducer or the original program.
> > > > > ```
> > > > >
> > > > > FWIW: 0bff0aaea03 ("x86/mm: try VMA lock-based page fault handling
> > > > > first") [merged for v6.4-rc1, authored by Suren Baghdasaryan [already CCed]]
> > > > >
> > > > > That's the same commit that causes build problems with go:
> > > > >
> > > > > https://lore.kernel.org/all/[email protected]/
> > > >
> > > > Thanks! I'll investigate this later today. After discussing with
> > > > Andrew, we would like to disable CONFIG_PER_VMA_LOCK by default until
> > > > the issue is fixed. I'll post a patch shortly.
> > >
> > > Posted at: https://lore.kernel.org/all/[email protected]/
> >
> > As that change fixes something in 6.4, why not cc: stable on it as well?
>
> Sorry, I thought since per-VMA locks were introduced in 6.4 and this
> patch is fixing 6.4 I didn't need to send it to stable for older
> versions. Did I miss something?

6.4.y is a stable kernel tree right now, so yes, it needs to be included
there :)

2023-07-04 16:38:05

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Tue, 4 Jul 2023 09:00:19 +0100 Greg KH <[email protected]> wrote:

> > > > > Thanks! I'll investigate this later today. After discussing with
> > > > > Andrew, we would like to disable CONFIG_PER_VMA_LOCK by default until
> > > > > the issue is fixed. I'll post a patch shortly.
> > > >
> > > > Posted at: https://lore.kernel.org/all/[email protected]/
> > >
> > > As that change fixes something in 6.4, why not cc: stable on it as well?
> >
> > Sorry, I thought since per-VMA locks were introduced in 6.4 and this
> > patch is fixing 6.4 I didn't need to send it to stable for older
> > versions. Did I miss something?
>
> 6.4.y is a stable kernel tree right now, so yes, it needs to be included
> there :)

I'm in wait-a-few-days-mode on this. To see if we have a backportable
fix rather than disabling the feature in -stable.

2023-07-04 20:45:40

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Tue, Jul 4, 2023 at 9:18 AM Andrew Morton <[email protected]> wrote:
>
> On Tue, 4 Jul 2023 09:00:19 +0100 Greg KH <[email protected]> wrote:
>
> > > > > > Thanks! I'll investigate this later today. After discussing with
> > > > > > Andrew, we would like to disable CONFIG_PER_VMA_LOCK by default until
> > > > > > the issue is fixed. I'll post a patch shortly.
> > > > >
> > > > > Posted at: https://lore.kernel.org/all/[email protected]/
> > > >
> > > > As that change fixes something in 6.4, why not cc: stable on it as well?
> > >
> > > Sorry, I thought since per-VMA locks were introduced in 6.4 and this
> > > patch is fixing 6.4 I didn't need to send it to stable for older
> > > versions. Did I miss something?
> >
> > 6.4.y is a stable kernel tree right now, so yes, it needs to be included
> > there :)
>
> I'm in wait-a-few-days-mode on this. To see if we have a backportable
> fix rather than disabling the feature in -stable.

Ok, I think we have a fix posted at [2] and it's cleanly applies to
6.4.y stable branch as well. However fork() performance might slightly
regress, therefore disabling per-VMA locks by default for now seems to
be preferable even with this fix (see discussion at
https://lore.kernel.org/all/[email protected]/).
IOW, both [1] and [2] should be applied to 6.4.y stable. Both apply
cleanly and I CC'ed stable on [2]. Greg, should I send [1] separately
to stable@vger?

[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/

>

2023-07-04 21:52:17

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Tue, 4 Jul 2023 13:22:54 -0700 Suren Baghdasaryan <[email protected]> wrote:

> On Tue, Jul 4, 2023 at 9:18 AM Andrew Morton <[email protected]> wrote:
> >
> > On Tue, 4 Jul 2023 09:00:19 +0100 Greg KH <[email protected]> wrote:
> >
> > > > > > > Thanks! I'll investigate this later today. After discussing with
> > > > > > > Andrew, we would like to disable CONFIG_PER_VMA_LOCK by default until
> > > > > > > the issue is fixed. I'll post a patch shortly.
> > > > > >
> > > > > > Posted at: https://lore.kernel.org/all/[email protected]/
> > > > >
> > > > > As that change fixes something in 6.4, why not cc: stable on it as well?
> > > >
> > > > Sorry, I thought since per-VMA locks were introduced in 6.4 and this
> > > > patch is fixing 6.4 I didn't need to send it to stable for older
> > > > versions. Did I miss something?
> > >
> > > 6.4.y is a stable kernel tree right now, so yes, it needs to be included
> > > there :)
> >
> > I'm in wait-a-few-days-mode on this. To see if we have a backportable
> > fix rather than disabling the feature in -stable.
>
> Ok, I think we have a fix posted at [2] and it's cleanly applies to
> 6.4.y stable branch as well. However fork() performance might slightly
> regress, therefore disabling per-VMA locks by default for now seems to
> be preferable even with this fix (see discussion at
> https://lore.kernel.org/all/[email protected]/).
> IOW, both [1] and [2] should be applied to 6.4.y stable. Both apply
> cleanly and I CC'ed stable on [2]. Greg, should I send [1] separately
> to stable@vger?
>
> [1] https://lore.kernel.org/all/[email protected]/

This one isn't sufficient for .configs which already have
PER_VMA_LOCK=y. Using `depends on BROKEN' would be better.

> [2] https://lore.kernel.org/all/[email protected]/
>

We're still awaiting tester input on this?

I think a clean new fully-changelogged two-patch series would be the
best way to handle this. Please ensure that the [0/2] intro clearly
explains what we're proposing here, and why.

Also, "might slightly regress" is a bit weak. These things are
measurable, no? Because a better solution would be to fix 6.4.x and
mainline and leave it at that.

2023-07-04 22:23:52

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Tue, Jul 4, 2023 at 2:28 PM Andrew Morton <[email protected]> wrote:
>
> On Tue, 4 Jul 2023 13:22:54 -0700 Suren Baghdasaryan <[email protected]> wrote:
>
> > On Tue, Jul 4, 2023 at 9:18 AM Andrew Morton <[email protected]> wrote:
> > >
> > > On Tue, 4 Jul 2023 09:00:19 +0100 Greg KH <[email protected]> wrote:
> > >
> > > > > > > > Thanks! I'll investigate this later today. After discussing with
> > > > > > > > Andrew, we would like to disable CONFIG_PER_VMA_LOCK by default until
> > > > > > > > the issue is fixed. I'll post a patch shortly.
> > > > > > >
> > > > > > > Posted at: https://lore.kernel.org/all/[email protected]/
> > > > > >
> > > > > > As that change fixes something in 6.4, why not cc: stable on it as well?
> > > > >
> > > > > Sorry, I thought since per-VMA locks were introduced in 6.4 and this
> > > > > patch is fixing 6.4 I didn't need to send it to stable for older
> > > > > versions. Did I miss something?
> > > >
> > > > 6.4.y is a stable kernel tree right now, so yes, it needs to be included
> > > > there :)
> > >
> > > I'm in wait-a-few-days-mode on this. To see if we have a backportable
> > > fix rather than disabling the feature in -stable.
> >
> > Ok, I think we have a fix posted at [2] and it's cleanly applies to
> > 6.4.y stable branch as well. However fork() performance might slightly
> > regress, therefore disabling per-VMA locks by default for now seems to
> > be preferable even with this fix (see discussion at
> > https://lore.kernel.org/all/[email protected]/).
> > IOW, both [1] and [2] should be applied to 6.4.y stable. Both apply
> > cleanly and I CC'ed stable on [2]. Greg, should I send [1] separately
> > to stable@vger?
> >
> > [1] https://lore.kernel.org/all/[email protected]/
>
> This one isn't sufficient for .configs which already have
> PER_VMA_LOCK=y. Using `depends on BROKEN' would be better.
>
> > [2] https://lore.kernel.org/all/[email protected]/
> >
>
> We're still awaiting tester input on this?

Yeah, and it seems to be negative... Anyway, I'll post a dependency on BROKEN.

>
> I think a clean new fully-changelogged two-patch series would be the
> best way to handle this. Please ensure that the [0/2] intro clearly
> explains what we're proposing here, and why.
>
> Also, "might slightly regress" is a bit weak. These things are
> measurable, no? Because a better solution would be to fix 6.4.x and
> mainline and leave it at that.
>

2023-07-05 07:04:53

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Tue, Jul 4, 2023 at 3:04 PM Suren Baghdasaryan <[email protected]> wrote:
>
> On Tue, Jul 4, 2023 at 2:28 PM Andrew Morton <[email protected]> wrote:
> >
> > On Tue, 4 Jul 2023 13:22:54 -0700 Suren Baghdasaryan <[email protected]> wrote:
> >
> > > On Tue, Jul 4, 2023 at 9:18 AM Andrew Morton <[email protected]> wrote:
> > > >
> > > > On Tue, 4 Jul 2023 09:00:19 +0100 Greg KH <[email protected]> wrote:
> > > >
> > > > > > > > > Thanks! I'll investigate this later today. After discussing with
> > > > > > > > > Andrew, we would like to disable CONFIG_PER_VMA_LOCK by default until
> > > > > > > > > the issue is fixed. I'll post a patch shortly.
> > > > > > > >
> > > > > > > > Posted at: https://lore.kernel.org/all/[email protected]/
> > > > > > >
> > > > > > > As that change fixes something in 6.4, why not cc: stable on it as well?
> > > > > >
> > > > > > Sorry, I thought since per-VMA locks were introduced in 6.4 and this
> > > > > > patch is fixing 6.4 I didn't need to send it to stable for older
> > > > > > versions. Did I miss something?
> > > > >
> > > > > 6.4.y is a stable kernel tree right now, so yes, it needs to be included
> > > > > there :)
> > > >
> > > > I'm in wait-a-few-days-mode on this. To see if we have a backportable
> > > > fix rather than disabling the feature in -stable.
> > >
> > > Ok, I think we have a fix posted at [2] and it's cleanly applies to
> > > 6.4.y stable branch as well. However fork() performance might slightly
> > > regress, therefore disabling per-VMA locks by default for now seems to
> > > be preferable even with this fix (see discussion at
> > > https://lore.kernel.org/all/[email protected]/).
> > > IOW, both [1] and [2] should be applied to 6.4.y stable. Both apply
> > > cleanly and I CC'ed stable on [2]. Greg, should I send [1] separately
> > > to stable@vger?
> > >
> > > [1] https://lore.kernel.org/all/[email protected]/
> >
> > This one isn't sufficient for .configs which already have
> > PER_VMA_LOCK=y. Using `depends on BROKEN' would be better.
> >
> > > [2] https://lore.kernel.org/all/[email protected]/
> > >
> >
> > We're still awaiting tester input on this?
>
> Yeah, and it seems to be negative... Anyway, I'll post a dependency on BROKEN.

I posted the patchset at
https://lore.kernel.org/all/[email protected]/
CC'ing stable@vger with the cover letter explaining the situation. The
negative report might have been a fluke, so let's wait for more
testing. In the meantime we can disable the feature by applying the
last patch in that series.

>
> >
> > I think a clean new fully-changelogged two-patch series would be the
> > best way to handle this. Please ensure that the [0/2] intro clearly
> > explains what we're proposing here, and why.

Done.

> >
> > Also, "might slightly regress" is a bit weak. These things are
> > measurable, no? Because a better solution would be to fix 6.4.x and
> > mainline and leave it at that.

They are measurable and they were included in the fix I posted. I
added the numbers in the new cover letter as well.
Thanks,
Suren.

> >

2023-07-05 07:32:42

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Tue, Jul 04, 2023 at 01:22:54PM -0700, Suren Baghdasaryan wrote:
> On Tue, Jul 4, 2023 at 9:18 AM Andrew Morton <[email protected]> wrote:
> >
> > On Tue, 4 Jul 2023 09:00:19 +0100 Greg KH <[email protected]> wrote:
> >
> > > > > > > Thanks! I'll investigate this later today. After discussing with
> > > > > > > Andrew, we would like to disable CONFIG_PER_VMA_LOCK by default until
> > > > > > > the issue is fixed. I'll post a patch shortly.
> > > > > >
> > > > > > Posted at: https://lore.kernel.org/all/[email protected]/
> > > > >
> > > > > As that change fixes something in 6.4, why not cc: stable on it as well?
> > > >
> > > > Sorry, I thought since per-VMA locks were introduced in 6.4 and this
> > > > patch is fixing 6.4 I didn't need to send it to stable for older
> > > > versions. Did I miss something?
> > >
> > > 6.4.y is a stable kernel tree right now, so yes, it needs to be included
> > > there :)
> >
> > I'm in wait-a-few-days-mode on this. To see if we have a backportable
> > fix rather than disabling the feature in -stable.
>
> Ok, I think we have a fix posted at [2] and it's cleanly applies to
> 6.4.y stable branch as well. However fork() performance might slightly
> regress, therefore disabling per-VMA locks by default for now seems to
> be preferable even with this fix (see discussion at
> https://lore.kernel.org/all/[email protected]/).
> IOW, both [1] and [2] should be applied to 6.4.y stable. Both apply
> cleanly and I CC'ed stable on [2]. Greg, should I send [1] separately
> to stable@vger?

We can't do anything for stable until it lands in Linus's tree, so if
you didn't happen to have the stable@ tag in the patch, just email us
the git SHA1 and I can pick it up that way.

thanks,

greg k-h

2023-07-05 08:58:38

by Thorsten Leemhuis

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On 05.07.23 09:08, Greg KH wrote:
> On Tue, Jul 04, 2023 at 01:22:54PM -0700, Suren Baghdasaryan wrote:
>> On Tue, Jul 4, 2023 at 9:18 AM Andrew Morton <[email protected]> wrote:
>>> On Tue, 4 Jul 2023 09:00:19 +0100 Greg KH <[email protected]> wrote:
>>>>>>>> Thanks! I'll investigate this later today. After discussing with
>>>>>>>> Andrew, we would like to disable CONFIG_PER_VMA_LOCK by default until
>>>>>>>> the issue is fixed. I'll post a patch shortly.
>>>>>>>
>>>>>>> Posted at: https://lore.kernel.org/all/[email protected]/
>>>>>>
>>>>>> As that change fixes something in 6.4, why not cc: stable on it as well?
>>>>>
>>>>> Sorry, I thought since per-VMA locks were introduced in 6.4 and this
>>>>> patch is fixing 6.4 I didn't need to send it to stable for older
>>>>> versions. Did I miss something?
>>>>
>>>> 6.4.y is a stable kernel tree right now, so yes, it needs to be included
>>>> there :)
>>>
>>> I'm in wait-a-few-days-mode on this. To see if we have a backportable
>>> fix rather than disabling the feature in -stable.

Andrew, how long will you remain in "wait-a-few-days-mode"? Given what
Greg said below and that we already had three reports I know of I'd
prefer if we could fix this rather sooner than later in mainline --
especially as Arch Linux and openSUSE Tumbleweed likely have switched to
6.4.y already or will do so soon.

>> Ok, I think we have a fix posted at [2] and it's cleanly applies to
>> 6.4.y stable branch as well. However fork() performance might slightly
>> regress, therefore disabling per-VMA locks by default for now seems to
>> be preferable even with this fix (see discussion at
>> https://lore.kernel.org/all/[email protected]/).
>> IOW, both [1] and [2] should be applied to 6.4.y stable. Both apply
>> cleanly and I CC'ed stable on [2]. Greg, should I send [1] separately
>> to stable@vger?
>
> We can't do anything for stable until it lands in Linus's tree, so if
> you didn't happen to have the stable@ tag in the patch, just email us
> the git SHA1 and I can pick it up that way.
>
> thanks,
>
> greg k-h

Ciao, Thorsten

2023-07-05 09:36:24

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Wed, Jul 05, 2023 at 10:51:57AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 05.07.23 09:08, Greg KH wrote:
> > On Tue, Jul 04, 2023 at 01:22:54PM -0700, Suren Baghdasaryan wrote:
> >> On Tue, Jul 4, 2023 at 9:18 AM Andrew Morton <[email protected]> wrote:
> >>> On Tue, 4 Jul 2023 09:00:19 +0100 Greg KH <[email protected]> wrote:
> >>>>>>>> Thanks! I'll investigate this later today. After discussing with
> >>>>>>>> Andrew, we would like to disable CONFIG_PER_VMA_LOCK by default until
> >>>>>>>> the issue is fixed. I'll post a patch shortly.
> >>>>>>>
> >>>>>>> Posted at: https://lore.kernel.org/all/[email protected]/
> >>>>>>
> >>>>>> As that change fixes something in 6.4, why not cc: stable on it as well?
> >>>>>
> >>>>> Sorry, I thought since per-VMA locks were introduced in 6.4 and this
> >>>>> patch is fixing 6.4 I didn't need to send it to stable for older
> >>>>> versions. Did I miss something?
> >>>>
> >>>> 6.4.y is a stable kernel tree right now, so yes, it needs to be included
> >>>> there :)
> >>>
> >>> I'm in wait-a-few-days-mode on this. To see if we have a backportable
> >>> fix rather than disabling the feature in -stable.
>
> Andrew, how long will you remain in "wait-a-few-days-mode"? Given what
> Greg said below and that we already had three reports I know of I'd
> prefer if we could fix this rather sooner than later in mainline --
> especially as Arch Linux and openSUSE Tumbleweed likely have switched to
> 6.4.y already or will do so soon.

Ick, yeah, and Fedora should be switching soon too, and I want to drop
support for 6.3.y "any day now". Is there just a revert we can do now
first to resolve the regression and then work on fixing this up "better"
for 6.6-rc1?

thanks,

greg k-h

2023-07-05 15:58:18

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Wed, 5 Jul 2023 10:51:57 +0200 "Linux regression tracking (Thorsten Leemhuis)" <[email protected]> wrote:

> >>> I'm in wait-a-few-days-mode on this. To see if we have a backportable
> >>> fix rather than disabling the feature in -stable.
>
> Andrew, how long will you remain in "wait-a-few-days-mode"? Given what
> Greg said below and that we already had three reports I know of I'd
> prefer if we could fix this rather sooner than later in mainline --
> especially as Arch Linux and openSUSE Tumbleweed likely have switched to
> 6.4.y already or will do so soon.

I'll send today's 2-patch series to Linus today or tomorrow.

2023-07-05 16:42:25

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Wed, Jul 5, 2023 at 8:49 AM Andrew Morton <[email protected]> wrote:
>
> On Wed, 5 Jul 2023 10:51:57 +0200 "Linux regression tracking (Thorsten Leemhuis)" <[email protected]> wrote:
>
> > >>> I'm in wait-a-few-days-mode on this. To see if we have a backportable
> > >>> fix rather than disabling the feature in -stable.
> >
> > Andrew, how long will you remain in "wait-a-few-days-mode"? Given what
> > Greg said below and that we already had three reports I know of I'd
> > prefer if we could fix this rather sooner than later in mainline --
> > especially as Arch Linux and openSUSE Tumbleweed likely have switched to
> > 6.4.y already or will do so soon.
>
> I'll send today's 2-patch series to Linus today or tomorrow.

I need to make a correction to the patch fixing the issue (the first
one in the series) and will post it within an hour. Thanks!

2023-07-05 17:26:26

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Wed, Jul 5, 2023 at 9:14 AM Suren Baghdasaryan <[email protected]> wrote:
>
> On Wed, Jul 5, 2023 at 8:49 AM Andrew Morton <[email protected]> wrote:
> >
> > On Wed, 5 Jul 2023 10:51:57 +0200 "Linux regression tracking (Thorsten Leemhuis)" <[email protected]> wrote:
> >
> > > >>> I'm in wait-a-few-days-mode on this. To see if we have a backportable
> > > >>> fix rather than disabling the feature in -stable.
> > >
> > > Andrew, how long will you remain in "wait-a-few-days-mode"? Given what
> > > Greg said below and that we already had three reports I know of I'd
> > > prefer if we could fix this rather sooner than later in mainline --
> > > especially as Arch Linux and openSUSE Tumbleweed likely have switched to
> > > 6.4.y already or will do so soon.
> >
> > I'll send today's 2-patch series to Linus today or tomorrow.
>
> I need to make a correction to the patch fixing the issue (the first
> one in the series) and will post it within an hour. Thanks!

Corrected patch is posted at
https://lore.kernel.org/all/[email protected]/.
The other patch is unchanged but I posted v3 of the entire patchset
anyway.
Andrew, could you please replace the old version with this one before
sending it to Linus?
Thanks,
Suren.

2023-07-08 13:15:07

by Thorsten Leemhuis

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

[adding Linus to the list of recipients to ensure the fix makes it into
-rc1 (and can finally be backported to -stable).

Linus, here is the backstory, as I assume you haven't seen this yet:

CONFIG_PER_VMA_LOCK (which defaults to Y; merged for v6.4-rc1 in
0bff0aaea03 ("x86/mm: try VMA lock-based page fault handling first"))
sometimes causes memory corruption reported here:
https://lore.kernel.org/all/[email protected]/
https://bugzilla.kernel.org/show_bug.cgi?id=217624

The plan since early this week is to mark CONFIG_PER_VMA_LOCK as broken;
latest patch that does this is this one afaics:
https://lore.kernel.org/all/[email protected]/

But that change or something similar hasn't reached you yet afaics;
note, this is the second patch of a series with two patches]

On 05.07.23 17:49, Andrew Morton wrote:
> On Wed, 5 Jul 2023 10:51:57 +0200 "Linux regression tracking (Thorsten Leemhuis)" <[email protected]> wrote:
>
>>>>> I'm in wait-a-few-days-mode on this. To see if we have a backportable
>>>>> fix rather than disabling the feature in -stable.
>>
>> Andrew, how long will you remain in "wait-a-few-days-mode"? Given what
>> Greg said below and that we already had three reports I know of I'd
>> prefer if we could fix this rather sooner than later in mainline --
>> especially as Arch Linux and openSUSE Tumbleweed likely have switched to
>> 6.4.y already or will do so soon.
>
> I'll send today's 2-patch series to Linus today or tomorrow.

That afaics did not happen until now. :-(

This makes me regret that I did not CC Linus earlier. I always feel like
a snitcher when I do that. But in retrospective it seems it would have
been the right thing to do given the problem, as I suspect Linus would
have quickly applied the patch or marked the feature as broken himself.

So thx to this (and a handful of earlier, similar situations) I now
fully made my peace with feeling like a snitcher (I always knew that
it's kinda part of the position). When something in me says "Ick, this
looks bad to my untrained eyes" I'll immediately CC Linus.

Linus, if I take things to far just let me know. But I assume you get a
lot of mails and won't mind a few more.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

2023-07-08 18:14:32

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Sat, 8 Jul 2023 at 04:35, Thorsten Leemhuis
<[email protected]> wrote:
>
> The plan since early this week is to mark CONFIG_PER_VMA_LOCK as broken;
> latest patch that does this is this one afaics:

Bah.

Both marking it as broken and the pending fix seems excessive.

Why isn't the trivial fix just to say "yes, fork() gets the mmap_lock
for writing for a reason, and that reason is that it acts kind of like
mprotect()".

And then just do what those functions do.

IOW, why isn't the fix just to do

--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -686,6 +686,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
for_each_vma(old_vmi, mpnt) {
struct file *file;

+ vma_start_write(mpnt);
if (mpnt->vm_flags & VM_DONTCOPY) {
vm_stat_account(mm, mpnt->vm_flags, -vma_pages(mpnt));
continue;

and be done with this? Yes, we could move it down a bit more, ignoring
the VM_DONTCOPY vma's, but they are so uncommon as to not matter, so
who cares?

Linus

2023-07-08 19:08:44

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Sat, 8 Jul 2023 at 10:39, Andrew Morton <[email protected]> wrote:
>
> That was the v1 fix, but after some discussion
> (https://lkml.kernel.org/r/[email protected])
> it was decided to take the "excessive" approach.

That makes absolutely _zero_ sense.

It seems to be complete voodoo programming.

To some degree I don't care what happens in stable kernels, but
there's no way we'll do that kind of thing in mainline without some
logic or reason, when it makes no sense.

flush_cache_dup_mm() is entirely irrelevant to the whole issue, for
several reason, but the core one being that it only matters on broken
virtually indexed caches, so none of the architectures that do per-vma
locking.

And the argument that "After the mmap_write_lock_killable(), there
will still be a period where page faults can happen" may be true
(that's kind of the *point* of per-vma locking), but it's irrelevant.

It's true for *all* users of mmap_write_lock_killable, whether in fork
or anywhere else. What makes fork() so magically special?

It's why we have that vma_start_write(), to say "I'm now modifying
*this* vma, so stop accessing it in parallel".

Because no, flush_cache_dup_mm() is not the magical reason to do that thing.

Maybe there is something else going on, but no, we don't write crazy
code without a reason for it. That's completely unmaintainable,
because people will look at that code, not understand it (because
there is nothing to understand) and be afraid to touch it. For no
actual reason.

The obvious place to say "I'm now starting to modify the vma" is when
you actually start to modify the vma.

> Also, this change needs a couple more updates:

Those updates seem sane, and come with explanations of why they exist.
Looks fine to me.

Suren, please send me the proper fixes. Not the voodoo one. The ones
you can explain.

And if stable wants to do something else, then that's fine. But for
the development kernel,. we have two options:

- fix the PER_VMA_LOCK code

- decide that it's not worth it, and just revert it all

and honestly, I'm ok with that second option, simply because this has
all been way too much pain.

But no, we don't mark it broken thinking we can't deal with it, or do
random non-sensible code code we can't explain.

Linus

2023-07-08 19:13:12

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Sat, 8 Jul 2023 10:29:42 -0700 Linus Torvalds <[email protected]> wrote:

> On Sat, 8 Jul 2023 at 04:35, Thorsten Leemhuis
> <[email protected]> wrote:
> >
> > The plan since early this week is to mark CONFIG_PER_VMA_LOCK as broken;
> > latest patch that does this is this one afaics:
>
> Bah.
>
> Both marking it as broken and the pending fix seems excessive.
>
> Why isn't the trivial fix just to say "yes, fork() gets the mmap_lock
> for writing for a reason, and that reason is that it acts kind of like
> mprotect()".
>
> And then just do what those functions do.
>
> IOW, why isn't the fix just to do
>
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -686,6 +686,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> for_each_vma(old_vmi, mpnt) {
> struct file *file;
>
> + vma_start_write(mpnt);
> if (mpnt->vm_flags & VM_DONTCOPY) {
> vm_stat_account(mm, mpnt->vm_flags, -vma_pages(mpnt));
> continue;
>
> and be done with this? Yes, we could move it down a bit more, ignoring
> the VM_DONTCOPY vma's, but they are so uncommon as to not matter, so
> who cares?

That was the v1 fix, but after some discussion
(https://lkml.kernel.org/r/[email protected])
it was decided to take the "excessive" approach.

Also, this change needs a couple more updates:
https://lkml.kernel.org/r/[email protected]
https://lkml.kernel.org/r/[email protected]

So I'm thinking it's best to disable the feature in 6.4.x and reenable
it for 6.5 once all this is sorted out.

2023-07-08 19:14:46

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Sat, 8 Jul 2023 at 11:40, Suren Baghdasaryan <[email protected]> wrote:
>
> My understanding was that flush_cache_dup_mm() is there to ensure
> nothing is in the cache, so locking VMAs before doing that would
> ensure that no page faults would pollute the caches after we flushed
> them. Is that reasoning incorrect?

It is indeed incorrect.

The VIVT caches are fundamentally broken, and we have various random
hacks for them to make them work in legacy situations.

And that flush_cache_dup_mm() is exactly that: a band-aid to make sure
that when we do a fork(), any previous writes that are dirty in the
caches will have made it to memory, so that they will show up in the
*new* process that has a different virtual mapping.

BUT!

This has nothing to do with page faults, or other threads.

If you have a threaded application that does fork(), it can - and will
- dirty the VIVT caches *during* the fork, and so the whole
"flush_cache_dup_mm()" is completely and fundamentally race wrt any
*new* activity.

It's not even what it is trying to deal with. All it tries to do is to
make sure that the newly forked child AT LEAST sees all the changes
that the parent did up to the point of the fork. Anything after that
is simply not relevant at all.

So think of all this not as some kind of absolute synchronization and
cache coherency (because you will never get that on a VIVT
architecture anyway), but as a "for the simple cases, this will at
least get you the expected behavior".

But as mentioned, for the issue of PER_VMA_LOCK, this is all *doubly*
irrelevant. Not only was it not relevant to begin with (ie that cache
flush only synchronizes parent -> child, not other-threads -> child),
but VIVT caches don't even exist on any relevant architecture because
they are fundamentally broken in so many other ways.

So all our "synchronize caches by hand" is literally just band-aid for
legacy architectures. I think it's mostly things like the old broken
MIPS chips, some sparc32, pa-risc: the "old RISC" stuff, where people
simplified the hardware a bit too much.

VIVT is lovely for hardware people becasue they get a shortcut. But
it's "lovely" in the same way that "PI=3" is lovely. It's simpler -
but it's _wrong_.

And it's almost entirely useless if you ever do SMP. I guarantee we
have tons of races with it for very fundamental reasons - the problems
it causes for software are not fixable, they are "hidable for the
simple case".

So you'll also find things like dcache_page_flush(), which flushes
writes to a page to memory. And exactly like the fork() case, it's
*not* real cache coherency, and it's *not* some kind of true global
serialization.

It's used in cases where we have a particular user that wants the
changes *it* made to be made visible. And exactly like
flush_cache_dup_mm(), it cannot deal with concurrent changes that
other threads make.

> Ok, I think these two are non-controversial:
> https://lkml.kernel.org/r/[email protected]
> https://lkml.kernel.org/r/[email protected]

These look sane to me. I wonder if the vma_start_write() should have
been somewhere else, but at least it makes sense in context, even if I
get the feeling that maybe it should have been done in some helper
earlier.

As it is, we randomly do it in other helpers like vm_flags_set(), and
I've often had the reaction that these vma_start_write() calls are
randomly sprinked around without any clear _design_ for where they
are.

> and the question now is how we fix the fork() case:
> https://lore.kernel.org/all/[email protected]/
> (if my above explanation makes sense to you)

See above. That patch is nonsensical. Trying to order
flush_cache_dup_mm() is not about page faults, and is fundamentally
not doable with threads anyway.

> https://lore.kernel.org/all/[email protected]/

This is the one that makes sense to me.

Linus

2023-07-08 19:56:01

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Sat, Jul 8, 2023 at 12:23 PM Linus Torvalds
<[email protected]> wrote:
>
> On Sat, 8 Jul 2023 at 12:17, Suren Baghdasaryan <[email protected]> wrote:
> >
> > Do you want me to disable per-VMA locks by default as well?
>
> No. I seriously believe that if the per-vma locking is so broken that
> it needs to be disabled in a development kernel, we should just admit
> failure, and revert it all.
>
> And not in a "revert it for a later attempt" kind of way.
>
> So it would be a "revert it because it added insurmountable problems
> that we couldn't figure out" thing that implies *not* trying it again
> in that form at all, and much soul-searching before somebody decides
> that they have a more maintainable model for it all.

Got it. I hope that's not the case and so far we haven't received an
indication that the fixes were insufficient.

>
> If stable decides that the fixes are not back-portable, and the whole
> thing needs to be disabled for stable, that's one thing. But if we
> decide that in mainline, it's a "this was a failure" thing.

The patches applied cleanly to 6.4.y stable branch the last time I
checked, so should not be a problem.
Thanks,
Suren.

>
> Linus

2023-07-08 20:08:55

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Sat, Jul 8, 2023 at 11:05 AM Linus Torvalds
<[email protected]> wrote:
>
> On Sat, 8 Jul 2023 at 10:39, Andrew Morton <[email protected]> wrote:
> >
> > That was the v1 fix, but after some discussion
> > (https://lkml.kernel.org/r/[email protected])
> > it was decided to take the "excessive" approach.
>
> That makes absolutely _zero_ sense.
>
> It seems to be complete voodoo programming.
>
> To some degree I don't care what happens in stable kernels, but
> there's no way we'll do that kind of thing in mainline without some
> logic or reason, when it makes no sense.
>
> flush_cache_dup_mm() is entirely irrelevant to the whole issue, for
> several reason, but the core one being that it only matters on broken
> virtually indexed caches, so none of the architectures that do per-vma
> locking.
>
> And the argument that "After the mmap_write_lock_killable(), there
> will still be a period where page faults can happen" may be true
> (that's kind of the *point* of per-vma locking), but it's irrelevant.
>
> It's true for *all* users of mmap_write_lock_killable, whether in fork
> or anywhere else. What makes fork() so magically special?
>
> It's why we have that vma_start_write(), to say "I'm now modifying
> *this* vma, so stop accessing it in parallel".
>
> Because no, flush_cache_dup_mm() is not the magical reason to do that thing.

My understanding was that flush_cache_dup_mm() is there to ensure
nothing is in the cache, so locking VMAs before doing that would
ensure that no page faults would pollute the caches after we flushed
them. Is that reasoning incorrect?

>
> Maybe there is something else going on, but no, we don't write crazy
> code without a reason for it. That's completely unmaintainable,
> because people will look at that code, not understand it (because
> there is nothing to understand) and be afraid to touch it. For no
> actual reason.
>
> The obvious place to say "I'm now starting to modify the vma" is when
> you actually start to modify the vma.
>
> > Also, this change needs a couple more updates:
>
> Those updates seem sane, and come with explanations of why they exist.
> Looks fine to me.
>
> Suren, please send me the proper fixes. Not the voodoo one. The ones
> you can explain.

Ok, I think these two are non-controversial:
https://lkml.kernel.org/r/[email protected]
https://lkml.kernel.org/r/[email protected]

and the question now is how we fix the fork() case:
https://lore.kernel.org/all/[email protected]/
(if my above explanation makes sense to you)
or
https://lore.kernel.org/all/[email protected]/

Please let me know which ones and I'll send you the patchset including
these patches.
Thanks,
Suren.

>
> And if stable wants to do something else, then that's fine. But for
> the development kernel,. we have two options:
>
> - fix the PER_VMA_LOCK code
>
> - decide that it's not worth it, and just revert it all
>
> and honestly, I'm ok with that second option, simply because this has
> all been way too much pain.
>
> But no, we don't mark it broken thinking we can't deal with it, or do
> random non-sensible code code we can't explain.
>
> Linus

2023-07-08 20:15:33

[permalink] [raw]

Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork

On Sat, 8 Jul 2023 at 12:17, Suren Baghdasaryan <[email protected]> wrote:
>
> Do you want me to disable per-VMA locks by default as well?

No. I seriously believe that if the per-vma locking is so broken that
it needs to be disabled in a development kernel, we should just admit
failure, and revert it all.

And not in a "revert it for a later attempt" kind of way.

So it would be a "revert it because it added insurmountable problems
that we couldn't figure out" thing that implies *not* trying it again
in that form at all, and much soul-searching before somebody decides
that they have a more maintainable model for it all.

If stable decides that the fixes are not back-portable, and the whole
thing needs to be disabled for stable, that's one thing. But if we
decide that in mainline, it's a "this was a failure" thing.

Linus

2023-07-08 20:17:46