2024-02-21 09:12:36

by Paolo Bonzini

[permalink] [raw]
Subject: Re: CVE-2023-52437: Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d" [resend]

Resending with LKML in Cc, since posting to the linux-cve-announce

mailing list is restricted to moderators.



On 2/20/24 19:34, Greg Kroah-Hartman wrote:

> Description

> ===========

>

> In the Linux kernel, the following vulnerability has been resolved:

>

> Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"

>

> This reverts commit 5e2cf333b7bd5d3e62595a44d598a254c697cd74.

>

> That commit introduced the following race and can cause system hung.

>

> md_write_start: raid5d:

> // mddev->in_sync == 1

> set "MD_SB_CHANGE_PENDING"

> // running before md_write_start wakeup it

> waiting "MD_SB_CHANGE_PENDING" cleared

> >>>>>>>>> hung

> wakeup mddev->thread

> ...

> waiting "MD_SB_CHANGE_PENDING" cleared

> >>>> hung, raid5d should clear this flag

> but get hung by same flag.

>

> The issue reverted commit fixing is fixed by last patch in a new way.



Sometimes less than optimal descriptions end up even in Linux commit

messages, and I understand that you're "not going to be adding anything

additional to the report" other than the git commit message. [1] But

this description is not just "suboptimal" English, it also makes zero

sense since it refers to a "last patch" that does not exist.



There are dozens of distros, both commercial and non-commercial, whose

users need a *real* description of what is being fixed. By writing CVE

descriptions that make no sense, you're creating more work for everyone

involved, without putting in place a process to clarify these things

except through "the maintainers of the relevant subsystem

affected"---who are not CC'd to these messages and therefore might not

even know that the CVE announcement exists.



My suggestion is to CC the author of the fix and the maintainer, and if

possible even go through a pre-verification phase similar to what's done

for AUTOSEL patches. If some commit messages are irredeemable, or some

situations are just too complex, and no one is willing to put the work

that's required to do the work properly, the maintainer should have the

possibility to NACK the creation of an unusable CVE entry like this one.



(Somewhat related to this, how are you going to handle patch

dependencies? Sasha's GSD updates has a separate entry for each commit,

the result being "vulnerabilities" with "no functional change" in their

description. Are they instead going to be rolled into a single entry

like this one now that you're actually creating CVEs?)



I am cautiously optimistic that this can be worked out and I agree with

you that lots of bug fixes going into stable have potential security

impact. But as this example shows, there's still more than a few kinks

to be ironed out.



> The Linux kernel CVE team has assigned CVE-2023-52437 to this issue.

>

>

> Affected and fixed versions

> ===========================

>

> Issue introduced in 5.15.75 with commit 9e86dffd0b02 and fixed in 5.15.148 with commit 84c39986fe6d

> Issue introduced in 6.1 with commit 5e2cf333b7bd and fixed in 6.1.74 with commit bed0acf330b2

> Issue introduced in 6.1 with commit 5e2cf333b7bd and fixed in 6.1.75 with commit cfa468382858

> Issue introduced in 6.1 with commit 5e2cf333b7bd and fixed in 6.6.13 with commit e16a0bbdb7e5

> Issue introduced in 6.1 with commit 5e2cf333b7bd and fixed in 6.6.14 with commit aab69ef76970

> Issue introduced in 6.1 with commit 5e2cf333b7bd and fixed in 6.7.1 with commit 0de40f76d567

> Issue introduced in 6.1 with commit 5e2cf333b7bd and fixed in 6.7.2 with commit 87165c64fe1a



So which one is it of these 6.{1,6,7}.y releases that fixed the issue?



> The Linux kernel CVE team recommends that you update to the latest

> stable kernel version for this, and many other bugfixes. Individual

> changes are never tested alone, but rather are part of a larger kernel

> release. Cherry-picking individual commits is not recommended or

> supported by the Linux kernel community at all. If however, updating to

> the latest release is impossible, the individual changes to resolve this

> issue can be found at these commits:

> https://git.kernel.org/stable/c/84c39986fe6dd77aa15f08712339f5d4eb7dbe27

> https://git.kernel.org/stable/c/bed0acf330b2c50c688f6d9cfbcac2aa57a8e613

> https://git.kernel.org/stable/c/cfa46838285814c3a27faacf7357f0a65bb5d152

> https://git.kernel.org/stable/c/e16a0bbdb7e590a6607b0d82915add738c03c069

> https://git.kernel.org/stable/c/aab69ef769707ad987ff905d79e0bd6591812580

> https://git.kernel.org/stable/c/0de40f76d567133b871cd6ad46bb87afbce46983

> https://git.kernel.org/stable/c/87165c64fe1a98bbab7280c58df3c83be2c98478

> https://git.kernel.org/stable/c/bed9e27baf52a09b7ba2a3714f1e24e17ced386d



Half of these are reverting the revert. I understand that

"cherry-picking individual commits is not recommended" but it looks like

this is a bug in whatever scripts you are using. Are they public, so

that fixes can be developed in the open?



Also, commit 87165c64fe1a9 (the revert of the revert) was marked 5.19+

but 5.15.148 does have the original revert. Does that mean that 5.15.148

still has the "issue with raid5 with journal device" (another hang, see

https://lore.kernel.org/linux-raid/[email protected]/)

mentioned in the commit message for 87165c64fe1a9? If so, that

contradicts the fact that updating to the latest release of a given LTS

branch is the best course of action, since for some users 5.15.147 might

be better than 5.15.148.



Paolo



[1] https://lwn.net/ml/linux-kernel/2024021518-stature-frightful-e7fc@gregkh/

[2] https://lwn.net/ml/linux-kernel/2024021430-blanching-spotter-c7c8@gregkh/