2022-06-21 15:48:43

by Joe Korty

[permalink] [raw]
Subject: [RT BUG] Mismatched get_uid/free_uid usage in signals in some rts (2nd try)

Mismatched get_uid/free_uid usage in signals in 4.9.312-rt193

[ First attempt using mutt did not show up on the mailing lists.
Trying again with office365 Outlook. Also added the 4.9-rt
maintainers. ]

The 4.19-rt patch,

0329-signal-Prevent-double-free-of-user-struct.patch

needs to be ported to LAG 4.9-rt, as that release now has the Linus commit,

fda31c50292a ("signal: avoid double atomic counter increments for user accounting")

which breaks the longstanding rt patch,

0259-signals-Allow-rt-tasks-to-cache-one-sigqueue-struct.patch

Current application status:

4.4.302-rt232 OK has both Linus's patch and the fix needed for rt.
4.9.312-rt193 BROKE has Linus's patch but not the fix.
4.14.87-rt50 OK does NOT have either Linus's patch nor its rt fix.
4.19.246-rt110 OK has both Linus's patch and the fix needed for rt.
5.4.193-rt74 OK has both Linus's patch and the fix needed for rt.
5.10.120-rt70 OK has both Linus's patch and the fix needed for rt.
5.15.44-rt46 UNKNOWN no get_uid/free_uid usage in kernel/signal.c anymore.

Regards,
Joe


2022-06-24 17:29:17

by Mark Gross

[permalink] [raw]
Subject: Re: [RT BUG] Mismatched get_uid/free_uid usage in signals in some rts (2nd try)

On Tue, Jun 21, 2022 at 03:16:39PM +0000, Joe Korty wrote:
> Mismatched get_uid/free_uid usage in signals in 4.9.312-rt193
>
> [ First attempt using mutt did not show up on the mailing lists.
> Trying again with office365 Outlook. Also added the 4.9-rt
> maintainers. ]
>
> The 4.19-rt patch,
>
> 0329-signal-Prevent-double-free-of-user-struct.patch
>
> needs to be ported to LAG 4.9-rt, as that release now has the Linus commit,
What does LAG stand for?

FWIW the cherry-pick within the RT-stable tree worked without conflict.
(cherry picked from commit a99e09659e6cd4b633c3689f2c3aa5f8a816fe5b)
It compiles.
See 58a584ee59b2 signal: Prevent double-free of user struct in
linux-stable-rt.git/v4.9-rt-next

>
> fda31c50292a ("signal: avoid double atomic counter increments for user accounting")
>
This was added to 4.9.y on March 20, 2020.
commit 4306259ff6b8b682322d9aeb0c12b27c61c4a548 in linux-stable.

How did you find this issue? What is missing from my testing?

Do you have a test case that I can conferm my cherry-pick works?
Could you test the v4.9-rt-next branch to see if it fixes you issue?

--mark

> which breaks the longstanding rt patch,
>
> 0259-signals-Allow-rt-tasks-to-cache-one-sigqueue-struct.patch
>
> Current application status:
>
> 4.4.302-rt232 OK has both Linus's patch and the fix needed for rt.
> 4.9.312-rt193 BROKE has Linus's patch but not the fix.
> 4.14.87-rt50 OK does NOT have either Linus's patch nor its rt fix.
> 4.19.246-rt110 OK has both Linus's patch and the fix needed for rt.
> 5.4.193-rt74 OK has both Linus's patch and the fix needed for rt.
> 5.10.120-rt70 OK has both Linus's patch and the fix needed for rt.
> 5.15.44-rt46 UNKNOWN no get_uid/free_uid usage in kernel/signal.c anymore.
>
> Regards,
> Joe

2022-06-24 18:59:40

by Joe Korty

[permalink] [raw]
Subject: Re: [RT BUG] Mismatched get_uid/free_uid usage in signals in some rts (2nd try)

[ Fixed incorrect linux-rt-users email address in CC ]

On Fri, Jun 24, 2022 at 09:58:07AM -0700, Mark Gross wrote:
> On Tue, Jun 21, 2022 at 03:16:39PM +0000, Joe Korty wrote:
> > Mismatched get_uid/free_uid usage in signals in 4.9.312-rt193
> >
> > [ First attempt using mutt did not show up on the mailing lists.
> > Trying again with office365 Outlook. Also added the 4.9-rt
> > maintainers. ]
> >
> > The 4.19-rt patch,
> >
> > 0329-signal-Prevent-double-free-of-user-struct.patch
> >
> > needs to be ported to LAG 4.9-rt, as that release now has the Linus commit,
> What does LAG stand for?

Hi Mark,
LAG = Latest and Greatest



> FWIW the cherry-pick within the RT-stable tree worked without conflict.
> (cherry picked from commit a99e09659e6cd4b633c3689f2c3aa5f8a816fe5b)
> It compiles.
> See 58a584ee59b2 signal: Prevent double-free of user struct in
> linux-stable-rt.git/v4.9-rt-next
>
> >
> > fda31c50292a ("signal: avoid double atomic counter increments for user accounting")
> >
> This was added to 4.9.y on March 20, 2020.
> commit 4306259ff6b8b682322d9aeb0c12b27c61c4a548 in linux-stable.
>
> How did you find this issue? What is missing from my testing?
>
> Do you have a test case that I can conferm my cherry-pick works?
> Could you test the v4.9-rt-next branch to see if it fixes you issue?

We do not have a standard test. We were seeing crashes in NFS. It happened
only on arm64 systems. We have a custom kernel with changes and the test
consisted of exercising one of those changes, which involved lots of signals,
then running NFS tests in loopback mode. On occasion NFS would crash in
a way it never has crashed before, which suggested use-after-free corruption.
It never would crash unless we hit signals heavily first, which implied that
something in signals was wrong. After that it wasn't too hard to find the
patch that fixed the problem in 4.4, 4.14, 4.19, 5.4, and 5.10.

We have not seen the NFS crash since applying the fix.

Joe


PS: Correction to the table below. I tested a too-early version of 4.14-rt. Retested.

Current application status:

4.4.302-rt232 OK has both Linus's patch and the fix needed for rt.
4.9.312-rt193 BROKE has Linus's patch but not the fix.
- 4.14.87-rt50 OK does NOT have either Linus's patch nor its rt fix.
+ 4.14.282-rt135 OK has both Linus's patch and the fix needed for rt.
4.19.246-rt110 OK has both Linus's patch and the fix needed for rt.
5.4.193-rt74 OK has both Linus's patch and the fix needed for rt.
5.10.120-rt70 OK has both Linus's patch and the fix needed for rt.
5.15.44-rt46 UNKNOWN no get_uid/free_uid usage in kernel/signal.c anymore.

2022-06-26 12:53:41

by Joe Korty

[permalink] [raw]
Subject: Re: [RT BUG] Mismatched get_uid/free_uid usage in signals in some rts (2nd try)

On Fri, Jun 24, 2022 at 02:44:31PM -0400, Joe Korty wrote:
> [ Fixed incorrect linux-rt-users email address in CC ]
>
> On Fri, Jun 24, 2022 at 09:58:07AM -0700, Mark Gross wrote:
> > On Tue, Jun 21, 2022 at 03:16:39PM +0000, Joe Korty wrote:
> > > Mismatched get_uid/free_uid usage in signals in 4.9.312-rt193
> > >
> > > [ First attempt using mutt did not show up on the mailing lists.
> > > Trying again with office365 Outlook. Also added the 4.9-rt
> > > maintainers. ]
> > >
> > > The 4.19-rt patch,
> > >
> > > 0329-signal-Prevent-double-free-of-user-struct.patch
> > >
> > > needs to be ported to LAG 4.9-rt, as that release now has the Linus commit,
> > What does LAG stand for?
>
> Hi Mark,
> LAG = Latest and Greatest
>
>
>
> > FWIW the cherry-pick within the RT-stable tree worked without conflict.
> > (cherry picked from commit a99e09659e6cd4b633c3689f2c3aa5f8a816fe5b)
> > It compiles.
> > See 58a584ee59b2 signal: Prevent double-free of user struct in
> > linux-stable-rt.git/v4.9-rt-next


Hi Mark,
Absent an actual test of your port of a99e09659e6c to 4.9-rt, I just
eye-verified that the change it makes to sigqueue_free_current looks
correct. In detail,

matches the same change the Linus patch makes to __sigqueue_free (ie,
to the routine that sigqueue_free_current is a copy of).

That the new variable 'up', in sigqueue_free_current, is being used
in the patch (some variants of this fix do not have 'up'), and that
variable is present in 4.9's version of sigqueue_free_current.

That atomic_dec_and_test, rather than the refcounting version of that
some function, is being used (some versions of this patch are refcounted
instead).

Regards,
Joe

Subject: Re: [RT BUG] Mismatched get_uid/free_uid usage in signals in some rts (2nd try)

On 2022-06-26 08:30:19 [-0400], Joe Korty wrote:
> Hi Mark,
> Absent an actual test of your port of a99e09659e6c to 4.9-rt, I just
> eye-verified that the change it makes to sigqueue_free_current looks
> correct. In detail,
>
> matches the same change the Linus patch makes to __sigqueue_free (ie,
> to the routine that sigqueue_free_current is a copy of).
>
> That the new variable 'up', in sigqueue_free_current, is being used
> in the patch (some variants of this fix do not have 'up'), and that
> variable is present in 4.9's version of sigqueue_free_current.
>
> That atomic_dec_and_test, rather than the refcounting version of that
> some function, is being used (some versions of this patch are refcounted
> instead).

What is the status here? Is this still needed?

> Regards,
> Joe

Sebastian

2022-08-18 16:42:48

by Joe Korty

[permalink] [raw]
Subject: Re: [RT BUG] Mismatched get_uid/free_uid usage in signals in some rts (2nd try)

On Thu, Aug 18, 2022 at 06:08:16PM +0200, Sebastian Andrzej Siewior wrote:
> On 2022-06-26 08:30:19 [-0400], Joe Korty wrote:
> > Hi Mark,
> > Absent an actual test of your port of a99e09659e6c to 4.9-rt, I just
> > eye-verified that the change it makes to sigqueue_free_current looks
> > correct. In detail,
> >
> > matches the same change the Linus patch makes to __sigqueue_free (ie,
> > to the routine that sigqueue_free_current is a copy of).
> >
> > That the new variable 'up', in sigqueue_free_current, is being used
> > in the patch (some variants of this fix do not have 'up'), and that
> > variable is present in 4.9's version of sigqueue_free_current.
> >
> > That atomic_dec_and_test, rather than the refcounting version of that
> > some function, is being used (some versions of this patch are refcounted
> > instead).
>
> What is the status here? Is this still needed?

Hi Sebastian,
I just verified that 4.9.319-rt195 has this fix.
Regards,
Joe

Subject: Re: [RT BUG] Mismatched get_uid/free_uid usage in signals in some rts (2nd try)

On 2022-08-18 12:31:26 [-0400], Joe Korty wrote:
> Hi Sebastian,
Hi Joe,

> I just verified that 4.9.319-rt195 has this fix.

Good to hear. And the v4.9 series was the only broken, right?

> Regards,
> Joe

Sebastian

2022-08-18 17:19:26

by Joe Korty

[permalink] [raw]
Subject: Re: [RT BUG] Mismatched get_uid/free_uid usage in signals in some rts (2nd try)

On Thu, Aug 18, 2022 at 06:33:28PM +0200, Sebastian Andrzej Siewior wrote:
> Hi Joe,
>
> > I just verified that 4.9.319-rt195 has this fix.
>
> Good to hear. And the v4.9 series was the only broken, right?

Yep. Way back when, I eye-verified that all our other long
term rt's, starting at 4.4 and beyond, already had the fix.

Joe