2022-06-19 13:25:19

by Thomas Gleixner

[permalink] [raw]
Subject: [GIT pull] locking/urgent for 5.19-rc3

Linus,

please pull the latest locking/urgent branch from:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking-urgent-2022-06-19

up to: 4051a81774d6: locking/lockdep: Use sched_clock() for random numbers


A RT fix for lockdep. lockdep invokes prandom_u32() to create cookies. This
worked until prandom_u32() was switched to the real random generator, which
takes a spinlock for extraction, which does not work on RT when invoked
from atomic contexts. lockdep has no requirement for real random numbers
and it turns out sched_clock() is good enough to create the cookie. That
works everywhere and is faster.

Thanks,

tglx

------------------>
Sebastian Andrzej Siewior (1):
locking/lockdep: Use sched_clock() for random numbers


kernel/locking/lockdep.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 81e87280513e..f06b91ca6482 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -5432,7 +5432,7 @@ static struct pin_cookie __lock_pin_lock(struct lockdep_map *lock)
* be guessable and still allows some pin nesting in
* our u32 pin_count.
*/
- cookie.val = 1 + (prandom_u32() >> 16);
+ cookie.val = 1 + (sched_clock() & 0xffff);
hlock->pin_count += cookie.val;
return cookie;
}


2022-06-19 15:02:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT pull] locking/urgent for 5.19-rc3

On Sun, Jun 19, 2022 at 8:12 AM Thomas Gleixner <[email protected]> wrote:
>
> A RT fix for lockdep. lockdep invokes prandom_u32() to create cookies. This
> worked until prandom_u32() was switched to the real random generator, which
> takes a spinlock for extraction, which does not work on RT when invoked
> from atomic contexts. lockdep has no requirement for real random numbers
> and it turns out sched_clock() is good enough to create the cookie. That
> works everywhere and is faster.

So this is obviously fine and works ok, but I do think it highlights
that maybe that prandom change was a bad bad idea.

Even outside of RT, you might end up getting nasty locks within locks.
Not a deadlock, but a "this was just pointless".

Linus

2022-06-19 15:03:41

by pr-tracker-bot

[permalink] [raw]
Subject: Re: [GIT pull] locking/urgent for 5.19-rc3

The pull request you sent on Sun, 19 Jun 2022 15:12:35 +0200 (CEST):

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking-urgent-2022-06-19

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/4afb65156a79c59fbdbc10abb0bf06ff83f73e23

Thank you!

--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

2022-06-19 16:40:26

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [GIT pull] locking/urgent for 5.19-rc3

Hi Linus,

On Sun, Jun 19, 2022 at 09:50:32AM -0500, Linus Torvalds wrote:
> On Sun, Jun 19, 2022 at 8:12 AM Thomas Gleixner <[email protected]> wrote:
> >
> > A RT fix for lockdep. lockdep invokes prandom_u32() to create cookies. This
> > worked until prandom_u32() was switched to the real random generator, which
> > takes a spinlock for extraction, which does not work on RT when invoked
> > from atomic contexts. lockdep has no requirement for real random numbers
> > and it turns out sched_clock() is good enough to create the cookie. That
> > works everywhere and is faster.
>
> So this is obviously fine and works ok, but I do think it highlights
> that maybe that prandom change was a bad bad idea.
>
> Even outside of RT, you might end up getting nasty locks within locks.
> Not a deadlock, but a "this was just pointless".

This was initially my concern too, which I expressed to Sebastian, but
he made the point that this area here is rather "special". Actually,
randomness isn't really required here. A counter would have worked, but
Peter thought that sched_clock() would be faster than incrementing a
counter. And if it did want some sort of max period generator that's not
sequential, there's still the deterministic prandom_u32_state()
function.

Anyway, if it turns out that this is terrible, and we're running into
issues all over the place, I'll start looking for more comprehensive
solutions. But as long as very special cases are fairly few and rare, it
seems like it's going to be okay. I guess we'll see. For now, I'm
undeterred.

Jason

2022-06-19 20:26:08

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT pull] locking/urgent for 5.19-rc3

On Sun, Jun 19, 2022 at 11:38 AM Jason A. Donenfeld <[email protected]> wrote:
>
> This was initially my concern too, which I expressed to Sebastian, but
> he made the point that this area here is rather "special". Actually,
> randomness isn't really required here.

That wasn't really my point.

My point was that there are a lot of uses of prandom_u32() and friends
in random places. Just grepping for it, there's lots of different
drivers that use it. Who knows what locking they have.

Clearly nobody *thought* about it. This one issue is purely about RT
correctness, but how about all the uses that just want a pseudo-random
number and may have performance issues, or may be calling things so
much that a lock is just bad.

The thing is, that prandom code used to be FAST. Not just "no locks",
but also "fairly simple siphash round because its a PSEUDO random
thing and shouldn't be anything more".

The whole "make it use the same randomness" may just have been a huge
and fundamental mistake.

We've seen one actual outright bug because of it already. That was
easy to fix by avoiding the new thing that now was a mistake. What
about all the other uses with lock bouncing or whatever subtler issues
that aren't pointed out by outright correctness tests?

Linus

2022-06-19 20:33:18

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [GIT pull] locking/urgent for 5.19-rc3

Hey Linus,

On Sun, Jun 19, 2022 at 03:05:23PM -0500, Linus Torvalds wrote:
> On Sun, Jun 19, 2022 at 11:38 AM Jason A. Donenfeld <[email protected]> wrote:
> >
> > This was initially my concern too, which I expressed to Sebastian, but
> > he made the point that this area here is rather "special". Actually,
> > randomness isn't really required here.
>
> That wasn't really my point.
>
> My point was that there are a lot of uses of prandom_u32() and friends
> in random places. Just grepping for it, there's lots of different
> drivers that use it. Who knows what locking they have.
>
> Clearly nobody *thought* about it. This one issue is purely about RT
> correctness, but how about all the uses that just want a pseudo-random
> number and may have performance issues, or may be calling things so
> much that a lock is just bad.
>
> The thing is, that prandom code used to be FAST. Not just "no locks",

In my benchmarks, get_random_u32() is on par with the old prandom_u32()
now. And a large part of that is that get_random_u32() is almost always
lockless. Every once in a while it needs to refill its buffer, so it
uses get_random_bytes(). And guess what? That too is almost always
lockless. But every once in a while -- on the order of once a minute --
get_random_byes() takes an extremely short spinlock that just does one
block of chacha. And that's only if get_random_u32()'s buffer being
empty is the thing to trigger the reseeding there. So taken together,
this means there's a very short spinlock once per minute, and only if
the stars align. That means performance and lock contention here is
really not an issue.

That's not to say we couldn't optimize the whole thing even further. I
think we probably can, should it become necessary or desirable or even
simply a fun thing to do. But hard as I tried, I couldn't find anywhere
that this was a problem from a performance perspective. And in some real
world cases (e.g. network stack), the performance was a little better.

So far, the one category of gotchas are when this is used from inside of
a raw spinlock, because that makes RT upset. I've seen two cases of
this, both of which were trivial to resolve. If it balloons into lots of
cases, or if multiple hard to address categories emerge, then maybe
we'll need to look at it differently. But for now this seems like a very
manageable problem.

Jason