2023-08-08 17:28:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/3] test-ww_mutex: Use prng instead of rng to avoid hangs at bootup

On Tue, Aug 08, 2023 at 06:26:41AM +0000, John Stultz wrote:
> Booting w/ qemu without kvm, I noticed we'd sometimes seem to get
> stuck in get_random_u32_below(). This seems potentially to be
> entropy exhaustion (with the test module linked statically, it
> runs pretty early in the bootup).
>
> I'm not 100% sure on this, but this patch switches to use the
> prng instead since we don't need true randomness, just mixed up
> orders for testing ww_mutex lock acquisitions.
>
> With this patch, I no longer see hangs in get_random_u32_below()
>
> Feedback would be appreciated!

Jason, I thought part of the 'recent' random rework was avoiding the
exhaustion problem, could you please give an opinion on the below?

> Signed-off-by: John Stultz <[email protected]>
> ---
> kernel/locking/test-ww_mutex.c | 19 +++++++++++++++++--
> 1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
> index 93cca6e69860..9bceba65858a 100644
> --- a/kernel/locking/test-ww_mutex.c
> +++ b/kernel/locking/test-ww_mutex.c
> @@ -9,7 +9,7 @@
> #include <linux/delay.h>
> #include <linux/kthread.h>
> #include <linux/module.h>
> -#include <linux/random.h>
> +#include <linux/prandom.h>
> #include <linux/slab.h>
> #include <linux/ww_mutex.h>
>
> @@ -386,6 +386,19 @@ struct stress {
> int nlocks;
> };
>
> +struct rnd_state rng;
> +DEFINE_SPINLOCK(rng_lock);
> +
> +static inline u32 prandom_u32_below(u32 ceil)
> +{
> + u32 ret;
> +
> + spin_lock(&rng_lock);
> + ret = prandom_u32_state(&rng) % ceil;
> + spin_unlock(&rng_lock);
> + return ret;
> +}
> +
> static int *get_random_order(int count)
> {
> int *order;
> @@ -399,7 +412,7 @@ static int *get_random_order(int count)
> order[n] = n;
>
> for (n = count - 1; n > 1; n--) {
> - r = get_random_u32_below(n + 1);
> + r = prandom_u32_below(n + 1);
> if (r != n) {
> tmp = order[n];
> order[n] = order[r];
> @@ -625,6 +638,8 @@ static int __init test_ww_mutex_init(void)
>
> printk(KERN_INFO "Beginning ww mutex selftests\n");
>
> + prandom_seed_state(&rng, get_random_u64());
> +
> wq = alloc_workqueue("test-ww_mutex", WQ_UNBOUND, 0);
> if (!wq)
> return -ENOMEM;
> --
> 2.41.0.640.ga95def55d0-goog
>


2023-08-08 21:42:26

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/3] test-ww_mutex: Use prng instead of rng to avoid hangs at bootup

Hi Peter, John,

On Tue, Aug 8, 2023 at 12:36 PM Peter Zijlstra <[email protected]> wrote:
>
> On Tue, Aug 08, 2023 at 06:26:41AM +0000, John Stultz wrote:
> > Booting w/ qemu without kvm, I noticed we'd sometimes seem to get
> > stuck in get_random_u32_below(). This seems potentially to be
> > entropy exhaustion (with the test module linked statically, it
> > runs pretty early in the bootup).
> >
> > I'm not 100% sure on this, but this patch switches to use the
> > prng instead since we don't need true randomness, just mixed up
> > orders for testing ww_mutex lock acquisitions.
> >
> > With this patch, I no longer see hangs in get_random_u32_below()
> >
> > Feedback would be appreciated!
>
> Jason, I thought part of the 'recent' random rework was avoiding the
> exhaustion problem, could you please give an opinion on the below?

Thanks for looping me in. I actually can't reproduce this. I'm using a
minimal config and using QEMU without KVM. The RNG doesn't initialize
until much later on in the boot process, expectedly, yet
get_random_u32_below() does _not_ hang in my trials. And indeed it's
designed to never hang, since that would create boot deadlocks. So I'm
not sure why you're seeing a hang.

It is worth noting that in those early boot test-case scenarios,
before the RNG initializes, get_random_u32_below() will be somewhat
slower than it normally is, and also slower than prandom_u32_state().
(But only in this early boot scenario edge case; this isn't a general
statement about speed.) It's possible that in your QEMU machine,
things are slow enough that you're simply noticing the difference. On
my system, however, I replaced `get_random_u32_below()` with `static
u32 x; return ++x % ceil;` and I didn't see any difference running it
under TCG -- it took about 7 seconds either way.

So, from my perspective, you shouldn't see any hang. That function
never blocks. I'm happy to look more into what's happening on your end
though. Maybe share your .config and qemu command line and I'll see if
I can repro?

Jason

2023-08-08 22:35:10

by John Stultz

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/3] test-ww_mutex: Use prng instead of rng to avoid hangs at bootup

On Tue, Aug 8, 2023 at 7:05 AM Jason A. Donenfeld <[email protected]> wrote:
>
> Hi Peter, John,
>
> On Tue, Aug 8, 2023 at 12:36 PM Peter Zijlstra <[email protected]> wrote:
> >
> > On Tue, Aug 08, 2023 at 06:26:41AM +0000, John Stultz wrote:
> > > Booting w/ qemu without kvm, I noticed we'd sometimes seem to get
> > > stuck in get_random_u32_below(). This seems potentially to be
> > > entropy exhaustion (with the test module linked statically, it
> > > runs pretty early in the bootup).
> > >
> > > I'm not 100% sure on this, but this patch switches to use the
> > > prng instead since we don't need true randomness, just mixed up
> > > orders for testing ww_mutex lock acquisitions.
> > >
> > > With this patch, I no longer see hangs in get_random_u32_below()
> > >
> > > Feedback would be appreciated!
> >
> > Jason, I thought part of the 'recent' random rework was avoiding the
> > exhaustion problem, could you please give an opinion on the below?
>
> Thanks for looping me in. I actually can't reproduce this. I'm using a
> minimal config and using QEMU without KVM. The RNG doesn't initialize
> until much later on in the boot process, expectedly, yet
> get_random_u32_below() does _not_ hang in my trials. And indeed it's
> designed to never hang, since that would create boot deadlocks. So I'm
> not sure why you're seeing a hang.

Ok. My hesitancy here was in part due to my understanding the entropy
exhaustion hangs weren't supposed to be an issue anymore (thanks,
btw).

> It is worth noting that in those early boot test-case scenarios,
> before the RNG initializes, get_random_u32_below() will be somewhat
> slower than it normally is, and also slower than prandom_u32_state().
> (But only in this early boot scenario edge case; this isn't a general
> statement about speed.) It's possible that in your QEMU machine,
> things are slow enough that you're simply noticing the difference. On
> my system, however, I replaced `get_random_u32_below()` with `static
> u32 x; return ++x % ceil;` and I didn't see any difference running it
> under TCG -- it took about 7 seconds either way.
>
> So, from my perspective, you shouldn't see any hang. That function
> never blocks. I'm happy to look more into what's happening on your end
> though. Maybe share your .config and qemu command line and I'll see if
> I can repro?

Yeah, it may just be that the real RNG is slow enough that I'm hitting
the hung task watchdog?
(I'm running with 64 cpus, so the test is trying to use 128 threads
all hitting get_random_u32_below over and over to create their own
random order of 16 locks)

When the softlockup watchdog hits, the trace looks like:
[ 64.268881] _get_random_bytes+0x3b/0x140
[ 64.268881] ? __kmem_cache_alloc_node+0x17a/0x310
[ 64.268881] ? get_random_order+0x28/0xa0
[ 64.268881] get_random_u32+0x1c2/0x1e0
[ 64.268881] __get_random_u32_below+0xd/0x60
[ 64.268881] get_random_order+0x60/0xa0
[ 64.268881] stress_inorder_work+0x26/0x390
[ 64.268881] ? lock_acquire+0xd4/0x290
[ 64.268881] process_one_work+0x267/0x520
[ 64.268881] worker_thread+0x4a/0x390
[ 64.268881] ? __pfx_worker_thread+0x10/0x10
[ 64.268881] kthread+0xf5/0x130
[ 64.268881] ? __pfx_kthread+0x10/0x10
[ 64.268881] ret_from_fork+0x2b/0x40
[ 64.268881] ? __pfx_kthread+0x10/0x10
[ 64.268881] ret_from_fork_asm+0x1b/0x30


Attached is the defconfig I just used.

My script to run is below.

thanks
-john

#base boot args
kcmd="root=/dev/vda2 console=ttyS0 earlycon "
#debug helper
kcmd+="ftrace_dump_on_oops sysrq_always_enabled debug_boot_weak_hash "
#locktorture bits
kcmd+="torture.random_shuffle=1 locktorture.writer_fifo=1
locktorture.torture_type=mutex_lock locktorture.nested_locks=8
locktorture.rt_boost=1 locktorture.rt_boost_factor=50
locktorture.stutter=0 "

cpu_count=64
mem_size="6G"
#use_kvm="-enable-kvm"
device_config="-smp ${cpu_count} -m ${mem_size} ${use_kvm} \
-drive if=virtio,format=qcow2,file=ubuntu.img \
-device virtio-scsi-pci,id=scsi0 \
-object rng-random,filename=/dev/urandom,id=rng0\
-device virtio-rng-pci,rng=rng0 \
-device virtio-net-pci,netdev=net0 \
-netdev user,id=net0,hostfwd=tcp::8022-:22 \
-chardev stdio,id=char0,mux=on,logfile=serial.log,signal=off \
-serial chardev:char0 -mon chardev=char0 \
-nographic \
"
kbin="bzImage"

qemu-system-x86_64 \
-kernel "${kbin}" -initrd initrd.img -append "${kcmd}" \
${device_config}


Attachments:
defconfig (26.39 kB)

2023-08-08 22:50:18

by John Stultz

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/3] test-ww_mutex: Use prng instead of rng to avoid hangs at bootup

On Tue, Aug 8, 2023 at 11:20 AM John Stultz <[email protected]> wrote:
> On Tue, Aug 8, 2023 at 7:05 AM Jason A. Donenfeld <[email protected]> wrote:
> > So, from my perspective, you shouldn't see any hang. That function
> > never blocks. I'm happy to look more into what's happening on your end
> > though. Maybe share your .config and qemu command line and I'll see if
> > I can repro?
>
> Yeah, it may just be that the real RNG is slow enough that I'm hitting
> the hung task watchdog?
> (I'm running with 64 cpus, so the test is trying to use 128 threads
> all hitting get_random_u32_below over and over to create their own
> random order of 16 locks)

Just following up on this point, I went through and disabled all the
hung task and delay detection (and pushed the rcu stall boundary up to
two minutes), and indeed the test did complete without actually
hanging. However, the test took something like 90 seconds to finish
using the get_random_u32_below() calls, whereas with this patch it
finishes in ~18s.

So indeed it's not blocking, just not fast enough to avoid the hung
task watchdogs in this admittedly contrived case (though one that has
been helpful in uncovering issues with proposed scheduler changes) .

I'll try to rework the commit message so the above is clear and resubmit.

thanks
-john