2022-06-23 05:13:45

by Gregory Erwin

[permalink] [raw]
Subject: ath9k: hwrng blocks for several minutes when phy is un-associated

Hello,

I bisected down to commit [fcd09c90c3c5] "ath9k: use hw_random API instead of
directly dumping into random.c'' while investigating a long delay when entering
suspend on kernels v5.18 onward. There are other reports of hangs or
unresponsiveness at https://bugs.archlinux.org/task/75138 with some more info.

AFAIKT, the issue is triggered by the ath9k hwrng when the interface is up,
but not associated with any AP. In this state, 'dd if=/dev/hwrng' will block
for up to 231 seconds before finally returning an input/output error. Similarly,
I get a kernel log message "hwrng: no data available" every 231 seconds.

The hwrng will unblock when attempting to connect to an SSID that doesn't exist,
but not when performing a scan, so I'm guessing AR_PHY_TST_ADC only produces new
data when the phy is transmitting.

Admittedly, I don't actually know if this blocking behavior is
expected or not, but it certainly seems undesirable.


2022-06-23 12:22:54

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: ath9k: hwrng blocks for several minutes when phy is un-associated

Hi Gregory,

On Wed, Jun 22, 2022 at 10:08:15PM -0700, Gregory Erwin wrote:
> Hello,
>
> I bisected down to commit [fcd09c90c3c5] "ath9k: use hw_random API instead of
> directly dumping into random.c'' while investigating a long delay when entering
> suspend on kernels v5.18 onward. There are other reports of hangs or
> unresponsiveness at https://bugs.archlinux.org/task/75138 with some more info.
>
> AFAIKT, the issue is triggered by the ath9k hwrng when the interface is up,
> but not associated with any AP. In this state, 'dd if=/dev/hwrng' will block
> for up to 231 seconds before finally returning an input/output error. Similarly,
> I get a kernel log message "hwrng: no data available" every 231 seconds.
>
> The hwrng will unblock when attempting to connect to an SSID that doesn't exist,
> but not when performing a scan, so I'm guessing AR_PHY_TST_ADC only produces new
> data when the phy is transmitting.
>
> Admittedly, I don't actually know if this blocking behavior is
> expected or not, but it certainly seems undesirable.

Thanks for the report. I wish somebody from one of those bug reports
would have emailed earlier.

I don't have hardware to test this, but could you let me know if the
below patch does something? I'm sort of guessing, but maybe this is
right?

Jason

diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..a6291f5f0d47 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -80,7 +80,7 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
bytes_read += max & 3UL;
memzero_explicit(&word, sizeof(word));
}
- if (!wait || !max || likely(bytes_read) || fail_stats > 110)
+ if (!wait || !max || likely(bytes_read) || fail_stats > 110 || kthread_should_stop())
break;

msleep_interruptible(ath9k_rng_delay_get(++fail_stats));

2022-06-23 12:30:41

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: ath9k: hwrng blocks for several minutes when phy is un-associated

Or perhaps more simply:

diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..5b44cd918c2b 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -83,7 +83,8 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
if (!wait || !max || likely(bytes_read) || fail_stats > 110)
break;

- msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
+ if (msleep_interruptible(ath9k_rng_delay_get(++fail_stats)))
+ break;
}

if (wait && !bytes_read && max)

2022-06-24 00:13:59

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: ath9k: hwrng blocks for several minutes when phy is un-associated

Hey again,

On Fri, Jun 24, 2022 at 1:47 AM Jason A. Donenfeld <[email protected]> wrote:
>
> Hey Gregory,
>
> On Fri, Jun 24, 2022 at 1:36 AM Gregory Erwin <[email protected]> wrote:
> >
> > No luck.
> >
> > The first patch caused a warning and oops with ath9k_rng_read() at the
> > top of the call stack when reading from /dev/hwrng:
> > WARNING: CPU: 1 PID: 454 at kernel/kthread.c:75 kthread_should_stop+0x2a/0x30
> > BUG: kernel NULL pointer dereference, address: 0000000000000000
> >
> > The second didn't have a noticeable effect, for better or worse.
>
> Alright. That's actually getting us somewhere. So the path in question
> here is from reading /dev/hwrng, not from the kthread that's doing the
> same read.
>
> Can you do a `cat /dev/hwrng > /dev/null`, and then do whatever it is
> you do that causes everything to hang, and then while things are hung
> in the bad way, look at the contents of /proc/[the pid of the cat you
> just ran]/stack?

There's another flow I'm interested in. You said it prevents the
system from sleeping. Does it also make a `ip link set wlan0 down`
hang too? If so, could you send the `/proc/[pid of ip link set
down]/stack ` of a hung ip process? That seems like the more relevant
deadlock to look into.

Jason

2022-06-24 00:52:57

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: ath9k: hwrng blocks for several minutes when phy is un-associated

Hey again again,

On Fri, Jun 24, 2022 at 02:01:22AM +0200, Jason A. Donenfeld wrote:
> Hey again,
>
> On Fri, Jun 24, 2022 at 1:47 AM Jason A. Donenfeld <[email protected]> wrote:
> >
> > Hey Gregory,
> >
> > On Fri, Jun 24, 2022 at 1:36 AM Gregory Erwin <[email protected]> wrote:
> > >
> > > No luck.
> > >
> > > The first patch caused a warning and oops with ath9k_rng_read() at the
> > > top of the call stack when reading from /dev/hwrng:
> > > WARNING: CPU: 1 PID: 454 at kernel/kthread.c:75 kthread_should_stop+0x2a/0x30
> > > BUG: kernel NULL pointer dereference, address: 0000000000000000
> > >
> > > The second didn't have a noticeable effect, for better or worse.
> >
> > Alright. That's actually getting us somewhere. So the path in question
> > here is from reading /dev/hwrng, not from the kthread that's doing the
> > same read.
> >
> > Can you do a `cat /dev/hwrng > /dev/null`, and then do whatever it is
> > you do that causes everything to hang, and then while things are hung
> > in the bad way, look at the contents of /proc/[the pid of the cat you
> > just ran]/stack?
>
> There's another flow I'm interested in. You said it prevents the
> system from sleeping. Does it also make a `ip link set wlan0 down`
> hang too? If so, could you send the `/proc/[pid of ip link set
> down]/stack ` of a hung ip process? That seems like the more relevant
> deadlock to look into.

I think I have a plausible theory. I'll send a real patch, and you can
test it. Incoming shortly...

Jason