2024-03-18 07:54:32

by Guoyong Wang

[permalink] [raw]
Subject: [PATCH] random: Fix the issue of '_might_sleep' function running in an atomic contex

'input_handle_event' runs in an atomic context
(spinlock). In rare instances, it may call
the '_might_sleep' function, which could trigger
a kernel exception.

Backtrace:
[<ffffffd613025ba0>] die+0xa8/0x2fc
[<ffffffd613027428>] bug_handler+0x44/0xec
[<ffffffd613016964>] brk_handler+0x90/0x144
[<ffffffd613041e58>] do_debug_exception+0xa0/0x148
[<ffffffd61400c208>] el1_dbg+0x60/0x7c
[<ffffffd61400c000>] el1h_64_sync_handler+0x38/0x90
[<ffffffd613011294>] el1h_64_sync+0x64/0x6c
[<ffffffd613102d88>] __might_resched+0x1fc/0x2e8
[<ffffffd613102b54>] __might_sleep+0x44/0x7c
[<ffffffd6130b6eac>] cpus_read_lock+0x1c/0xec
[<ffffffd6132c2820>] static_key_enable+0x14/0x38
[<ffffffd61400ac08>] crng_set_ready+0x14/0x28
[<ffffffd6130df4dc>] execute_in_process_context+0xb8/0xf8
[<ffffffd61400ab30>] _credit_init_bits+0x118/0x1dc
[<ffffffd6138580c8>] add_timer_randomness+0x264/0x270
[<ffffffd613857e54>] add_input_randomness+0x38/0x48
[<ffffffd613a80f94>] input_handle_event+0x2b8/0x490
[<ffffffd613a81310>] input_event+0x6c/0x98

Signed-off-by: Guoyong Wang <[email protected]>
---
drivers/char/random.c | 2 +-
include/linux/workqueue.h | 1 +
kernel/workqueue.c | 26 ++++++++++++++++++++++++++
3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 456be28ba67c..00be9426a6fc 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -719,7 +719,7 @@ static void __cold _credit_init_bits(size_t bits)
if (orig < POOL_READY_BITS && new >= POOL_READY_BITS) {
crng_reseed(NULL); /* Sets crng_init to CRNG_READY under base_crng.lock. */
if (static_key_initialized)
- execute_in_process_context(crng_set_ready, &set_ready);
+ execute_in_non_atomic_context(crng_set_ready, &set_ready);
atomic_notifier_call_chain(&random_ready_notifier, 0, NULL);
wake_up_interruptible(&crng_init_wait);
kill_fasync(&fasync, SIGIO, POLL_IN);
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 158784dd189a..eb17c62d23aa 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -550,6 +550,7 @@ extern void drain_workqueue(struct workqueue_struct *wq);
extern int schedule_on_each_cpu(work_func_t func);

int execute_in_process_context(work_func_t fn, struct execute_work *);
+int execute_in_non_atomic_context(work_func_t fn, struct execute_work *ew);

extern bool flush_work(struct work_struct *work);
extern bool cancel_work(struct work_struct *work);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index bf2bdac46843..8f212346da7a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4449,6 +4449,32 @@ int execute_in_process_context(work_func_t fn, struct execute_work *ew)
}
EXPORT_SYMBOL_GPL(execute_in_process_context);

+/**
+ * execute_in_non_atomic_context - reliably execute the routine with user context
+ * @fn: the function to execute
+ * @ew: guaranteed storage for the execute work structure (must
+ * be available when the work executes)
+ *
+ * Schedules the function for delayed execution if atomic context is available,
+ * otherwise executes the function immediately .
+ *
+ * Return: 0 - function was executed
+ * 1 - function was scheduled for execution
+ */
+int execute_in_non_atomic_context(work_func_t fn, struct execute_work *ew)
+{
+ if (!in_atomic()) {
+ fn(&ew->work);
+ return 0;
+ }
+
+ INIT_WORK(&ew->work, fn);
+ schedule_work(&ew->work);
+
+ return 1;
+}
+EXPORT_SYMBOL_GPL(execute_in_non_atomic_context);
+
/**
* free_workqueue_attrs - free a workqueue_attrs
* @attrs: workqueue_attrs to free
--
2.18.0



2024-03-18 20:00:54

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH] random: Fix the issue of '_might_sleep' function running in an atomic contex

Hi Guoyong,

On Mon, Mar 18, 2024 at 03:53:27PM +0800, Guoyong Wang wrote:
> 'input_handle_event' runs in an atomic context
> (spinlock). In rare instances, it may call
> the '_might_sleep' function, which could trigger
> a kernel exception.
>
> Backtrace:
> [<ffffffd613025ba0>] die+0xa8/0x2fc
> [<ffffffd613027428>] bug_handler+0x44/0xec
> [<ffffffd613016964>] brk_handler+0x90/0x144
> [<ffffffd613041e58>] do_debug_exception+0xa0/0x148
> [<ffffffd61400c208>] el1_dbg+0x60/0x7c
> [<ffffffd61400c000>] el1h_64_sync_handler+0x38/0x90
> [<ffffffd613011294>] el1h_64_sync+0x64/0x6c
> [<ffffffd613102d88>] __might_resched+0x1fc/0x2e8
> [<ffffffd613102b54>] __might_sleep+0x44/0x7c
> [<ffffffd6130b6eac>] cpus_read_lock+0x1c/0xec
> [<ffffffd6132c2820>] static_key_enable+0x14/0x38
> [<ffffffd61400ac08>] crng_set_ready+0x14/0x28
> [<ffffffd6130df4dc>] execute_in_process_context+0xb8/0xf8
> [<ffffffd61400ab30>] _credit_init_bits+0x118/0x1dc
> [<ffffffd6138580c8>] add_timer_randomness+0x264/0x270
> [<ffffffd613857e54>] add_input_randomness+0x38/0x48
> [<ffffffd613a80f94>] input_handle_event+0x2b8/0x490
> [<ffffffd613a81310>] input_event+0x6c/0x98

Thanks for reporting this.

I'm wondering, though, rather than introducing a second function, maybe
execute_in_process_context() should just gain a `&& !in_atomic()`.
That'd make things a bit simpler.

However, I'm pretty sure in_atomic() isn't actually a reliable way of
determining that, depending on config. So maybe this should just call
the worker always (if system_wq isn't null).

Alternatively, any chance the call to add_input_randomness() could be
moved outside the spinlock, or does this not look possible?

Jason

2024-03-19 09:31:31

by Guoyong Wang

[permalink] [raw]
Subject: Re: [PATCH] random: Fix the issue of '_might_sleep' function running in an atomic contex

On Mon, 18 Mar 2024 21:00:42 +0100, Jason A. Donenfeld wrote:
> I'm wondering, though, rather than introducing a second function, maybe
> execute_in_process_context() should just gain a `&& !in_atomic()`.
> That'd make things a bit simpler.

> However, I'm pretty sure in_atomic() isn't actually a reliable way of
> determining that, depending on config. So maybe this should just call
> the worker always (if system_wq isn't null).

> Alternatively, any chance the call to add_input_randomness() could be
> moved outside the spinlock, or does this not look possible?

Hi Jason,

Thanks for your suggestions.

I am inclined to accept your second suggestion. My reluctance to accept
the first is due to the concern that "&& !in_atomic()" could potentially
alter the original meaning of the 'execute_in_process_context' interface.
Regarding the third suggestion, modifying the logic associated with 'input'
is not recommended.

2024-03-20 01:09:38

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH] random: Fix the issue of '_might_sleep' function running in an atomic contex

On Tue, Mar 19, 2024 at 05:30:55PM +0800, Guoyong Wang wrote:
> On Mon, 18 Mar 2024 21:00:42 +0100, Jason A. Donenfeld wrote:
> > I'm wondering, though, rather than introducing a second function, maybe
> > execute_in_process_context() should just gain a `&& !in_atomic()`.
> > That'd make things a bit simpler.
>
> > However, I'm pretty sure in_atomic() isn't actually a reliable way of
> > determining that, depending on config. So maybe this should just call
> > the worker always (if system_wq isn't null).
>
> > Alternatively, any chance the call to add_input_randomness() could be
> > moved outside the spinlock, or does this not look possible?
>
> Hi Jason,
>
> Thanks for your suggestions.
>
> I am inclined to accept your second suggestion. My reluctance to accept
> the first is due to the concern that "&& !in_atomic()" could potentially
> alter the original meaning of the 'execute_in_process_context' interface.
> Regarding the third suggestion, modifying the logic associated with 'input'
> is not recommended.

Doesn't something like the below seem simplest? Just move the call out
of the spinlock and we're done.

diff --git a/drivers/input/input-core-private.h b/drivers/input/input-core-private.h
index 116834cf8868..717f239e28d0 100644
--- a/drivers/input/input-core-private.h
+++ b/drivers/input/input-core-private.h
@@ -10,7 +10,7 @@
struct input_dev;

void input_mt_release_slots(struct input_dev *dev);
-void input_handle_event(struct input_dev *dev,
+bool input_handle_event(struct input_dev *dev,
unsigned int type, unsigned int code, int value);

#endif /* _INPUT_CORE_PRIVATE_H */
diff --git a/drivers/input/input.c b/drivers/input/input.c
index f71ea4fb173f..2faf46218c66 100644
--- a/drivers/input/input.c
+++ b/drivers/input/input.c
@@ -391,20 +391,22 @@ static void input_event_dispose(struct input_dev *dev, int disposition,
}
}

-void input_handle_event(struct input_dev *dev,
+bool input_handle_event(struct input_dev *dev,
unsigned int type, unsigned int code, int value)
{
int disposition;
+ bool should_contribute_to_rng = false;

lockdep_assert_held(&dev->event_lock);

disposition = input_get_disposition(dev, type, code, &value);
if (disposition != INPUT_IGNORE_EVENT) {
if (type != EV_SYN)
- add_input_randomness(type, code, value);
+ should_contribute_to_rng = true;

input_event_dispose(dev, disposition, type, code, value);
}
+ return should_contribute_to_rng;
}

/**
@@ -428,12 +430,15 @@ void input_event(struct input_dev *dev,
unsigned int type, unsigned int code, int value)
{
unsigned long flags;
+ bool should_contribute_to_rng;

if (is_event_supported(type, dev->evbit, EV_MAX)) {

spin_lock_irqsave(&dev->event_lock, flags);
- input_handle_event(dev, type, code, value);
+ should_contribute_to_rng = input_handle_event(dev, type, code, value);
spin_unlock_irqrestore(&dev->event_lock, flags);
+ if (should_contribute_to_rng)
+ add_input_randomness(type, code, value);
}
}
EXPORT_SYMBOL(input_event);


2024-03-20 09:04:25

by Guoyong Wang

[permalink] [raw]
Subject: Re: [PATCH] random: Fix the issue of '_might_sleep' function running in an atomic contex

On Web, 20 Mar 2024 02:09:21 +0100, Jason A. Donenfeld wrote:
>> Hi Jason,
>>
>> Thanks for your suggestions.
>>
>> I am inclined to accept your second suggestion. My reluctance to accept
>> the first is due to the concern that "&& !in_atomic()" could potentially
>> alter the original meaning of the 'execute_in_process_context' interface.
>> Regarding the third suggestion, modifying the logic associated with 'input'
>> is not recommended.
>
> Doesn't something like the below seem simplest? Just move the call out
> of the spinlock and we're done.
>
> diff --git a/drivers/input/input-core-private.h b/drivers/input/input-core-private.h
> index 116834cf8868..717f239e28d0 100644
> --- a/drivers/input/input-core-private.h
> +++ b/drivers/input/input-core-private.h
> @@ -10,7 +10,7 @@
> struct input_dev;
>
> void input_mt_release_slots(struct input_dev *dev);
> -void input_handle_event(struct input_dev *dev,
> +bool input_handle_event(struct input_dev *dev,
> unsigned int type, unsigned int code, int value);
>
> #endif /* _INPUT_CORE_PRIVATE_H */
> diff --git a/drivers/input/input.c b/drivers/input/input.c
> index f71ea4fb173f..2faf46218c66 100644
> --- a/drivers/input/input.c
> +++ b/drivers/input/input.c
> @@ -391,20 +391,22 @@ static void input_event_dispose(struct input_dev *dev, int disposition,
> }
> }
>
> -void input_handle_event(struct input_dev *dev,
> +bool input_handle_event(struct input_dev *dev,
> unsigned int type, unsigned int code, int value)
> {
> int disposition;
> +bool should_contribute_to_rng = false;
>
> lockdep_assert_held(&dev->event_lock);
>
> disposition = input_get_disposition(dev, type, code, &value);
> if (disposition != INPUT_IGNORE_EVENT) {
> if (type != EV_SYN)
> -add_input_randomness(type, code, value);
> +should_contribute_to_rng = true;
>
> input_event_dispose(dev, disposition, type, code, value);
> }
> +return should_contribute_to_rng;
> }
>
> /**
> @@ -428,12 +430,15 @@ void input_event(struct input_dev *dev,
> unsigned int type, unsigned int code, int value)
> {
> unsigned long flags;
> +bool should_contribute_to_rng;
>
> if (is_event_supported(type, dev->evbit, EV_MAX)) {
>
> spin_lock_irqsave(&dev->event_lock, flags);
> -input_handle_event(dev, type, code, value);
> +should_contribute_to_rng = input_handle_event(dev, type, code, value);
> spin_unlock_irqrestore(&dev->event_lock, flags);
> +if (should_contribute_to_rng)
> +add_input_randomness(type, code, value);
> }
> }
> EXPORT_SYMBOL(input_event);

Hi Jason,

Your proposal is not suitable for scenarios where input_event is called within an atomic context.

For example:
spin_lock(&dev->spinlock);
input_event(dev, EV_ABS, ABS_X, x);
spin_unlock(&dev->spinlock);

2024-04-02 08:12:48

by Guoyong Wang

[permalink] [raw]
Subject: Re: [PATCH] random: Fix the issue of '_might_sleep' function running in an atomic contex

On Web, 20 Mar 2024 02:09:21 +0100, Jason A. Donenfeld wrote:
>> Hi Jason,
>>
>> Thanks for your suggestions.
>>
>> I am inclined to accept your second suggestion. My reluctance to accept
>> the first is due to the concern that "&& !in_atomic()" could potentially
>> alter the original meaning of the 'execute_in_process_context' interface.
>> Regarding the third suggestion, modifying the logic associated with 'input'
>> is not recommended.
>
> Doesn't something like the below seem simplest? Just move the call out
> of the spinlock and we're done.
>
> diff --git a/drivers/input/input-core-private.h b/drivers/input/input-core-private.h
> index 116834cf8868..717f239e28d0 100644
> --- a/drivers/input/input-core-private.h
> +++ b/drivers/input/input-core-private.h
> @@ -10,7 +10,7 @@
> struct input_dev;
>
> void input_mt_release_slots(struct input_dev *dev);
> -void input_handle_event(struct input_dev *dev,
> +bool input_handle_event(struct input_dev *dev,
> unsigned int type, unsigned int code, int value);
>
> #endif /* _INPUT_CORE_PRIVATE_H */
> diff --git a/drivers/input/input.c b/drivers/input/input.c
> index f71ea4fb173f..2faf46218c66 100644
> --- a/drivers/input/input.c
> +++ b/drivers/input/input.c
> @@ -391,20 +391,22 @@ static void input_event_dispose(struct input_dev *dev, int disposition,
> }
> }
>
> -void input_handle_event(struct input_dev *dev,
> +bool input_handle_event(struct input_dev *dev,
> unsigned int type, unsigned int code, int value)
> {
> int disposition;
> +bool should_contribute_to_rng = false;
>
> lockdep_assert_held(&dev->event_lock);
>
> disposition = input_get_disposition(dev, type, code, &value);
> if (disposition != INPUT_IGNORE_EVENT) {
> if (type != EV_SYN)
> -add_input_randomness(type, code, value);
> +should_contribute_to_rng = true;
>
> input_event_dispose(dev, disposition, type, code, value);
> }
> +return should_contribute_to_rng;
> }
>
> /**
> @@ -428,12 +430,15 @@ void input_event(struct input_dev *dev,
> unsigned int type, unsigned int code, int value)
> {
> unsigned long flags;
> +bool should_contribute_to_rng;
>
> if (is_event_supported(type, dev->evbit, EV_MAX)) {
>
> spin_lock_irqsave(&dev->event_lock, flags);
> -input_handle_event(dev, type, code, value);
> +should_contribute_to_rng = input_handle_event(dev, type, code, value);
> spin_unlock_irqrestore(&dev->event_lock, flags);
> +if (should_contribute_to_rng)
> +add_input_randomness(type, code, value);
> }
> }
> EXPORT_SYMBOL(input_event);

Hi Jason,

As I mentioned last time: Your solution may not be applicable when 'input_event' is executed in users spinlock.
What are you thoughts on this? I'm looking forward to your suggestions so we can reach an agreement and expedite
the upstream process, Thanks!


2024-04-17 12:12:20

by Jason A. Donenfeld

[permalink] [raw]
Subject: [PATCH] random: handle creditable entropy from atomic process context

The entropy accounting changes a static key when the RNG has
initialized, since it only ever initializes once. Static key changes,
however, cannot be made from atomic context, so depending on where the
last creditable entropy comes from, the static key change might need to
be deferred to a worker.

Previously the code used the execute_in_process_context() helper
function, which accounts for whether or not the caller is
in_interrupt(). However, that doesn't account for the case where the
caller is actually in process context but is holding a spinlock.

This turned out to be the case with input_handle_event() in
drivers/input/input.c contributing entropy:

[<ffffffd613025ba0>] die+0xa8/0x2fc
[<ffffffd613027428>] bug_handler+0x44/0xec
[<ffffffd613016964>] brk_handler+0x90/0x144
[<ffffffd613041e58>] do_debug_exception+0xa0/0x148
[<ffffffd61400c208>] el1_dbg+0x60/0x7c
[<ffffffd61400c000>] el1h_64_sync_handler+0x38/0x90
[<ffffffd613011294>] el1h_64_sync+0x64/0x6c
[<ffffffd613102d88>] __might_resched+0x1fc/0x2e8
[<ffffffd613102b54>] __might_sleep+0x44/0x7c
[<ffffffd6130b6eac>] cpus_read_lock+0x1c/0xec
[<ffffffd6132c2820>] static_key_enable+0x14/0x38
[<ffffffd61400ac08>] crng_set_ready+0x14/0x28
[<ffffffd6130df4dc>] execute_in_process_context+0xb8/0xf8
[<ffffffd61400ab30>] _credit_init_bits+0x118/0x1dc
[<ffffffd6138580c8>] add_timer_randomness+0x264/0x270
[<ffffffd613857e54>] add_input_randomness+0x38/0x48
[<ffffffd613a80f94>] input_handle_event+0x2b8/0x490
[<ffffffd613a81310>] input_event+0x6c/0x98

According to Guoyong, it's not really possible to refactor the various
drivers to never hold a spinlock there. And in_atomic() isn't reliable.

So, rather than trying to be too fancy, just punt the change in the
static key to a workqueue always. There's basically no drawback of doing
this, as the code already needed to account for the static key not
changing immediately, and given that it's just an optimization, there's
not exactly a hurry to change the static key right away, so deferal is
fine.

Reported-by: Guoyong Wang <[email protected]>
Cc: [email protected]
Fixes: f5bda35fba61 ("random: use static branch for crng_ready()")
Signed-off-by: Jason A. Donenfeld <[email protected]>
---
Guoyong- can you test this and tell me whether it fixes the problem you
were seeing? If so, I'll try to get this sent up for 6.9. -Jason

drivers/char/random.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 456be28ba67c..2597cb43f438 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -702,7 +702,7 @@ static void extract_entropy(void *buf, size_t len)

static void __cold _credit_init_bits(size_t bits)
{
- static struct execute_work set_ready;
+ static DECLARE_WORK(set_ready, crng_set_ready);
unsigned int new, orig, add;
unsigned long flags;

@@ -718,8 +718,8 @@ static void __cold _credit_init_bits(size_t bits)

if (orig < POOL_READY_BITS && new >= POOL_READY_BITS) {
crng_reseed(NULL); /* Sets crng_init to CRNG_READY under base_crng.lock. */
- if (static_key_initialized)
- execute_in_process_context(crng_set_ready, &set_ready);
+ if (static_key_initialized && system_unbound_wq)
+ queue_work(system_unbound_wq, &set_ready);
atomic_notifier_call_chain(&random_ready_notifier, 0, NULL);
wake_up_interruptible(&crng_init_wait);
kill_fasync(&fasync, SIGIO, POLL_IN);
@@ -890,8 +890,8 @@ void __init random_init(void)

/*
* If we were initialized by the cpu or bootloader before jump labels
- * are initialized, then we should enable the static branch here, where
- * it's guaranteed that jump labels have been initialized.
+ * or workqueues are initialized, then we should enable the static
+ * branch here, where it's guaranteed that these have been initialized.
*/
if (!static_branch_likely(&crng_is_ready) && crng_init >= CRNG_READY)
crng_set_ready(NULL);
--
2.44.0


2024-04-19 08:41:17

by Guoyong Wang

[permalink] [raw]
Subject: Re: [PATCH] random: Fix the issue of '_might_sleep' function running in an atomic contex

On Wed, 17 Apr 2024 14:01:11 +0200, Jason A. Donenfeld wrote:
> The entropy accounting changes a static key when the RNG has
> initialized, since it only ever initializes once. Static key changes,
> however, cannot be made from atomic context, so depending on where the
> last creditable entropy comes from, the static key change might need to
> be deferred to a worker.
>
> Previously the code used the execute_in_process_context() helper
> function, which accounts for whether or not the caller is
> in_interrupt(). However, that doesn't account for the case where the
> caller is actually in process context but is holding a spinlock.
>
> This turned out to be the case with input_handle_event() in
> drivers/input/input.c contributing entropy:
>
> [<ffffffd613025ba0>] die+0xa8/0x2fc
> [<ffffffd613027428>] bug_handler+0x44/0xec
> [<ffffffd613016964>] brk_handler+0x90/0x144
> [<ffffffd613041e58>] do_debug_exception+0xa0/0x148
> [<ffffffd61400c208>] el1_dbg+0x60/0x7c
> [<ffffffd61400c000>] el1h_64_sync_handler+0x38/0x90
> [<ffffffd613011294>] el1h_64_sync+0x64/0x6c
> [<ffffffd613102d88>] __might_resched+0x1fc/0x2e8
> [<ffffffd613102b54>] __might_sleep+0x44/0x7c
> [<ffffffd6130b6eac>] cpus_read_lock+0x1c/0xec
> [<ffffffd6132c2820>] static_key_enable+0x14/0x38
> [<ffffffd61400ac08>] crng_set_ready+0x14/0x28
> [<ffffffd6130df4dc>] execute_in_process_context+0xb8/0xf8
> [<ffffffd61400ab30>] _credit_init_bits+0x118/0x1dc
> [<ffffffd6138580c8>] add_timer_randomness+0x264/0x270
> [<ffffffd613857e54>] add_input_randomness+0x38/0x48
> [<ffffffd613a80f94>] input_handle_event+0x2b8/0x490
> [<ffffffd613a81310>] input_event+0x6c/0x98
>
> According to Guoyong, it's not really possible to refactor the various
> drivers to never hold a spinlock there. And in_atomic() isn't reliable.
>
> So, rather than trying to be too fancy, just punt the change in the
> static key to a workqueue always. There's basically no drawback of doing
> this, as the code already needed to account for the static key not
> changing immediately, and given that it's just an optimization, there's
> not exactly a hurry to change the static key right away, so deferal is
> fine.
>
> Reported-by: Guoyong Wang <[email protected]>
> Cc: [email protected]
> Fixes: f5bda35fba61 ("random: use static branch for crng_ready()")
> Signed-off-by: Jason A. Donenfeld <[email protected]>
> ---
> Guoyong- can you test this and tell me whether it fixes the problem you
> were seeing? If so, I'll try to get this sent up for 6.9. -Jason
>
> drivers/char/random.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 456be28ba67c..2597cb43f438 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -702,7 +702,7 @@ static void extract_entropy(void *buf, size_t len)
>
> static void __cold _credit_init_bits(size_t bits)
> {
> - static struct execute_work set_ready;
> + static DECLARE_WORK(set_ready, crng_set_ready);
> unsigned int new, orig, add;
> unsigned long flags;
>
> @@ -718,8 +718,8 @@ static void __cold _credit_init_bits(size_t bits)
>
> if (orig < POOL_READY_BITS && new >= POOL_READY_BITS) {
> crng_reseed(NULL); /* Sets crng_init to CRNG_READY under base_crng.lock. */
> - if (static_key_initialized)
> - execute_in_process_context(crng_set_ready, &set_ready);
> + if (static_key_initialized && system_unbound_wq)
> + queue_work(system_unbound_wq, &set_ready);
> atomic_notifier_call_chain(&random_ready_notifier, 0, NULL);
> wake_up_interruptible(&crng_init_wait);
> kill_fasync(&fasync, SIGIO, POLL_IN);
> @@ -890,8 +890,8 @@ void __init random_init(void)
>
> /*
> * If we were initialized by the cpu or bootloader before jump labels
> - * are initialized, then we should enable the static branch here, where
> - * it's guaranteed that jump labels have been initialized.
> + * or workqueues are initialized, then we should enable the static
> + * branch here, where it's guaranteed that these have been initialized.
> */
> if (!static_branch_likely(&crng_is_ready) && crng_init >= CRNG_READY)
> crng_set_ready(NULL);
> --
> 2.44.0

Hi Jason,

Thanks for your feedback. We concur with the proposed change and have verified that it works well
in our tests. Next, I will provide a patch v2 for the changes discussed.


2024-04-19 08:56:00

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH] random: Fix the issue of '_might_sleep' function running in an atomic contex

On Fri, Apr 19, 2024 at 04:41:12PM +0800, Guoyong Wang wrote:
> On Wed, 17 Apr 2024 14:01:11 +0200, Jason A. Donenfeld wrote:
> > The entropy accounting changes a static key when the RNG has
> > initialized, since it only ever initializes once. Static key changes,
> > however, cannot be made from atomic context, so depending on where the
> > last creditable entropy comes from, the static key change might need to
> > be deferred to a worker.
> >
> > Previously the code used the execute_in_process_context() helper
> > function, which accounts for whether or not the caller is
> > in_interrupt(). However, that doesn't account for the case where the
> > caller is actually in process context but is holding a spinlock.
> >
> > This turned out to be the case with input_handle_event() in
> > drivers/input/input.c contributing entropy:
> >
> > [<ffffffd613025ba0>] die+0xa8/0x2fc
> > [<ffffffd613027428>] bug_handler+0x44/0xec
> > [<ffffffd613016964>] brk_handler+0x90/0x144
> > [<ffffffd613041e58>] do_debug_exception+0xa0/0x148
> > [<ffffffd61400c208>] el1_dbg+0x60/0x7c
> > [<ffffffd61400c000>] el1h_64_sync_handler+0x38/0x90
> > [<ffffffd613011294>] el1h_64_sync+0x64/0x6c
> > [<ffffffd613102d88>] __might_resched+0x1fc/0x2e8
> > [<ffffffd613102b54>] __might_sleep+0x44/0x7c
> > [<ffffffd6130b6eac>] cpus_read_lock+0x1c/0xec
> > [<ffffffd6132c2820>] static_key_enable+0x14/0x38
> > [<ffffffd61400ac08>] crng_set_ready+0x14/0x28
> > [<ffffffd6130df4dc>] execute_in_process_context+0xb8/0xf8
> > [<ffffffd61400ab30>] _credit_init_bits+0x118/0x1dc
> > [<ffffffd6138580c8>] add_timer_randomness+0x264/0x270
> > [<ffffffd613857e54>] add_input_randomness+0x38/0x48
> > [<ffffffd613a80f94>] input_handle_event+0x2b8/0x490
> > [<ffffffd613a81310>] input_event+0x6c/0x98
> >
> > According to Guoyong, it's not really possible to refactor the various
> > drivers to never hold a spinlock there. And in_atomic() isn't reliable.
> >
> > So, rather than trying to be too fancy, just punt the change in the
> > static key to a workqueue always. There's basically no drawback of doing
> > this, as the code already needed to account for the static key not
> > changing immediately, and given that it's just an optimization, there's
> > not exactly a hurry to change the static key right away, so deferal is
> > fine.
> >
> > Reported-by: Guoyong Wang <[email protected]>
> > Cc: [email protected]
> > Fixes: f5bda35fba61 ("random: use static branch for crng_ready()")
> > Signed-off-by: Jason A. Donenfeld <[email protected]>
> > ---
> > Guoyong- can you test this and tell me whether it fixes the problem you
> > were seeing? If so, I'll try to get this sent up for 6.9. -Jason
> >
> > drivers/char/random.c | 10 +++++-----
> > 1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/char/random.c b/drivers/char/random.c
> > index 456be28ba67c..2597cb43f438 100644
> > --- a/drivers/char/random.c
> > +++ b/drivers/char/random.c
> > @@ -702,7 +702,7 @@ static void extract_entropy(void *buf, size_t len)
> >
> > static void __cold _credit_init_bits(size_t bits)
> > {
> > - static struct execute_work set_ready;
> > + static DECLARE_WORK(set_ready, crng_set_ready);
> > unsigned int new, orig, add;
> > unsigned long flags;
> >
> > @@ -718,8 +718,8 @@ static void __cold _credit_init_bits(size_t bits)
> >
> > if (orig < POOL_READY_BITS && new >= POOL_READY_BITS) {
> > crng_reseed(NULL); /* Sets crng_init to CRNG_READY under base_crng.lock. */
> > - if (static_key_initialized)
> > - execute_in_process_context(crng_set_ready, &set_ready);
> > + if (static_key_initialized && system_unbound_wq)
> > + queue_work(system_unbound_wq, &set_ready);
> > atomic_notifier_call_chain(&random_ready_notifier, 0, NULL);
> > wake_up_interruptible(&crng_init_wait);
> > kill_fasync(&fasync, SIGIO, POLL_IN);
> > @@ -890,8 +890,8 @@ void __init random_init(void)
> >
> > /*
> > * If we were initialized by the cpu or bootloader before jump labels
> > - * are initialized, then we should enable the static branch here, where
> > - * it's guaranteed that jump labels have been initialized.
> > + * or workqueues are initialized, then we should enable the static
> > + * branch here, where it's guaranteed that these have been initialized.
> > */
> > if (!static_branch_likely(&crng_is_ready) && crng_init >= CRNG_READY)
> > crng_set_ready(NULL);
> > --
> > 2.44.0
>
> Hi Jason,
>
> Thanks for your feedback. We concur with the proposed change and have verified that it works well
> in our tests. Next, I will provide a patch v2 for the changes discussed.

No need. This is already upstream and has your Reported-by. It'll get
backported to stable too.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e871abcda3b67d0820b4182ebe93435624e9c6a4