2023-07-07 00:24:49

by Waiman Long

[permalink] [raw]
Subject: [PATCH] refscale: Fix use of uninitalized wait_queue_head_t

It was found that running the refscale test might sometimes crash the
kernel with the following error:

[ 8569.952896] BUG: unable to handle page fault for address: ffffffffffffffe8
[ 8569.952900] #PF: supervisor read access in kernel mode
[ 8569.952902] #PF: error_code(0x0000) - not-present page
[ 8569.952904] PGD c4b048067 P4D c4b049067 PUD c4b04b067 PMD 0
[ 8569.952910] Oops: 0000 [#1] PREEMPT_RT SMP NOPTI
[ 8569.952916] Hardware name: Dell Inc. PowerEdge R750/0WMWCR, BIOS 1.2.4 05/28/2021
[ 8569.952917] RIP: 0010:prepare_to_wait_event+0x101/0x190
:
[ 8569.952940] Call Trace:
[ 8569.952941] <TASK>
[ 8569.952944] ref_scale_reader+0x380/0x4a0 [refscale]
[ 8569.952959] kthread+0x10e/0x130
[ 8569.952966] ret_from_fork+0x1f/0x30
[ 8569.952973] </TASK>

This is likely caused by the fact that init_waitqueue_head() is called
after the ref_scale_reader kthread is created. So the kthread may try
to use the waitqueue head before it is properly initialized. Fix this
by initializing the waitqueue head first before kthread creation.

Fixes: 653ed64b01dc ("refperf: Add a test to measure performance of read-side synchronization")
Signed-off-by: Waiman Long <[email protected]>
---
kernel/rcu/refscale.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/rcu/refscale.c b/kernel/rcu/refscale.c
index 1970ce5f22d4..e365d6f8c139 100644
--- a/kernel/rcu/refscale.c
+++ b/kernel/rcu/refscale.c
@@ -1107,12 +1107,11 @@ ref_scale_init(void)
VERBOSE_SCALEOUT("Starting %d reader threads", nreaders);

for (i = 0; i < nreaders; i++) {
+ init_waitqueue_head(&(reader_tasks[i].wq));
firsterr = torture_create_kthread(ref_scale_reader, (void *)i,
reader_tasks[i].task);
if (torture_init_error(firsterr))
goto unwind;
-
- init_waitqueue_head(&(reader_tasks[i].wq));
}

// Main Task
--
2.31.1



2023-07-07 07:45:07

by Qiuxu Zhuo

[permalink] [raw]
Subject: RE: [PATCH] refscale: Fix use of uninitalized wait_queue_head_t

> From: Waiman Long <[email protected]>
> ...
> Subject: [PATCH] refscale: Fix use of uninitalized wait_queue_head_t
>
> It was found that running the refscale test might sometimes crash the kernel
> with the following error:
>
> [ 8569.952896] BUG: unable to handle page fault for address: ffffffffffffffe8
> [ 8569.952900] #PF: supervisor read access in kernel mode [ 8569.952902]
> #PF: error_code(0x0000) - not-present page [ 8569.952904] PGD c4b048067
> P4D c4b049067 PUD c4b04b067 PMD 0 [ 8569.952910] Oops: 0000 [#1]
> PREEMPT_RT SMP NOPTI [ 8569.952916] Hardware name: Dell Inc.
> PowerEdge R750/0WMWCR, BIOS 1.2.4 05/28/2021 [ 8569.952917] RIP:
> 0010:prepare_to_wait_event+0x101/0x190
> :
> [ 8569.952940] Call Trace:
> [ 8569.952941] <TASK>
> [ 8569.952944] ref_scale_reader+0x380/0x4a0 [refscale] [ 8569.952959]
> kthread+0x10e/0x130 [ 8569.952966] ret_from_fork+0x1f/0x30
> [ 8569.952973] </TASK>
>
> This is likely caused by the fact that init_waitqueue_head() is called after the
> ref_scale_reader kthread is created. So the kthread may try to use the
> waitqueue head before it is properly initialized. Fix this by initializing the
> waitqueue head first before kthread creation.
>
> Fixes: 653ed64b01dc ("refperf: Add a test to measure performance of read-
> side synchronization")
> Signed-off-by: Waiman Long <[email protected]>
> ---
> kernel/rcu/refscale.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/refscale.c b/kernel/rcu/refscale.c index
> 1970ce5f22d4..e365d6f8c139 100644
> --- a/kernel/rcu/refscale.c
> +++ b/kernel/rcu/refscale.c
> @@ -1107,12 +1107,11 @@ ref_scale_init(void)
> VERBOSE_SCALEOUT("Starting %d reader threads", nreaders);
>
> for (i = 0; i < nreaders; i++) {
> + init_waitqueue_head(&(reader_tasks[i].wq));

Running checkpatch.pl tool with the " --strict" option, it complained that
"CHECK: Unnecessary parentheses around reader_tasks[i].wq".
I know that you just moved the code position. The tool should have
complained the original code. ????

Other than that, this patch LGTM.

Reviewed-by: Qiuxu Zhuo <[email protected]>

Thanks!
-Qiuxu

> firsterr = torture_create_kthread(ref_scale_reader, (void *)i,
> reader_tasks[i].task);
> if (torture_init_error(firsterr))
> goto unwind;
> -
> - init_waitqueue_head(&(reader_tasks[i].wq));
> }
>
> // Main Task
> --
> 2.31.1

2023-07-07 13:17:35

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] refscale: Fix use of uninitalized wait_queue_head_t

On 7/7/23 03:27, Zhuo, Qiuxu wrote:
>> From: Waiman Long <[email protected]>
>> ...
>> Subject: [PATCH] refscale: Fix use of uninitalized wait_queue_head_t
>>
>> It was found that running the refscale test might sometimes crash the kernel
>> with the following error:
>>
>> [ 8569.952896] BUG: unable to handle page fault for address: ffffffffffffffe8
>> [ 8569.952900] #PF: supervisor read access in kernel mode [ 8569.952902]
>> #PF: error_code(0x0000) - not-present page [ 8569.952904] PGD c4b048067
>> P4D c4b049067 PUD c4b04b067 PMD 0 [ 8569.952910] Oops: 0000 [#1]
>> PREEMPT_RT SMP NOPTI [ 8569.952916] Hardware name: Dell Inc.
>> PowerEdge R750/0WMWCR, BIOS 1.2.4 05/28/2021 [ 8569.952917] RIP:
>> 0010:prepare_to_wait_event+0x101/0x190
>> :
>> [ 8569.952940] Call Trace:
>> [ 8569.952941] <TASK>
>> [ 8569.952944] ref_scale_reader+0x380/0x4a0 [refscale] [ 8569.952959]
>> kthread+0x10e/0x130 [ 8569.952966] ret_from_fork+0x1f/0x30
>> [ 8569.952973] </TASK>
>>
>> This is likely caused by the fact that init_waitqueue_head() is called after the
>> ref_scale_reader kthread is created. So the kthread may try to use the
>> waitqueue head before it is properly initialized. Fix this by initializing the
>> waitqueue head first before kthread creation.
>>
>> Fixes: 653ed64b01dc ("refperf: Add a test to measure performance of read-
>> side synchronization")
>> Signed-off-by: Waiman Long <[email protected]>
>> ---
>> kernel/rcu/refscale.c | 3 +--
>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/kernel/rcu/refscale.c b/kernel/rcu/refscale.c index
>> 1970ce5f22d4..e365d6f8c139 100644
>> --- a/kernel/rcu/refscale.c
>> +++ b/kernel/rcu/refscale.c
>> @@ -1107,12 +1107,11 @@ ref_scale_init(void)
>> VERBOSE_SCALEOUT("Starting %d reader threads", nreaders);
>>
>> for (i = 0; i < nreaders; i++) {
>> + init_waitqueue_head(&(reader_tasks[i].wq));
> Running checkpatch.pl tool with the " --strict" option, it complained that
> "CHECK: Unnecessary parentheses around reader_tasks[i].wq".
> I know that you just moved the code position. The tool should have
> complained the original code. ????
>
> Other than that, this patch LGTM.
>
> Reviewed-by: Qiuxu Zhuo <[email protected]>
>
> Thanks!
> -Qiuxu

Thanks for the review. I will do --strict test next time.

Cheers,
Longman

>
>> firsterr = torture_create_kthread(ref_scale_reader, (void *)i,
>> reader_tasks[i].task);
>> if (torture_init_error(firsterr))
>> goto unwind;
>> -
>> - init_waitqueue_head(&(reader_tasks[i].wq));
>> }
>>
>> // Main Task
>> --
>> 2.31.1


2023-07-07 15:11:10

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH] refscale: Fix use of uninitalized wait_queue_head_t

On Thu, 06 Jul 2023, Waiman Long wrote:

>It was found that running the refscale test might sometimes crash the
>kernel with the following error:
>
>[ 8569.952896] BUG: unable to handle page fault for address: ffffffffffffffe8
>[ 8569.952900] #PF: supervisor read access in kernel mode
>[ 8569.952902] #PF: error_code(0x0000) - not-present page
>[ 8569.952904] PGD c4b048067 P4D c4b049067 PUD c4b04b067 PMD 0
>[ 8569.952910] Oops: 0000 [#1] PREEMPT_RT SMP NOPTI
>[ 8569.952916] Hardware name: Dell Inc. PowerEdge R750/0WMWCR, BIOS 1.2.4 05/28/2021
>[ 8569.952917] RIP: 0010:prepare_to_wait_event+0x101/0x190
> :
>[ 8569.952940] Call Trace:
>[ 8569.952941] <TASK>
>[ 8569.952944] ref_scale_reader+0x380/0x4a0 [refscale]
>[ 8569.952959] kthread+0x10e/0x130
>[ 8569.952966] ret_from_fork+0x1f/0x30
>[ 8569.952973] </TASK>
>
>This is likely caused by the fact that init_waitqueue_head() is called
>after the ref_scale_reader kthread is created. So the kthread may try
>to use the waitqueue head before it is properly initialized. Fix this
>by initializing the waitqueue head first before kthread creation.
>
>Fixes: 653ed64b01dc ("refperf: Add a test to measure performance of read-side synchronization")
>Signed-off-by: Waiman Long <[email protected]>

Strange this wasn't reported sooner.

Reviewed-by: Davidlohr Bueso <[email protected]>

2023-07-07 15:16:48

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] refscale: Fix use of uninitalized wait_queue_head_t

On 7/7/23 10:07, Davidlohr Bueso wrote:
> On Thu, 06 Jul 2023, Waiman Long wrote:
>
>> It was found that running the refscale test might sometimes crash the
>> kernel with the following error:
>>
>> [ 8569.952896] BUG: unable to handle page fault for address:
>> ffffffffffffffe8
>> [ 8569.952900] #PF: supervisor read access in kernel mode
>> [ 8569.952902] #PF: error_code(0x0000) - not-present page
>> [ 8569.952904] PGD c4b048067 P4D c4b049067 PUD c4b04b067 PMD 0
>> [ 8569.952910] Oops: 0000 [#1] PREEMPT_RT SMP NOPTI
>> [ 8569.952916] Hardware name: Dell Inc. PowerEdge R750/0WMWCR, BIOS
>> 1.2.4 05/28/2021
>> [ 8569.952917] RIP: 0010:prepare_to_wait_event+0x101/0x190
>>  :
>> [ 8569.952940] Call Trace:
>> [ 8569.952941]  <TASK>
>> [ 8569.952944]  ref_scale_reader+0x380/0x4a0 [refscale]
>> [ 8569.952959]  kthread+0x10e/0x130
>> [ 8569.952966]  ret_from_fork+0x1f/0x30
>> [ 8569.952973]  </TASK>
>>
>> This is likely caused by the fact that init_waitqueue_head() is called
>> after the ref_scale_reader kthread is created. So the kthread may try
>> to use the waitqueue head before it is properly initialized. Fix this
>> by initializing the waitqueue head first before kthread creation.
>>
>> Fixes: 653ed64b01dc ("refperf: Add a test to measure performance of
>> read-side synchronization")
>> Signed-off-by: Waiman Long <[email protected]>
>
> Strange this wasn't reported sooner.

Red Hat does have a pretty large QE organization that run all sort of
tests include this one pretty frequently. The race window is pretty
small, but they did hit this once in a while.

Cheers,
Longman


2023-07-07 17:09:15

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] refscale: Fix use of uninitalized wait_queue_head_t

On Fri, Jul 07, 2023 at 10:56:51AM -0400, Waiman Long wrote:
> On 7/7/23 10:07, Davidlohr Bueso wrote:
> > On Thu, 06 Jul 2023, Waiman Long wrote:
> >
> > > It was found that running the refscale test might sometimes crash the
> > > kernel with the following error:
> > >
> > > [ 8569.952896] BUG: unable to handle page fault for address:
> > > ffffffffffffffe8
> > > [ 8569.952900] #PF: supervisor read access in kernel mode
> > > [ 8569.952902] #PF: error_code(0x0000) - not-present page
> > > [ 8569.952904] PGD c4b048067 P4D c4b049067 PUD c4b04b067 PMD 0
> > > [ 8569.952910] Oops: 0000 [#1] PREEMPT_RT SMP NOPTI
> > > [ 8569.952916] Hardware name: Dell Inc. PowerEdge R750/0WMWCR, BIOS
> > > 1.2.4 05/28/2021
> > > [ 8569.952917] RIP: 0010:prepare_to_wait_event+0x101/0x190
> > > ?:
> > > [ 8569.952940] Call Trace:
> > > [ 8569.952941]? <TASK>
> > > [ 8569.952944]? ref_scale_reader+0x380/0x4a0 [refscale]
> > > [ 8569.952959]? kthread+0x10e/0x130
> > > [ 8569.952966]? ret_from_fork+0x1f/0x30
> > > [ 8569.952973]? </TASK>
> > >
> > > This is likely caused by the fact that init_waitqueue_head() is called
> > > after the ref_scale_reader kthread is created. So the kthread may try
> > > to use the waitqueue head before it is properly initialized. Fix this
> > > by initializing the waitqueue head first before kthread creation.
> > >
> > > Fixes: 653ed64b01dc ("refperf: Add a test to measure performance of
> > > read-side synchronization")
> > > Signed-off-by: Waiman Long <[email protected]>
> >
> > Strange this wasn't reported sooner.
>
> Red Hat does have a pretty large QE organization that run all sort of tests
> include this one pretty frequently. The race window is pretty small, but
> they did hit this once in a while.

I do run this fairly frequently, but haven't managed to hit it.

Good show on making it happen, and looking forward to the updated patch!

Thanx, Paul

2023-07-07 18:28:05

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] refscale: Fix use of uninitalized wait_queue_head_t

On 7/7/23 12:16, Paul E. McKenney wrote:
> On Fri, Jul 07, 2023 at 10:56:51AM -0400, Waiman Long wrote:
>> On 7/7/23 10:07, Davidlohr Bueso wrote:
>>> On Thu, 06 Jul 2023, Waiman Long wrote:
>>>
>>>> It was found that running the refscale test might sometimes crash the
>>>> kernel with the following error:
>>>>
>>>> [ 8569.952896] BUG: unable to handle page fault for address:
>>>> ffffffffffffffe8
>>>> [ 8569.952900] #PF: supervisor read access in kernel mode
>>>> [ 8569.952902] #PF: error_code(0x0000) - not-present page
>>>> [ 8569.952904] PGD c4b048067 P4D c4b049067 PUD c4b04b067 PMD 0
>>>> [ 8569.952910] Oops: 0000 [#1] PREEMPT_RT SMP NOPTI
>>>> [ 8569.952916] Hardware name: Dell Inc. PowerEdge R750/0WMWCR, BIOS
>>>> 1.2.4 05/28/2021
>>>> [ 8569.952917] RIP: 0010:prepare_to_wait_event+0x101/0x190
>>>>  :
>>>> [ 8569.952940] Call Trace:
>>>> [ 8569.952941]  <TASK>
>>>> [ 8569.952944]  ref_scale_reader+0x380/0x4a0 [refscale]
>>>> [ 8569.952959]  kthread+0x10e/0x130
>>>> [ 8569.952966]  ret_from_fork+0x1f/0x30
>>>> [ 8569.952973]  </TASK>
>>>>
>>>> This is likely caused by the fact that init_waitqueue_head() is called
>>>> after the ref_scale_reader kthread is created. So the kthread may try
>>>> to use the waitqueue head before it is properly initialized. Fix this
>>>> by initializing the waitqueue head first before kthread creation.
>>>>
>>>> Fixes: 653ed64b01dc ("refperf: Add a test to measure performance of
>>>> read-side synchronization")
>>>> Signed-off-by: Waiman Long <[email protected]>
>>> Strange this wasn't reported sooner.
>> Red Hat does have a pretty large QE organization that run all sort of tests
>> include this one pretty frequently. The race window is pretty small, but
>> they did hit this once in a while.
> I do run this fairly frequently, but haven't managed to hit it.
>
> Good show on making it happen, and looking forward to the updated patch!

v2 sent with a bit more explanation text in the commit log.

Cheers,
Longman


2023-07-07 19:56:57

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH] refscale: Fix use of uninitalized wait_queue_head_t



> On Jul 7, 2023, at 10:56 AM, Waiman Long <[email protected]> wrote:
>
> On 7/7/23 10:07, Davidlohr Bueso wrote:
>>> On Thu, 06 Jul 2023, Waiman Long wrote:
>>>
>>> It was found that running the refscale test might sometimes crash the
>>> kernel with the following error:
>>>
>>> [ 8569.952896] BUG: unable to handle page fault for address: ffffffffffffffe8
>>> [ 8569.952900] #PF: supervisor read access in kernel mode
>>> [ 8569.952902] #PF: error_code(0x0000) - not-present page
>>> [ 8569.952904] PGD c4b048067 P4D c4b049067 PUD c4b04b067 PMD 0
>>> [ 8569.952910] Oops: 0000 [#1] PREEMPT_RT SMP NOPTI
>>> [ 8569.952916] Hardware name: Dell Inc. PowerEdge R750/0WMWCR, BIOS 1.2.4 05/28/2021
>>> [ 8569.952917] RIP: 0010:prepare_to_wait_event+0x101/0x190
>>> :
>>> [ 8569.952940] Call Trace:
>>> [ 8569.952941] <TASK>
>>> [ 8569.952944] ref_scale_reader+0x380/0x4a0 [refscale]
>>> [ 8569.952959] kthread+0x10e/0x130
>>> [ 8569.952966] ret_from_fork+0x1f/0x30
>>> [ 8569.952973] </TASK>
>>>
>>> This is likely caused by the fact that init_waitqueue_head() is called
>>> after the ref_scale_reader kthread is created. So the kthread may try
>>> to use the waitqueue head before it is properly initialized. Fix this
>>> by initializing the waitqueue head first before kthread creation.
>>>
>>> Fixes: 653ed64b01dc ("refperf: Add a test to measure performance of read-side synchronization")
>>> Signed-off-by: Waiman Long <[email protected]>
>>
>> Strange this wasn't reported sooner.
>
> Red Hat does have a pretty large QE organization that run all sort of tests include this one pretty frequently. The race window is pretty small, but they did hit this once in a while.

Having worked on this test initially, I am happy to hear that Redhat runs this test!

Thanks for fixing this issue.
Acked-by: Joel Fernandes (Google) <[email protected]>

-Joel

>
> Cheers,
> Longman
>

2023-07-07 20:03:14

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] refscale: Fix use of uninitalized wait_queue_head_t

On Fri, Jul 07, 2023 at 03:24:58PM -0400, Joel Fernandes wrote:
>
>
> > On Jul 7, 2023, at 10:56 AM, Waiman Long <[email protected]> wrote:
> >
> > On 7/7/23 10:07, Davidlohr Bueso wrote:
> >>> On Thu, 06 Jul 2023, Waiman Long wrote:
> >>>
> >>> It was found that running the refscale test might sometimes crash the
> >>> kernel with the following error:
> >>>
> >>> [ 8569.952896] BUG: unable to handle page fault for address: ffffffffffffffe8
> >>> [ 8569.952900] #PF: supervisor read access in kernel mode
> >>> [ 8569.952902] #PF: error_code(0x0000) - not-present page
> >>> [ 8569.952904] PGD c4b048067 P4D c4b049067 PUD c4b04b067 PMD 0
> >>> [ 8569.952910] Oops: 0000 [#1] PREEMPT_RT SMP NOPTI
> >>> [ 8569.952916] Hardware name: Dell Inc. PowerEdge R750/0WMWCR, BIOS 1.2.4 05/28/2021
> >>> [ 8569.952917] RIP: 0010:prepare_to_wait_event+0x101/0x190
> >>> :
> >>> [ 8569.952940] Call Trace:
> >>> [ 8569.952941] <TASK>
> >>> [ 8569.952944] ref_scale_reader+0x380/0x4a0 [refscale]
> >>> [ 8569.952959] kthread+0x10e/0x130
> >>> [ 8569.952966] ret_from_fork+0x1f/0x30
> >>> [ 8569.952973] </TASK>
> >>>
> >>> This is likely caused by the fact that init_waitqueue_head() is called
> >>> after the ref_scale_reader kthread is created. So the kthread may try
> >>> to use the waitqueue head before it is properly initialized. Fix this
> >>> by initializing the waitqueue head first before kthread creation.
> >>>
> >>> Fixes: 653ed64b01dc ("refperf: Add a test to measure performance of read-side synchronization")
> >>> Signed-off-by: Waiman Long <[email protected]>
> >>
> >> Strange this wasn't reported sooner.
> >
> > Red Hat does have a pretty large QE organization that run all sort of tests include this one pretty frequently. The race window is pretty small, but they did hit this once in a while.
>
> Having worked on this test initially, I am happy to hear that Redhat runs this test!
>
> Thanks for fixing this issue.
> Acked-by: Joel Fernandes (Google) <[email protected]>

Applied, thank you!

Thanx, Paul