Hello Paul and RCU folks,
I am afraid I correctly understand and fix it. But I really wonder why
sync_rcu_exp_handler() reports the quiescent state even in the case that
current task is within a RCU read-side section. Do I miss something?
If I correctly understand it and you agree with it, I can add more logic
which make it more expedited by boosting current or making it urgent
when we fail to report the quiescent state on the IPI.
----->8-----
From 0b0191f506c19ce331a1fdb7c2c5a00fb23fbcf2 Mon Sep 17 00:00:00 2001
From: Byungchul Park <[email protected]>
Date: Tue, 6 Mar 2018 13:54:41 +0900
Subject: [RFC] rcu: Prevent expedite reporting within RCU read-side section
We report the quiescent state for this cpu if it's out of RCU read-side
section at the moment IPI was just fired during the expedite process.
However, current code reports the quiescent state even in the case:
1) the current task is still within a RCU read-side section
2) the current task has been blocked within the RCU read-side section
Since we don't get to the quiescent state yet in the case, we shouldn't
report it but check it another time.
Signed-off-by: Byungchul Park <[email protected]>
---
kernel/rcu/tree_exp.h | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 73e1d3d..cc69d14 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -731,13 +731,13 @@ static void sync_rcu_exp_handler(void *info)
/*
* We are either exiting an RCU read-side critical section (negative
* values of t->rcu_read_lock_nesting) or are not in one at all
- * (zero value of t->rcu_read_lock_nesting). Or we are in an RCU
- * read-side critical section that blocked before this expedited
- * grace period started. Either way, we can immediately report
- * the quiescent state.
+ * (zero value of t->rcu_read_lock_nesting). We can immediately
+ * report the quiescent state.
*/
- rdp = this_cpu_ptr(rsp->rda);
- rcu_report_exp_rdp(rsp, rdp, true);
+ if (t->rcu_read_lock_nesting <= 0) {
+ rdp = this_cpu_ptr(rsp->rda);
+ rcu_report_exp_rdp(rsp, rdp, true);
+ }
}
/**
--
1.9.1
On Mar 6, 2018 2:34 PM, "Byungchul Park" <[email protected]> wrote:
>
> Hello Paul and RCU folks,
>
> I am afraid I correctly understand and fix it. But I really wonder why
> sync_rcu_exp_handler() reports the quiescent state even in the case that
> current task is within a RCU read-side section. Do I miss something?
Hello,
I missed the fact that the original code is anyway safe because
the case is gonna be handled properly in rcu_read_unlock().
This patch just makes unnecessary spin lock/unlock within *report*()
avoided. Please ignore this if you don't think it's that worthy. I am
also not sure if it is.
Sorry bothering you. And thanks.
> If I correctly understand it and you agree with it, I can add more logic
> which make it more expedited by boosting current or making it urgent
> when we fail to report the quiescent state on the IPI.
>
> ----->8-----
> From 0b0191f506c19ce331a1fdb7c2c5a00fb23fbcf2 Mon Sep 17 00:00:00 2001
> From: Byungchul Park <[email protected]>
> Date: Tue, 6 Mar 2018 13:54:41 +0900
> Subject: [RFC] rcu: Prevent expedite reporting within RCU read-side section
>
> We report the quiescent state for this cpu if it's out of RCU read-side
> section at the moment IPI was just fired during the expedite process.
>
> However, current code reports the quiescent state even in the case:
>
> 1) the current task is still within a RCU read-side section
> 2) the current task has been blocked within the RCU read-side section
>
> Since we don't get to the quiescent state yet in the case, we shouldn't
> report it but check it another time.
>
> Signed-off-by: Byungchul Park <[email protected]>
> ---
> kernel/rcu/tree_exp.h | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 73e1d3d..cc69d14 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -731,13 +731,13 @@ static void sync_rcu_exp_handler(void *info)
> /*
> * We are either exiting an RCU read-side critical section (negative
> * values of t->rcu_read_lock_nesting) or are not in one at all
> - * (zero value of t->rcu_read_lock_nesting). Or we are in an RCU
> - * read-side critical section that blocked before this expedited
> - * grace period started. Either way, we can immediately report
> - * the quiescent state.
> + * (zero value of t->rcu_read_lock_nesting). We can immediately
> + * report the quiescent state.
> */
> - rdp = this_cpu_ptr(rsp->rda);
> - rcu_report_exp_rdp(rsp, rdp, true);
> + if (t->rcu_read_lock_nesting <= 0) {
> + rdp = this_cpu_ptr(rsp->rda);
> + rcu_report_exp_rdp(rsp, rdp, true);
> + }
> }
>
> /**
> --
> 1.9.1
>
On Tue, Mar 06, 2018 at 02:31:58PM +0900, Byungchul Park wrote:
> Hello Paul and RCU folks,
>
> I am afraid I correctly understand and fix it. But I really wonder why
> sync_rcu_exp_handler() reports the quiescent state even in the case that
> current task is within a RCU read-side section. Do I miss something?
>
> If I correctly understand it and you agree with it, I can add more logic
> which make it more expedited by boosting current or making it urgent
> when we fail to report the quiescent state on the IPI.
>
> ----->8-----
> From 0b0191f506c19ce331a1fdb7c2c5a00fb23fbcf2 Mon Sep 17 00:00:00 2001
> From: Byungchul Park <[email protected]>
> Date: Tue, 6 Mar 2018 13:54:41 +0900
> Subject: [RFC] rcu: Prevent expedite reporting within RCU read-side section
>
> We report the quiescent state for this cpu if it's out of RCU read-side
> section at the moment IPI was just fired during the expedite process.
>
> However, current code reports the quiescent state even in the case:
>
> 1) the current task is still within a RCU read-side section
> 2) the current task has been blocked within the RCU read-side section
>
If this happens, the task will queue itself in
rcu_preempt_note_context_switch() using rcu_preempt_ctxt_queue(). The gp
kthread will wait for this task to dequeue itself. IOW, we have other
mechanism to wait for this task other than bottom-up qs reporting tree.
So I think we are fine here.
Regards,
Boqun
> Since we don't get to the quiescent state yet in the case, we shouldn't
> report it but check it another time.
>
> Signed-off-by: Byungchul Park <[email protected]>
> ---
> kernel/rcu/tree_exp.h | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 73e1d3d..cc69d14 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -731,13 +731,13 @@ static void sync_rcu_exp_handler(void *info)
> /*
> * We are either exiting an RCU read-side critical section (negative
> * values of t->rcu_read_lock_nesting) or are not in one at all
> - * (zero value of t->rcu_read_lock_nesting). Or we are in an RCU
> - * read-side critical section that blocked before this expedited
> - * grace period started. Either way, we can immediately report
> - * the quiescent state.
> + * (zero value of t->rcu_read_lock_nesting). We can immediately
> + * report the quiescent state.
> */
> - rdp = this_cpu_ptr(rsp->rda);
> - rcu_report_exp_rdp(rsp, rdp, true);
> + if (t->rcu_read_lock_nesting <= 0) {
> + rdp = this_cpu_ptr(rsp->rda);
> + rcu_report_exp_rdp(rsp, rdp, true);
> + }
> }
>
> /**
> --
> 1.9.1
>
On Tue, Mar 06, 2018 at 09:43:19PM +0900, Byungchul Park wrote:
> On Mar 6, 2018 2:34 PM, "Byungchul Park" <[email protected]> wrote:
> >
> > Hello Paul and RCU folks,
> >
> > I am afraid I correctly understand and fix it. But I really wonder why
> > sync_rcu_exp_handler() reports the quiescent state even in the case that
> > current task is within a RCU read-side section. Do I miss something?
>
> Hello,
>
> I missed the fact that the original code is anyway safe because
> the case is gonna be handled properly in rcu_read_unlock().
>
> This patch just makes unnecessary spin lock/unlock within *report*()
> avoided. Please ignore this if you don't think it's that worthy. I am
> also not sure if it is.
>
> Sorry bothering you. And thanks.
Not a problem, especially given that you figured it out before I got
to your email. And thank you for your review of RCU!
Thanx, Paul
> > If I correctly understand it and you agree with it, I can add more logic
> > which make it more expedited by boosting current or making it urgent
> > when we fail to report the quiescent state on the IPI.
> >
> > ----->8-----
> > From 0b0191f506c19ce331a1fdb7c2c5a00fb23fbcf2 Mon Sep 17 00:00:00 2001
> > From: Byungchul Park <[email protected]>
> > Date: Tue, 6 Mar 2018 13:54:41 +0900
> > Subject: [RFC] rcu: Prevent expedite reporting within RCU read-side section
> >
> > We report the quiescent state for this cpu if it's out of RCU read-side
> > section at the moment IPI was just fired during the expedite process.
> >
> > However, current code reports the quiescent state even in the case:
> >
> > 1) the current task is still within a RCU read-side section
> > 2) the current task has been blocked within the RCU read-side section
> >
> > Since we don't get to the quiescent state yet in the case, we shouldn't
> > report it but check it another time.
> >
> > Signed-off-by: Byungchul Park <[email protected]>
> > ---
> > kernel/rcu/tree_exp.h | 12 ++++++------
> > 1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > index 73e1d3d..cc69d14 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -731,13 +731,13 @@ static void sync_rcu_exp_handler(void *info)
> > /*
> > * We are either exiting an RCU read-side critical section (negative
> > * values of t->rcu_read_lock_nesting) or are not in one at all
> > - * (zero value of t->rcu_read_lock_nesting). Or we are in an RCU
> > - * read-side critical section that blocked before this expedited
> > - * grace period started. Either way, we can immediately report
> > - * the quiescent state.
> > + * (zero value of t->rcu_read_lock_nesting). We can immediately
> > + * report the quiescent state.
> > */
> > - rdp = this_cpu_ptr(rsp->rda);
> > - rcu_report_exp_rdp(rsp, rdp, true);
> > + if (t->rcu_read_lock_nesting <= 0) {
> > + rdp = this_cpu_ptr(rsp->rda);
> > + rcu_report_exp_rdp(rsp, rdp, true);
> > + }
> > }
> >
> > /**
> > --
> > 1.9.1
> >
>
On Tue, Mar 06, 2018 at 09:42:05PM +0800, Boqun Feng wrote:
> On Tue, Mar 06, 2018 at 02:31:58PM +0900, Byungchul Park wrote:
> > Hello Paul and RCU folks,
> >
> > I am afraid I correctly understand and fix it. But I really wonder why
> > sync_rcu_exp_handler() reports the quiescent state even in the case that
> > current task is within a RCU read-side section. Do I miss something?
> >
> > If I correctly understand it and you agree with it, I can add more logic
> > which make it more expedited by boosting current or making it urgent
> > when we fail to report the quiescent state on the IPI.
> >
> > ----->8-----
> > From 0b0191f506c19ce331a1fdb7c2c5a00fb23fbcf2 Mon Sep 17 00:00:00 2001
> > From: Byungchul Park <[email protected]>
> > Date: Tue, 6 Mar 2018 13:54:41 +0900
> > Subject: [RFC] rcu: Prevent expedite reporting within RCU read-side section
> >
> > We report the quiescent state for this cpu if it's out of RCU read-side
> > section at the moment IPI was just fired during the expedite process.
> >
> > However, current code reports the quiescent state even in the case:
> >
> > 1) the current task is still within a RCU read-side section
> > 2) the current task has been blocked within the RCU read-side section
>
> If this happens, the task will queue itself in
> rcu_preempt_note_context_switch() using rcu_preempt_ctxt_queue(). The gp
> kthread will wait for this task to dequeue itself. IOW, we have other
> mechanism to wait for this task other than bottom-up qs reporting tree.
> So I think we are fine here.
That is indeed the trick! ;-)
Thanx, Paul
> Regards,
> Boqun
>
> > Since we don't get to the quiescent state yet in the case, we shouldn't
> > report it but check it another time.
> >
> > Signed-off-by: Byungchul Park <[email protected]>
> > ---
> > kernel/rcu/tree_exp.h | 12 ++++++------
> > 1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > index 73e1d3d..cc69d14 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -731,13 +731,13 @@ static void sync_rcu_exp_handler(void *info)
> > /*
> > * We are either exiting an RCU read-side critical section (negative
> > * values of t->rcu_read_lock_nesting) or are not in one at all
> > - * (zero value of t->rcu_read_lock_nesting). Or we are in an RCU
> > - * read-side critical section that blocked before this expedited
> > - * grace period started. Either way, we can immediately report
> > - * the quiescent state.
> > + * (zero value of t->rcu_read_lock_nesting). We can immediately
> > + * report the quiescent state.
> > */
> > - rdp = this_cpu_ptr(rsp->rda);
> > - rcu_report_exp_rdp(rsp, rdp, true);
> > + if (t->rcu_read_lock_nesting <= 0) {
> > + rdp = this_cpu_ptr(rsp->rda);
> > + rcu_report_exp_rdp(rsp, rdp, true);
> > + }
> > }
> >
> > /**
> > --
> > 1.9.1
> >
On 3/6/2018 10:42 PM, Boqun Feng wrote:
> On Tue, Mar 06, 2018 at 02:31:58PM +0900, Byungchul Park wrote:
>> Hello Paul and RCU folks,
>>
>> I am afraid I correctly understand and fix it. But I really wonder why
>> sync_rcu_exp_handler() reports the quiescent state even in the case that
>> current task is within a RCU read-side section. Do I miss something?
>>
>> If I correctly understand it and you agree with it, I can add more logic
>> which make it more expedited by boosting current or making it urgent
>> when we fail to report the quiescent state on the IPI.
>>
>> ----->8-----
>> From 0b0191f506c19ce331a1fdb7c2c5a00fb23fbcf2 Mon Sep 17 00:00:00 2001
>> From: Byungchul Park <[email protected]>
>> Date: Tue, 6 Mar 2018 13:54:41 +0900
>> Subject: [RFC] rcu: Prevent expedite reporting within RCU read-side section
>>
>> We report the quiescent state for this cpu if it's out of RCU read-side
>> section at the moment IPI was just fired during the expedite process.
>>
>> However, current code reports the quiescent state even in the case:
>>
>> 1) the current task is still within a RCU read-side section
>> 2) the current task has been blocked within the RCU read-side section
>>
>
> If this happens, the task will queue itself in
> rcu_preempt_note_context_switch() using rcu_preempt_ctxt_queue(). The gp
> kthread will wait for this task to dequeue itself. IOW, we have other
> mechanism to wait for this task other than bottom-up qs reporting tree.
> So I think we are fine here.
Right. Basically we consider both the quiscent state within the current
task and queued tasks on rcu nodes that you mentioned, to control grace
periods when PREEMPT kernel is used.
Actually my concern was if it's safe to clear the bit of 'expmask' on
the IPI for all possible cases, even though anyway blocked tasks would
try to prevent the grace period from ending.
I worried if something subtle might cause problems, but the code looks
fine on second thought as you said. Thank you for your explanation.
> Regards,
> Boqun
>
>> Since we don't get to the quiescent state yet in the case, we shouldn't
>> report it but check it another time.
>>
>> Signed-off-by: Byungchul Park <[email protected]>
>> ---
>> kernel/rcu/tree_exp.h | 12 ++++++------
>> 1 file changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
>> index 73e1d3d..cc69d14 100644
>> --- a/kernel/rcu/tree_exp.h
>> +++ b/kernel/rcu/tree_exp.h
>> @@ -731,13 +731,13 @@ static void sync_rcu_exp_handler(void *info)
>> /*
>> * We are either exiting an RCU read-side critical section (negative
>> * values of t->rcu_read_lock_nesting) or are not in one at all
>> - * (zero value of t->rcu_read_lock_nesting). Or we are in an RCU
>> - * read-side critical section that blocked before this expedited
>> - * grace period started. Either way, we can immediately report
>> - * the quiescent state.
>> + * (zero value of t->rcu_read_lock_nesting). We can immediately
>> + * report the quiescent state.
>> */
>> - rdp = this_cpu_ptr(rsp->rda);
>> - rcu_report_exp_rdp(rsp, rdp, true);
>> + if (t->rcu_read_lock_nesting <= 0) {
>> + rdp = this_cpu_ptr(rsp->rda);
>> + rcu_report_exp_rdp(rsp, rdp, true);
>> + }
>> }
>>
>> /**
>> --
>> 1.9.1
>>
--
Thanks,
Byungchul
On 3/7/2018 2:55 PM, Byungchul Park wrote:
> On 3/6/2018 10:42 PM, Boqun Feng wrote:
>> On Tue, Mar 06, 2018 at 02:31:58PM +0900, Byungchul Park wrote:
>>> Hello Paul and RCU folks,
>>>
>>> I am afraid I correctly understand and fix it. But I really wonder why
>>> sync_rcu_exp_handler() reports the quiescent state even in the case that
>>> current task is within a RCU read-side section. Do I miss something?
>>>
>>> If I correctly understand it and you agree with it, I can add more logic
>>> which make it more expedited by boosting current or making it urgent
>>> when we fail to report the quiescent state on the IPI.
>>>
>>> ----->8-----
>>> From 0b0191f506c19ce331a1fdb7c2c5a00fb23fbcf2 Mon Sep 17 00:00:00 2001
>>> From: Byungchul Park <[email protected]>
>>> Date: Tue, 6 Mar 2018 13:54:41 +0900
>>> Subject: [RFC] rcu: Prevent expedite reporting within RCU read-side
>>> section
>>>
>>> We report the quiescent state for this cpu if it's out of RCU read-side
>>> section at the moment IPI was just fired during the expedite process.
>>>
>>> However, current code reports the quiescent state even in the case:
>>>
>>> 1) the current task is still within a RCU read-side section
>>> 2) the current task has been blocked within the RCU read-side
>>> section
>>>
>>
>> If this happens, the task will queue itself in
>> rcu_preempt_note_context_switch() using rcu_preempt_ctxt_queue(). The gp
>> kthread will wait for this task to dequeue itself. IOW, we have other
>> mechanism to wait for this task other than bottom-up qs reporting tree.
>> So I think we are fine here.
>
> Right. Basically we consider both the quiscent state within the current
> task and queued tasks on rcu nodes that you mentioned, to control grace
> periods when PREEMPT kernel is used.
>
> Actually my concern was if it's safe to clear the bit of 'expmask' on
> the IPI for all possible cases, even though anyway blocked tasks would
> try to prevent the grace period from ending.
>
> I worried if something subtle might cause problems, but the code looks
> fine on second thought as you said. Thank you for your explanation.
In addition, by making quiescent states reported and bits of expmask
cleared only when it's out of rcu read sections, of course keeping
other mechanism unchanged like what you mentioned, I think we can avoid
unnecessary locking ops and other statements, keeping the code still
sane, even though the benefit might be small.
For example, by removing some evitable calls to rcu_report_cpu_mult()
either directly or indirectly. I'm not sure if RCU maintainers think
it's worthy tho.
>> Regards,
>> Boqun
>>
>>> Since we don't get to the quiescent state yet in the case, we shouldn't
>>> report it but check it another time.
>>>
>>> Signed-off-by: Byungchul Park <[email protected]>
>>> ---
>>> kernel/rcu/tree_exp.h | 12 ++++++------
>>> 1 file changed, 6 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
>>> index 73e1d3d..cc69d14 100644
>>> --- a/kernel/rcu/tree_exp.h
>>> +++ b/kernel/rcu/tree_exp.h
>>> @@ -731,13 +731,13 @@ static void sync_rcu_exp_handler(void *info)
>>> /*
>>> * We are either exiting an RCU read-side critical section
>>> (negative
>>> * values of t->rcu_read_lock_nesting) or are not in one at all
>>> - * (zero value of t->rcu_read_lock_nesting). Or we are in an RCU
>>> - * read-side critical section that blocked before this expedited
>>> - * grace period started. Either way, we can immediately report
>>> - * the quiescent state.
>>> + * (zero value of t->rcu_read_lock_nesting). We can immediately
>>> + * report the quiescent state.
>>> */
>>> - rdp = this_cpu_ptr(rsp->rda);
>>> - rcu_report_exp_rdp(rsp, rdp, true);
>>> + if (t->rcu_read_lock_nesting <= 0) {
>>> + rdp = this_cpu_ptr(rsp->rda);
>>> + rcu_report_exp_rdp(rsp, rdp, true);
>>> + }
>>> }
>>> /**
>>> --
>>> 1.9.1
>>>
>
--
Thanks,
Byungchul
On Wed, Mar 07, 2018 at 03:25:36PM +0900, Byungchul Park wrote:
> On 3/7/2018 2:55 PM, Byungchul Park wrote:
> >On 3/6/2018 10:42 PM, Boqun Feng wrote:
> >>On Tue, Mar 06, 2018 at 02:31:58PM +0900, Byungchul Park wrote:
> >>>Hello Paul and RCU folks,
> >>>
> >>>I am afraid I correctly understand and fix it. But I really wonder why
> >>>sync_rcu_exp_handler() reports the quiescent state even in the case that
> >>>current task is within a RCU read-side section. Do I miss something?
> >>>
> >>>If I correctly understand it and you agree with it, I can add more logic
> >>>which make it more expedited by boosting current or making it urgent
> >>>when we fail to report the quiescent state on the IPI.
> >>>
> >>>----->8-----
> >>>?From 0b0191f506c19ce331a1fdb7c2c5a00fb23fbcf2 Mon Sep 17 00:00:00 2001
> >>>From: Byungchul Park <[email protected]>
> >>>Date: Tue, 6 Mar 2018 13:54:41 +0900
> >>>Subject: [RFC] rcu: Prevent expedite reporting within RCU
> >>>read-side section
> >>>
> >>>We report the quiescent state for this cpu if it's out of RCU read-side
> >>>section at the moment IPI was just fired during the expedite process.
> >>>
> >>>However, current code reports the quiescent state even in the case:
> >>>
> >>>??? 1) the current task is still within a RCU read-side section
> >>>??? 2) the current task has been blocked within the RCU
> >>>read-side section
> >>>
> >>
> >>If this happens, the task will queue itself in
> >>rcu_preempt_note_context_switch() using rcu_preempt_ctxt_queue(). The gp
> >>kthread will wait for this task to dequeue itself. IOW, we have other
> >>mechanism to wait for this task other than bottom-up qs reporting tree.
> >>So I think we are fine here.
> >
> >Right. Basically we consider both the quiscent state within the current
> >task and queued tasks on rcu nodes that you mentioned, to control grace
> >periods when PREEMPT kernel is used.
> >
> >Actually my concern was if it's safe to clear the bit of 'expmask' on
> >the IPI for all possible cases, even though anyway blocked tasks would
> >try to prevent the grace period from ending.
> >
> >I worried if something subtle might cause problems, but the code looks
> >fine on second thought as you said. Thank you for your explanation.
>
> In addition, by making quiescent states reported and bits of expmask
> cleared only when it's out of rcu read sections, of course keeping
> other mechanism unchanged like what you mentioned, I think we can avoid
> unnecessary locking ops and other statements, keeping the code still
> sane, even though the benefit might be small.
>
> For example, by removing some evitable calls to rcu_report_cpu_mult()
> either directly or indirectly. I'm not sure if RCU maintainers think
> it's worthy tho.
You mean rcu_report_exp_cpu_mult()? If so, which calls to this function
do you believe should be removed?
(Please note that there are likely to be changes in this area soon,
but still, I would like to understand what might be make more
efficient.)
Thanx, Paul
> >>Regards,
> >>Boqun
> >>
> >>>Since we don't get to the quiescent state yet in the case, we shouldn't
> >>>report it but check it another time.
> >>>
> >>>Signed-off-by: Byungchul Park <[email protected]>
> >>>---
> >>>? kernel/rcu/tree_exp.h | 12 ++++++------
> >>>? 1 file changed, 6 insertions(+), 6 deletions(-)
> >>>
> >>>diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> >>>index 73e1d3d..cc69d14 100644
> >>>--- a/kernel/rcu/tree_exp.h
> >>>+++ b/kernel/rcu/tree_exp.h
> >>>@@ -731,13 +731,13 @@ static void sync_rcu_exp_handler(void *info)
> >>>????? /*
> >>>?????? * We are either exiting an RCU read-side critical
> >>>section (negative
> >>>?????? * values of t->rcu_read_lock_nesting) or are not in one at all
> >>>-???? * (zero value of t->rcu_read_lock_nesting).? Or we are in an RCU
> >>>-???? * read-side critical section that blocked before this expedited
> >>>-???? * grace period started.? Either way, we can immediately report
> >>>-???? * the quiescent state.
> >>>+???? * (zero value of t->rcu_read_lock_nesting). We can immediately
> >>>+???? * report the quiescent state.
> >>>?????? */
> >>>-??? rdp = this_cpu_ptr(rsp->rda);
> >>>-??? rcu_report_exp_rdp(rsp, rdp, true);
> >>>+??? if (t->rcu_read_lock_nesting <= 0) {
> >>>+??????? rdp = this_cpu_ptr(rsp->rda);
> >>>+??????? rcu_report_exp_rdp(rsp, rdp, true);
> >>>+??? }
> >>>? }
> >>>? /**
> >>>--
> >>>1.9.1
> >>>
> >
>
> --
> Thanks,
> Byungchul
>
On Wed, Mar 07, 2018 at 07:03:43AM -0800, Paul E. McKenney wrote:
> On Wed, Mar 07, 2018 at 03:25:36PM +0900, Byungchul Park wrote:
> > On 3/7/2018 2:55 PM, Byungchul Park wrote:
> > >On 3/6/2018 10:42 PM, Boqun Feng wrote:
> > >>On Tue, Mar 06, 2018 at 02:31:58PM +0900, Byungchul Park wrote:
> > >>>Hello Paul and RCU folks,
> > >>>
> > >>>I am afraid I correctly understand and fix it. But I really wonder why
> > >>>sync_rcu_exp_handler() reports the quiescent state even in the case that
> > >>>current task is within a RCU read-side section. Do I miss something?
> > >>>
> > >>>If I correctly understand it and you agree with it, I can add more logic
> > >>>which make it more expedited by boosting current or making it urgent
> > >>>when we fail to report the quiescent state on the IPI.
> > >>>
> > >>>----->8-----
> > >>>?From 0b0191f506c19ce331a1fdb7c2c5a00fb23fbcf2 Mon Sep 17 00:00:00 2001
> > >>>From: Byungchul Park <[email protected]>
> > >>>Date: Tue, 6 Mar 2018 13:54:41 +0900
> > >>>Subject: [RFC] rcu: Prevent expedite reporting within RCU
> > >>>read-side section
> > >>>
> > >>>We report the quiescent state for this cpu if it's out of RCU read-side
> > >>>section at the moment IPI was just fired during the expedite process.
> > >>>
> > >>>However, current code reports the quiescent state even in the case:
> > >>>
> > >>>??? 1) the current task is still within a RCU read-side section
> > >>>??? 2) the current task has been blocked within the RCU
> > >>>read-side section
> > >>>
> > >>
> > >>If this happens, the task will queue itself in
> > >>rcu_preempt_note_context_switch() using rcu_preempt_ctxt_queue(). The gp
> > >>kthread will wait for this task to dequeue itself. IOW, we have other
> > >>mechanism to wait for this task other than bottom-up qs reporting tree.
> > >>So I think we are fine here.
> > >
> > >Right. Basically we consider both the quiscent state within the current
> > >task and queued tasks on rcu nodes that you mentioned, to control grace
> > >periods when PREEMPT kernel is used.
> > >
> > >Actually my concern was if it's safe to clear the bit of 'expmask' on
> > >the IPI for all possible cases, even though anyway blocked tasks would
> > >try to prevent the grace period from ending.
> > >
> > >I worried if something subtle might cause problems, but the code looks
> > >fine on second thought as you said. Thank you for your explanation.
> >
> > In addition, by making quiescent states reported and bits of expmask
> > cleared only when it's out of rcu read sections, of course keeping
> > other mechanism unchanged like what you mentioned, I think we can avoid
> > unnecessary locking ops and other statements, keeping the code still
> > sane, even though the benefit might be small.
> >
> > For example, by removing some evitable calls to rcu_report_cpu_mult()
> > either directly or indirectly. I'm not sure if RCU maintainers think
> > it's worthy tho.
>
> You mean rcu_report_exp_cpu_mult()? If so, which calls to this function
> do you believe should be removed?
>
> (Please note that there are likely to be changes in this area soon,
> but still, I would like to understand what might be make more
> efficient.)
Hello Paul,
I think there are two options to manage an expedite-gp when a current
task is preempted within a RCU read section (with CONFIG_PREEMPT_RCU):
1. Clear its bit of ->expmask even though it's within a RCU read
section and rely on ->exp_tasks to prevent the gp from ending.
This option would work because we anyway check both ->exp_tasks
and ->expmask to finish the expedite-gp.
2. Clear its bit of ->expmask *only* when it's out of RCU read
sections and keep others unchanged. So it will be cleared at the
end of the RCU read section in that case.
This option would also work because we anyway check both
->exp_tasks and ->expmask to finish the expedite-gp.
Current code chose the 1st option and try to report the quiescent state
using rcu_report_exp_rdp() when it's out of RCU read sections or the
task is preempted, while an expedite-gp is in progress.
However, when reporting it within a RCU read section, the reporting
hardly goes further since sync_rcu_preempt_exp_done() of course returns
false, furthermore, it might add *unnecessary* lock contention of
rcu_node's spin lock within rcu_report_exp_cpu_mult(). I think those are
unnecessary at all even though there's no logical problem.
Showing the patch reflecting the 2nd might be more helpful for you to
understand what I'd like to do. Please take a look at the following
patch. And it would be appriciated if you answer it. (In addition, we
can do similar work to normal-gp but this time I worked it only with
expedite-gp.)
--
Thanks,
Byungchul
-----8<-----
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 73e1d3d..08c2944 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -716,28 +716,22 @@ static void sync_rcu_exp_handler(void *info)
struct rcu_state *rsp = info;
struct task_struct *t = current;
- /*
- * Within an RCU read-side critical section, request that the next
- * rcu_read_unlock() report. Unless this RCU read-side critical
- * section has already blocked, in which case it is already set
- * up for the expedited grace period to wait on it.
- */
- if (t->rcu_read_lock_nesting > 0 &&
- !t->rcu_read_unlock_special.b.blocked) {
+ if (t->rcu_read_lock_nesting > 0) {
+ /*
+ * Within an RCU read-side critical section, request that
+ * the next rcu_read_unlock() report.
+ */
t->rcu_read_unlock_special.b.exp_need_qs = true;
- return;
+ } else {
+ /*
+ * We are either exiting an RCU read-side critical section
+ * (negative values of t->rcu_read_lock_nesting) or are not
+ * in one at all (zero value of t->rcu_read_lock_nesting).
+ * We can immediately report the quiescent state.
+ */
+ rdp = this_cpu_ptr(rsp->rda);
+ rcu_report_exp_rdp(rsp, rdp, true);
}
-
- /*
- * We are either exiting an RCU read-side critical section (negative
- * values of t->rcu_read_lock_nesting) or are not in one at all
- * (zero value of t->rcu_read_lock_nesting). Or we are in an RCU
- * read-side critical section that blocked before this expedited
- * grace period started. Either way, we can immediately report
- * the quiescent state.
- */
- rdp = this_cpu_ptr(rsp->rda);
- rcu_report_exp_rdp(rsp, rdp, true);
}
/**
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index b0d7f9b..b303b63 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -269,20 +269,6 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp)
WARN_ON_ONCE(!(blkd_state & RCU_EXP_BLKD) !=
!(rnp->expmask & rdp->grpmask));
raw_spin_unlock_rcu_node(rnp); /* interrupts remain disabled. */
-
- /*
- * Report the quiescent state for the expedited GP. This expedited
- * GP should not be able to end until we report, so there should be
- * no need to check for a subsequent expedited GP. (Though we are
- * still in a quiescent state in any case.)
- */
- if (blkd_state & RCU_EXP_BLKD &&
- t->rcu_read_unlock_special.b.exp_need_qs) {
- t->rcu_read_unlock_special.b.exp_need_qs = false;
- rcu_report_exp_rdp(rdp->rsp, rdp, true);
- } else {
- WARN_ON_ONCE(t->rcu_read_unlock_special.b.exp_need_qs);
- }
}
/*
@@ -414,9 +400,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
*/
void rcu_read_unlock_special(struct task_struct *t)
{
- bool empty_exp;
bool empty_norm;
- bool empty_exp_now;
unsigned long flags;
struct list_head *np;
bool drop_boost_mutex = false;
@@ -436,6 +420,7 @@ void rcu_read_unlock_special(struct task_struct *t)
* t->rcu_read_unlock_special cannot change.
*/
special = t->rcu_read_unlock_special;
+
if (special.b.need_qs) {
rcu_preempt_qs();
t->rcu_read_unlock_special.b.need_qs = false;
@@ -445,39 +430,22 @@ void rcu_read_unlock_special(struct task_struct *t)
}
}
- /*
- * Respond to a request for an expedited grace period, but only if
- * we were not preempted, meaning that we were running on the same
- * CPU throughout. If we were preempted, the exp_need_qs flag
- * would have been cleared at the time of the first preemption,
- * and the quiescent state would be reported when we were dequeued.
- */
- if (special.b.exp_need_qs) {
- WARN_ON_ONCE(special.b.blocked);
- t->rcu_read_unlock_special.b.exp_need_qs = false;
- rdp = this_cpu_ptr(rcu_state_p->rda);
- rcu_report_exp_rdp(rcu_state_p, rdp, true);
- if (!t->rcu_read_unlock_special.s) {
+ /* Clean up if blocked during RCU read-side critical section. */
+ if (special.b.blocked) {
+
+ /* Hardware IRQ handlers cannot block, complain if they get here. */
+ if (in_irq() || in_serving_softirq()) {
+ lockdep_rcu_suspicious(__FILE__, __LINE__,
+ "rcu_read_unlock() from irq or softirq with blocking in critical section!!!\n");
+ pr_alert("->rcu_read_unlock_special: %#x (b: %d, enq: %d nq: %d)\n",
+ t->rcu_read_unlock_special.s,
+ t->rcu_read_unlock_special.b.blocked,
+ t->rcu_read_unlock_special.b.exp_need_qs,
+ t->rcu_read_unlock_special.b.need_qs);
local_irq_restore(flags);
return;
}
- }
- /* Hardware IRQ handlers cannot block, complain if they get here. */
- if (in_irq() || in_serving_softirq()) {
- lockdep_rcu_suspicious(__FILE__, __LINE__,
- "rcu_read_unlock() from irq or softirq with blocking in critical section!!!\n");
- pr_alert("->rcu_read_unlock_special: %#x (b: %d, enq: %d nq: %d)\n",
- t->rcu_read_unlock_special.s,
- t->rcu_read_unlock_special.b.blocked,
- t->rcu_read_unlock_special.b.exp_need_qs,
- t->rcu_read_unlock_special.b.need_qs);
- local_irq_restore(flags);
- return;
- }
-
- /* Clean up if blocked during RCU read-side critical section. */
- if (special.b.blocked) {
t->rcu_read_unlock_special.b.blocked = false;
/*
@@ -491,7 +459,6 @@ void rcu_read_unlock_special(struct task_struct *t)
WARN_ON_ONCE(rnp != t->rcu_blocked_node);
WARN_ON_ONCE(rnp->level != rcu_num_lvls - 1);
empty_norm = !rcu_preempt_blocked_readers_cgp(rnp);
- empty_exp = sync_rcu_preempt_exp_done(rnp);
smp_mb(); /* ensure expedited fastpath sees end of RCU c-s. */
np = rcu_next_node_entry(t, rnp);
list_del_init(&t->rcu_node_entry);
@@ -515,7 +482,6 @@ void rcu_read_unlock_special(struct task_struct *t)
* Note that rcu_report_unblock_qs_rnp() releases rnp->lock,
* so we must take a snapshot of the expedited state.
*/
- empty_exp_now = sync_rcu_preempt_exp_done(rnp);
if (!empty_norm && !rcu_preempt_blocked_readers_cgp(rnp)) {
trace_rcu_quiescent_state_report(TPS("preempt_rcu"),
rnp->gpnum,
@@ -532,16 +498,18 @@ void rcu_read_unlock_special(struct task_struct *t)
/* Unboost if we were boosted. */
if (IS_ENABLED(CONFIG_RCU_BOOST) && drop_boost_mutex)
rt_mutex_futex_unlock(&rnp->boost_mtx);
-
- /*
- * If this was the last task on the expedited lists,
- * then we need to report up the rcu_node hierarchy.
- */
- if (!empty_exp && empty_exp_now)
- rcu_report_exp_rnp(rcu_state_p, rnp, true);
} else {
local_irq_restore(flags);
}
+
+ /*
+ * Respond to a request for an expedited grace period.
+ */
+ if (special.b.exp_need_qs) {
+ t->rcu_read_unlock_special.b.exp_need_qs = false;
+ rdp = this_cpu_ptr(rcu_state_p->rda);
+ rcu_report_exp_rdp(rcu_state_p, rdp, true);
+ }
}
/*
On Thu, Mar 08, 2018 at 07:08:25PM +0900, Byungchul Park wrote:
> On Wed, Mar 07, 2018 at 07:03:43AM -0800, Paul E. McKenney wrote:
> > On Wed, Mar 07, 2018 at 03:25:36PM +0900, Byungchul Park wrote:
> > > On 3/7/2018 2:55 PM, Byungchul Park wrote:
> > > >On 3/6/2018 10:42 PM, Boqun Feng wrote:
> > > >>On Tue, Mar 06, 2018 at 02:31:58PM +0900, Byungchul Park wrote:
> > > >>>Hello Paul and RCU folks,
> > > >>>
> > > >>>I am afraid I correctly understand and fix it. But I really wonder why
> > > >>>sync_rcu_exp_handler() reports the quiescent state even in the case that
> > > >>>current task is within a RCU read-side section. Do I miss something?
> > > >>>
> > > >>>If I correctly understand it and you agree with it, I can add more logic
> > > >>>which make it more expedited by boosting current or making it urgent
> > > >>>when we fail to report the quiescent state on the IPI.
> > > >>>
> > > >>>----->8-----
> > > >>>?From 0b0191f506c19ce331a1fdb7c2c5a00fb23fbcf2 Mon Sep 17 00:00:00 2001
> > > >>>From: Byungchul Park <[email protected]>
> > > >>>Date: Tue, 6 Mar 2018 13:54:41 +0900
> > > >>>Subject: [RFC] rcu: Prevent expedite reporting within RCU
> > > >>>read-side section
> > > >>>
> > > >>>We report the quiescent state for this cpu if it's out of RCU read-side
> > > >>>section at the moment IPI was just fired during the expedite process.
> > > >>>
> > > >>>However, current code reports the quiescent state even in the case:
> > > >>>
> > > >>>??? 1) the current task is still within a RCU read-side section
> > > >>>??? 2) the current task has been blocked within the RCU
> > > >>>read-side section
> > > >>>
> > > >>
> > > >>If this happens, the task will queue itself in
> > > >>rcu_preempt_note_context_switch() using rcu_preempt_ctxt_queue(). The gp
> > > >>kthread will wait for this task to dequeue itself. IOW, we have other
> > > >>mechanism to wait for this task other than bottom-up qs reporting tree.
> > > >>So I think we are fine here.
> > > >
> > > >Right. Basically we consider both the quiscent state within the current
> > > >task and queued tasks on rcu nodes that you mentioned, to control grace
> > > >periods when PREEMPT kernel is used.
> > > >
> > > >Actually my concern was if it's safe to clear the bit of 'expmask' on
> > > >the IPI for all possible cases, even though anyway blocked tasks would
> > > >try to prevent the grace period from ending.
> > > >
> > > >I worried if something subtle might cause problems, but the code looks
> > > >fine on second thought as you said. Thank you for your explanation.
> > >
> > > In addition, by making quiescent states reported and bits of expmask
> > > cleared only when it's out of rcu read sections, of course keeping
> > > other mechanism unchanged like what you mentioned, I think we can avoid
> > > unnecessary locking ops and other statements, keeping the code still
> > > sane, even though the benefit might be small.
> > >
> > > For example, by removing some evitable calls to rcu_report_cpu_mult()
> > > either directly or indirectly. I'm not sure if RCU maintainers think
> > > it's worthy tho.
> >
> > You mean rcu_report_exp_cpu_mult()? If so, which calls to this function
> > do you believe should be removed?
> >
> > (Please note that there are likely to be changes in this area soon,
> > but still, I would like to understand what might be make more
> > efficient.)
>
> Hello Paul,
>
> I think there are two options to manage an expedite-gp when a current
> task is preempted within a RCU read section (with CONFIG_PREEMPT_RCU):
>
> 1. Clear its bit of ->expmask even though it's within a RCU read
> section and rely on ->exp_tasks to prevent the gp from ending.
>
> This option would work because we anyway check both ->exp_tasks
> and ->expmask to finish the expedite-gp.
And this is indeed what is currently done.
> 2. Clear its bit of ->expmask *only* when it's out of RCU read
> sections and keep others unchanged. So it will be cleared at the
> end of the RCU read section in that case.
>
> This option would also work because we anyway check both
> ->exp_tasks and ->expmask to finish the expedite-gp.
This could be made to work, but one shortcoming is that the grace
period would end up waiting on later read-side critical sections
that it does not really need to wait on. Also, eventually all the
bits would clear, and they might clear before the last preempted
task resumed, so the race would still exist.
Or am I missing your point?
> Current code chose the 1st option and try to report the quiescent state
> using rcu_report_exp_rdp() when it's out of RCU read sections or the
> task is preempted, while an expedite-gp is in progress.
>
> However, when reporting it within a RCU read section, the reporting
> hardly goes further since sync_rcu_preempt_exp_done() of course returns
> false, furthermore, it might add *unnecessary* lock contention of
> rcu_node's spin lock within rcu_report_exp_cpu_mult(). I think those are
> unnecessary at all even though there's no logical problem.
It would be possible to use ordering and memory barriers to avoid at
least some lock acquisitions, but the resulting code would be more
complex and fragile. So I would want to see a real problem before
using a more aggressive design.
> Showing the patch reflecting the 2nd might be more helpful for you to
> understand what I'd like to do. Please take a look at the following
> patch. And it would be appriciated if you answer it. (In addition, we
> can do similar work to normal-gp but this time I worked it only with
> expedite-gp.)
Some questions and comments below.
> --
> Thanks,
> Byungchul
>
> -----8<-----
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 73e1d3d..08c2944 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -716,28 +716,22 @@ static void sync_rcu_exp_handler(void *info)
> struct rcu_state *rsp = info;
> struct task_struct *t = current;
>
> - /*
> - * Within an RCU read-side critical section, request that the next
> - * rcu_read_unlock() report. Unless this RCU read-side critical
> - * section has already blocked, in which case it is already set
> - * up for the expedited grace period to wait on it.
> - */
> - if (t->rcu_read_lock_nesting > 0 &&
> - !t->rcu_read_unlock_special.b.blocked) {
> + if (t->rcu_read_lock_nesting > 0) {
> + /*
> + * Within an RCU read-side critical section, request that
> + * the next rcu_read_unlock() report.
> + */
> t->rcu_read_unlock_special.b.exp_need_qs = true;
> - return;
> + } else {
> + /*
> + * We are either exiting an RCU read-side critical section
> + * (negative values of t->rcu_read_lock_nesting) or are not
> + * in one at all (zero value of t->rcu_read_lock_nesting).
> + * We can immediately report the quiescent state.
> + */
> + rdp = this_cpu_ptr(rsp->rda);
> + rcu_report_exp_rdp(rsp, rdp, true);
> }
> -
> - /*
> - * We are either exiting an RCU read-side critical section (negative
> - * values of t->rcu_read_lock_nesting) or are not in one at all
> - * (zero value of t->rcu_read_lock_nesting). Or we are in an RCU
> - * read-side critical section that blocked before this expedited
> - * grace period started. Either way, we can immediately report
> - * the quiescent state.
> - */
> - rdp = this_cpu_ptr(rsp->rda);
> - rcu_report_exp_rdp(rsp, rdp, true);
This code is equivalent, correct? All that happened is that a
"return" statement was replaced with an "else" clause. Or am I
blind this morning?
> }
>
> /**
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index b0d7f9b..b303b63 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -269,20 +269,6 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp)
> WARN_ON_ONCE(!(blkd_state & RCU_EXP_BLKD) !=
> !(rnp->expmask & rdp->grpmask));
> raw_spin_unlock_rcu_node(rnp); /* interrupts remain disabled. */
> -
> - /*
> - * Report the quiescent state for the expedited GP. This expedited
> - * GP should not be able to end until we report, so there should be
> - * no need to check for a subsequent expedited GP. (Though we are
> - * still in a quiescent state in any case.)
> - */
> - if (blkd_state & RCU_EXP_BLKD &&
> - t->rcu_read_unlock_special.b.exp_need_qs) {
> - t->rcu_read_unlock_special.b.exp_need_qs = false;
> - rcu_report_exp_rdp(rdp->rsp, rdp, true);
> - } else {
> - WARN_ON_ONCE(t->rcu_read_unlock_special.b.exp_need_qs);
> - }
Again, this does not eliminate the race, from what I can see. And it
can unnecessarily extend the expedited grace period.
> }
>
> /*
> @@ -414,9 +400,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
> */
> void rcu_read_unlock_special(struct task_struct *t)
> {
> - bool empty_exp;
> bool empty_norm;
> - bool empty_exp_now;
> unsigned long flags;
> struct list_head *np;
> bool drop_boost_mutex = false;
> @@ -436,6 +420,7 @@ void rcu_read_unlock_special(struct task_struct *t)
> * t->rcu_read_unlock_special cannot change.
> */
> special = t->rcu_read_unlock_special;
> +
> if (special.b.need_qs) {
> rcu_preempt_qs();
> t->rcu_read_unlock_special.b.need_qs = false;
> @@ -445,39 +430,22 @@ void rcu_read_unlock_special(struct task_struct *t)
> }
> }
>
> - /*
> - * Respond to a request for an expedited grace period, but only if
> - * we were not preempted, meaning that we were running on the same
> - * CPU throughout. If we were preempted, the exp_need_qs flag
> - * would have been cleared at the time of the first preemption,
> - * and the quiescent state would be reported when we were dequeued.
> - */
> - if (special.b.exp_need_qs) {
> - WARN_ON_ONCE(special.b.blocked);
> - t->rcu_read_unlock_special.b.exp_need_qs = false;
> - rdp = this_cpu_ptr(rcu_state_p->rda);
> - rcu_report_exp_rdp(rcu_state_p, rdp, true);
> - if (!t->rcu_read_unlock_special.s) {
> + /* Clean up if blocked during RCU read-side critical section. */
> + if (special.b.blocked) {
> +
> + /* Hardware IRQ handlers cannot block, complain if they get here. */
> + if (in_irq() || in_serving_softirq()) {
> + lockdep_rcu_suspicious(__FILE__, __LINE__,
> + "rcu_read_unlock() from irq or softirq with blocking in critical section!!!\n");
> + pr_alert("->rcu_read_unlock_special: %#x (b: %d, enq: %d nq: %d)\n",
> + t->rcu_read_unlock_special.s,
> + t->rcu_read_unlock_special.b.blocked,
> + t->rcu_read_unlock_special.b.exp_need_qs,
> + t->rcu_read_unlock_special.b.need_qs);
> local_irq_restore(flags);
> return;
> }
> - }
>
> - /* Hardware IRQ handlers cannot block, complain if they get here. */
> - if (in_irq() || in_serving_softirq()) {
> - lockdep_rcu_suspicious(__FILE__, __LINE__,
> - "rcu_read_unlock() from irq or softirq with blocking in critical section!!!\n");
> - pr_alert("->rcu_read_unlock_special: %#x (b: %d, enq: %d nq: %d)\n",
> - t->rcu_read_unlock_special.s,
> - t->rcu_read_unlock_special.b.blocked,
> - t->rcu_read_unlock_special.b.exp_need_qs,
> - t->rcu_read_unlock_special.b.need_qs);
> - local_irq_restore(flags);
> - return;
> - }
> -
> - /* Clean up if blocked during RCU read-side critical section. */
> - if (special.b.blocked) {
> t->rcu_read_unlock_special.b.blocked = false;
>
> /*
> @@ -491,7 +459,6 @@ void rcu_read_unlock_special(struct task_struct *t)
> WARN_ON_ONCE(rnp != t->rcu_blocked_node);
> WARN_ON_ONCE(rnp->level != rcu_num_lvls - 1);
> empty_norm = !rcu_preempt_blocked_readers_cgp(rnp);
> - empty_exp = sync_rcu_preempt_exp_done(rnp);
> smp_mb(); /* ensure expedited fastpath sees end of RCU c-s. */
> np = rcu_next_node_entry(t, rnp);
> list_del_init(&t->rcu_node_entry);
> @@ -515,7 +482,6 @@ void rcu_read_unlock_special(struct task_struct *t)
> * Note that rcu_report_unblock_qs_rnp() releases rnp->lock,
> * so we must take a snapshot of the expedited state.
> */
> - empty_exp_now = sync_rcu_preempt_exp_done(rnp);
> if (!empty_norm && !rcu_preempt_blocked_readers_cgp(rnp)) {
> trace_rcu_quiescent_state_report(TPS("preempt_rcu"),
> rnp->gpnum,
> @@ -532,16 +498,18 @@ void rcu_read_unlock_special(struct task_struct *t)
> /* Unboost if we were boosted. */
> if (IS_ENABLED(CONFIG_RCU_BOOST) && drop_boost_mutex)
> rt_mutex_futex_unlock(&rnp->boost_mtx);
> -
> - /*
> - * If this was the last task on the expedited lists,
> - * then we need to report up the rcu_node hierarchy.
> - */
> - if (!empty_exp && empty_exp_now)
> - rcu_report_exp_rnp(rcu_state_p, rnp, true);
> } else {
> local_irq_restore(flags);
> }
> +
> + /*
> + * Respond to a request for an expedited grace period.
> + */
> + if (special.b.exp_need_qs) {
> + t->rcu_read_unlock_special.b.exp_need_qs = false;
> + rdp = this_cpu_ptr(rcu_state_p->rda);
> + rcu_report_exp_rdp(rcu_state_p, rdp, true);
> + }
OK, so it looks like you are thinking in terms of combining the two
possible calls to rcu_report_exp_rdp(). But can't you lose reports
this way? For example, suppose that this rcu_node structure has
two tasks queued behind ->exp_tasks?
The current code uses the non-empty-to-empty transition to make the
later report happen. How does this code make that happen?
In addition, preemption is rare, so we don't necessarily need to
optimize for that case.
Thanx, Paul
> }
>
> /*
>
On Thu, Mar 08, 2018 at 10:01:56AM -0800, Paul E. McKenney wrote:
> On Thu, Mar 08, 2018 at 07:08:25PM +0900, Byungchul Park wrote:
[...]
> > 2. Clear its bit of ->expmask *only* when it's out of RCU read
> > sections and keep others unchanged. So it will be cleared at the
> > end of the RCU read section in that case.
> >
> > This option would also work because we anyway check both
> > ->exp_tasks and ->expmask to finish the expedite-gp.
>
> This could be made to work, but one shortcoming is that the grace
> period would end up waiting on later read-side critical sections
> that it does not really need to wait on. Also, eventually all the
I don't think it waits on any later ones since ->expmask would be
cleared at the end of the previous RCU read section.
> bits would clear, and they might clear before the last preempted
> task resumed, so the race would still exist.
No problem since ->exp_tasks will prevent the expedite-gp from ending.
> > Current code chose the 1st option and try to report the quiescent state
> > using rcu_report_exp_rdp() when it's out of RCU read sections or the
> > task is preempted, while an expedite-gp is in progress.
> >
> > However, when reporting it within a RCU read section, the reporting
> > hardly goes further since sync_rcu_preempt_exp_done() of course returns
> > false, furthermore, it might add *unnecessary* lock contention of
> > rcu_node's spin lock within rcu_report_exp_cpu_mult(). I think those are
> > unnecessary at all even though there's no logical problem.
>
> It would be possible to use ordering and memory barriers to avoid at
> least some lock acquisitions, but the resulting code would be more
> complex and fragile. So I would want to see a real problem before
Yes, the simpler the better.
> using a more aggressive design.
I admit that I've done far more than I intended. I understand you and
don't wanna change the design aggressively.
> > @@ -716,28 +716,22 @@ static void sync_rcu_exp_handler(void *info)
> > struct rcu_state *rsp = info;
> > struct task_struct *t = current;
> >
> > - /*
> > - * Within an RCU read-side critical section, request that the next
> > - * rcu_read_unlock() report. Unless this RCU read-side critical
> > - * section has already blocked, in which case it is already set
> > - * up for the expedited grace period to wait on it.
> > - */
> > - if (t->rcu_read_lock_nesting > 0 &&
> > - !t->rcu_read_unlock_special.b.blocked) {
> > + if (t->rcu_read_lock_nesting > 0) {
> > + /*
> > + * Within an RCU read-side critical section, request that
> > + * the next rcu_read_unlock() report.
> > + */
> > t->rcu_read_unlock_special.b.exp_need_qs = true;
> > - return;
> > + } else {
> > + /*
> > + * We are either exiting an RCU read-side critical section
> > + * (negative values of t->rcu_read_lock_nesting) or are not
> > + * in one at all (zero value of t->rcu_read_lock_nesting).
> > + * We can immediately report the quiescent state.
> > + */
> > + rdp = this_cpu_ptr(rsp->rda);
> > + rcu_report_exp_rdp(rsp, rdp, true);
> > }
> > -
> > - /*
> > - * We are either exiting an RCU read-side critical section (negative
> > - * values of t->rcu_read_lock_nesting) or are not in one at all
> > - * (zero value of t->rcu_read_lock_nesting). Or we are in an RCU
> > - * read-side critical section that blocked before this expedited
> > - * grace period started. Either way, we can immediately report
> > - * the quiescent state.
> > - */
> > - rdp = this_cpu_ptr(rsp->rda);
> > - rcu_report_exp_rdp(rsp, rdp, true);
>
> This code is equivalent, correct? All that happened is that a
> "return" statement was replaced with an "else" clause. Or am I
> blind this morning?
It's different. I made it do rcu_report_exp_rdp() only when it's out of
RCU read sections, otherwise set the exp_need_qs flag. But I noticed
that this part should be modified thanks to your explanation below.
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index b0d7f9b..b303b63 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -269,20 +269,6 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp)
> > WARN_ON_ONCE(!(blkd_state & RCU_EXP_BLKD) !=
> > !(rnp->expmask & rdp->grpmask));
> > raw_spin_unlock_rcu_node(rnp); /* interrupts remain disabled. */
> > -
> > - /*
> > - * Report the quiescent state for the expedited GP. This expedited
> > - * GP should not be able to end until we report, so there should be
> > - * no need to check for a subsequent expedited GP. (Though we are
> > - * still in a quiescent state in any case.)
> > - */
> > - if (blkd_state & RCU_EXP_BLKD &&
> > - t->rcu_read_unlock_special.b.exp_need_qs) {
> > - t->rcu_read_unlock_special.b.exp_need_qs = false;
> > - rcu_report_exp_rdp(rdp->rsp, rdp, true);
> > - } else {
> > - WARN_ON_ONCE(t->rcu_read_unlock_special.b.exp_need_qs);
> > - }
>
> Again, this does not eliminate the race, from what I can see. And it
There are races neither in the current code nor in my patch, I think.
> can unnecessarily extend the expedited grace period.
I don't know what you're pointing out here. Does the patch I'm attaching
below still have the same problem? Do you remind that it's gonna be
handled in rcu_read_unlock_special()?
> > @@ -532,16 +498,18 @@ void rcu_read_unlock_special(struct task_struct *t)
> > /* Unboost if we were boosted. */
> > if (IS_ENABLED(CONFIG_RCU_BOOST) && drop_boost_mutex)
> > rt_mutex_futex_unlock(&rnp->boost_mtx);
> > -
> > - /*
> > - * If this was the last task on the expedited lists,
> > - * then we need to report up the rcu_node hierarchy.
> > - */
> > - if (!empty_exp && empty_exp_now)
> > - rcu_report_exp_rnp(rcu_state_p, rnp, true);
> > } else {
> > local_irq_restore(flags);
> > }
> > +
> > + /*
> > + * Respond to a request for an expedited grace period.
> > + */
> > + if (special.b.exp_need_qs) {
> > + t->rcu_read_unlock_special.b.exp_need_qs = false;
> > + rdp = this_cpu_ptr(rcu_state_p->rda);
> > + rcu_report_exp_rdp(rcu_state_p, rdp, true);
> > + }
>
> OK, so it looks like you are thinking in terms of combining the two
> possible calls to rcu_report_exp_rdp(). But can't you lose reports
> this way? For example, suppose that this rcu_node structure has
> two tasks queued behind ->exp_tasks?
I certainly missed something here. I noticed it thanks to you.
> The current code uses the non-empty-to-empty transition to make the
> later report happen. How does this code make that happen?
Exactly. My code doesn't make it.
What about the modified version which doesn't change the design but just
eliminates the obvious unnecessary? And it might answer your question
properly saying "which calls to this function (rcu_report_exp_cpu_mult())
do you believe should be removed?".
----->8-----
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 73e1d3d..33dfe6b 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -736,8 +736,10 @@ static void sync_rcu_exp_handler(void *info)
* grace period started. Either way, we can immediately report
* the quiescent state.
*/
- rdp = this_cpu_ptr(rsp->rda);
- rcu_report_exp_rdp(rsp, rdp, true);
+ if (t->rcu_read_lock_nesting <= 0) {
+ rdp = this_cpu_ptr(rsp->rda);
+ rcu_report_exp_rdp(rsp, rdp, true);
+ }
}
/**
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index b0d7f9b..bb6b2dc 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -271,15 +271,12 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp)
raw_spin_unlock_rcu_node(rnp); /* interrupts remain disabled. */
/*
- * Report the quiescent state for the expedited GP. This expedited
- * GP should not be able to end until we report, so there should be
- * no need to check for a subsequent expedited GP. (Though we are
- * still in a quiescent state in any case.)
+ * Reporting the quiescent state for the expedited GP will be
+ * handled by special.b.blocked field in rcu_read_unlock_special().
*/
if (blkd_state & RCU_EXP_BLKD &&
t->rcu_read_unlock_special.b.exp_need_qs) {
t->rcu_read_unlock_special.b.exp_need_qs = false;
- rcu_report_exp_rdp(rdp->rsp, rdp, true);
} else {
WARN_ON_ONCE(t->rcu_read_unlock_special.b.exp_need_qs);
}
@@ -491,7 +488,7 @@ void rcu_read_unlock_special(struct task_struct *t)
WARN_ON_ONCE(rnp != t->rcu_blocked_node);
WARN_ON_ONCE(rnp->level != rcu_num_lvls - 1);
empty_norm = !rcu_preempt_blocked_readers_cgp(rnp);
- empty_exp = sync_rcu_preempt_exp_done(rnp);
+ empty_exp = rnp->exp_tasks == NULL;
smp_mb(); /* ensure expedited fastpath sees end of RCU c-s. */
np = rcu_next_node_entry(t, rnp);
list_del_init(&t->rcu_node_entry);
@@ -515,7 +512,7 @@ void rcu_read_unlock_special(struct task_struct *t)
* Note that rcu_report_unblock_qs_rnp() releases rnp->lock,
* so we must take a snapshot of the expedited state.
*/
- empty_exp_now = sync_rcu_preempt_exp_done(rnp);
+ empty_exp_now = rnp->exp_tasks == NULL;
if (!empty_norm && !rcu_preempt_blocked_readers_cgp(rnp)) {
trace_rcu_quiescent_state_report(TPS("preempt_rcu"),
rnp->gpnum,
@@ -537,8 +534,10 @@ void rcu_read_unlock_special(struct task_struct *t)
* If this was the last task on the expedited lists,
* then we need to report up the rcu_node hierarchy.
*/
- if (!empty_exp && empty_exp_now)
- rcu_report_exp_rnp(rcu_state_p, rnp, true);
+ if (!empty_exp && empty_exp_now) {
+ rdp = this_cpu_ptr(rcu_state_p->rda);
+ rcu_report_exp_rdp(rcu_state_p, rdp, true);
+ }
} else {
local_irq_restore(flags);
}
On 3/9/2018 5:41 PM, Byungchul Park wrote:
> On Thu, Mar 08, 2018 at 10:01:56AM -0800, Paul E. McKenney wrote:
>> On Thu, Mar 08, 2018 at 07:08:25PM +0900, Byungchul Park wrote:
>
> [...]
>
>>> 2. Clear its bit of ->expmask *only* when it's out of RCU read
>>> sections and keep others unchanged. So it will be cleared at the
>>> end of the RCU read section in that case.
>>>
>>> This option would also work because we anyway check both
>>> ->exp_tasks and ->expmask to finish the expedite-gp.
>>
>> This could be made to work, but one shortcoming is that the grace
>> period would end up waiting on later read-side critical sections
>> that it does not really need to wait on. Also, eventually all the
>
> I don't think it waits on any later ones since ->expmask would be
> cleared at the end of the previous RCU read section.
Now having simplified my patch, with the simplified version, I can
see what you were saying. Right, the expedite-gp might be extended
as you said. That optimization cannot be achieved this way without
reggression.
You're awesome. Thank you for explaning the reason why not.
--
Thanks,
Byungchul