2013-08-20 03:51:10

by Chen Gang

[permalink] [raw]
Subject: [PATCH] kernel/rcutree.c: deem to be lazy if there are no callbacks.

According to the comment above rcu_cpu_has_callbacks(): "If there are
no callbacks, all of them are deemed to be lazy".

So when both 'hc' and 'al' are false, '*all_lazy' should be true, not
false.


Signed-off-by: Chen Gang <[email protected]>
---
kernel/rcutree.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 5b53a89..9ee9565 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -2725,7 +2725,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
hc = true;
}
if (all_lazy)
- *all_lazy = al;
+ *all_lazy = !hc ? true : al;
return hc;
}

--
1.7.7.6


2013-08-20 03:52:24

by Chen Gang

[permalink] [raw]
Subject: Re: [PATCH] kernel/rcutree.c: deem to be lazy if there are no callbacks.



If 'hc' is false, 'al' will never be false, either (only need check
"irdp->qlen != rdp->qlen_lazy' when 'rdp->nxtlist' existance).

Recommend to improve the related code, like the diff below.

----------------------------------diff begin------------------------------------

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 5b53a89..421caf0 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -2719,10 +2719,13 @@ static int rcd'_cpu_has_callbacks(int cpu, bool *all_lazy)

for_each_rcu_flavor(rsp) {
rdp = per_cpu_ptr(rsp->rda, cpu);
- if (rdp->qlen != rdp->qlen_lazy)
- al = false;
- if (rdp->nxtlist)
+ if (rdp->nxtlist) {
hc = true;
+ if (rdp->qlen != rdp->qlen_lazy) {
+ al = false;
+ break;
+ }
+ }
}
if (all_lazy)
*all_lazy = al;

----------------------------------diff end--------------------------------------


On 08/20/2013 11:50 AM, Chen Gang wrote:
> According to the comment above rcu_cpu_has_callbacks(): "If there are
> no callbacks, all of them are deemed to be lazy".
>
> So when both 'hc' and 'al' are false, '*all_lazy' should be true, not
> false.
>
>
> Signed-off-by: Chen Gang <[email protected]>
> ---
> kernel/rcutree.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 5b53a89..9ee9565 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -2725,7 +2725,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
> hc = true;
> }
> if (all_lazy)
> - *all_lazy = al;
> + *all_lazy = !hc ? true : al;
> return hc;
> }
>
>


--
Chen Gang

2013-08-20 04:11:00

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] kernel/rcutree.c: deem to be lazy if there are no callbacks.

On Tue, Aug 20, 2013 at 11:50:02AM +0800, Chen Gang wrote:
> According to the comment above rcu_cpu_has_callbacks(): "If there are
> no callbacks, all of them are deemed to be lazy".
>
> So when both 'hc' and 'al' are false, '*all_lazy' should be true, not
> false.

If there are no callbacks, what must the value of "al" be at this
point in the code? Given this, what is the effect of your patch?

Thanx, Paul

> Signed-off-by: Chen Gang <[email protected]>
> ---
> kernel/rcutree.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 5b53a89..9ee9565 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -2725,7 +2725,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
> hc = true;
> }
> if (all_lazy)
> - *all_lazy = al;
> + *all_lazy = !hc ? true : al;
> return hc;
> }
>
> --
> 1.7.7.6
>

2013-08-20 04:18:39

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] kernel/rcutree.c: deem to be lazy if there are no callbacks.

On Tue, Aug 20, 2013 at 11:51:23AM +0800, Chen Gang wrote:
>
>
> If 'hc' is false, 'al' will never be false, either (only need check
> "irdp->qlen != rdp->qlen_lazy' when 'rdp->nxtlist' existance).
>
> Recommend to improve the related code, like the diff below.

Are you sure that this represents an improvement? If so, why?

Or to put it another way, I see a patch that increases the size of the
kernel by three lines. What is the corresponding benefit given common
kernel workloads?

Thanx, Paul

> ----------------------------------diff begin------------------------------------
>
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 5b53a89..421caf0 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -2719,10 +2719,13 @@ static int rcd'_cpu_has_callbacks(int cpu, bool *all_lazy)
>
> for_each_rcu_flavor(rsp) {
> rdp = per_cpu_ptr(rsp->rda, cpu);
> - if (rdp->qlen != rdp->qlen_lazy)
> - al = false;
> - if (rdp->nxtlist)
> + if (rdp->nxtlist) {
> hc = true;
> + if (rdp->qlen != rdp->qlen_lazy) {
> + al = false;
> + break;
> + }
> + }
> }
> if (all_lazy)
> *all_lazy = al;
>
> ----------------------------------diff end--------------------------------------
>
>
> On 08/20/2013 11:50 AM, Chen Gang wrote:
> > According to the comment above rcu_cpu_has_callbacks(): "If there are
> > no callbacks, all of them are deemed to be lazy".
> >
> > So when both 'hc' and 'al' are false, '*all_lazy' should be true, not
> > false.
> >
> >
> > Signed-off-by: Chen Gang <[email protected]>
> > ---
> > kernel/rcutree.c | 2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index 5b53a89..9ee9565 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -2725,7 +2725,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
> > hc = true;
> > }
> > if (all_lazy)
> > - *all_lazy = al;
> > + *all_lazy = !hc ? true : al;
> > return hc;
> > }
> >
> >
>
>
> --
> Chen Gang
>

2013-08-20 04:31:14

by Chen Gang

[permalink] [raw]
Subject: Re: [PATCH] kernel/rcutree.c: deem to be lazy if there are no callbacks.

On 08/20/2013 12:10 PM, Paul E. McKenney wrote:
> On Tue, Aug 20, 2013 at 11:50:02AM +0800, Chen Gang wrote:
>> According to the comment above rcu_cpu_has_callbacks(): "If there are
>> no callbacks, all of them are deemed to be lazy".
>>
>> So when both 'hc' and 'al' are false, '*all_lazy' should be true, not
>> false.
>
> If there are no callbacks, what must the value of "al" be at this
> point in the code? Given this, what is the effect of your patch?
>

Hmm... I find it by reading code, the 'C code' says that 'hc' and 'al'
has no relationships with each other, so for a reader they can assume
when 'hc' is false, 'al' can be either 'true' or 'false'.

> Thanx, Paul
>
>> Signed-off-by: Chen Gang <[email protected]>
>> ---
>> kernel/rcutree.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>> index 5b53a89..9ee9565 100644
>> --- a/kernel/rcutree.c
>> +++ b/kernel/rcutree.c
>> @@ -2725,7 +2725,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
>> hc = true;
>> }
>> if (all_lazy)
>> - *all_lazy = al;
>> + *all_lazy = !hc ? true : al;
>> return hc;
>> }
>>
>> --
>> 1.7.7.6
>>
>
>
>


--
Chen Gang

2013-08-20 04:44:19

by Chen Gang

[permalink] [raw]
Subject: Re: [PATCH] kernel/rcutree.c: deem to be lazy if there are no callbacks.

On 08/20/2013 12:18 PM, Paul E. McKenney wrote:
> On Tue, Aug 20, 2013 at 11:51:23AM +0800, Chen Gang wrote:
>>
>>
>> If 'hc' is false, 'al' will never be false, either (only need check
>> "irdp->qlen != rdp->qlen_lazy' when 'rdp->nxtlist' existance).
>>
>> Recommend to improve the related code, like the diff below.
>
> Are you sure that this represents an improvement? If so, why?
>

If 'hc' and 'al' really has relationships, better to let 'C code'
express it, that will make the code clearer.

> Or to put it another way, I see a patch that increases the size of the
> kernel by three lines. What is the corresponding benefit given common
> kernel workloads?
>

For 'al', need not check for each looping, and for 'hc', may save the
useless looping (so it can make performance better).

For C code, it really increases 3 lines, but may not for assembly code
(excuse me, I am not check it, I think it is not important, although it
is easy to give a comparing for binary).

> Thanx, Paul
>
>> ----------------------------------diff begin------------------------------------
>>
>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>> index 5b53a89..421caf0 100644
>> --- a/kernel/rcutree.c
>> +++ b/kernel/rcutree.c
>> @@ -2719,10 +2719,13 @@ static int rcd'_cpu_has_callbacks(int cpu, bool *all_lazy)
>>
>> for_each_rcu_flavor(rsp) {
>> rdp = per_cpu_ptr(rsp->rda, cpu);
>> - if (rdp->qlen != rdp->qlen_lazy)
>> - al = false;
>> - if (rdp->nxtlist)
>> + if (rdp->nxtlist) {
>> hc = true;
>> + if (rdp->qlen != rdp->qlen_lazy) {
>> + al = false;
>> + break;
>> + }
>> + }
>> }
>> if (all_lazy)
>> *all_lazy = al;
>>
>> ----------------------------------diff end--------------------------------------
>>
>>
>> On 08/20/2013 11:50 AM, Chen Gang wrote:
>>> According to the comment above rcu_cpu_has_callbacks(): "If there are
>>> no callbacks, all of them are deemed to be lazy".
>>>
>>> So when both 'hc' and 'al' are false, '*all_lazy' should be true, not
>>> false.
>>>
>>>
>>> Signed-off-by: Chen Gang <[email protected]>
>>> ---
>>> kernel/rcutree.c | 2 +-
>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>>> index 5b53a89..9ee9565 100644
>>> --- a/kernel/rcutree.c
>>> +++ b/kernel/rcutree.c
>>> @@ -2725,7 +2725,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
>>> hc = true;
>>> }
>>> if (all_lazy)
>>> - *all_lazy = al;
>>> + *all_lazy = !hc ? true : al;
>>> return hc;
>>> }
>>>
>>>
>>
>>
>> --
>> Chen Gang
>>
>
>
>


--
Chen Gang

2013-08-20 04:46:30

by Chen Gang

[permalink] [raw]
Subject: Re: [PATCH] kernel/rcutree.c: deem to be lazy if there are no callbacks.

On 08/20/2013 12:43 PM, Chen Gang wrote:
> On 08/20/2013 12:18 PM, Paul E. McKenney wrote:
>> On Tue, Aug 20, 2013 at 11:51:23AM +0800, Chen Gang wrote:
>>>
>>>
>>> If 'hc' is false, 'al' will never be false, either (only need check
>>> "irdp->qlen != rdp->qlen_lazy' when 'rdp->nxtlist' existance).
>>>
>>> Recommend to improve the related code, like the diff below.
>>
>> Are you sure that this represents an improvement? If so, why?
>>
>
> If 'hc' and 'al' really has relationships, better to let 'C code'
> express it, that will make the code clearer.
>
>> Or to put it another way, I see a patch that increases the size of the
>> kernel by three lines. What is the corresponding benefit given common
>> kernel workloads?
>>
>
> For 'al', need not check for each looping, and for 'hc', may save the
> useless looping (so it can make performance better).
>
> For C code, it really increases 3 lines, but may not for assembly code
> (excuse me, I am not check it, I think it is not important, although it
> is easy to give a comparing for binary).
>

Oh, sorry, I mean: only for our case, "it is not important".


>> Thanx, Paul
>>
>>> ----------------------------------diff begin------------------------------------
>>>
>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>>> index 5b53a89..421caf0 100644
>>> --- a/kernel/rcutree.c
>>> +++ b/kernel/rcutree.c
>>> @@ -2719,10 +2719,13 @@ static int rcd'_cpu_has_callbacks(int cpu, bool *all_lazy)
>>>
>>> for_each_rcu_flavor(rsp) {
>>> rdp = per_cpu_ptr(rsp->rda, cpu);
>>> - if (rdp->qlen != rdp->qlen_lazy)
>>> - al = false;
>>> - if (rdp->nxtlist)
>>> + if (rdp->nxtlist) {
>>> hc = true;
>>> + if (rdp->qlen != rdp->qlen_lazy) {
>>> + al = false;
>>> + break;
>>> + }
>>> + }
>>> }
>>> if (all_lazy)
>>> *all_lazy = al;
>>>
>>> ----------------------------------diff end--------------------------------------
>>>
>>>
>>> On 08/20/2013 11:50 AM, Chen Gang wrote:
>>>> According to the comment above rcu_cpu_has_callbacks(): "If there are
>>>> no callbacks, all of them are deemed to be lazy".
>>>>
>>>> So when both 'hc' and 'al' are false, '*all_lazy' should be true, not
>>>> false.
>>>>
>>>>
>>>> Signed-off-by: Chen Gang <[email protected]>
>>>> ---
>>>> kernel/rcutree.c | 2 +-
>>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>>>> index 5b53a89..9ee9565 100644
>>>> --- a/kernel/rcutree.c
>>>> +++ b/kernel/rcutree.c
>>>> @@ -2725,7 +2725,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
>>>> hc = true;
>>>> }
>>>> if (all_lazy)
>>>> - *all_lazy = al;
>>>> + *all_lazy = !hc ? true : al;
>>>> return hc;
>>>> }
>>>>
>>>>
>>>
>>>
>>> --
>>> Chen Gang
>>>
>>
>>
>>
>
>


--
Chen Gang

2013-08-21 06:00:32

by Chen Gang

[permalink] [raw]
Subject: Re: [PATCH] kernel/rcutree.c: deem to be lazy if there are no callbacks.


If we still doubt about it, but can not find a suitable way to fix it
(neither of us are familiar with it).

Is it suitable to use BUG_ON() for it (the diff may like below) ?

-------------------------------diff begin-------------------------------

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index dbf74b5..1d02659 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -2728,6 +2728,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
if (rdp->nxtlist)
hc = true;
}
+ BUG_ON(!hc && !al);
if (all_lazy)
*all_lazy = al;
return hc;

-------------------------------diff end---------------------------------

Thanks.


On 08/20/2013 12:45 PM, Chen Gang wrote:
> On 08/20/2013 12:43 PM, Chen Gang wrote:
>> On 08/20/2013 12:18 PM, Paul E. McKenney wrote:
>>> On Tue, Aug 20, 2013 at 11:51:23AM +0800, Chen Gang wrote:
>>>>
>>>>
>>>> If 'hc' is false, 'al' will never be false, either (only need check
>>>> "irdp->qlen != rdp->qlen_lazy' when 'rdp->nxtlist' existance).
>>>>
>>>> Recommend to improve the related code, like the diff below.
>>>
>>> Are you sure that this represents an improvement? If so, why?
>>>
>>
>> If 'hc' and 'al' really has relationships, better to let 'C code'
>> express it, that will make the code clearer.
>>
>>> Or to put it another way, I see a patch that increases the size of the
>>> kernel by three lines. What is the corresponding benefit given common
>>> kernel workloads?
>>>
>>
>> For 'al', need not check for each looping, and for 'hc', may save the
>> useless looping (so it can make performance better).
>>
>> For C code, it really increases 3 lines, but may not for assembly code
>> (excuse me, I am not check it, I think it is not important, although it
>> is easy to give a comparing for binary).
>>
>
> Oh, sorry, I mean: only for our case, "it is not important".
>
>
>>> Thanx, Paul
>>>
>>>> ----------------------------------diff begin------------------------------------
>>>>
>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>>>> index 5b53a89..421caf0 100644
>>>> --- a/kernel/rcutree.c
>>>> +++ b/kernel/rcutree.c
>>>> @@ -2719,10 +2719,13 @@ static int rcd'_cpu_has_callbacks(int cpu, bool *all_lazy)
>>>>
>>>> for_each_rcu_flavor(rsp) {
>>>> rdp = per_cpu_ptr(rsp->rda, cpu);
>>>> - if (rdp->qlen != rdp->qlen_lazy)
>>>> - al = false;
>>>> - if (rdp->nxtlist)
>>>> + if (rdp->nxtlist) {
>>>> hc = true;
>>>> + if (rdp->qlen != rdp->qlen_lazy) {
>>>> + al = false;
>>>> + break;
>>>> + }
>>>> + }
>>>> }
>>>> if (all_lazy)
>>>> *all_lazy = al;
>>>>
>>>> ----------------------------------diff end--------------------------------------
>>>>
>>>>
>>>> On 08/20/2013 11:50 AM, Chen Gang wrote:
>>>>> According to the comment above rcu_cpu_has_callbacks(): "If there are
>>>>> no callbacks, all of them are deemed to be lazy".
>>>>>
>>>>> So when both 'hc' and 'al' are false, '*all_lazy' should be true, not
>>>>> false.
>>>>>
>>>>>
>>>>> Signed-off-by: Chen Gang <[email protected]>
>>>>> ---
>>>>> kernel/rcutree.c | 2 +-
>>>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>>>
>>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>>>>> index 5b53a89..9ee9565 100644
>>>>> --- a/kernel/rcutree.c
>>>>> +++ b/kernel/rcutree.c
>>>>> @@ -2725,7 +2725,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
>>>>> hc = true;
>>>>> }
>>>>> if (all_lazy)
>>>>> - *all_lazy = al;
>>>>> + *all_lazy = !hc ? true : al;
>>>>> return hc;
>>>>> }
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Chen Gang
>>>>
>>>
>>>
>>>
>>
>>
>
>


--
Chen Gang

2013-08-21 14:23:15

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] kernel/rcutree.c: deem to be lazy if there are no callbacks.

On Wed, Aug 21, 2013 at 01:59:29PM +0800, Chen Gang wrote:
>
> If we still doubt about it, but can not find a suitable way to fix it
> (neither of us are familiar with it).

Well, you have that halfway correct, which some might well argue is an
upward trend from your earlier postings. I do appreciate your honesty
in saying that you are not familiar with it. That said, most people
reading these emails will have figured that out by now.

> Is it suitable to use BUG_ON() for it (the diff may like below) ?

Hmmm... Since my use of questions seems to have confused you, I will
not use questions in this reply. (The Google search string "quick quiz
site:lwn.net" may help you avoid this confusion in the future.)

And no, the patch below is not appropriate. Of course, even if it
was appropriate to accept it, I would be unable to do so, as it has
no Signed-of-by. But I would expect that you already knew that,
given that your first patch (which was also inappropriate) did have
your Signed-off-by.

In addition, I cordially invite you to interpret my questions in the
earlier emails in this thread as quick quizzes, and to contemplate how
you have been doing with your answers.

Don't get me wrong, I do welcome appropriate patches. In fact, if
you look at RCU's git history, you will see that I frequently accept
patches from a fair number of people. And if you were willing to
invest some time and thought, you might eventually be able to generate
an appropriate (albeit low priority) patch to this function. However,
you seem to be motivated to submit small patches with a minimum of
thought and preparation, perhaps because you need to meet some external
or self-imposed quota of accepted patches. And if you are in fact driven
by a quota that prevents you from taking the time required to carefully
think things through, you are wasting your time with RCU.

Good luck!

Thanx, Paul

> -------------------------------diff begin-------------------------------
>
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index dbf74b5..1d02659 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -2728,6 +2728,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
> if (rdp->nxtlist)
> hc = true;
> }
> + BUG_ON(!hc && !al);
> if (all_lazy)
> *all_lazy = al;
> return hc;
>
> -------------------------------diff end---------------------------------
>
> Thanks.
>
>
> On 08/20/2013 12:45 PM, Chen Gang wrote:
> > On 08/20/2013 12:43 PM, Chen Gang wrote:
> >> On 08/20/2013 12:18 PM, Paul E. McKenney wrote:
> >>> On Tue, Aug 20, 2013 at 11:51:23AM +0800, Chen Gang wrote:
> >>>>
> >>>>
> >>>> If 'hc' is false, 'al' will never be false, either (only need check
> >>>> "irdp->qlen != rdp->qlen_lazy' when 'rdp->nxtlist' existance).
> >>>>
> >>>> Recommend to improve the related code, like the diff below.
> >>>
> >>> Are you sure that this represents an improvement? If so, why?
> >>>
> >>
> >> If 'hc' and 'al' really has relationships, better to let 'C code'
> >> express it, that will make the code clearer.
> >>
> >>> Or to put it another way, I see a patch that increases the size of the
> >>> kernel by three lines. What is the corresponding benefit given common
> >>> kernel workloads?
> >>>
> >>
> >> For 'al', need not check for each looping, and for 'hc', may save the
> >> useless looping (so it can make performance better).
> >>
> >> For C code, it really increases 3 lines, but may not for assembly code
> >> (excuse me, I am not check it, I think it is not important, although it
> >> is easy to give a comparing for binary).
> >>
> >
> > Oh, sorry, I mean: only for our case, "it is not important".
> >
> >
> >>> Thanx, Paul
> >>>
> >>>> ----------------------------------diff begin------------------------------------
> >>>>
> >>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> >>>> index 5b53a89..421caf0 100644
> >>>> --- a/kernel/rcutree.c
> >>>> +++ b/kernel/rcutree.c
> >>>> @@ -2719,10 +2719,13 @@ static int rcd'_cpu_has_callbacks(int cpu, bool *all_lazy)
> >>>>
> >>>> for_each_rcu_flavor(rsp) {
> >>>> rdp = per_cpu_ptr(rsp->rda, cpu);
> >>>> - if (rdp->qlen != rdp->qlen_lazy)
> >>>> - al = false;
> >>>> - if (rdp->nxtlist)
> >>>> + if (rdp->nxtlist) {
> >>>> hc = true;
> >>>> + if (rdp->qlen != rdp->qlen_lazy) {
> >>>> + al = false;
> >>>> + break;
> >>>> + }
> >>>> + }
> >>>> }
> >>>> if (all_lazy)
> >>>> *all_lazy = al;
> >>>>
> >>>> ----------------------------------diff end--------------------------------------
> >>>>
> >>>>
> >>>> On 08/20/2013 11:50 AM, Chen Gang wrote:
> >>>>> According to the comment above rcu_cpu_has_callbacks(): "If there are
> >>>>> no callbacks, all of them are deemed to be lazy".
> >>>>>
> >>>>> So when both 'hc' and 'al' are false, '*all_lazy' should be true, not
> >>>>> false.
> >>>>>
> >>>>>
> >>>>> Signed-off-by: Chen Gang <[email protected]>
> >>>>> ---
> >>>>> kernel/rcutree.c | 2 +-
> >>>>> 1 files changed, 1 insertions(+), 1 deletions(-)
> >>>>>
> >>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> >>>>> index 5b53a89..9ee9565 100644
> >>>>> --- a/kernel/rcutree.c
> >>>>> +++ b/kernel/rcutree.c
> >>>>> @@ -2725,7 +2725,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
> >>>>> hc = true;
> >>>>> }
> >>>>> if (all_lazy)
> >>>>> - *all_lazy = al;
> >>>>> + *all_lazy = !hc ? true : al;
> >>>>> return hc;
> >>>>> }
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Chen Gang
> >>>>
> >>>
> >>>
> >>>
> >>
> >>
> >
> >
>
>
> --
> Chen Gang
>

2013-08-22 03:02:57

by Chen Gang

[permalink] [raw]
Subject: Re: [PATCH] kernel/rcutree.c: deem to be lazy if there are no callbacks.

On 08/21/2013 10:23 PM, Paul E. McKenney wrote:
> On Wed, Aug 21, 2013 at 01:59:29PM +0800, Chen Gang wrote:
>>
>> If we still doubt about it, but can not find a suitable way to fix it
>> (neither of us are familiar with it).
>
> Well, you have that halfway correct, which some might well argue is an
> upward trend from your earlier postings. I do appreciate your honesty
> in saying that you are not familiar with it. That said, most people
> reading these emails will have figured that out by now.
>

Hmm... do you mean you familiar with it, and know the correct fix ?

If so, please say directly, and let the discussion finish.


>> Is it suitable to use BUG_ON() for it (the diff may like below) ?
>
> Hmmm... Since my use of questions seems to have confused you, I will
> not use questions in this reply. (The Google search string "quick quiz
> site:lwn.net" may help you avoid this confusion in the future.)
>

It seems "quick quiz site:lwn.net" is valuable (at least, originally I
don't know about it), thanks.


> And no, the patch below is not appropriate. Of course, even if it
> was appropriate to accept it, I would be unable to do so, as it has
> no Signed-of-by. But I would expect that you already knew that,
> given that your first patch (which was also inappropriate) did have
> your Signed-off-by.
>

So I say it is a diff below, not a patch below, it is only for
discussion, not for applying.

Hmm... if make it as a patch (include Signed-of-by and others), will be
good to efficiency (not need send patch again), I will do, next time.


> In addition, I cordially invite you to interpret my questions in the
> earlier emails in this thread as quick quizzes, and to contemplate how
> you have been doing with your answers.
>

Do you know the answer ? If so need not let me reply.

The main goal of the discussion is not for learning, it is for
providing contributions (in our case, it is for fixing issues).

Normally, it will really can learn from the discussions, but it is not
our main goal (every members time resources are expensive).


But if you do not know the answer either, I should try.


> Don't get me wrong, I do welcome appropriate patches. In fact, if
> you look at RCU's git history, you will see that I frequently accept
> patches from a fair number of people. And if you were willing to
> invest some time and thought, you might eventually be able to generate
> an appropriate (albeit low priority) patch to this function. However,
> you seem to be motivated to submit small patches with a minimum of
> thought and preparation, perhaps because you need to meet some external
> or self-imposed quota of accepted patches. And if you are in fact driven
> by a quota that prevents you from taking the time required to carefully
> think things through, you are wasting your time with RCU.
>

Hmm... at least, some contents you said above is correct to me.

At least, I should provide 10 patches per month, it is a necessary
basic requirement to me.

And what my focus is efficiency: let appliers and maintainers together
to provide contributes to outside with efficiency.

If you already know about it, why need I continue ? but if you don't
know either, I should try.


> Good luck!
>

Thanks.

> Thanx, Paul
>
>> -------------------------------diff begin-------------------------------
>>
>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>> index dbf74b5..1d02659 100644
>> --- a/kernel/rcutree.c
>> +++ b/kernel/rcutree.c
>> @@ -2728,6 +2728,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
>> if (rdp->nxtlist)
>> hc = true;
>> }
>> + BUG_ON(!hc && !al);
>> if (all_lazy)
>> *all_lazy = al;
>> return hc;
>>
>> -------------------------------diff end---------------------------------
>>
>> Thanks.
>>
>>
>> On 08/20/2013 12:45 PM, Chen Gang wrote:
>>> On 08/20/2013 12:43 PM, Chen Gang wrote:
>>>> On 08/20/2013 12:18 PM, Paul E. McKenney wrote:
>>>>> On Tue, Aug 20, 2013 at 11:51:23AM +0800, Chen Gang wrote:
>>>>>>
>>>>>>
>>>>>> If 'hc' is false, 'al' will never be false, either (only need check
>>>>>> "irdp->qlen != rdp->qlen_lazy' when 'rdp->nxtlist' existance).
>>>>>>
>>>>>> Recommend to improve the related code, like the diff below.
>>>>>
>>>>> Are you sure that this represents an improvement? If so, why?
>>>>>
>>>>
>>>> If 'hc' and 'al' really has relationships, better to let 'C code'
>>>> express it, that will make the code clearer.
>>>>
>>>>> Or to put it another way, I see a patch that increases the size of the
>>>>> kernel by three lines. What is the corresponding benefit given common
>>>>> kernel workloads?
>>>>>
>>>>
>>>> For 'al', need not check for each looping, and for 'hc', may save the
>>>> useless looping (so it can make performance better).
>>>>
>>>> For C code, it really increases 3 lines, but may not for assembly code
>>>> (excuse me, I am not check it, I think it is not important, although it
>>>> is easy to give a comparing for binary).
>>>>
>>>
>>> Oh, sorry, I mean: only for our case, "it is not important".
>>>
>>>
>>>>> Thanx, Paul
>>>>>
>>>>>> ----------------------------------diff begin------------------------------------
>>>>>>
>>>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>>>>>> index 5b53a89..421caf0 100644
>>>>>> --- a/kernel/rcutree.c
>>>>>> +++ b/kernel/rcutree.c
>>>>>> @@ -2719,10 +2719,13 @@ static int rcd'_cpu_has_callbacks(int cpu, bool *all_lazy)
>>>>>>
>>>>>> for_each_rcu_flavor(rsp) {
>>>>>> rdp = per_cpu_ptr(rsp->rda, cpu);
>>>>>> - if (rdp->qlen != rdp->qlen_lazy)
>>>>>> - al = false;
>>>>>> - if (rdp->nxtlist)
>>>>>> + if (rdp->nxtlist) {
>>>>>> hc = true;
>>>>>> + if (rdp->qlen != rdp->qlen_lazy) {
>>>>>> + al = false;
>>>>>> + break;
>>>>>> + }
>>>>>> + }
>>>>>> }
>>>>>> if (all_lazy)
>>>>>> *all_lazy = al;
>>>>>>
>>>>>> ----------------------------------diff end--------------------------------------
>>>>>>
>>>>>>
>>>>>> On 08/20/2013 11:50 AM, Chen Gang wrote:
>>>>>>> According to the comment above rcu_cpu_has_callbacks(): "If there are
>>>>>>> no callbacks, all of them are deemed to be lazy".
>>>>>>>
>>>>>>> So when both 'hc' and 'al' are false, '*all_lazy' should be true, not
>>>>>>> false.
>>>>>>>
>>>>>>>
>>>>>>> Signed-off-by: Chen Gang <[email protected]>
>>>>>>> ---
>>>>>>> kernel/rcutree.c | 2 +-
>>>>>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>>>>>
>>>>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>>>>>>> index 5b53a89..9ee9565 100644
>>>>>>> --- a/kernel/rcutree.c
>>>>>>> +++ b/kernel/rcutree.c
>>>>>>> @@ -2725,7 +2725,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
>>>>>>> hc = true;
>>>>>>> }
>>>>>>> if (all_lazy)
>>>>>>> - *all_lazy = al;
>>>>>>> + *all_lazy = !hc ? true : al;
>>>>>>> return hc;
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Chen Gang
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>> --
>> Chen Gang
>>
>
>
>


--
Chen Gang

2013-08-25 19:19:17

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] kernel/rcutree.c: deem to be lazy if there are no callbacks.

On Thu, Aug 22, 2013 at 11:01:53AM +0800, Chen Gang wrote:
> On 08/21/2013 10:23 PM, Paul E. McKenney wrote:
> > On Wed, Aug 21, 2013 at 01:59:29PM +0800, Chen Gang wrote:

[ . . . ]

> > Don't get me wrong, I do welcome appropriate patches. In fact, if
> > you look at RCU's git history, you will see that I frequently accept
> > patches from a fair number of people. And if you were willing to
> > invest some time and thought, you might eventually be able to generate
> > an appropriate (albeit low priority) patch to this function. However,
> > you seem to be motivated to submit small patches with a minimum of
> > thought and preparation, perhaps because you need to meet some external
> > or self-imposed quota of accepted patches. And if you are in fact driven
> > by a quota that prevents you from taking the time required to carefully
> > think things through, you are wasting your time with RCU.
>
> Hmm... at least, some contents you said above is correct to me.
>
> At least, I should provide 10 patches per month, it is a necessary
> basic requirement to me.

OK, that does help explain the otherwise inexplicable approach you have
been taking. Let's see how you have been doing, based on committer date
in Linus's tree:

1 2012-11
15 2013-01
7 2013-02
20 2013-03
21 2013-04
12 2013-05
17 2013-06
10 2013-07

The last few months might be understated a bit due to patches
still being in maintainer trees. This is a nice contrast from my
first impression of you from https://lkml.org/lkml/2013/6/9/64 and
https://lkml.org/lkml/2013/8/19/650, neither of which gave me any
reason to trust your work, to put it mildly. And if I cannot trust
your work, I obviously cannot accept your patches.

You do seem to select for localized bug fixes, which require less work
than the performance-motivated patches you were putting forward earlier
in this thread. With a localized bug, you demonstrate the bug, show the
fix, and that is that. From what I can see, part of the problem with
your patches in this email thread is that you are trying to move from
localized bug fixes to performance issues without doing the additional
work required. Please see below for a rough outline of this additional
work.

> And what my focus is efficiency: let appliers and maintainers together
> to provide contributes to outside with efficiency.

Sounds great, but there are many possible definitions of "efficiency".
Given your quota, I would expect your definition to involve number of
patches accepted. In contrast, my definition for RCU instead involves
maintainability, robustness, scalability, and, for a few critical
code paths, performance. I therefore need you to have thought through
and carefully tested your patch.

> If you already know about it, why need I continue ? but if you don't
> know either, I should try.

What I need you to do in future RCU performance patch submissions is:

1. Think through your patch and the code that it is modifying.
If you submit a patch to me, you should be able to answer the
sorts of questions that I was asking in this thread.

2. Tell me what situations your patch helps and not.

3. Tell me how much your patch improves performance in the
situations where it helps.

4. Test the code. If it makes a measurable difference, present
the performance results. (It would be very surprising if your
early-loop exit patch made a significant difference, expecially
on a CONFIG_PREEMPT=n kernel.)

5. Rather than randomly dropping into the code, use actual measurements
to determine where to focus your performance-improvement efforts.
Developers, even experienced ones, are really bad at guessing
where the most important performance problems are.

6. Use your judgement. For example, 1000-line patch to improve a
slowpath by 0.1% simply isn't worth it. A high risk of adding
bugs for a microscopic benefit? Thanks, but no thanks!!!

For your patch https://lkml.org/lkml/2013/8/19/651, which was closest
of the three to being useful, here are some things about RCU that you
should have taken the time to learn -before- submitting the patch:

a. Q: How many iterations for the for_each_rcu_flavor() loop?
A: On CONFIG_PREEMPT=n kernels, only two iterations.
On CONFIG_PREEMPT=y kernels, only three iterations.

b. Q: Which flavor of RCU is most likely to have non-lazy callbacks
queued?

A: On CONFIG_PREEMPT=y kernels, the first one in the list.
For CONFIG_PREEMPT=n kernels, it is last in the list.
(In other words, for CONFIG_PREEMPT=n kernels, this change
won't help at all, at least not without also changing the
order of the list.)

c. Q: Do any of the other for_each_rcu_flavor() loops care what order
the flavors are in?

A: No. (In other words, it is OK to reorder the list to improve
the performance.)

d. Q: What is the performance benefit of this change?

A: Quite small, for example, much less than an atomic operation
on a shared data item. It is probably not possible to
measure the performance difference.

e. Q: Is the change on a hotpath?

A: Somewhat. It is not on the read side, but it is on the path
to and from idle, which can be important for latency-sensitive
workloads.

f. Q: How did you test this patch?

A: As far as I can see, you did no testing.

If I receive a future patch from you that does not convince me that you
know the answer to questions like these, I will most likely ignore it.

Just for practice, let's rework your second patch to make it something
that I might accept. Here is what you had:

for_each_rcu_flavor(rsp) {
rdp = per_cpu_ptr(rsp->rda, cpu);
- if (rdp->qlen != rdp->qlen_lazy)
- al = false;
- if (rdp->nxtlist)
+ if (rdp->nxtlist) {
hc = true;
+ if (rdp->qlen != rdp->qlen_lazy) {
+ al = false;
+ break;
+ }
+ }
}
if (all_lazy)
*all_lazy = al;

We need to do something about the indentation, perhaps as follows:

for_each_rcu_flavor(rsp) {
rdp = per_cpu_ptr(rsp->rda, cpu);
if (!rdp->nxtlist)
continue;
hc = true;
if (rdp->qlen != rdp->qlen_lazy) {
al = false;
break;
}
}
if (all_lazy)
*all_lazy = al;


We also need to change the following code in rcu_init() in the file
kernel/rcutree.c:

rcu_init_one(&rcu_sched_state, &rcu_sched_data);
rcu_init_one(&rcu_bh_state, &rcu_bh_data);
__rcu_init_preempt();

So that it gets rcu_sched_state in the right place, which I believe is
like this:

rcu_init_one(&rcu_bh_state, &rcu_bh_data);
rcu_init_one(&rcu_sched_state, &rcu_sched_data);
__rcu_init_preempt();


If you make these changes, test them with RCU_FAST_NO_HZ both set and
not set, and verify that rcu_sched_state is first in the flavor list
for kernels with PREEMPT=n and that rcu_preempt_state is first in flavor
list for kernels with PREEMPT=y, and send me a the resulting patch by end
of day Friday, China time, I will seriously consider it for acceptance.
Otherwise, I will author the patch myself with your Reported-by.

Again, good luck!

Thanx, Paul

> > Good luck!
> >
>
> Thanks.
>
> > Thanx, Paul
> >
> >> -------------------------------diff begin-------------------------------
> >>
> >> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> >> index dbf74b5..1d02659 100644
> >> --- a/kernel/rcutree.c
> >> +++ b/kernel/rcutree.c
> >> @@ -2728,6 +2728,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
> >> if (rdp->nxtlist)
> >> hc = true;
> >> }
> >> + BUG_ON(!hc && !al);
> >> if (all_lazy)
> >> *all_lazy = al;
> >> return hc;
> >>
> >> -------------------------------diff end---------------------------------
> >>
> >> Thanks.
> >>
> >>
> >> On 08/20/2013 12:45 PM, Chen Gang wrote:
> >>> On 08/20/2013 12:43 PM, Chen Gang wrote:
> >>>> On 08/20/2013 12:18 PM, Paul E. McKenney wrote:
> >>>>> On Tue, Aug 20, 2013 at 11:51:23AM +0800, Chen Gang wrote:
> >>>>>>
> >>>>>>
> >>>>>> If 'hc' is false, 'al' will never be false, either (only need check
> >>>>>> "irdp->qlen != rdp->qlen_lazy' when 'rdp->nxtlist' existance).
> >>>>>>
> >>>>>> Recommend to improve the related code, like the diff below.
> >>>>>
> >>>>> Are you sure that this represents an improvement? If so, why?
> >>>>>
> >>>>
> >>>> If 'hc' and 'al' really has relationships, better to let 'C code'
> >>>> express it, that will make the code clearer.
> >>>>
> >>>>> Or to put it another way, I see a patch that increases the size of the
> >>>>> kernel by three lines. What is the corresponding benefit given common
> >>>>> kernel workloads?
> >>>>>
> >>>>
> >>>> For 'al', need not check for each looping, and for 'hc', may save the
> >>>> useless looping (so it can make performance better).
> >>>>
> >>>> For C code, it really increases 3 lines, but may not for assembly code
> >>>> (excuse me, I am not check it, I think it is not important, although it
> >>>> is easy to give a comparing for binary).
> >>>>
> >>>
> >>> Oh, sorry, I mean: only for our case, "it is not important".
> >>>
> >>>
> >>>>> Thanx, Paul
> >>>>>
> >>>>>> ----------------------------------diff begin------------------------------------
> >>>>>>
> >>>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> >>>>>> index 5b53a89..421caf0 100644
> >>>>>> --- a/kernel/rcutree.c
> >>>>>> +++ b/kernel/rcutree.c
> >>>>>> @@ -2719,10 +2719,13 @@ static int rcd'_cpu_has_callbacks(int cpu, bool *all_lazy)
> >>>>>>
> >>>>>> for_each_rcu_flavor(rsp) {
> >>>>>> rdp = per_cpu_ptr(rsp->rda, cpu);
> >>>>>> - if (rdp->qlen != rdp->qlen_lazy)
> >>>>>> - al = false;
> >>>>>> - if (rdp->nxtlist)
> >>>>>> + if (rdp->nxtlist) {
> >>>>>> hc = true;
> >>>>>> + if (rdp->qlen != rdp->qlen_lazy) {
> >>>>>> + al = false;
> >>>>>> + break;
> >>>>>> + }
> >>>>>> + }
> >>>>>> }
> >>>>>> if (all_lazy)
> >>>>>> *all_lazy = al;
> >>>>>>
> >>>>>> ----------------------------------diff end--------------------------------------
> >>>>>>
> >>>>>>
> >>>>>> On 08/20/2013 11:50 AM, Chen Gang wrote:
> >>>>>>> According to the comment above rcu_cpu_has_callbacks(): "If there are
> >>>>>>> no callbacks, all of them are deemed to be lazy".
> >>>>>>>
> >>>>>>> So when both 'hc' and 'al' are false, '*all_lazy' should be true, not
> >>>>>>> false.
> >>>>>>>
> >>>>>>>
> >>>>>>> Signed-off-by: Chen Gang <[email protected]>
> >>>>>>> ---
> >>>>>>> kernel/rcutree.c | 2 +-
> >>>>>>> 1 files changed, 1 insertions(+), 1 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> >>>>>>> index 5b53a89..9ee9565 100644
> >>>>>>> --- a/kernel/rcutree.c
> >>>>>>> +++ b/kernel/rcutree.c
> >>>>>>> @@ -2725,7 +2725,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
> >>>>>>> hc = true;
> >>>>>>> }
> >>>>>>> if (all_lazy)
> >>>>>>> - *all_lazy = al;
> >>>>>>> + *all_lazy = !hc ? true : al;
> >>>>>>> return hc;
> >>>>>>> }
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Chen Gang
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Chen Gang
> >>
> >
> >
> >
>
>
> --
> Chen Gang
>

2013-08-26 02:22:27

by Chen Gang F T

[permalink] [raw]
Subject: Re: [PATCH] kernel/rcutree.c: deem to be lazy if there are no callbacks.


Firstly, thank you for your reply with these details.

On 08/26/2013 03:18 AM, Paul E. McKenney wrote:
> On Thu, Aug 22, 2013 at 11:01:53AM +0800, Chen Gang wrote:
>> On 08/21/2013 10:23 PM, Paul E. McKenney wrote:
>>> On Wed, Aug 21, 2013 at 01:59:29PM +0800, Chen Gang wrote:
>
> [ . . . ]
>
>>> Don't get me wrong, I do welcome appropriate patches. In fact, if
>>> you look at RCU's git history, you will see that I frequently accept
>>> patches from a fair number of people. And if you were willing to
>>> invest some time and thought, you might eventually be able to generate
>>> an appropriate (albeit low priority) patch to this function. However,
>>> you seem to be motivated to submit small patches with a minimum of
>>> thought and preparation, perhaps because you need to meet some external
>>> or self-imposed quota of accepted patches. And if you are in fact driven
>>> by a quota that prevents you from taking the time required to carefully
>>> think things through, you are wasting your time with RCU.
>>
>> Hmm... at least, some contents you said above is correct to me.
>>
>> At least, I should provide 10 patches per month, it is a necessary
>> basic requirement to me.
>
> OK, that does help explain the otherwise inexplicable approach you have
> been taking. Let's see how you have been doing, based on committer date
> in Linus's tree:
>
> 1 2012-11
> 15 2013-01
> 7 2013-02
> 20 2013-03
> 21 2013-04
> 12 2013-05
> 17 2013-06
> 10 2013-07
>
> The last few months might be understated a bit due to patches
> still being in maintainer trees. This is a nice contrast from my
> first impression of you from https://lkml.org/lkml/2013/6/9/64 and
> https://lkml.org/lkml/2013/8/19/650, neither of which gave me any
> reason to trust your work, to put it mildly. And if I cannot trust
> your work, I obviously cannot accept your patches.
>

Hmm... better to check patches independent personal feelings (trust
some one, or not).

;-)


> You do seem to select for localized bug fixes, which require less work
> than the performance-motivated patches you were putting forward earlier
> in this thread. With a localized bug, you demonstrate the bug, show the
> fix, and that is that. From what I can see, part of the problem with
> your patches in this email thread is that you are trying to move from
> localized bug fixes to performance issues without doing the additional
> work required. Please see below for a rough outline of this additional
> work.
>

Hmm... it seems I need describe my work flow for fixing bugs in details.

1. Is it a bug ?
if so, I can be marked as Reported-by and continue to 2nd.
else, it is a waste mail.

2. Try to fix it in simple ways (so can save the maintainers time resource).
if it can be accepted by maintainers, it is OK (I can be Signed-off-by).
else need continue to 3rd.

exception: if I can not find a simple way to fix it, I will send [Suggestion] mail.

3. Do the maintainers know how to fix it ?
if yes, fix it together with maintainers (may mark me only as Reported-by).
else need continue to Last.

Last: I should analyze it and fix it (it is my duty to fix it).


How do you feel about this work flow ? welcome any suggestions or
completions.

Thanks.

>> And what my focus is efficiency: let appliers and maintainers together
>> to provide contributes to outside with efficiency.
>
> Sounds great, but there are many possible definitions of "efficiency".
> Given your quota, I would expect your definition to involve number of
> patches accepted. In contrast, my definition for RCU instead involves
> maintainability, robustness, scalability, and, for a few critical
> code paths, performance. I therefore need you to have thought through
> and carefully tested your patch.
>

Hmm... it seems I need give more description for the 'efficiency' which
I point to.

If it is no negative effect with the quality, we need try to use less
resources (e.g. time resources) to provide more contributions (e.g. fix
issue).


>> If you already know about it, why need I continue ? but if you don't
>> know either, I should try.
>
> What I need you to do in future RCU performance patch submissions is:
>
> 1. Think through your patch and the code that it is modifying.
> If you submit a patch to me, you should be able to answer the
> sorts of questions that I was asking in this thread.
>
> 2. Tell me what situations your patch helps and not.
>
> 3. Tell me how much your patch improves performance in the
> situations where it helps.
>
> 4. Test the code. If it makes a measurable difference, present
> the performance results. (It would be very surprising if your
> early-loop exit patch made a significant difference, expecially
> on a CONFIG_PREEMPT=n kernel.)
>
> 5. Rather than randomly dropping into the code, use actual measurements
> to determine where to focus your performance-improvement efforts.
> Developers, even experienced ones, are really bad at guessing
> where the most important performance problems are.
>
> 6. Use your judgement. For example, 1000-line patch to improve a
> slowpath by 0.1% simply isn't worth it. A high risk of adding
> bugs for a microscopic benefit? Thanks, but no thanks!!!
>
> For your patch https://lkml.org/lkml/2013/8/19/651, which was closest
> of the three to being useful, here are some things about RCU that you
> should have taken the time to learn -before- submitting the patch:
>
> a. Q: How many iterations for the for_each_rcu_flavor() loop?
> A: On CONFIG_PREEMPT=n kernels, only two iterations.
> On CONFIG_PREEMPT=y kernels, only three iterations.
>
> b. Q: Which flavor of RCU is most likely to have non-lazy callbacks
> queued?
>
> A: On CONFIG_PREEMPT=y kernels, the first one in the list.
> For CONFIG_PREEMPT=n kernels, it is last in the list.
> (In other words, for CONFIG_PREEMPT=n kernels, this change
> won't help at all, at least not without also changing the
> order of the list.)
>
> c. Q: Do any of the other for_each_rcu_flavor() loops care what order
> the flavors are in?
>
> A: No. (In other words, it is OK to reorder the list to improve
> the performance.)
>
> d. Q: What is the performance benefit of this change?
>
> A: Quite small, for example, much less than an atomic operation
> on a shared data item. It is probably not possible to
> measure the performance difference.
>
> e. Q: Is the change on a hotpath?
>
> A: Somewhat. It is not on the read side, but it is on the path
> to and from idle, which can be important for latency-sensitive
> workloads.
>
> f. Q: How did you test this patch?
>
> A: As far as I can see, you did no testing.
>
> If I receive a future patch from you that does not convince me that you
> know the answer to questions like these, I will most likely ignore it.
>

Hmm... it sounds reasonable for some cases.

e.g.

when neither you nor me know about how to fix it.

As a patch maker, I should continue trying to fix it.
(what you said above is valuable reference to me).

As an integrator, you should give a necessary check for it.
(what you said above is the necessary check for it).


If the integrator already know about how to fix it, it seems what you
said above is not quite efficient.


> Just for practice, let's rework your second patch to make it something
> that I might accept. Here is what you had:
>
> for_each_rcu_flavor(rsp) {
> rdp = per_cpu_ptr(rsp->rda, cpu);
> - if (rdp->qlen != rdp->qlen_lazy)
> - al = false;
> - if (rdp->nxtlist)
> + if (rdp->nxtlist) {
> hc = true;
> + if (rdp->qlen != rdp->qlen_lazy) {
> + al = false;
> + break;
> + }
> + }
> }
> if (all_lazy)
> *all_lazy = al;
>
> We need to do something about the indentation, perhaps as follows:
>
> for_each_rcu_flavor(rsp) {
> rdp = per_cpu_ptr(rsp->rda, cpu);
> if (!rdp->nxtlist)
> continue;
> hc = true;
> if (rdp->qlen != rdp->qlen_lazy) {
> al = false;
> break;
> }
> }
> if (all_lazy)
> *all_lazy = al;
>
>
> We also need to change the following code in rcu_init() in the file
> kernel/rcutree.c:
>
> rcu_init_one(&rcu_sched_state, &rcu_sched_data);
> rcu_init_one(&rcu_bh_state, &rcu_bh_data);
> __rcu_init_preempt();
>
> So that it gets rcu_sched_state in the right place, which I believe is
> like this:
>
> rcu_init_one(&rcu_bh_state, &rcu_bh_data);
> rcu_init_one(&rcu_sched_state, &rcu_sched_data);
> __rcu_init_preempt();
>
>

At least for me, it sounds reasonable. It seems you have already know
about how to fix it (you never directly say you know about it, so I use
'seems').


> If you make these changes, test them with RCU_FAST_NO_HZ both set and
> not set, and verify that rcu_sched_state is first in the flavor list
> for kernels with PREEMPT=n and that rcu_preempt_state is first in flavor
> list for kernels with PREEMPT=y, and send me a the resulting patch by end
> of day Friday, China time, I will seriously consider it for acceptance.
> Otherwise, I will author the patch myself with your Reported-by.
>

If you have already know about how to fix it, please fix it as soon as
possible when you have time (mark me as Reported-by is OK).

If you need additional help from me for this issue, please let me know,
I should try.


:-)


Thanks.

> Again, good luck!
>
> Thanx, Paul
>
>>> Good luck!
>>>
>>
>> Thanks.
>>
>>> Thanx, Paul
>>>
>>>> -------------------------------diff begin-------------------------------
>>>>
>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>>>> index dbf74b5..1d02659 100644
>>>> --- a/kernel/rcutree.c
>>>> +++ b/kernel/rcutree.c
>>>> @@ -2728,6 +2728,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
>>>> if (rdp->nxtlist)
>>>> hc = true;
>>>> }
>>>> + BUG_ON(!hc && !al);
>>>> if (all_lazy)
>>>> *all_lazy = al;
>>>> return hc;
>>>>
>>>> -------------------------------diff end---------------------------------
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> On 08/20/2013 12:45 PM, Chen Gang wrote:
>>>>> On 08/20/2013 12:43 PM, Chen Gang wrote:
>>>>>> On 08/20/2013 12:18 PM, Paul E. McKenney wrote:
>>>>>>> On Tue, Aug 20, 2013 at 11:51:23AM +0800, Chen Gang wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> If 'hc' is false, 'al' will never be false, either (only need check
>>>>>>>> "irdp->qlen != rdp->qlen_lazy' when 'rdp->nxtlist' existance).
>>>>>>>>
>>>>>>>> Recommend to improve the related code, like the diff below.
>>>>>>>
>>>>>>> Are you sure that this represents an improvement? If so, why?
>>>>>>>
>>>>>>
>>>>>> If 'hc' and 'al' really has relationships, better to let 'C code'
>>>>>> express it, that will make the code clearer.
>>>>>>
>>>>>>> Or to put it another way, I see a patch that increases the size of the
>>>>>>> kernel by three lines. What is the corresponding benefit given common
>>>>>>> kernel workloads?
>>>>>>>
>>>>>>
>>>>>> For 'al', need not check for each looping, and for 'hc', may save the
>>>>>> useless looping (so it can make performance better).
>>>>>>
>>>>>> For C code, it really increases 3 lines, but may not for assembly code
>>>>>> (excuse me, I am not check it, I think it is not important, although it
>>>>>> is easy to give a comparing for binary).
>>>>>>
>>>>>
>>>>> Oh, sorry, I mean: only for our case, "it is not important".
>>>>>
>>>>>
>>>>>>> Thanx, Paul
>>>>>>>
>>>>>>>> ----------------------------------diff begin------------------------------------
>>>>>>>>
>>>>>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>>>>>>>> index 5b53a89..421caf0 100644
>>>>>>>> --- a/kernel/rcutree.c
>>>>>>>> +++ b/kernel/rcutree.c
>>>>>>>> @@ -2719,10 +2719,13 @@ static int rcd'_cpu_has_callbacks(int cpu, bool *all_lazy)
>>>>>>>>
>>>>>>>> for_each_rcu_flavor(rsp) {
>>>>>>>> rdp = per_cpu_ptr(rsp->rda, cpu);
>>>>>>>> - if (rdp->qlen != rdp->qlen_lazy)
>>>>>>>> - al = false;
>>>>>>>> - if (rdp->nxtlist)
>>>>>>>> + if (rdp->nxtlist) {
>>>>>>>> hc = true;
>>>>>>>> + if (rdp->qlen != rdp->qlen_lazy) {
>>>>>>>> + al = false;
>>>>>>>> + break;
>>>>>>>> + }
>>>>>>>> + }
>>>>>>>> }
>>>>>>>> if (all_lazy)
>>>>>>>> *all_lazy = al;
>>>>>>>>
>>>>>>>> ----------------------------------diff end--------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> On 08/20/2013 11:50 AM, Chen Gang wrote:
>>>>>>>>> According to the comment above rcu_cpu_has_callbacks(): "If there are
>>>>>>>>> no callbacks, all of them are deemed to be lazy".
>>>>>>>>>
>>>>>>>>> So when both 'hc' and 'al' are false, '*all_lazy' should be true, not
>>>>>>>>> false.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Signed-off-by: Chen Gang <[email protected]>
>>>>>>>>> ---
>>>>>>>>> kernel/rcutree.c | 2 +-
>>>>>>>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>>>>>>>>> index 5b53a89..9ee9565 100644
>>>>>>>>> --- a/kernel/rcutree.c
>>>>>>>>> +++ b/kernel/rcutree.c
>>>>>>>>> @@ -2725,7 +2725,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
>>>>>>>>> hc = true;
>>>>>>>>> }
>>>>>>>>> if (all_lazy)
>>>>>>>>> - *all_lazy = al;
>>>>>>>>> + *all_lazy = !hc ? true : al;
>>>>>>>>> return hc;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Chen Gang
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Chen Gang
>>>>
>>>
>>>
>>>
>>
>>
>> --
>> Chen Gang
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


--
Chen Gang

2013-09-03 05:42:10

by Chen Gang

[permalink] [raw]
Subject: Re: [PATCH] kernel/rcutree.c: deem to be lazy if there are no callbacks.

Hello Maintainers:

Is this issue finished ?

If need additional help from me (e.g. some test things, or others, if
you have no time, can let me try), please let me know, I should try.


Thanks.

On 08/26/2013 10:21 AM, Chen Gang F T wrote:
>
> Firstly, thank you for your reply with these details.
>
> On 08/26/2013 03:18 AM, Paul E. McKenney wrote:
>> On Thu, Aug 22, 2013 at 11:01:53AM +0800, Chen Gang wrote:
>>> On 08/21/2013 10:23 PM, Paul E. McKenney wrote:
>>>> On Wed, Aug 21, 2013 at 01:59:29PM +0800, Chen Gang wrote:
>>
>> [ . . . ]
>>
>>>> Don't get me wrong, I do welcome appropriate patches. In fact, if
>>>> you look at RCU's git history, you will see that I frequently accept
>>>> patches from a fair number of people. And if you were willing to
>>>> invest some time and thought, you might eventually be able to generate
>>>> an appropriate (albeit low priority) patch to this function. However,
>>>> you seem to be motivated to submit small patches with a minimum of
>>>> thought and preparation, perhaps because you need to meet some external
>>>> or self-imposed quota of accepted patches. And if you are in fact driven
>>>> by a quota that prevents you from taking the time required to carefully
>>>> think things through, you are wasting your time with RCU.
>>>
>>> Hmm... at least, some contents you said above is correct to me.
>>>
>>> At least, I should provide 10 patches per month, it is a necessary
>>> basic requirement to me.
>>
>> OK, that does help explain the otherwise inexplicable approach you have
>> been taking. Let's see how you have been doing, based on committer date
>> in Linus's tree:
>>
>> 1 2012-11
>> 15 2013-01
>> 7 2013-02
>> 20 2013-03
>> 21 2013-04
>> 12 2013-05
>> 17 2013-06
>> 10 2013-07
>>
>> The last few months might be understated a bit due to patches
>> still being in maintainer trees. This is a nice contrast from my
>> first impression of you from https://lkml.org/lkml/2013/6/9/64 and
>> https://lkml.org/lkml/2013/8/19/650, neither of which gave me any
>> reason to trust your work, to put it mildly. And if I cannot trust
>> your work, I obviously cannot accept your patches.
>>
>
> Hmm... better to check patches independent personal feelings (trust
> some one, or not).
>
> ;-)
>
>
>> You do seem to select for localized bug fixes, which require less work
>> than the performance-motivated patches you were putting forward earlier
>> in this thread. With a localized bug, you demonstrate the bug, show the
>> fix, and that is that. From what I can see, part of the problem with
>> your patches in this email thread is that you are trying to move from
>> localized bug fixes to performance issues without doing the additional
>> work required. Please see below for a rough outline of this additional
>> work.
>>
>
> Hmm... it seems I need describe my work flow for fixing bugs in details.
>
> 1. Is it a bug ?
> if so, I can be marked as Reported-by and continue to 2nd.
> else, it is a waste mail.
>
> 2. Try to fix it in simple ways (so can save the maintainers time resource).
> if it can be accepted by maintainers, it is OK (I can be Signed-off-by).
> else need continue to 3rd.
>
> exception: if I can not find a simple way to fix it, I will send [Suggestion] mail.
>
> 3. Do the maintainers know how to fix it ?
> if yes, fix it together with maintainers (may mark me only as Reported-by).
> else need continue to Last.
>
> Last: I should analyze it and fix it (it is my duty to fix it).
>
>
> How do you feel about this work flow ? welcome any suggestions or
> completions.
>
> Thanks.
>
>>> And what my focus is efficiency: let appliers and maintainers together
>>> to provide contributes to outside with efficiency.
>>
>> Sounds great, but there are many possible definitions of "efficiency".
>> Given your quota, I would expect your definition to involve number of
>> patches accepted. In contrast, my definition for RCU instead involves
>> maintainability, robustness, scalability, and, for a few critical
>> code paths, performance. I therefore need you to have thought through
>> and carefully tested your patch.
>>
>
> Hmm... it seems I need give more description for the 'efficiency' which
> I point to.
>
> If it is no negative effect with the quality, we need try to use less
> resources (e.g. time resources) to provide more contributions (e.g. fix
> issue).
>
>
>>> If you already know about it, why need I continue ? but if you don't
>>> know either, I should try.
>>
>> What I need you to do in future RCU performance patch submissions is:
>>
>> 1. Think through your patch and the code that it is modifying.
>> If you submit a patch to me, you should be able to answer the
>> sorts of questions that I was asking in this thread.
>>
>> 2. Tell me what situations your patch helps and not.
>>
>> 3. Tell me how much your patch improves performance in the
>> situations where it helps.
>>
>> 4. Test the code. If it makes a measurable difference, present
>> the performance results. (It would be very surprising if your
>> early-loop exit patch made a significant difference, expecially
>> on a CONFIG_PREEMPT=n kernel.)
>>
>> 5. Rather than randomly dropping into the code, use actual measurements
>> to determine where to focus your performance-improvement efforts.
>> Developers, even experienced ones, are really bad at guessing
>> where the most important performance problems are.
>>
>> 6. Use your judgement. For example, 1000-line patch to improve a
>> slowpath by 0.1% simply isn't worth it. A high risk of adding
>> bugs for a microscopic benefit? Thanks, but no thanks!!!
>>
>> For your patch https://lkml.org/lkml/2013/8/19/651, which was closest
>> of the three to being useful, here are some things about RCU that you
>> should have taken the time to learn -before- submitting the patch:
>>
>> a. Q: How many iterations for the for_each_rcu_flavor() loop?
>> A: On CONFIG_PREEMPT=n kernels, only two iterations.
>> On CONFIG_PREEMPT=y kernels, only three iterations.
>>
>> b. Q: Which flavor of RCU is most likely to have non-lazy callbacks
>> queued?
>>
>> A: On CONFIG_PREEMPT=y kernels, the first one in the list.
>> For CONFIG_PREEMPT=n kernels, it is last in the list.
>> (In other words, for CONFIG_PREEMPT=n kernels, this change
>> won't help at all, at least not without also changing the
>> order of the list.)
>>
>> c. Q: Do any of the other for_each_rcu_flavor() loops care what order
>> the flavors are in?
>>
>> A: No. (In other words, it is OK to reorder the list to improve
>> the performance.)
>>
>> d. Q: What is the performance benefit of this change?
>>
>> A: Quite small, for example, much less than an atomic operation
>> on a shared data item. It is probably not possible to
>> measure the performance difference.
>>
>> e. Q: Is the change on a hotpath?
>>
>> A: Somewhat. It is not on the read side, but it is on the path
>> to and from idle, which can be important for latency-sensitive
>> workloads.
>>
>> f. Q: How did you test this patch?
>>
>> A: As far as I can see, you did no testing.
>>
>> If I receive a future patch from you that does not convince me that you
>> know the answer to questions like these, I will most likely ignore it.
>>
>
> Hmm... it sounds reasonable for some cases.
>
> e.g.
>
> when neither you nor me know about how to fix it.
>
> As a patch maker, I should continue trying to fix it.
> (what you said above is valuable reference to me).
>
> As an integrator, you should give a necessary check for it.
> (what you said above is the necessary check for it).
>
>
> If the integrator already know about how to fix it, it seems what you
> said above is not quite efficient.
>
>
>> Just for practice, let's rework your second patch to make it something
>> that I might accept. Here is what you had:
>>
>> for_each_rcu_flavor(rsp) {
>> rdp = per_cpu_ptr(rsp->rda, cpu);
>> - if (rdp->qlen != rdp->qlen_lazy)
>> - al = false;
>> - if (rdp->nxtlist)
>> + if (rdp->nxtlist) {
>> hc = true;
>> + if (rdp->qlen != rdp->qlen_lazy) {
>> + al = false;
>> + break;
>> + }
>> + }
>> }
>> if (all_lazy)
>> *all_lazy = al;
>>
>> We need to do something about the indentation, perhaps as follows:
>>
>> for_each_rcu_flavor(rsp) {
>> rdp = per_cpu_ptr(rsp->rda, cpu);
>> if (!rdp->nxtlist)
>> continue;
>> hc = true;
>> if (rdp->qlen != rdp->qlen_lazy) {
>> al = false;
>> break;
>> }
>> }
>> if (all_lazy)
>> *all_lazy = al;
>>
>>
>> We also need to change the following code in rcu_init() in the file
>> kernel/rcutree.c:
>>
>> rcu_init_one(&rcu_sched_state, &rcu_sched_data);
>> rcu_init_one(&rcu_bh_state, &rcu_bh_data);
>> __rcu_init_preempt();
>>
>> So that it gets rcu_sched_state in the right place, which I believe is
>> like this:
>>
>> rcu_init_one(&rcu_bh_state, &rcu_bh_data);
>> rcu_init_one(&rcu_sched_state, &rcu_sched_data);
>> __rcu_init_preempt();
>>
>>
>
> At least for me, it sounds reasonable. It seems you have already know
> about how to fix it (you never directly say you know about it, so I use
> 'seems').
>
>
>> If you make these changes, test them with RCU_FAST_NO_HZ both set and
>> not set, and verify that rcu_sched_state is first in the flavor list
>> for kernels with PREEMPT=n and that rcu_preempt_state is first in flavor
>> list for kernels with PREEMPT=y, and send me a the resulting patch by end
>> of day Friday, China time, I will seriously consider it for acceptance.
>> Otherwise, I will author the patch myself with your Reported-by.
>>
>
> If you have already know about how to fix it, please fix it as soon as
> possible when you have time (mark me as Reported-by is OK).
>
> If you need additional help from me for this issue, please let me know,
> I should try.
>
>
> :-)
>
>
> Thanks.
>
>> Again, good luck!
>>
>> Thanx, Paul
>>
>>>> Good luck!
>>>>
>>>
>>> Thanks.
>>>
>>>> Thanx, Paul
>>>>
>>>>> -------------------------------diff begin-------------------------------
>>>>>
>>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>>>>> index dbf74b5..1d02659 100644
>>>>> --- a/kernel/rcutree.c
>>>>> +++ b/kernel/rcutree.c
>>>>> @@ -2728,6 +2728,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
>>>>> if (rdp->nxtlist)
>>>>> hc = true;
>>>>> }
>>>>> + BUG_ON(!hc && !al);
>>>>> if (all_lazy)
>>>>> *all_lazy = al;
>>>>> return hc;
>>>>>
>>>>> -------------------------------diff end---------------------------------
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> On 08/20/2013 12:45 PM, Chen Gang wrote:
>>>>>> On 08/20/2013 12:43 PM, Chen Gang wrote:
>>>>>>> On 08/20/2013 12:18 PM, Paul E. McKenney wrote:
>>>>>>>> On Tue, Aug 20, 2013 at 11:51:23AM +0800, Chen Gang wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If 'hc' is false, 'al' will never be false, either (only need check
>>>>>>>>> "irdp->qlen != rdp->qlen_lazy' when 'rdp->nxtlist' existance).
>>>>>>>>>
>>>>>>>>> Recommend to improve the related code, like the diff below.
>>>>>>>>
>>>>>>>> Are you sure that this represents an improvement? If so, why?
>>>>>>>>
>>>>>>>
>>>>>>> If 'hc' and 'al' really has relationships, better to let 'C code'
>>>>>>> express it, that will make the code clearer.
>>>>>>>
>>>>>>>> Or to put it another way, I see a patch that increases the size of the
>>>>>>>> kernel by three lines. What is the corresponding benefit given common
>>>>>>>> kernel workloads?
>>>>>>>>
>>>>>>>
>>>>>>> For 'al', need not check for each looping, and for 'hc', may save the
>>>>>>> useless looping (so it can make performance better).
>>>>>>>
>>>>>>> For C code, it really increases 3 lines, but may not for assembly code
>>>>>>> (excuse me, I am not check it, I think it is not important, although it
>>>>>>> is easy to give a comparing for binary).
>>>>>>>
>>>>>>
>>>>>> Oh, sorry, I mean: only for our case, "it is not important".
>>>>>>
>>>>>>
>>>>>>>> Thanx, Paul
>>>>>>>>
>>>>>>>>> ----------------------------------diff begin------------------------------------
>>>>>>>>>
>>>>>>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>>>>>>>>> index 5b53a89..421caf0 100644
>>>>>>>>> --- a/kernel/rcutree.c
>>>>>>>>> +++ b/kernel/rcutree.c
>>>>>>>>> @@ -2719,10 +2719,13 @@ static int rcd'_cpu_has_callbacks(int cpu, bool *all_lazy)
>>>>>>>>>
>>>>>>>>> for_each_rcu_flavor(rsp) {
>>>>>>>>> rdp = per_cpu_ptr(rsp->rda, cpu);
>>>>>>>>> - if (rdp->qlen != rdp->qlen_lazy)
>>>>>>>>> - al = false;
>>>>>>>>> - if (rdp->nxtlist)
>>>>>>>>> + if (rdp->nxtlist) {
>>>>>>>>> hc = true;
>>>>>>>>> + if (rdp->qlen != rdp->qlen_lazy) {
>>>>>>>>> + al = false;
>>>>>>>>> + break;
>>>>>>>>> + }
>>>>>>>>> + }
>>>>>>>>> }
>>>>>>>>> if (all_lazy)
>>>>>>>>> *all_lazy = al;
>>>>>>>>>
>>>>>>>>> ----------------------------------diff end--------------------------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 08/20/2013 11:50 AM, Chen Gang wrote:
>>>>>>>>>> According to the comment above rcu_cpu_has_callbacks(): "If there are
>>>>>>>>>> no callbacks, all of them are deemed to be lazy".
>>>>>>>>>>
>>>>>>>>>> So when both 'hc' and 'al' are false, '*all_lazy' should be true, not
>>>>>>>>>> false.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Chen Gang <[email protected]>
>>>>>>>>>> ---
>>>>>>>>>> kernel/rcutree.c | 2 +-
>>>>>>>>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>>>>>>>>>> index 5b53a89..9ee9565 100644
>>>>>>>>>> --- a/kernel/rcutree.c
>>>>>>>>>> +++ b/kernel/rcutree.c
>>>>>>>>>> @@ -2725,7 +2725,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
>>>>>>>>>> hc = true;
>>>>>>>>>> }
>>>>>>>>>> if (all_lazy)
>>>>>>>>>> - *all_lazy = al;
>>>>>>>>>> + *all_lazy = !hc ? true : al;
>>>>>>>>>> return hc;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Chen Gang
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Chen Gang
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Chen Gang
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>
>


--
Chen Gang

2013-09-03 18:08:00

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] kernel/rcutree.c: deem to be lazy if there are no callbacks.

On Tue, Sep 03, 2013 at 01:41:03PM +0800, Chen Gang wrote:
> Hello Maintainers:
>
> Is this issue finished ?
>
> If need additional help from me (e.g. some test things, or others, if
> you have no time, can let me try), please let me know, I should try.

Ah, sorry, here is the patch.

Thanx, Paul

------------------------------------------------------------------------

rcu: Micro-optimize rcu_cpu_has_callbacks()

The for_each_rcu_flavor() loop unconditionally scans all flavors, even
when the first flavor might have some non-lazy callbacks. Once the
loop has seen a non-lazy callback, further passes through the loop
cannot change the state. This is not a huge problem, given that there
can be at most three RCU flavors (RCU-bh, RCU-preempt, and RCU-sched),
but this code is on the path to idle, so speeding it up even a small
amount would have some benefit.

This commit therefore does two things:

1. Rearranges the order of the list of RCU flavors in order to
place the most active flavor first in the list. The most active
RCU flavor is RCU-preempt, or, if there is no RCU-preempt,
RCU-sched.

2. Reworks the for_each_rcu_flavor() to exit early when the first
non-lazy callback is seen, or, in the case where the caller
does not care about non-lazy callbacks (RCU_FAST_NO_HZ=n),
when the first callback is seen.

Reported-by: Chen Gang <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index b1b959d..38596be 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -2756,10 +2756,13 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)

for_each_rcu_flavor(rsp) {
rdp = per_cpu_ptr(rsp->rda, cpu);
- if (rdp->qlen != rdp->qlen_lazy)
+ if (!rdp->nxtlist)
+ continue;
+ hc = true;
+ if (rdp->qlen != rdp->qlen_lazy || !all_lazy) {
al = false;
- if (rdp->nxtlist)
- hc = true;
+ break;
+ }
}
if (all_lazy)
*all_lazy = al;
@@ -3326,8 +3329,8 @@ void __init rcu_init(void)

rcu_bootup_announce();
rcu_init_geometry();
- rcu_init_one(&rcu_sched_state, &rcu_sched_data);
rcu_init_one(&rcu_bh_state, &rcu_bh_data);
+ rcu_init_one(&rcu_sched_state, &rcu_sched_data);
__rcu_init_preempt();
open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);

2013-09-03 19:36:47

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] kernel/rcutree.c: deem to be lazy if there are no callbacks.

On Mon, Aug 26, 2013 at 10:21:18AM +0800, Chen Gang F T wrote:
>
> Firstly, thank you for your reply with these details.
>
> On 08/26/2013 03:18 AM, Paul E. McKenney wrote:
> > On Thu, Aug 22, 2013 at 11:01:53AM +0800, Chen Gang wrote:
> >> On 08/21/2013 10:23 PM, Paul E. McKenney wrote:
> >>> On Wed, Aug 21, 2013 at 01:59:29PM +0800, Chen Gang wrote:
> >
> > [ . . . ]
> >
> >>> Don't get me wrong, I do welcome appropriate patches. In fact, if
> >>> you look at RCU's git history, you will see that I frequently accept
> >>> patches from a fair number of people. And if you were willing to
> >>> invest some time and thought, you might eventually be able to generate
> >>> an appropriate (albeit low priority) patch to this function. However,
> >>> you seem to be motivated to submit small patches with a minimum of
> >>> thought and preparation, perhaps because you need to meet some external
> >>> or self-imposed quota of accepted patches. And if you are in fact driven
> >>> by a quota that prevents you from taking the time required to carefully
> >>> think things through, you are wasting your time with RCU.
> >>
> >> Hmm... at least, some contents you said above is correct to me.
> >>
> >> At least, I should provide 10 patches per month, it is a necessary
> >> basic requirement to me.
> >
> > OK, that does help explain the otherwise inexplicable approach you have
> > been taking. Let's see how you have been doing, based on committer date
> > in Linus's tree:
> >
> > 1 2012-11
> > 15 2013-01
> > 7 2013-02
> > 20 2013-03
> > 21 2013-04
> > 12 2013-05
> > 17 2013-06
> > 10 2013-07
> >
> > The last few months might be understated a bit due to patches
> > still being in maintainer trees. This is a nice contrast from my
> > first impression of you from https://lkml.org/lkml/2013/6/9/64 and
> > https://lkml.org/lkml/2013/8/19/650, neither of which gave me any
> > reason to trust your work, to put it mildly. And if I cannot trust
> > your work, I obviously cannot accept your patches.
> >
>
> Hmm... better to check patches independent personal feelings (trust
> some one, or not).
>
> ;-)

Believe me, I judged based on your first two patches! Those were my
first impression of you.

> > You do seem to select for localized bug fixes, which require less work
> > than the performance-motivated patches you were putting forward earlier
> > in this thread. With a localized bug, you demonstrate the bug, show the
> > fix, and that is that. From what I can see, part of the problem with
> > your patches in this email thread is that you are trying to move from
> > localized bug fixes to performance issues without doing the additional
> > work required. Please see below for a rough outline of this additional
> > work.
> >
>
> Hmm... it seems I need describe my work flow for fixing bugs in details.
>
> 1. Is it a bug ?
> if so, I can be marked as Reported-by and continue to 2nd.
> else, it is a waste mail.
>
> 2. Try to fix it in simple ways (so can save the maintainers time resource).
> if it can be accepted by maintainers, it is OK (I can be Signed-off-by).
> else need continue to 3rd.
>
> exception: if I can not find a simple way to fix it, I will send [Suggestion] mail.
>
> 3. Do the maintainers know how to fix it ?
> if yes, fix it together with maintainers (may mark me only as Reported-by).
> else need continue to Last.
>
> Last: I should analyze it and fix it (it is my duty to fix it).
>
>
> How do you feel about this work flow ? welcome any suggestions or
> completions.

I am surprised that there are no testing or validation steps.

Especially if you ever want to progress to more complex fixes, your life
will be easier if you do some testing where feasible. As might your
maintainers' lives: Any bug your testing catches is one buggy patch that
the maintainers do not need to look at.

In addition, as noted earlier in this thread, validation is important
for performance improvements.

Thanx, Paul

> Thanks.
>
> >> And what my focus is efficiency: let appliers and maintainers together
> >> to provide contributes to outside with efficiency.
> >
> > Sounds great, but there are many possible definitions of "efficiency".
> > Given your quota, I would expect your definition to involve number of
> > patches accepted. In contrast, my definition for RCU instead involves
> > maintainability, robustness, scalability, and, for a few critical
> > code paths, performance. I therefore need you to have thought through
> > and carefully tested your patch.
> >
>
> Hmm... it seems I need give more description for the 'efficiency' which
> I point to.
>
> If it is no negative effect with the quality, we need try to use less
> resources (e.g. time resources) to provide more contributions (e.g. fix
> issue).
>
>
> >> If you already know about it, why need I continue ? but if you don't
> >> know either, I should try.
> >
> > What I need you to do in future RCU performance patch submissions is:
> >
> > 1. Think through your patch and the code that it is modifying.
> > If you submit a patch to me, you should be able to answer the
> > sorts of questions that I was asking in this thread.
> >
> > 2. Tell me what situations your patch helps and not.
> >
> > 3. Tell me how much your patch improves performance in the
> > situations where it helps.
> >
> > 4. Test the code. If it makes a measurable difference, present
> > the performance results. (It would be very surprising if your
> > early-loop exit patch made a significant difference, expecially
> > on a CONFIG_PREEMPT=n kernel.)
> >
> > 5. Rather than randomly dropping into the code, use actual measurements
> > to determine where to focus your performance-improvement efforts.
> > Developers, even experienced ones, are really bad at guessing
> > where the most important performance problems are.
> >
> > 6. Use your judgement. For example, 1000-line patch to improve a
> > slowpath by 0.1% simply isn't worth it. A high risk of adding
> > bugs for a microscopic benefit? Thanks, but no thanks!!!
> >
> > For your patch https://lkml.org/lkml/2013/8/19/651, which was closest
> > of the three to being useful, here are some things about RCU that you
> > should have taken the time to learn -before- submitting the patch:
> >
> > a. Q: How many iterations for the for_each_rcu_flavor() loop?
> > A: On CONFIG_PREEMPT=n kernels, only two iterations.
> > On CONFIG_PREEMPT=y kernels, only three iterations.
> >
> > b. Q: Which flavor of RCU is most likely to have non-lazy callbacks
> > queued?
> >
> > A: On CONFIG_PREEMPT=y kernels, the first one in the list.
> > For CONFIG_PREEMPT=n kernels, it is last in the list.
> > (In other words, for CONFIG_PREEMPT=n kernels, this change
> > won't help at all, at least not without also changing the
> > order of the list.)
> >
> > c. Q: Do any of the other for_each_rcu_flavor() loops care what order
> > the flavors are in?
> >
> > A: No. (In other words, it is OK to reorder the list to improve
> > the performance.)
> >
> > d. Q: What is the performance benefit of this change?
> >
> > A: Quite small, for example, much less than an atomic operation
> > on a shared data item. It is probably not possible to
> > measure the performance difference.
> >
> > e. Q: Is the change on a hotpath?
> >
> > A: Somewhat. It is not on the read side, but it is on the path
> > to and from idle, which can be important for latency-sensitive
> > workloads.
> >
> > f. Q: How did you test this patch?
> >
> > A: As far as I can see, you did no testing.
> >
> > If I receive a future patch from you that does not convince me that you
> > know the answer to questions like these, I will most likely ignore it.
> >
>
> Hmm... it sounds reasonable for some cases.
>
> e.g.
>
> when neither you nor me know about how to fix it.
>
> As a patch maker, I should continue trying to fix it.
> (what you said above is valuable reference to me).
>
> As an integrator, you should give a necessary check for it.
> (what you said above is the necessary check for it).
>
>
> If the integrator already know about how to fix it, it seems what you
> said above is not quite efficient.
>
>
> > Just for practice, let's rework your second patch to make it something
> > that I might accept. Here is what you had:
> >
> > for_each_rcu_flavor(rsp) {
> > rdp = per_cpu_ptr(rsp->rda, cpu);
> > - if (rdp->qlen != rdp->qlen_lazy)
> > - al = false;
> > - if (rdp->nxtlist)
> > + if (rdp->nxtlist) {
> > hc = true;
> > + if (rdp->qlen != rdp->qlen_lazy) {
> > + al = false;
> > + break;
> > + }
> > + }
> > }
> > if (all_lazy)
> > *all_lazy = al;
> >
> > We need to do something about the indentation, perhaps as follows:
> >
> > for_each_rcu_flavor(rsp) {
> > rdp = per_cpu_ptr(rsp->rda, cpu);
> > if (!rdp->nxtlist)
> > continue;
> > hc = true;
> > if (rdp->qlen != rdp->qlen_lazy) {
> > al = false;
> > break;
> > }
> > }
> > if (all_lazy)
> > *all_lazy = al;
> >
> >
> > We also need to change the following code in rcu_init() in the file
> > kernel/rcutree.c:
> >
> > rcu_init_one(&rcu_sched_state, &rcu_sched_data);
> > rcu_init_one(&rcu_bh_state, &rcu_bh_data);
> > __rcu_init_preempt();
> >
> > So that it gets rcu_sched_state in the right place, which I believe is
> > like this:
> >
> > rcu_init_one(&rcu_bh_state, &rcu_bh_data);
> > rcu_init_one(&rcu_sched_state, &rcu_sched_data);
> > __rcu_init_preempt();
> >
> >
>
> At least for me, it sounds reasonable. It seems you have already know
> about how to fix it (you never directly say you know about it, so I use
> 'seems').
>
>
> > If you make these changes, test them with RCU_FAST_NO_HZ both set and
> > not set, and verify that rcu_sched_state is first in the flavor list
> > for kernels with PREEMPT=n and that rcu_preempt_state is first in flavor
> > list for kernels with PREEMPT=y, and send me a the resulting patch by end
> > of day Friday, China time, I will seriously consider it for acceptance.
> > Otherwise, I will author the patch myself with your Reported-by.
> >
>
> If you have already know about how to fix it, please fix it as soon as
> possible when you have time (mark me as Reported-by is OK).
>
> If you need additional help from me for this issue, please let me know,
> I should try.
>
>
> :-)
>
>
> Thanks.
>
> > Again, good luck!
> >
> > Thanx, Paul
> >
> >>> Good luck!
> >>>
> >>
> >> Thanks.
> >>
> >>> Thanx, Paul
> >>>
> >>>> -------------------------------diff begin-------------------------------
> >>>>
> >>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> >>>> index dbf74b5..1d02659 100644
> >>>> --- a/kernel/rcutree.c
> >>>> +++ b/kernel/rcutree.c
> >>>> @@ -2728,6 +2728,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
> >>>> if (rdp->nxtlist)
> >>>> hc = true;
> >>>> }
> >>>> + BUG_ON(!hc && !al);
> >>>> if (all_lazy)
> >>>> *all_lazy = al;
> >>>> return hc;
> >>>>
> >>>> -------------------------------diff end---------------------------------
> >>>>
> >>>> Thanks.
> >>>>
> >>>>
> >>>> On 08/20/2013 12:45 PM, Chen Gang wrote:
> >>>>> On 08/20/2013 12:43 PM, Chen Gang wrote:
> >>>>>> On 08/20/2013 12:18 PM, Paul E. McKenney wrote:
> >>>>>>> On Tue, Aug 20, 2013 at 11:51:23AM +0800, Chen Gang wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> If 'hc' is false, 'al' will never be false, either (only need check
> >>>>>>>> "irdp->qlen != rdp->qlen_lazy' when 'rdp->nxtlist' existance).
> >>>>>>>>
> >>>>>>>> Recommend to improve the related code, like the diff below.
> >>>>>>>
> >>>>>>> Are you sure that this represents an improvement? If so, why?
> >>>>>>>
> >>>>>>
> >>>>>> If 'hc' and 'al' really has relationships, better to let 'C code'
> >>>>>> express it, that will make the code clearer.
> >>>>>>
> >>>>>>> Or to put it another way, I see a patch that increases the size of the
> >>>>>>> kernel by three lines. What is the corresponding benefit given common
> >>>>>>> kernel workloads?
> >>>>>>>
> >>>>>>
> >>>>>> For 'al', need not check for each looping, and for 'hc', may save the
> >>>>>> useless looping (so it can make performance better).
> >>>>>>
> >>>>>> For C code, it really increases 3 lines, but may not for assembly code
> >>>>>> (excuse me, I am not check it, I think it is not important, although it
> >>>>>> is easy to give a comparing for binary).
> >>>>>>
> >>>>>
> >>>>> Oh, sorry, I mean: only for our case, "it is not important".
> >>>>>
> >>>>>
> >>>>>>> Thanx, Paul
> >>>>>>>
> >>>>>>>> ----------------------------------diff begin------------------------------------
> >>>>>>>>
> >>>>>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> >>>>>>>> index 5b53a89..421caf0 100644
> >>>>>>>> --- a/kernel/rcutree.c
> >>>>>>>> +++ b/kernel/rcutree.c
> >>>>>>>> @@ -2719,10 +2719,13 @@ static int rcd'_cpu_has_callbacks(int cpu, bool *all_lazy)
> >>>>>>>>
> >>>>>>>> for_each_rcu_flavor(rsp) {
> >>>>>>>> rdp = per_cpu_ptr(rsp->rda, cpu);
> >>>>>>>> - if (rdp->qlen != rdp->qlen_lazy)
> >>>>>>>> - al = false;
> >>>>>>>> - if (rdp->nxtlist)
> >>>>>>>> + if (rdp->nxtlist) {
> >>>>>>>> hc = true;
> >>>>>>>> + if (rdp->qlen != rdp->qlen_lazy) {
> >>>>>>>> + al = false;
> >>>>>>>> + break;
> >>>>>>>> + }
> >>>>>>>> + }
> >>>>>>>> }
> >>>>>>>> if (all_lazy)
> >>>>>>>> *all_lazy = al;
> >>>>>>>>
> >>>>>>>> ----------------------------------diff end--------------------------------------
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 08/20/2013 11:50 AM, Chen Gang wrote:
> >>>>>>>>> According to the comment above rcu_cpu_has_callbacks(): "If there are
> >>>>>>>>> no callbacks, all of them are deemed to be lazy".
> >>>>>>>>>
> >>>>>>>>> So when both 'hc' and 'al' are false, '*all_lazy' should be true, not
> >>>>>>>>> false.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Chen Gang <[email protected]>
> >>>>>>>>> ---
> >>>>>>>>> kernel/rcutree.c | 2 +-
> >>>>>>>>> 1 files changed, 1 insertions(+), 1 deletions(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> >>>>>>>>> index 5b53a89..9ee9565 100644
> >>>>>>>>> --- a/kernel/rcutree.c
> >>>>>>>>> +++ b/kernel/rcutree.c
> >>>>>>>>> @@ -2725,7 +2725,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
> >>>>>>>>> hc = true;
> >>>>>>>>> }
> >>>>>>>>> if (all_lazy)
> >>>>>>>>> - *all_lazy = al;
> >>>>>>>>> + *all_lazy = !hc ? true : al;
> >>>>>>>>> return hc;
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Chen Gang
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Chen Gang
> >>>>
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Chen Gang
> >>
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
>
> --
> Chen Gang
>

2013-09-04 01:58:28

by Chen Gang

[permalink] [raw]
Subject: Re: [PATCH] kernel/rcutree.c: deem to be lazy if there are no callbacks.

On 09/04/2013 01:59 AM, Paul E. McKenney wrote:
> On Tue, Sep 03, 2013 at 01:41:03PM +0800, Chen Gang wrote:
>> Hello Maintainers:
>>
>> Is this issue finished ?
>>
>> If need additional help from me (e.g. some test things, or others, if
>> you have no time, can let me try), please let me know, I should try.
>
> Ah, sorry, here is the patch.
>

Thanks.

:-)

> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> rcu: Micro-optimize rcu_cpu_has_callbacks()
>
> The for_each_rcu_flavor() loop unconditionally scans all flavors, even
> when the first flavor might have some non-lazy callbacks. Once the
> loop has seen a non-lazy callback, further passes through the loop
> cannot change the state. This is not a huge problem, given that there
> can be at most three RCU flavors (RCU-bh, RCU-preempt, and RCU-sched),
> but this code is on the path to idle, so speeding it up even a small
> amount would have some benefit.
>
> This commit therefore does two things:
>
> 1. Rearranges the order of the list of RCU flavors in order to
> place the most active flavor first in the list. The most active
> RCU flavor is RCU-preempt, or, if there is no RCU-preempt,
> RCU-sched.
>
> 2. Reworks the for_each_rcu_flavor() to exit early when the first
> non-lazy callback is seen, or, in the case where the caller
> does not care about non-lazy callbacks (RCU_FAST_NO_HZ=n),
> when the first callback is seen.
>
> Reported-by: Chen Gang <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index b1b959d..38596be 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -2756,10 +2756,13 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
>
> for_each_rcu_flavor(rsp) {
> rdp = per_cpu_ptr(rsp->rda, cpu);
> - if (rdp->qlen != rdp->qlen_lazy)
> + if (!rdp->nxtlist)
> + continue;
> + hc = true;
> + if (rdp->qlen != rdp->qlen_lazy || !all_lazy) {
> al = false;
> - if (rdp->nxtlist)
> - hc = true;
> + break;
> + }
> }
> if (all_lazy)
> *all_lazy = al;
> @@ -3326,8 +3329,8 @@ void __init rcu_init(void)
>
> rcu_bootup_announce();
> rcu_init_geometry();
> - rcu_init_one(&rcu_sched_state, &rcu_sched_data);
> rcu_init_one(&rcu_bh_state, &rcu_bh_data);
> + rcu_init_one(&rcu_sched_state, &rcu_sched_data);
> __rcu_init_preempt();
> open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
>
>
>
>


--
Chen Gang

2013-09-04 02:42:18

by Chen Gang F T

[permalink] [raw]
Subject: Re: [PATCH] kernel/rcutree.c: deem to be lazy if there are no callbacks.

On 09/04/2013 03:36 AM, Paul E. McKenney wrote:
> On Mon, Aug 26, 2013 at 10:21:18AM +0800, Chen Gang F T wrote:
>>
>> Firstly, thank you for your reply with these details.
>>
>> On 08/26/2013 03:18 AM, Paul E. McKenney wrote:
>>> On Thu, Aug 22, 2013 at 11:01:53AM +0800, Chen Gang wrote:
>>>> On 08/21/2013 10:23 PM, Paul E. McKenney wrote:
>>>>> On Wed, Aug 21, 2013 at 01:59:29PM +0800, Chen Gang wrote:
>>>
>>> [ . . . ]
>>>
>>>>> Don't get me wrong, I do welcome appropriate patches. In fact, if
>>>>> you look at RCU's git history, you will see that I frequently accept
>>>>> patches from a fair number of people. And if you were willing to
>>>>> invest some time and thought, you might eventually be able to generate
>>>>> an appropriate (albeit low priority) patch to this function. However,
>>>>> you seem to be motivated to submit small patches with a minimum of
>>>>> thought and preparation, perhaps because you need to meet some external
>>>>> or self-imposed quota of accepted patches. And if you are in fact driven
>>>>> by a quota that prevents you from taking the time required to carefully
>>>>> think things through, you are wasting your time with RCU.
>>>>
>>>> Hmm... at least, some contents you said above is correct to me.
>>>>
>>>> At least, I should provide 10 patches per month, it is a necessary
>>>> basic requirement to me.
>>>
>>> OK, that does help explain the otherwise inexplicable approach you have
>>> been taking. Let's see how you have been doing, based on committer date
>>> in Linus's tree:
>>>
>>> 1 2012-11
>>> 15 2013-01
>>> 7 2013-02
>>> 20 2013-03
>>> 21 2013-04
>>> 12 2013-05
>>> 17 2013-06
>>> 10 2013-07
>>>
>>> The last few months might be understated a bit due to patches
>>> still being in maintainer trees. This is a nice contrast from my
>>> first impression of you from https://lkml.org/lkml/2013/6/9/64 and
>>> https://lkml.org/lkml/2013/8/19/650, neither of which gave me any
>>> reason to trust your work, to put it mildly. And if I cannot trust
>>> your work, I obviously cannot accept your patches.
>>>
>>
>> Hmm... better to check patches independent personal feelings (trust
>> some one, or not).
>>
>> ;-)
>
> Believe me, I judged based on your first two patches! Those were my
> first impression of you.
>

OK, I can understand.


>>> You do seem to select for localized bug fixes, which require less work
>>> than the performance-motivated patches you were putting forward earlier
>>> in this thread. With a localized bug, you demonstrate the bug, show the
>>> fix, and that is that. From what I can see, part of the problem with
>>> your patches in this email thread is that you are trying to move from
>>> localized bug fixes to performance issues without doing the additional
>>> work required. Please see below for a rough outline of this additional
>>> work.
>>>
>>
>> Hmm... it seems I need describe my work flow for fixing bugs in details.
>>
>> 1. Is it a bug ?
>> if so, I can be marked as Reported-by and continue to 2nd.
>> else, it is a waste mail.
>>
>> 2. Try to fix it in simple ways (so can save the maintainers time resource).
>> if it can be accepted by maintainers, it is OK (I can be Signed-off-by).
>> else need continue to 3rd.
>>
>> exception: if I can not find a simple way to fix it, I will send [Suggestion] mail.
>>
>> 3. Do the maintainers know how to fix it ?
>> if yes, fix it together with maintainers (may mark me only as Reported-by).
>> else need continue to Last.
>>
>> Last: I should analyze it and fix it (it is my duty to fix it).
>>
>>
>> How do you feel about this work flow ? welcome any suggestions or
>> completions.
>
> I am surprised that there are no testing or validation steps.
>

Hmm... "fix it together" in 3rd step, may include test.

And for the 'Last' (I should analyze it and fix it, if maintainers do
not know either or lack of maintainers), your original information is
very good reference.

Our mailing list is also developing mailing list (it is not only result
report mailing list, or integration mailing list), so I can develop and
test with anther maintainers in mailing list (not only myself).


Thanks.

> Especially if you ever want to progress to more complex fixes, your life
> will be easier if you do some testing where feasible. As might your
> maintainers' lives: Any bug your testing catches is one buggy patch that
> the maintainers do not need to look at.
>

Yeah. that the reason why I have planned to do something for LTP (Linux
Test Project) in 2nd half of 2013 (some of my original mails mentioned
it).

It will let "life will be more easier" especially for finding and solving more issues.

my internal things within my company is also can be improved by more familiar with LTP.

the precise time point for starting LTP: I have planned to start at 4th quarter of 2013 (2013-10-01).


"LTP + GCC + Reading code" will be my mainly ways for finding and
solving issues of kernel.


Thanks.

> In addition, as noted earlier in this thread, validation is important
> for performance improvements.
>

Yeah, it is necessary.

> Thanx, Paul
>
>> Thanks.
>>
>>>> And what my focus is efficiency: let appliers and maintainers together
>>>> to provide contributes to outside with efficiency.
>>>
>>> Sounds great, but there are many possible definitions of "efficiency".
>>> Given your quota, I would expect your definition to involve number of
>>> patches accepted. In contrast, my definition for RCU instead involves
>>> maintainability, robustness, scalability, and, for a few critical
>>> code paths, performance. I therefore need you to have thought through
>>> and carefully tested your patch.
>>>
>>
>> Hmm... it seems I need give more description for the 'efficiency' which
>> I point to.
>>
>> If it is no negative effect with the quality, we need try to use less
>> resources (e.g. time resources) to provide more contributions (e.g. fix
>> issue).
>>
>>
>>>> If you already know about it, why need I continue ? but if you don't
>>>> know either, I should try.
>>>
>>> What I need you to do in future RCU performance patch submissions is:
>>>
>>> 1. Think through your patch and the code that it is modifying.
>>> If you submit a patch to me, you should be able to answer the
>>> sorts of questions that I was asking in this thread.
>>>
>>> 2. Tell me what situations your patch helps and not.
>>>
>>> 3. Tell me how much your patch improves performance in the
>>> situations where it helps.
>>>
>>> 4. Test the code. If it makes a measurable difference, present
>>> the performance results. (It would be very surprising if your
>>> early-loop exit patch made a significant difference, expecially
>>> on a CONFIG_PREEMPT=n kernel.)
>>>
>>> 5. Rather than randomly dropping into the code, use actual measurements
>>> to determine where to focus your performance-improvement efforts.
>>> Developers, even experienced ones, are really bad at guessing
>>> where the most important performance problems are.
>>>
>>> 6. Use your judgement. For example, 1000-line patch to improve a
>>> slowpath by 0.1% simply isn't worth it. A high risk of adding
>>> bugs for a microscopic benefit? Thanks, but no thanks!!!
>>>
>>> For your patch https://lkml.org/lkml/2013/8/19/651, which was closest
>>> of the three to being useful, here are some things about RCU that you
>>> should have taken the time to learn -before- submitting the patch:
>>>
>>> a. Q: How many iterations for the for_each_rcu_flavor() loop?
>>> A: On CONFIG_PREEMPT=n kernels, only two iterations.
>>> On CONFIG_PREEMPT=y kernels, only three iterations.
>>>
>>> b. Q: Which flavor of RCU is most likely to have non-lazy callbacks
>>> queued?
>>>
>>> A: On CONFIG_PREEMPT=y kernels, the first one in the list.
>>> For CONFIG_PREEMPT=n kernels, it is last in the list.
>>> (In other words, for CONFIG_PREEMPT=n kernels, this change
>>> won't help at all, at least not without also changing the
>>> order of the list.)
>>>
>>> c. Q: Do any of the other for_each_rcu_flavor() loops care what order
>>> the flavors are in?
>>>
>>> A: No. (In other words, it is OK to reorder the list to improve
>>> the performance.)
>>>
>>> d. Q: What is the performance benefit of this change?
>>>
>>> A: Quite small, for example, much less than an atomic operation
>>> on a shared data item. It is probably not possible to
>>> measure the performance difference.
>>>
>>> e. Q: Is the change on a hotpath?
>>>
>>> A: Somewhat. It is not on the read side, but it is on the path
>>> to and from idle, which can be important for latency-sensitive
>>> workloads.
>>>
>>> f. Q: How did you test this patch?
>>>
>>> A: As far as I can see, you did no testing.
>>>
>>> If I receive a future patch from you that does not convince me that you
>>> know the answer to questions like these, I will most likely ignore it.
>>>
>>
>> Hmm... it sounds reasonable for some cases.
>>
>> e.g.
>>
>> when neither you nor me know about how to fix it.
>>
>> As a patch maker, I should continue trying to fix it.
>> (what you said above is valuable reference to me).
>>
>> As an integrator, you should give a necessary check for it.
>> (what you said above is the necessary check for it).
>>
>>
>> If the integrator already know about how to fix it, it seems what you
>> said above is not quite efficient.
>>
>>
>>> Just for practice, let's rework your second patch to make it something
>>> that I might accept. Here is what you had:
>>>
>>> for_each_rcu_flavor(rsp) {
>>> rdp = per_cpu_ptr(rsp->rda, cpu);
>>> - if (rdp->qlen != rdp->qlen_lazy)
>>> - al = false;
>>> - if (rdp->nxtlist)
>>> + if (rdp->nxtlist) {
>>> hc = true;
>>> + if (rdp->qlen != rdp->qlen_lazy) {
>>> + al = false;
>>> + break;
>>> + }
>>> + }
>>> }
>>> if (all_lazy)
>>> *all_lazy = al;
>>>
>>> We need to do something about the indentation, perhaps as follows:
>>>
>>> for_each_rcu_flavor(rsp) {
>>> rdp = per_cpu_ptr(rsp->rda, cpu);
>>> if (!rdp->nxtlist)
>>> continue;
>>> hc = true;
>>> if (rdp->qlen != rdp->qlen_lazy) {
>>> al = false;
>>> break;
>>> }
>>> }
>>> if (all_lazy)
>>> *all_lazy = al;
>>>
>>>
>>> We also need to change the following code in rcu_init() in the file
>>> kernel/rcutree.c:
>>>
>>> rcu_init_one(&rcu_sched_state, &rcu_sched_data);
>>> rcu_init_one(&rcu_bh_state, &rcu_bh_data);
>>> __rcu_init_preempt();
>>>
>>> So that it gets rcu_sched_state in the right place, which I believe is
>>> like this:
>>>
>>> rcu_init_one(&rcu_bh_state, &rcu_bh_data);
>>> rcu_init_one(&rcu_sched_state, &rcu_sched_data);
>>> __rcu_init_preempt();
>>>
>>>
>>
>> At least for me, it sounds reasonable. It seems you have already know
>> about how to fix it (you never directly say you know about it, so I use
>> 'seems').
>>
>>
>>> If you make these changes, test them with RCU_FAST_NO_HZ both set and
>>> not set, and verify that rcu_sched_state is first in the flavor list
>>> for kernels with PREEMPT=n and that rcu_preempt_state is first in flavor
>>> list for kernels with PREEMPT=y, and send me a the resulting patch by end
>>> of day Friday, China time, I will seriously consider it for acceptance.
>>> Otherwise, I will author the patch myself with your Reported-by.
>>>
>>
>> If you have already know about how to fix it, please fix it as soon as
>> possible when you have time (mark me as Reported-by is OK).
>>
>> If you need additional help from me for this issue, please let me know,
>> I should try.
>>
>>
>> :-)
>>
>>
>> Thanks.
>>
>>> Again, good luck!
>>>
>>> Thanx, Paul
>>>
>>>>> Good luck!
>>>>>
>>>>
>>>> Thanks.
>>>>
>>>>> Thanx, Paul
>>>>>
>>>>>> -------------------------------diff begin-------------------------------
>>>>>>
>>>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>>>>>> index dbf74b5..1d02659 100644
>>>>>> --- a/kernel/rcutree.c
>>>>>> +++ b/kernel/rcutree.c
>>>>>> @@ -2728,6 +2728,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
>>>>>> if (rdp->nxtlist)
>>>>>> hc = true;
>>>>>> }
>>>>>> + BUG_ON(!hc && !al);
>>>>>> if (all_lazy)
>>>>>> *all_lazy = al;
>>>>>> return hc;
>>>>>>
>>>>>> -------------------------------diff end---------------------------------
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>> On 08/20/2013 12:45 PM, Chen Gang wrote:
>>>>>>> On 08/20/2013 12:43 PM, Chen Gang wrote:
>>>>>>>> On 08/20/2013 12:18 PM, Paul E. McKenney wrote:
>>>>>>>>> On Tue, Aug 20, 2013 at 11:51:23AM +0800, Chen Gang wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If 'hc' is false, 'al' will never be false, either (only need check
>>>>>>>>>> "irdp->qlen != rdp->qlen_lazy' when 'rdp->nxtlist' existance).
>>>>>>>>>>
>>>>>>>>>> Recommend to improve the related code, like the diff below.
>>>>>>>>>
>>>>>>>>> Are you sure that this represents an improvement? If so, why?
>>>>>>>>>
>>>>>>>>
>>>>>>>> If 'hc' and 'al' really has relationships, better to let 'C code'
>>>>>>>> express it, that will make the code clearer.
>>>>>>>>
>>>>>>>>> Or to put it another way, I see a patch that increases the size of the
>>>>>>>>> kernel by three lines. What is the corresponding benefit given common
>>>>>>>>> kernel workloads?
>>>>>>>>>
>>>>>>>>
>>>>>>>> For 'al', need not check for each looping, and for 'hc', may save the
>>>>>>>> useless looping (so it can make performance better).
>>>>>>>>
>>>>>>>> For C code, it really increases 3 lines, but may not for assembly code
>>>>>>>> (excuse me, I am not check it, I think it is not important, although it
>>>>>>>> is easy to give a comparing for binary).
>>>>>>>>
>>>>>>>
>>>>>>> Oh, sorry, I mean: only for our case, "it is not important".
>>>>>>>
>>>>>>>
>>>>>>>>> Thanx, Paul
>>>>>>>>>
>>>>>>>>>> ----------------------------------diff begin------------------------------------
>>>>>>>>>>
>>>>>>>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>>>>>>>>>> index 5b53a89..421caf0 100644
>>>>>>>>>> --- a/kernel/rcutree.c
>>>>>>>>>> +++ b/kernel/rcutree.c
>>>>>>>>>> @@ -2719,10 +2719,13 @@ static int rcd'_cpu_has_callbacks(int cpu, bool *all_lazy)
>>>>>>>>>>
>>>>>>>>>> for_each_rcu_flavor(rsp) {
>>>>>>>>>> rdp = per_cpu_ptr(rsp->rda, cpu);
>>>>>>>>>> - if (rdp->qlen != rdp->qlen_lazy)
>>>>>>>>>> - al = false;
>>>>>>>>>> - if (rdp->nxtlist)
>>>>>>>>>> + if (rdp->nxtlist) {
>>>>>>>>>> hc = true;
>>>>>>>>>> + if (rdp->qlen != rdp->qlen_lazy) {
>>>>>>>>>> + al = false;
>>>>>>>>>> + break;
>>>>>>>>>> + }
>>>>>>>>>> + }
>>>>>>>>>> }
>>>>>>>>>> if (all_lazy)
>>>>>>>>>> *all_lazy = al;
>>>>>>>>>>
>>>>>>>>>> ----------------------------------diff end--------------------------------------
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 08/20/2013 11:50 AM, Chen Gang wrote:
>>>>>>>>>>> According to the comment above rcu_cpu_has_callbacks(): "If there are
>>>>>>>>>>> no callbacks, all of them are deemed to be lazy".
>>>>>>>>>>>
>>>>>>>>>>> So when both 'hc' and 'al' are false, '*all_lazy' should be true, not
>>>>>>>>>>> false.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Chen Gang <[email protected]>
>>>>>>>>>>> ---
>>>>>>>>>>> kernel/rcutree.c | 2 +-
>>>>>>>>>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>>>>>>>>>>> index 5b53a89..9ee9565 100644
>>>>>>>>>>> --- a/kernel/rcutree.c
>>>>>>>>>>> +++ b/kernel/rcutree.c
>>>>>>>>>>> @@ -2725,7 +2725,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy)
>>>>>>>>>>> hc = true;
>>>>>>>>>>> }
>>>>>>>>>>> if (all_lazy)
>>>>>>>>>>> - *all_lazy = al;
>>>>>>>>>>> + *all_lazy = !hc ? true : al;
>>>>>>>>>>> return hc;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Chen Gang
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Chen Gang
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Chen Gang
>>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at http://www.tux.org/lkml/
>>>
>>
>>
>> --
>> Chen Gang
>>
>


--
Chen Gang