Since commit aa40c138cc8f3 ("rcu: Report QS for outermost
PREEMPT=n rcu_read_unlock() for strict GPs"). A real function call
rcu_read_unlock_strict() is added to the inlined rcu_read_unlock().
The rcu_read_unlock_strict() call is only needed if the performance
sagging CONFIG_RCU_STRICT_GRACE_PERIOD option is set. This config
option isn't set for most production kernels while the function call
overhead remains.
To provide a slight performance improvement, the
CONFIG_RCU_STRICT_GRACE_PERIOD config check is moved from
rcu_read_unlock_strict() to __rcu_read_unlock() so that the function
call can be compiled out in most cases.
Besides, the GPL exported rcu_read_unlock_strict() also impact the
the compilation of non-GPL kernel modules as rcu_read_unlock() is a
frequently used kernel API.
Signed-off-by: Waiman Long <[email protected]>
---
include/linux/rcupdate.h | 3 ++-
kernel/rcu/tree_plugin.h | 3 +--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index d9680b798b21..945594770d57 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -71,7 +71,8 @@ static inline void __rcu_read_lock(void)
static inline void __rcu_read_unlock(void)
{
preempt_enable();
- rcu_read_unlock_strict();
+ if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
+ rcu_read_unlock_strict();
}
static inline int rcu_preempt_depth(void)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index de1dc3bb7f70..7fa518bef15d 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -844,8 +844,7 @@ void rcu_read_unlock_strict(void)
{
struct rcu_data *rdp;
- if (!IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ||
- irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
+ if (irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
return;
rdp = this_cpu_ptr(&rcu_data);
rcu_report_qs_rdp(rdp);
--
2.18.1
On Thu, Aug 26, 2021 at 10:21:22PM -0400, Waiman Long wrote:
> Since commit aa40c138cc8f3 ("rcu: Report QS for outermost
> PREEMPT=n rcu_read_unlock() for strict GPs"). A real function call
> rcu_read_unlock_strict() is added to the inlined rcu_read_unlock().
> The rcu_read_unlock_strict() call is only needed if the performance
> sagging CONFIG_RCU_STRICT_GRACE_PERIOD option is set. This config
> option isn't set for most production kernels while the function call
> overhead remains.
>
> To provide a slight performance improvement, the
> CONFIG_RCU_STRICT_GRACE_PERIOD config check is moved from
> rcu_read_unlock_strict() to __rcu_read_unlock() so that the function
> call can be compiled out in most cases.
>
> Besides, the GPL exported rcu_read_unlock_strict() also impact the
> the compilation of non-GPL kernel modules as rcu_read_unlock() is a
> frequently used kernel API.
>
> Signed-off-by: Waiman Long <[email protected]>
Nice, and good eyes!!!
I have queued this for v5.16, that is, not the upcoming merge window
but the one after that.
I did my usual wordsmithing, so please check the following in case I
messed something up. I intentionally omitted the EXPORT_SYMBOL_GPL()
discussion because:
1. Kernels built with CONFIG_PREEMPT=y have the same issue
with the __rcu_read_lock() and __rcu_read_unlock() functions.
2. Many other RCU functions are EXPORT_SYMBOL_GPL() and have
been for almost two decades.
But if someone does use RCU readers within CONFIG_PREEMPT=n kernels from
a binary module, I will happily refer them to you for any RCU issues
that they encounter. ;-)
I am also CCing the BPF guys in case my interpretation of the code in
the BPF verifier is incorrect.
Thanx, Paul
------------------------------------------------------------------------
commit 4a9f53b997b809c0256838e31c604aeeded2345a
Author: Waiman Long <[email protected]>
Date: Thu Aug 26 22:21:22 2021 -0400
rcu: Avoid unneeded function call in rcu_read_unlock()
Since commit aa40c138cc8f3 ("rcu: Report QS for outermost PREEMPT=n
rcu_read_unlock() for strict GPs") the function rcu_read_unlock_strict()
is invoked by the inlined rcu_read_unlock() function. However,
rcu_read_unlock_strict() is an empty function in production kernels,
which are built with CONFIG_RCU_STRICT_GRACE_PERIOD=n.
There is a mention of rcu_read_unlock_strict() in the BPF verifier,
but this is in a deny-list, meaning that BPF does not care whether
rcu_read_unlock_strict() is ever called.
This commit therefore provides a slight performance improvement
by hoisting the check of CONFIG_RCU_STRICT_GRACE_PERIOD from
rcu_read_unlock_strict() into rcu_read_unlock(), thus avoiding the
pointless call to an empty function.
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 434d12fe2d4f..5e0beb5c5659 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -71,7 +71,8 @@ static inline void __rcu_read_lock(void)
static inline void __rcu_read_unlock(void)
{
preempt_enable();
- rcu_read_unlock_strict();
+ if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
+ rcu_read_unlock_strict();
}
static inline int rcu_preempt_depth(void)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 7a4876a3a882..0b55c647ab80 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -814,8 +814,7 @@ void rcu_read_unlock_strict(void)
{
struct rcu_data *rdp;
- if (!IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ||
- irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
+ if (irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
return;
rdp = this_cpu_ptr(&rcu_data);
rcu_report_qs_rdp(rdp);
On 8/27/21 2:34 PM, Paul E. McKenney wrote:
> On Thu, Aug 26, 2021 at 10:21:22PM -0400, Waiman Long wrote:
>> Since commit aa40c138cc8f3 ("rcu: Report QS for outermost
>> PREEMPT=n rcu_read_unlock() for strict GPs"). A real function call
>> rcu_read_unlock_strict() is added to the inlined rcu_read_unlock().
>> The rcu_read_unlock_strict() call is only needed if the performance
>> sagging CONFIG_RCU_STRICT_GRACE_PERIOD option is set. This config
>> option isn't set for most production kernels while the function call
>> overhead remains.
>>
>> To provide a slight performance improvement, the
>> CONFIG_RCU_STRICT_GRACE_PERIOD config check is moved from
>> rcu_read_unlock_strict() to __rcu_read_unlock() so that the function
>> call can be compiled out in most cases.
>>
>> Besides, the GPL exported rcu_read_unlock_strict() also impact the
>> the compilation of non-GPL kernel modules as rcu_read_unlock() is a
>> frequently used kernel API.
>>
>> Signed-off-by: Waiman Long <[email protected]>
> Nice, and good eyes!!!
>
> I have queued this for v5.16, that is, not the upcoming merge window
> but the one after that.
>
> I did my usual wordsmithing, so please check the following in case I
> messed something up. I intentionally omitted the EXPORT_SYMBOL_GPL()
> discussion because:
>
> 1. Kernels built with CONFIG_PREEMPT=y have the same issue
> with the __rcu_read_lock() and __rcu_read_unlock() functions.
>
> 2. Many other RCU functions are EXPORT_SYMBOL_GPL() and have
> been for almost two decades.
>
> But if someone does use RCU readers within CONFIG_PREEMPT=n kernels from
> a binary module, I will happily refer them to you for any RCU issues
> that they encounter. ;-)
>
> I am also CCing the BPF guys in case my interpretation of the code in
> the BPF verifier is incorrect.
>
> Thanx, Paul
>
It looks good to me. Thanks for the rewording. I did regret mentioning
about about the GPL export symbol in the commit log and it is good that
you had taken it out.
Cheers,
Longman
On Fri, Aug 27, 2021 at 11:34 AM Paul E. McKenney <[email protected]> wrote:
>
> On Thu, Aug 26, 2021 at 10:21:22PM -0400, Waiman Long wrote:
> > Since commit aa40c138cc8f3 ("rcu: Report QS for outermost
> > PREEMPT=n rcu_read_unlock() for strict GPs"). A real function call
> > rcu_read_unlock_strict() is added to the inlined rcu_read_unlock().
> > The rcu_read_unlock_strict() call is only needed if the performance
> > sagging CONFIG_RCU_STRICT_GRACE_PERIOD option is set. This config
> > option isn't set for most production kernels while the function call
> > overhead remains.
> >
> > To provide a slight performance improvement, the
> > CONFIG_RCU_STRICT_GRACE_PERIOD config check is moved from
> > rcu_read_unlock_strict() to __rcu_read_unlock() so that the function
> > call can be compiled out in most cases.
> >
> > Besides, the GPL exported rcu_read_unlock_strict() also impact the
> > the compilation of non-GPL kernel modules as rcu_read_unlock() is a
> > frequently used kernel API.
> >
> > Signed-off-by: Waiman Long <[email protected]>
>
> Nice, and good eyes!!!
>
> I have queued this for v5.16, that is, not the upcoming merge window
> but the one after that.
>
> I did my usual wordsmithing, so please check the following in case I
> messed something up. I intentionally omitted the EXPORT_SYMBOL_GPL()
> discussion because:
>
> 1. Kernels built with CONFIG_PREEMPT=y have the same issue
> with the __rcu_read_lock() and __rcu_read_unlock() functions.
>
> 2. Many other RCU functions are EXPORT_SYMBOL_GPL() and have
> been for almost two decades.
>
> But if someone does use RCU readers within CONFIG_PREEMPT=n kernels from
> a binary module, I will happily refer them to you for any RCU issues
> that they encounter. ;-)
>
> I am also CCing the BPF guys in case my interpretation of the code in
> the BPF verifier is incorrect.
>
LGTM from the BPF side, nothing really changed about when
rcu_read_unlock_strict is an actual function vs no-op macro. It's also
important to minimize the number of function calls in the context of
recent LBR on-demand work done by Song, so this is a great
improvement!
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit 4a9f53b997b809c0256838e31c604aeeded2345a
> Author: Waiman Long <[email protected]>
> Date: Thu Aug 26 22:21:22 2021 -0400
>
> rcu: Avoid unneeded function call in rcu_read_unlock()
>
> Since commit aa40c138cc8f3 ("rcu: Report QS for outermost PREEMPT=n
> rcu_read_unlock() for strict GPs") the function rcu_read_unlock_strict()
> is invoked by the inlined rcu_read_unlock() function. However,
> rcu_read_unlock_strict() is an empty function in production kernels,
> which are built with CONFIG_RCU_STRICT_GRACE_PERIOD=n.
>
> There is a mention of rcu_read_unlock_strict() in the BPF verifier,
> but this is in a deny-list, meaning that BPF does not care whether
> rcu_read_unlock_strict() is ever called.
>
> This commit therefore provides a slight performance improvement
> by hoisting the check of CONFIG_RCU_STRICT_GRACE_PERIOD from
> rcu_read_unlock_strict() into rcu_read_unlock(), thus avoiding the
> pointless call to an empty function.
>
> Cc: Alexei Starovoitov <[email protected]>
> Cc: Andrii Nakryiko <[email protected]>
> Signed-off-by: Waiman Long <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 434d12fe2d4f..5e0beb5c5659 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -71,7 +71,8 @@ static inline void __rcu_read_lock(void)
> static inline void __rcu_read_unlock(void)
> {
> preempt_enable();
> - rcu_read_unlock_strict();
> + if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
> + rcu_read_unlock_strict();
> }
>
> static inline int rcu_preempt_depth(void)
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 7a4876a3a882..0b55c647ab80 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -814,8 +814,7 @@ void rcu_read_unlock_strict(void)
> {
> struct rcu_data *rdp;
>
> - if (!IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ||
> - irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
> + if (irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
> return;
> rdp = this_cpu_ptr(&rcu_data);
> rcu_report_qs_rdp(rdp);
On Mon, Aug 30, 2021 at 11:36:51AM -0700, Andrii Nakryiko wrote:
> On Fri, Aug 27, 2021 at 11:34 AM Paul E. McKenney <[email protected]> wrote:
> >
> > On Thu, Aug 26, 2021 at 10:21:22PM -0400, Waiman Long wrote:
> > > Since commit aa40c138cc8f3 ("rcu: Report QS for outermost
> > > PREEMPT=n rcu_read_unlock() for strict GPs"). A real function call
> > > rcu_read_unlock_strict() is added to the inlined rcu_read_unlock().
> > > The rcu_read_unlock_strict() call is only needed if the performance
> > > sagging CONFIG_RCU_STRICT_GRACE_PERIOD option is set. This config
> > > option isn't set for most production kernels while the function call
> > > overhead remains.
> > >
> > > To provide a slight performance improvement, the
> > > CONFIG_RCU_STRICT_GRACE_PERIOD config check is moved from
> > > rcu_read_unlock_strict() to __rcu_read_unlock() so that the function
> > > call can be compiled out in most cases.
> > >
> > > Besides, the GPL exported rcu_read_unlock_strict() also impact the
> > > the compilation of non-GPL kernel modules as rcu_read_unlock() is a
> > > frequently used kernel API.
> > >
> > > Signed-off-by: Waiman Long <[email protected]>
> >
> > Nice, and good eyes!!!
> >
> > I have queued this for v5.16, that is, not the upcoming merge window
> > but the one after that.
> >
> > I did my usual wordsmithing, so please check the following in case I
> > messed something up. I intentionally omitted the EXPORT_SYMBOL_GPL()
> > discussion because:
> >
> > 1. Kernels built with CONFIG_PREEMPT=y have the same issue
> > with the __rcu_read_lock() and __rcu_read_unlock() functions.
> >
> > 2. Many other RCU functions are EXPORT_SYMBOL_GPL() and have
> > been for almost two decades.
> >
> > But if someone does use RCU readers within CONFIG_PREEMPT=n kernels from
> > a binary module, I will happily refer them to you for any RCU issues
> > that they encounter. ;-)
> >
> > I am also CCing the BPF guys in case my interpretation of the code in
> > the BPF verifier is incorrect.
> >
>
> LGTM from the BPF side, nothing really changed about when
> rcu_read_unlock_strict is an actual function vs no-op macro. It's also
> important to minimize the number of function calls in the context of
> recent LBR on-demand work done by Song, so this is a great
> improvement!
Thank you for looking this over! May I add your Acked-by or similar?
Thanx, Paul
> > ------------------------------------------------------------------------
> >
> > commit 4a9f53b997b809c0256838e31c604aeeded2345a
> > Author: Waiman Long <[email protected]>
> > Date: Thu Aug 26 22:21:22 2021 -0400
> >
> > rcu: Avoid unneeded function call in rcu_read_unlock()
> >
> > Since commit aa40c138cc8f3 ("rcu: Report QS for outermost PREEMPT=n
> > rcu_read_unlock() for strict GPs") the function rcu_read_unlock_strict()
> > is invoked by the inlined rcu_read_unlock() function. However,
> > rcu_read_unlock_strict() is an empty function in production kernels,
> > which are built with CONFIG_RCU_STRICT_GRACE_PERIOD=n.
> >
> > There is a mention of rcu_read_unlock_strict() in the BPF verifier,
> > but this is in a deny-list, meaning that BPF does not care whether
> > rcu_read_unlock_strict() is ever called.
> >
> > This commit therefore provides a slight performance improvement
> > by hoisting the check of CONFIG_RCU_STRICT_GRACE_PERIOD from
> > rcu_read_unlock_strict() into rcu_read_unlock(), thus avoiding the
> > pointless call to an empty function.
> >
> > Cc: Alexei Starovoitov <[email protected]>
> > Cc: Andrii Nakryiko <[email protected]>
> > Signed-off-by: Waiman Long <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
> >
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 434d12fe2d4f..5e0beb5c5659 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -71,7 +71,8 @@ static inline void __rcu_read_lock(void)
> > static inline void __rcu_read_unlock(void)
> > {
> > preempt_enable();
> > - rcu_read_unlock_strict();
> > + if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
> > + rcu_read_unlock_strict();
> > }
> >
> > static inline int rcu_preempt_depth(void)
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 7a4876a3a882..0b55c647ab80 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -814,8 +814,7 @@ void rcu_read_unlock_strict(void)
> > {
> > struct rcu_data *rdp;
> >
> > - if (!IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ||
> > - irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
> > + if (irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
> > return;
> > rdp = this_cpu_ptr(&rcu_data);
> > rcu_report_qs_rdp(rdp);
On Mon, Aug 30, 2021 at 11:46 AM Paul E. McKenney <[email protected]> wrote:
>
> On Mon, Aug 30, 2021 at 11:36:51AM -0700, Andrii Nakryiko wrote:
> > On Fri, Aug 27, 2021 at 11:34 AM Paul E. McKenney <[email protected]> wrote:
> > >
> > > On Thu, Aug 26, 2021 at 10:21:22PM -0400, Waiman Long wrote:
> > > > Since commit aa40c138cc8f3 ("rcu: Report QS for outermost
> > > > PREEMPT=n rcu_read_unlock() for strict GPs"). A real function call
> > > > rcu_read_unlock_strict() is added to the inlined rcu_read_unlock().
> > > > The rcu_read_unlock_strict() call is only needed if the performance
> > > > sagging CONFIG_RCU_STRICT_GRACE_PERIOD option is set. This config
> > > > option isn't set for most production kernels while the function call
> > > > overhead remains.
> > > >
> > > > To provide a slight performance improvement, the
> > > > CONFIG_RCU_STRICT_GRACE_PERIOD config check is moved from
> > > > rcu_read_unlock_strict() to __rcu_read_unlock() so that the function
> > > > call can be compiled out in most cases.
> > > >
> > > > Besides, the GPL exported rcu_read_unlock_strict() also impact the
> > > > the compilation of non-GPL kernel modules as rcu_read_unlock() is a
> > > > frequently used kernel API.
> > > >
> > > > Signed-off-by: Waiman Long <[email protected]>
> > >
> > > Nice, and good eyes!!!
> > >
> > > I have queued this for v5.16, that is, not the upcoming merge window
> > > but the one after that.
> > >
> > > I did my usual wordsmithing, so please check the following in case I
> > > messed something up. I intentionally omitted the EXPORT_SYMBOL_GPL()
> > > discussion because:
> > >
> > > 1. Kernels built with CONFIG_PREEMPT=y have the same issue
> > > with the __rcu_read_lock() and __rcu_read_unlock() functions.
> > >
> > > 2. Many other RCU functions are EXPORT_SYMBOL_GPL() and have
> > > been for almost two decades.
> > >
> > > But if someone does use RCU readers within CONFIG_PREEMPT=n kernels from
> > > a binary module, I will happily refer them to you for any RCU issues
> > > that they encounter. ;-)
> > >
> > > I am also CCing the BPF guys in case my interpretation of the code in
> > > the BPF verifier is incorrect.
> > >
> >
> > LGTM from the BPF side, nothing really changed about when
> > rcu_read_unlock_strict is an actual function vs no-op macro. It's also
> > important to minimize the number of function calls in the context of
> > recent LBR on-demand work done by Song, so this is a great
> > improvement!
>
> Thank you for looking this over! May I add your Acked-by or similar?
>
Sure.
Acked-by: Andrii Nakryiko <[email protected]>
> Thanx, Paul
>
> > > ------------------------------------------------------------------------
> > >
> > > commit 4a9f53b997b809c0256838e31c604aeeded2345a
> > > Author: Waiman Long <[email protected]>
> > > Date: Thu Aug 26 22:21:22 2021 -0400
> > >
> > > rcu: Avoid unneeded function call in rcu_read_unlock()
> > >
> > > Since commit aa40c138cc8f3 ("rcu: Report QS for outermost PREEMPT=n
> > > rcu_read_unlock() for strict GPs") the function rcu_read_unlock_strict()
> > > is invoked by the inlined rcu_read_unlock() function. However,
> > > rcu_read_unlock_strict() is an empty function in production kernels,
> > > which are built with CONFIG_RCU_STRICT_GRACE_PERIOD=n.
> > >
> > > There is a mention of rcu_read_unlock_strict() in the BPF verifier,
> > > but this is in a deny-list, meaning that BPF does not care whether
> > > rcu_read_unlock_strict() is ever called.
> > >
> > > This commit therefore provides a slight performance improvement
> > > by hoisting the check of CONFIG_RCU_STRICT_GRACE_PERIOD from
> > > rcu_read_unlock_strict() into rcu_read_unlock(), thus avoiding the
> > > pointless call to an empty function.
> > >
> > > Cc: Alexei Starovoitov <[email protected]>
> > > Cc: Andrii Nakryiko <[email protected]>
> > > Signed-off-by: Waiman Long <[email protected]>
> > > Signed-off-by: Paul E. McKenney <[email protected]>
> > >
> > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > index 434d12fe2d4f..5e0beb5c5659 100644
> > > --- a/include/linux/rcupdate.h
> > > +++ b/include/linux/rcupdate.h
> > > @@ -71,7 +71,8 @@ static inline void __rcu_read_lock(void)
> > > static inline void __rcu_read_unlock(void)
> > > {
> > > preempt_enable();
> > > - rcu_read_unlock_strict();
> > > + if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
> > > + rcu_read_unlock_strict();
> > > }
> > >
> > > static inline int rcu_preempt_depth(void)
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index 7a4876a3a882..0b55c647ab80 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -814,8 +814,7 @@ void rcu_read_unlock_strict(void)
> > > {
> > > struct rcu_data *rdp;
> > >
> > > - if (!IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ||
> > > - irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
> > > + if (irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
> > > return;
> > > rdp = this_cpu_ptr(&rcu_data);
> > > rcu_report_qs_rdp(rdp);
On Mon, Aug 30, 2021 at 12:34:04PM -0700, Andrii Nakryiko wrote:
> On Mon, Aug 30, 2021 at 11:46 AM Paul E. McKenney <[email protected]> wrote:
> >
> > On Mon, Aug 30, 2021 at 11:36:51AM -0700, Andrii Nakryiko wrote:
> > > On Fri, Aug 27, 2021 at 11:34 AM Paul E. McKenney <[email protected]> wrote:
> > > >
> > > > On Thu, Aug 26, 2021 at 10:21:22PM -0400, Waiman Long wrote:
> > > > > Since commit aa40c138cc8f3 ("rcu: Report QS for outermost
> > > > > PREEMPT=n rcu_read_unlock() for strict GPs"). A real function call
> > > > > rcu_read_unlock_strict() is added to the inlined rcu_read_unlock().
> > > > > The rcu_read_unlock_strict() call is only needed if the performance
> > > > > sagging CONFIG_RCU_STRICT_GRACE_PERIOD option is set. This config
> > > > > option isn't set for most production kernels while the function call
> > > > > overhead remains.
> > > > >
> > > > > To provide a slight performance improvement, the
> > > > > CONFIG_RCU_STRICT_GRACE_PERIOD config check is moved from
> > > > > rcu_read_unlock_strict() to __rcu_read_unlock() so that the function
> > > > > call can be compiled out in most cases.
> > > > >
> > > > > Besides, the GPL exported rcu_read_unlock_strict() also impact the
> > > > > the compilation of non-GPL kernel modules as rcu_read_unlock() is a
> > > > > frequently used kernel API.
> > > > >
> > > > > Signed-off-by: Waiman Long <[email protected]>
> > > >
> > > > Nice, and good eyes!!!
> > > >
> > > > I have queued this for v5.16, that is, not the upcoming merge window
> > > > but the one after that.
> > > >
> > > > I did my usual wordsmithing, so please check the following in case I
> > > > messed something up. I intentionally omitted the EXPORT_SYMBOL_GPL()
> > > > discussion because:
> > > >
> > > > 1. Kernels built with CONFIG_PREEMPT=y have the same issue
> > > > with the __rcu_read_lock() and __rcu_read_unlock() functions.
> > > >
> > > > 2. Many other RCU functions are EXPORT_SYMBOL_GPL() and have
> > > > been for almost two decades.
> > > >
> > > > But if someone does use RCU readers within CONFIG_PREEMPT=n kernels from
> > > > a binary module, I will happily refer them to you for any RCU issues
> > > > that they encounter. ;-)
> > > >
> > > > I am also CCing the BPF guys in case my interpretation of the code in
> > > > the BPF verifier is incorrect.
> > > >
> > >
> > > LGTM from the BPF side, nothing really changed about when
> > > rcu_read_unlock_strict is an actual function vs no-op macro. It's also
> > > important to minimize the number of function calls in the context of
> > > recent LBR on-demand work done by Song, so this is a great
> > > improvement!
> >
> > Thank you for looking this over! May I add your Acked-by or similar?
> >
>
> Sure.
>
> Acked-by: Andrii Nakryiko <[email protected]>
Thank you! I will add this on the next rebase.
Thanx, Paul
> > > > ------------------------------------------------------------------------
> > > >
> > > > commit 4a9f53b997b809c0256838e31c604aeeded2345a
> > > > Author: Waiman Long <[email protected]>
> > > > Date: Thu Aug 26 22:21:22 2021 -0400
> > > >
> > > > rcu: Avoid unneeded function call in rcu_read_unlock()
> > > >
> > > > Since commit aa40c138cc8f3 ("rcu: Report QS for outermost PREEMPT=n
> > > > rcu_read_unlock() for strict GPs") the function rcu_read_unlock_strict()
> > > > is invoked by the inlined rcu_read_unlock() function. However,
> > > > rcu_read_unlock_strict() is an empty function in production kernels,
> > > > which are built with CONFIG_RCU_STRICT_GRACE_PERIOD=n.
> > > >
> > > > There is a mention of rcu_read_unlock_strict() in the BPF verifier,
> > > > but this is in a deny-list, meaning that BPF does not care whether
> > > > rcu_read_unlock_strict() is ever called.
> > > >
> > > > This commit therefore provides a slight performance improvement
> > > > by hoisting the check of CONFIG_RCU_STRICT_GRACE_PERIOD from
> > > > rcu_read_unlock_strict() into rcu_read_unlock(), thus avoiding the
> > > > pointless call to an empty function.
> > > >
> > > > Cc: Alexei Starovoitov <[email protected]>
> > > > Cc: Andrii Nakryiko <[email protected]>
> > > > Signed-off-by: Waiman Long <[email protected]>
> > > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > >
> > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > > index 434d12fe2d4f..5e0beb5c5659 100644
> > > > --- a/include/linux/rcupdate.h
> > > > +++ b/include/linux/rcupdate.h
> > > > @@ -71,7 +71,8 @@ static inline void __rcu_read_lock(void)
> > > > static inline void __rcu_read_unlock(void)
> > > > {
> > > > preempt_enable();
> > > > - rcu_read_unlock_strict();
> > > > + if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
> > > > + rcu_read_unlock_strict();
> > > > }
> > > >
> > > > static inline int rcu_preempt_depth(void)
> > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > index 7a4876a3a882..0b55c647ab80 100644
> > > > --- a/kernel/rcu/tree_plugin.h
> > > > +++ b/kernel/rcu/tree_plugin.h
> > > > @@ -814,8 +814,7 @@ void rcu_read_unlock_strict(void)
> > > > {
> > > > struct rcu_data *rdp;
> > > >
> > > > - if (!IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ||
> > > > - irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
> > > > + if (irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
> > > > return;
> > > > rdp = this_cpu_ptr(&rcu_data);
> > > > rcu_report_qs_rdp(rdp);