2021-12-22 07:54:13

by Zqiang

[permalink] [raw]
Subject: [PATCH] rcu: record kasan stack before enter local_irq_save()/restore() critical area

The kasan_record_aux_stack_noalloc() only record stack, it doesn't need
to be called in local_irq_save()/restore() critical area, and the global
spinlock (depot_lock) will be acquired in this function, When enable
kasan stack, locking contention may increase the time in the critical area.

Signed-off-by: Zqiang <[email protected]>
---
kernel/rcu/tree.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 347dae1876a6..5198e44cb124 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3030,8 +3030,8 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
}
head->func = func;
head->next = NULL;
- local_irq_save(flags);
kasan_record_aux_stack_noalloc(head);
+ local_irq_save(flags);
rdp = this_cpu_ptr(&rcu_data);

/* Add the callback to our list. */
--
2.25.1



2021-12-23 15:08:49

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] rcu: record kasan stack before enter local_irq_save()/restore() critical area

On Wed, 22 Dec 2021 at 08:54, Zqiang <[email protected]> wrote:
> The kasan_record_aux_stack_noalloc() only record stack, it doesn't need
> to be called in local_irq_save()/restore() critical area, and the global
> spinlock (depot_lock) will be acquired in this function, When enable
> kasan stack, locking contention may increase the time in the critical area.

I think the change itself is harmless, because
kasan_record_aux_stack_noalloc() doesn't care if interrupts are
enabled or not when called, but the justification isn't clear to me.

What "locking contention" are you speaking about? You're moving a
local_irq_save() which disables interrupts. Yes, it might be nice to
reduce the time interrupts are disabled, but in this case the benefit
(if any) isn't clear at all, also because this only benefits
non-production KASAN kernels.

Can you provide better justification? Did you encounter a specific
problem, maybe together with data?

Thanks,
-- Marco

> Signed-off-by: Zqiang <[email protected]>
> ---
> kernel/rcu/tree.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 347dae1876a6..5198e44cb124 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3030,8 +3030,8 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
> }
> head->func = func;
> head->next = NULL;
> - local_irq_save(flags);
> kasan_record_aux_stack_noalloc(head);
> + local_irq_save(flags);
> rdp = this_cpu_ptr(&rcu_data);
>
> /* Add the callback to our list. */
> --
> 2.25.1
>

2021-12-24 03:23:02

by Zqiang

[permalink] [raw]
Subject: RE: [PATCH] rcu: record kasan stack before enter local_irq_save()/restore() critical area


On Wed, 22 Dec 2021 at 08:54, Zqiang <[email protected]> wrote:
> The kasan_record_aux_stack_noalloc() only record stack, it doesn't
> need to be called in local_irq_save()/restore() critical area, and the
> global spinlock (depot_lock) will be acquired in this function, When
> enable kasan stack, locking contention may increase the time in the critical area.
>
>I think the change itself is harmless, because
>kasan_record_aux_stack_noalloc() doesn't care if interrupts are enabled or not when called, but the justification isn't clear to me.
>
>What "locking contention" are you speaking about? You're moving a
>local_irq_save() which disables interrupts. Yes, it might be nice to reduce the time interrupts are disabled, but in this case the benefit (if any) isn't clear at all, also because this only benefits non-production KASAN kernels.
>
>Can you provide better justification? Did you encounter a specific problem, maybe together with data?
>

Thanks for reply, Yes, this only benefits non-production KASAN kernel. In KASAN kernel,
there may be a lot of call stack recorded, in addition to locking competition, the find_stack() will
also take a long time.


Thanks,
Zqiang

>Thanks,
>-- Marco

> Signed-off-by: Zqiang <[email protected]>
> ---
> kernel/rcu/tree.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index
> 347dae1876a6..5198e44cb124 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3030,8 +3030,8 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
> }
> head->func = func;
> head->next = NULL;
> - local_irq_save(flags);
> kasan_record_aux_stack_noalloc(head);
> + local_irq_save(flags);
> rdp = this_cpu_ptr(&rcu_data);
>
> /* Add the callback to our list. */
> --
> 2.25.1
>

2021-12-24 11:02:37

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] rcu: record kasan stack before enter local_irq_save()/restore() critical area

On Fri, 24 Dec 2021 at 04:23, Zhang, Qiang1 <[email protected]> wrote:
>
>
> On Wed, 22 Dec 2021 at 08:54, Zqiang <[email protected]> wrote:
> > The kasan_record_aux_stack_noalloc() only record stack, it doesn't
> > need to be called in local_irq_save()/restore() critical area, and the
> > global spinlock (depot_lock) will be acquired in this function, When
> > enable kasan stack, locking contention may increase the time in the critical area.
> >
> >I think the change itself is harmless, because
> >kasan_record_aux_stack_noalloc() doesn't care if interrupts are enabled or not when called, but the justification isn't clear to me.
> >
> >What "locking contention" are you speaking about? You're moving a
> >local_irq_save() which disables interrupts. Yes, it might be nice to reduce the time interrupts are disabled, but in this case the benefit (if any) isn't clear at all, also because this only benefits non-production KASAN kernels.
> >
> >Can you provide better justification? Did you encounter a specific problem, maybe together with data?
> >
>
> Thanks for reply, Yes, this only benefits non-production KASAN kernel. In KASAN kernel,
> there may be a lot of call stack recorded, in addition to locking competition, the find_stack() will
> also take a long time.

But there's no locking here, it's disabling interrupts. Yes, a lock is
taken inside kasan_record_aux_stack_noalloc(), but that's not one you
can do much about.

I don't mind this patch, but I think there might be some confusion. A
better explanation (in commit message or otherwise) would help make
sure we're not talking about different things.

2021-12-25 08:39:56

by Zqiang

[permalink] [raw]
Subject: RE: [PATCH] rcu: record kasan stack before enter local_irq_save()/restore() critical area

>
> On Wed, 22 Dec 2021 at 08:54, Zqiang <[email protected]> wrote:
> > The kasan_record_aux_stack_noalloc() only record stack, it doesn't
> > need to be called in local_irq_save()/restore() critical area, and
> > the global spinlock (depot_lock) will be acquired in this function,
> > When enable kasan stack, locking contention may increase the time in the critical area.
> >
> >I think the change itself is harmless, because
> >kasan_record_aux_stack_noalloc() doesn't care if interrupts are enabled or not when called, but the justification isn't clear to me.
> >
> >What "locking contention" are you speaking about? You're moving a
> >local_irq_save() which disables interrupts. Yes, it might be nice to reduce the time interrupts are disabled, but in this case the benefit (if any) isn't clear at all, also because this only benefits non-production KASAN kernels.
> >
> >Can you provide better justification? Did you encounter a specific problem, maybe together with data?
> >
>
> Thanks for reply, Yes, this only benefits non-production KASAN kernel.
> In KASAN kernel, there may be a lot of call stack recorded, in
> addition to locking competition, the find_stack() will also take a long time.
>
>But there's no locking here, it's disabling interrupts. Yes, a lock is taken inside kasan_record_aux_stack_noalloc(), but that's not one you can do much about.

>I don't mind this patch, but I think there might be some confusion. A better explanation (in commit message or otherwise) would help make sure we're not talking about different things.

Hi Marco, Are the following modifications clear to you?

Subject: [PATCH] rcu: Reduce the consumption time of
local_irq_save()/restore() critical area

In non-production KASAN kernel, a large number of call stacks are recorded,
it takes some time to acquire the global spinlock(depot_lock) inside
kasan_record_aux_stack_noalloc(), increased interrupts disable time,
kasan_record_aux_stack_noalloc() doesn't care if interrupts are enabled or
not when called, so move it outside the critical area.

Signed-off-by: Zqiang <[email protected]>
---
kernel/rcu/tree.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 9b58bae0527a..36bd3f9e57b3 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3068,8 +3068,8 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
}
head->func = func;
head->next = NULL;
- local_irq_save(flags);
kasan_record_aux_stack_noalloc(head);
+ local_irq_save(flags);
rdp = this_cpu_ptr(&rcu_data);

Thanks,
Zqiang

2021-12-25 09:36:17

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] rcu: record kasan stack before enter local_irq_save()/restore() critical area

On Sat, 25 Dec 2021 at 09:39, Zhang, Qiang1 <[email protected]> wrote:
[...]
> Hi Marco, Are the following modifications clear to you?

I understood now that the contention you're talking about is from
depot_lock, which wasn't clear before (I thought you intended to
reduce contention by shortening some other critical section).

> Subject: [PATCH] rcu: Reduce the consumption time of
> local_irq_save()/restore() critical area

Subject: rcu, kasan: Record work creation stack trace with interrupts enabled

> In non-production KASAN kernel, a large number of call stacks are recorded,
> it takes some time to acquire the global spinlock(depot_lock) inside
> kasan_record_aux_stack_noalloc(), increased interrupts disable time,
> kasan_record_aux_stack_noalloc() doesn't care if interrupts are enabled or
> not when called, so move it outside the critical area.

I think this might be clearer:

"Recording the work creation stack trace for KASAN reports in
call_rcu() is expensive, due to unwinding the stack, but also due to
acquiring depot_lock inside stackdepot (which may be contended).
Because calling kasan_record_aux_stack_noalloc() does not require
interrupts to already be disabled, this may unnecessarily extend the
time with interrupts disabled.

Therefore, move calling kasan_record_aux_stack() before the section
with interrupts disabled."

> Signed-off-by: Zqiang <[email protected]>

Acked-by: Marco Elver <[email protected]>

> ---
> kernel/rcu/tree.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 9b58bae0527a..36bd3f9e57b3 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3068,8 +3068,8 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
> }
> head->func = func;
> head->next = NULL;
> - local_irq_save(flags);
> kasan_record_aux_stack_noalloc(head);
> + local_irq_save(flags);
> rdp = this_cpu_ptr(&rcu_data);
>
> Thanks,
> Zqiang

2021-12-25 10:48:09

by Zqiang

[permalink] [raw]
Subject: RE: [PATCH] rcu: record kasan stack before enter local_irq_save()/restore() critical area


>> Hi Marco, Are the following modifications clear to you?

>>I understood now that the contention you're talking about is from depot_lock, which wasn't clear before (I thought you ?intended to reduce contention by shortening some other critical section).

Sorry, I didn't explain clearly before.

> Subject: [PATCH] rcu: Reduce the consumption time of
> local_irq_save()/restore() critical area

>>Subject: rcu, kasan: Record work creation stack trace with interrupts enabled

> In non-production KASAN kernel, a large number of call stacks are
> recorded, it takes some time to acquire the global
> spinlock(depot_lock) inside kasan_record_aux_stack_noalloc(),
> increased interrupts disable time,
> kasan_record_aux_stack_noalloc() doesn't care if interrupts are
> enabled or not when called, so move it outside the critical area.

>>I think this might be clearer:

>>"Recording the work creation stack trace for KASAN reports in
>>call_rcu() is expensive, due to unwinding the stack, but also due to acquiring depot_lock inside stackdepot (which may be contended).
>>Because calling kasan_record_aux_stack_noalloc() does not require interrupts to already be disabled, this may unnecessarily extend the time with interrupts disabled.
>>
>>, move calling kasan_record_aux_stack() before the section with interrupts disabled."


Thanks Marco, your description is clearer, I will resend it.


> Signed-off-by: Zqiang <[email protected]>

>>Acked-by: Marco Elver <[email protected]>

> ---
> kernel/rcu/tree.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index
> 9b58bae0527a..36bd3f9e57b3 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3068,8 +3068,8 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
> }
> head->func = func;
> head->next = NULL;
> - local_irq_save(flags);
> kasan_record_aux_stack_noalloc(head);
> + local_irq_save(flags);
> rdp = this_cpu_ptr(&rcu_data);
>
> Thanks,
> Zqiang