2023-01-30 07:28:25

by Changbin Du

[permalink] [raw]
Subject: [PATCH v3] riscv: patch: Fixup lockdep warning in stop_machine

From: Changbin Du <[email protected]>

The task of ftrace_arch_code_modify(_post)_prepare() caller is
stop_machine, whose caller and work thread are of different tasks. The
lockdep checker needs the same task context, or it's wrong. That means
it's a bug here to use lockdep_assert_held because we don't guarantee
the same task context.

kernel/locking/lockdep.c:
int __lock_is_held(const struct lockdep_map *lock, int read)
{
struct task_struct *curr = current;
int i;

for (i = 0; i < curr->lockdep_depth; i++) {
^^^^^^^^^^^^^^^^^^^
struct held_lock *hlock = curr->held_locks + i;
^^^^^^^^^^^^^^^^
if (match_held_lock(hlock, lock)) {
if (read == -1 || !!hlock->read == read)
return LOCK_STATE_HELD;

The __lock_is_held depends on current held_locks records; if
stop_machine makes the checker runing on another task, that's wrong.

Here is the log:
[ 15.761523] ------------[ cut here ]------------
[ 15.762125] WARNING: CPU: 0 PID: 15 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x72/0x364
[ 15.763258] Modules linked in:
[ 15.764154] CPU: 0 PID: 15 Comm: migration/0 Not tainted 6.1.0-rc1-00014-g66924be85884-dirty #377
[ 15.765339] Hardware name: riscv-virtio,qemu (DT)
[ 15.765985] Stopper: multi_cpu_stop+0x0/0x192 <- stop_cpus.constprop.0+0x90/0xe2
[ 15.766711] epc : patch_insn_write+0x72/0x364
[ 15.767011] ra : patch_insn_write+0x70/0x364
[ 15.767276] epc : ffffffff8000721e ra : ffffffff8000721c sp : ff2000000067bca0
[ 15.767622] gp : ffffffff81603f90 tp : ff60000002432a00 t0 : 7300000000000000
[ 15.767919] t1 : 0000000000000000 t2 : 73695f6b636f6c5f s0 : ff2000000067bcf0
[ 15.768238] s1 : 0000000000000008 a0 : 0000000000000000 a1 : 0000000000000000
[ 15.768537] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
[ 15.768837] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000
[ 15.769139] s2 : ffffffff80009faa s3 : ff2000000067bd10 s4 : ffffffffffffffff
[ 15.769447] s5 : 0000000000000001 s6 : 0000000000000001 s7 : 0000000000000003
[ 15.769740] s8 : 0000000000000002 s9 : 0000000000000004 s10: 0000000000000003
[ 15.770027] s11: 0000000000000002 t3 : 0000000000000000 t4 : ffffffff819af097
[ 15.770323] t5 : ffffffff819af098 t6 : ff2000000067ba28
[ 15.770574] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
[ 15.771102] [<ffffffff80007520>] patch_text_nosync+0x10/0x3a
[ 15.771421] [<ffffffff80009c66>] ftrace_update_ftrace_func+0x74/0x10a
[ 15.771704] [<ffffffff800fa17e>] ftrace_modify_all_code+0xb0/0x16c
[ 15.771958] [<ffffffff800fa24c>] __ftrace_modify_code+0x12/0x1c
[ 15.772196] [<ffffffff800e110e>] multi_cpu_stop+0x14a/0x192
[ 15.772454] [<ffffffff800e0a34>] cpu_stopper_thread+0x96/0x14c
[ 15.772699] [<ffffffff8003f4ea>] smpboot_thread_fn+0xf8/0x1cc
[ 15.772945] [<ffffffff8003ac9c>] kthread+0xe2/0xf8
[ 15.773160] [<ffffffff80003e98>] ret_from_exception+0x0/0x14
[ 15.773471] ---[ end trace 0000000000000000 ]---

By the way, this also fixes the same issue for patch_text().

Fixes: 0ff7c3b33127 ("riscv: Use text_mutex instead of patch_lock")
Co-developed-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
Cc: Zong Li <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Signed-off-by: Changbin Du <[email protected]>
---
Changes in v3:
- denote this also fixes function patch_text().

Changes in v2:
- Rewrite commit log with lockdep explanation [Guo Ren]
- Rebase on v6.1 [Guo Ren]

v1:
https://lore.kernel.org/linux-riscv/[email protected]/
---
arch/riscv/kernel/patch.c | 7 -------
1 file changed, 7 deletions(-)

diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
index 765004b60513..8619706f8dfd 100644
--- a/arch/riscv/kernel/patch.c
+++ b/arch/riscv/kernel/patch.c
@@ -55,13 +55,6 @@ static int patch_insn_write(void *addr, const void *insn, size_t len)
bool across_pages = (((uintptr_t) addr & ~PAGE_MASK) + len) > PAGE_SIZE;
int ret;

- /*
- * Before reaching here, it was expected to lock the text_mutex
- * already, so we don't need to give another lock here and could
- * ensure that it was safe between each cores.
- */
- lockdep_assert_held(&text_mutex);
-
if (across_pages)
patch_map(addr + len, FIX_TEXT_POKE1);

--
2.25.1



2023-01-30 15:10:23

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v3] riscv: patch: Fixup lockdep warning in stop_machine

Hey Changbin,

On Tue, Jan 31, 2023 at 07:26:59AM +0800, Changbin Du wrote:
> From: Changbin Du <[email protected]>
>
> The task of ftrace_arch_code_modify(_post)_prepare() caller is
> stop_machine, whose caller and work thread are of different tasks. The
> lockdep checker needs the same task context, or it's wrong. That means
> it's a bug here to use lockdep_assert_held because we don't guarantee
> the same task context.
>
> kernel/locking/lockdep.c:
> int __lock_is_held(const struct lockdep_map *lock, int read)
> {
> struct task_struct *curr = current;
> int i;
>
> for (i = 0; i < curr->lockdep_depth; i++) {
> ^^^^^^^^^^^^^^^^^^^
> struct held_lock *hlock = curr->held_locks + i;
> ^^^^^^^^^^^^^^^^
> if (match_held_lock(hlock, lock)) {
> if (read == -1 || !!hlock->read == read)
> return LOCK_STATE_HELD;
>
> The __lock_is_held depends on current held_locks records; if
> stop_machine makes the checker runing on another task, that's wrong.
>
> Here is the log:
> [ 15.761523] ------------[ cut here ]------------
> [ 15.762125] WARNING: CPU: 0 PID: 15 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x72/0x364
> [ 15.763258] Modules linked in:
> [ 15.764154] CPU: 0 PID: 15 Comm: migration/0 Not tainted 6.1.0-rc1-00014-g66924be85884-dirty #377
> [ 15.765339] Hardware name: riscv-virtio,qemu (DT)
> [ 15.765985] Stopper: multi_cpu_stop+0x0/0x192 <- stop_cpus.constprop.0+0x90/0xe2
> [ 15.766711] epc : patch_insn_write+0x72/0x364
> [ 15.767011] ra : patch_insn_write+0x70/0x364
> [ 15.767276] epc : ffffffff8000721e ra : ffffffff8000721c sp : ff2000000067bca0
> [ 15.767622] gp : ffffffff81603f90 tp : ff60000002432a00 t0 : 7300000000000000
> [ 15.767919] t1 : 0000000000000000 t2 : 73695f6b636f6c5f s0 : ff2000000067bcf0
> [ 15.768238] s1 : 0000000000000008 a0 : 0000000000000000 a1 : 0000000000000000
> [ 15.768537] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
> [ 15.768837] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000
> [ 15.769139] s2 : ffffffff80009faa s3 : ff2000000067bd10 s4 : ffffffffffffffff
> [ 15.769447] s5 : 0000000000000001 s6 : 0000000000000001 s7 : 0000000000000003
> [ 15.769740] s8 : 0000000000000002 s9 : 0000000000000004 s10: 0000000000000003
> [ 15.770027] s11: 0000000000000002 t3 : 0000000000000000 t4 : ffffffff819af097
> [ 15.770323] t5 : ffffffff819af098 t6 : ff2000000067ba28
> [ 15.770574] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
> [ 15.771102] [<ffffffff80007520>] patch_text_nosync+0x10/0x3a
> [ 15.771421] [<ffffffff80009c66>] ftrace_update_ftrace_func+0x74/0x10a
> [ 15.771704] [<ffffffff800fa17e>] ftrace_modify_all_code+0xb0/0x16c
> [ 15.771958] [<ffffffff800fa24c>] __ftrace_modify_code+0x12/0x1c
> [ 15.772196] [<ffffffff800e110e>] multi_cpu_stop+0x14a/0x192
> [ 15.772454] [<ffffffff800e0a34>] cpu_stopper_thread+0x96/0x14c
> [ 15.772699] [<ffffffff8003f4ea>] smpboot_thread_fn+0xf8/0x1cc
> [ 15.772945] [<ffffffff8003ac9c>] kthread+0xe2/0xf8
> [ 15.773160] [<ffffffff80003e98>] ret_from_exception+0x0/0x14
> [ 15.773471] ---[ end trace 0000000000000000 ]---

FWIW, you can always crop the [15.192321] stuff out of commit messages,
as it just adds noise.

> By the way, this also fixes the same issue for patch_text().
>
> Fixes: 0ff7c3b33127 ("riscv: Use text_mutex instead of patch_lock")
> Co-developed-by: Guo Ren <[email protected]>
> Signed-off-by: Guo Ren <[email protected]>
> Cc: Zong Li <[email protected]>
> Cc: Palmer Dabbelt <[email protected]>
> Signed-off-by: Changbin Du <[email protected]>
> ---
> Changes in v3:
> - denote this also fixes function patch_text().
>
> Changes in v2:
> - Rewrite commit log with lockdep explanation [Guo Ren]
> - Rebase on v6.1 [Guo Ren]
>
> v1:
> https://lore.kernel.org/linux-riscv/[email protected]/
> ---
> arch/riscv/kernel/patch.c | 7 -------
> 1 file changed, 7 deletions(-)
>
> diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
> index 765004b60513..8619706f8dfd 100644
> --- a/arch/riscv/kernel/patch.c
> +++ b/arch/riscv/kernel/patch.c
> @@ -55,13 +55,6 @@ static int patch_insn_write(void *addr, const void *insn, size_t len)
> bool across_pages = (((uintptr_t) addr & ~PAGE_MASK) + len) > PAGE_SIZE;
> int ret;
>
> - /*
> - * Before reaching here, it was expected to lock the text_mutex
> - * already, so we don't need to give another lock here and could
> - * ensure that it was safe between each cores.
> - */
> - lockdep_assert_held(&text_mutex);

I must admit, patches like this do concern me a little, as a someone
unfamiliar with the world of probing and tracing.
Seeing an explicit check that the lock was held, leads me to believe
that the original author (Zong Li I think) thought that the text_mutex
lock was insufficient.
Do you think that their fear is unfounded? Explaining why it is safe to
remove this assertion in the commit message would go a long way towards
easing my anxiety!

Also, why delete the comment altogether? The comment provides some
information that doesn't appear to become invalid, even with the
assertion removed?

Thanks,
Conor.

> -
> if (across_pages)
> patch_map(addr + len, FIX_TEXT_POKE1);
>
> --
> 2.25.1
>
>


Attachments:
(No filename) (5.30 kB)
signature.asc (228.00 B)
Download all attachments

2023-01-31 07:26:53

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH v3] riscv: patch: Fixup lockdep warning in stop_machine

On Mon, Jan 30, 2023 at 11:10 PM Conor Dooley
<[email protected]> wrote:
>
> Hey Changbin,
>
> On Tue, Jan 31, 2023 at 07:26:59AM +0800, Changbin Du wrote:
> > From: Changbin Du <[email protected]>
> >
> > The task of ftrace_arch_code_modify(_post)_prepare() caller is
> > stop_machine, whose caller and work thread are of different tasks. The
> > lockdep checker needs the same task context, or it's wrong. That means
> > it's a bug here to use lockdep_assert_held because we don't guarantee
> > the same task context.
> >
> > kernel/locking/lockdep.c:
> > int __lock_is_held(const struct lockdep_map *lock, int read)
> > {
> > struct task_struct *curr = current;
> > int i;
> >
> > for (i = 0; i < curr->lockdep_depth; i++) {
> > ^^^^^^^^^^^^^^^^^^^
> > struct held_lock *hlock = curr->held_locks + i;
> > ^^^^^^^^^^^^^^^^
> > if (match_held_lock(hlock, lock)) {
> > if (read == -1 || !!hlock->read == read)
> > return LOCK_STATE_HELD;
> >
> > The __lock_is_held depends on current held_locks records; if
> > stop_machine makes the checker runing on another task, that's wrong.
> >
> > Here is the log:
> > [ 15.761523] ------------[ cut here ]------------
> > [ 15.762125] WARNING: CPU: 0 PID: 15 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x72/0x364
> > [ 15.763258] Modules linked in:
> > [ 15.764154] CPU: 0 PID: 15 Comm: migration/0 Not tainted 6.1.0-rc1-00014-g66924be85884-dirty #377
> > [ 15.765339] Hardware name: riscv-virtio,qemu (DT)
> > [ 15.765985] Stopper: multi_cpu_stop+0x0/0x192 <- stop_cpus.constprop.0+0x90/0xe2
> > [ 15.766711] epc : patch_insn_write+0x72/0x364
> > [ 15.767011] ra : patch_insn_write+0x70/0x364
> > [ 15.767276] epc : ffffffff8000721e ra : ffffffff8000721c sp : ff2000000067bca0
> > [ 15.767622] gp : ffffffff81603f90 tp : ff60000002432a00 t0 : 7300000000000000
> > [ 15.767919] t1 : 0000000000000000 t2 : 73695f6b636f6c5f s0 : ff2000000067bcf0
> > [ 15.768238] s1 : 0000000000000008 a0 : 0000000000000000 a1 : 0000000000000000
> > [ 15.768537] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
> > [ 15.768837] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000
> > [ 15.769139] s2 : ffffffff80009faa s3 : ff2000000067bd10 s4 : ffffffffffffffff
> > [ 15.769447] s5 : 0000000000000001 s6 : 0000000000000001 s7 : 0000000000000003
> > [ 15.769740] s8 : 0000000000000002 s9 : 0000000000000004 s10: 0000000000000003
> > [ 15.770027] s11: 0000000000000002 t3 : 0000000000000000 t4 : ffffffff819af097
> > [ 15.770323] t5 : ffffffff819af098 t6 : ff2000000067ba28
> > [ 15.770574] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
> > [ 15.771102] [<ffffffff80007520>] patch_text_nosync+0x10/0x3a
> > [ 15.771421] [<ffffffff80009c66>] ftrace_update_ftrace_func+0x74/0x10a
> > [ 15.771704] [<ffffffff800fa17e>] ftrace_modify_all_code+0xb0/0x16c
> > [ 15.771958] [<ffffffff800fa24c>] __ftrace_modify_code+0x12/0x1c
> > [ 15.772196] [<ffffffff800e110e>] multi_cpu_stop+0x14a/0x192
> > [ 15.772454] [<ffffffff800e0a34>] cpu_stopper_thread+0x96/0x14c
> > [ 15.772699] [<ffffffff8003f4ea>] smpboot_thread_fn+0xf8/0x1cc
> > [ 15.772945] [<ffffffff8003ac9c>] kthread+0xe2/0xf8
> > [ 15.773160] [<ffffffff80003e98>] ret_from_exception+0x0/0x14
> > [ 15.773471] ---[ end trace 0000000000000000 ]---
>
> FWIW, you can always crop the [15.192321] stuff out of commit messages,
> as it just adds noise.
>
> > By the way, this also fixes the same issue for patch_text().
> >
> > Fixes: 0ff7c3b33127 ("riscv: Use text_mutex instead of patch_lock")
> > Co-developed-by: Guo Ren <[email protected]>
> > Signed-off-by: Guo Ren <[email protected]>
> > Cc: Zong Li <[email protected]>
> > Cc: Palmer Dabbelt <[email protected]>
> > Signed-off-by: Changbin Du <[email protected]>
> > ---
> > Changes in v3:
> > - denote this also fixes function patch_text().
> >
> > Changes in v2:
> > - Rewrite commit log with lockdep explanation [Guo Ren]
> > - Rebase on v6.1 [Guo Ren]
> >
> > v1:
> > https://lore.kernel.org/linux-riscv/[email protected]/
> > ---
> > arch/riscv/kernel/patch.c | 7 -------
> > 1 file changed, 7 deletions(-)
> >
> > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
> > index 765004b60513..8619706f8dfd 100644
> > --- a/arch/riscv/kernel/patch.c
> > +++ b/arch/riscv/kernel/patch.c
> > @@ -55,13 +55,6 @@ static int patch_insn_write(void *addr, const void *insn, size_t len)
> > bool across_pages = (((uintptr_t) addr & ~PAGE_MASK) + len) > PAGE_SIZE;
> > int ret;
> >
> > - /*
> > - * Before reaching here, it was expected to lock the text_mutex
> > - * already, so we don't need to give another lock here and could
> > - * ensure that it was safe between each cores.
> > - */
> > - lockdep_assert_held(&text_mutex);
>
> I must admit, patches like this do concern me a little, as a someone
> unfamiliar with the world of probing and tracing.
> Seeing an explicit check that the lock was held, leads me to believe
> that the original author (Zong Li I think) thought that the text_mutex
> lock was insufficient.
> Do you think that their fear is unfounded? Explaining why it is safe to
> remove this assertion in the commit message would go a long way towards
> easing my anxiety!
>
> Also, why delete the comment altogether? The comment provides some
> information that doesn't appear to become invalid, even with the
> assertion removed?
Stop_machine separated the mutex context and made a lockdep warning.
So text_mutex can't be used here. We need to find another check
solution. I agree with the patch.

>
> Thanks,
> Conor.
>
> > -
> > if (across_pages)
> > patch_map(addr + len, FIX_TEXT_POKE1);
> >
> > --
> > 2.25.1
> >
> >



--
Best Regards
Guo Ren

2023-01-31 07:51:01

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v3] riscv: patch: Fixup lockdep warning in stop_machine

On Tue, Jan 31, 2023 at 03:26:33PM +0800, Guo Ren wrote:
> On Mon, Jan 30, 2023 at 11:10 PM Conor Dooley
> <[email protected]> wrote:
> >
> > Hey Changbin,
> >
> > On Tue, Jan 31, 2023 at 07:26:59AM +0800, Changbin Du wrote:
> > > From: Changbin Du <[email protected]>
> > >
> > > The task of ftrace_arch_code_modify(_post)_prepare() caller is
> > > stop_machine, whose caller and work thread are of different tasks. The
> > > lockdep checker needs the same task context, or it's wrong. That means
> > > it's a bug here to use lockdep_assert_held because we don't guarantee
> > > the same task context.
> > >
> > > kernel/locking/lockdep.c:
> > > int __lock_is_held(const struct lockdep_map *lock, int read)
> > > {
> > > struct task_struct *curr = current;
> > > int i;
> > >
> > > for (i = 0; i < curr->lockdep_depth; i++) {
> > > ^^^^^^^^^^^^^^^^^^^
> > > struct held_lock *hlock = curr->held_locks + i;
> > > ^^^^^^^^^^^^^^^^
> > > if (match_held_lock(hlock, lock)) {
> > > if (read == -1 || !!hlock->read == read)
> > > return LOCK_STATE_HELD;
> > >
> > > The __lock_is_held depends on current held_locks records; if
> > > stop_machine makes the checker runing on another task, that's wrong.
> > >
> > > Here is the log:
> > > [ 15.761523] ------------[ cut here ]------------
> > > [ 15.762125] WARNING: CPU: 0 PID: 15 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x72/0x364
> > > [ 15.763258] Modules linked in:
> > > [ 15.764154] CPU: 0 PID: 15 Comm: migration/0 Not tainted 6.1.0-rc1-00014-g66924be85884-dirty #377
> > > [ 15.765339] Hardware name: riscv-virtio,qemu (DT)
> > > [ 15.765985] Stopper: multi_cpu_stop+0x0/0x192 <- stop_cpus.constprop.0+0x90/0xe2
> > > [ 15.766711] epc : patch_insn_write+0x72/0x364
> > > [ 15.767011] ra : patch_insn_write+0x70/0x364
> > > [ 15.767276] epc : ffffffff8000721e ra : ffffffff8000721c sp : ff2000000067bca0
> > > [ 15.767622] gp : ffffffff81603f90 tp : ff60000002432a00 t0 : 7300000000000000
> > > [ 15.767919] t1 : 0000000000000000 t2 : 73695f6b636f6c5f s0 : ff2000000067bcf0
> > > [ 15.768238] s1 : 0000000000000008 a0 : 0000000000000000 a1 : 0000000000000000
> > > [ 15.768537] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
> > > [ 15.768837] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000
> > > [ 15.769139] s2 : ffffffff80009faa s3 : ff2000000067bd10 s4 : ffffffffffffffff
> > > [ 15.769447] s5 : 0000000000000001 s6 : 0000000000000001 s7 : 0000000000000003
> > > [ 15.769740] s8 : 0000000000000002 s9 : 0000000000000004 s10: 0000000000000003
> > > [ 15.770027] s11: 0000000000000002 t3 : 0000000000000000 t4 : ffffffff819af097
> > > [ 15.770323] t5 : ffffffff819af098 t6 : ff2000000067ba28
> > > [ 15.770574] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
> > > [ 15.771102] [<ffffffff80007520>] patch_text_nosync+0x10/0x3a
> > > [ 15.771421] [<ffffffff80009c66>] ftrace_update_ftrace_func+0x74/0x10a
> > > [ 15.771704] [<ffffffff800fa17e>] ftrace_modify_all_code+0xb0/0x16c
> > > [ 15.771958] [<ffffffff800fa24c>] __ftrace_modify_code+0x12/0x1c
> > > [ 15.772196] [<ffffffff800e110e>] multi_cpu_stop+0x14a/0x192
> > > [ 15.772454] [<ffffffff800e0a34>] cpu_stopper_thread+0x96/0x14c
> > > [ 15.772699] [<ffffffff8003f4ea>] smpboot_thread_fn+0xf8/0x1cc
> > > [ 15.772945] [<ffffffff8003ac9c>] kthread+0xe2/0xf8
> > > [ 15.773160] [<ffffffff80003e98>] ret_from_exception+0x0/0x14
> > > [ 15.773471] ---[ end trace 0000000000000000 ]---
> >
> > FWIW, you can always crop the [15.192321] stuff out of commit messages,
> > as it just adds noise.
> >
> > > By the way, this also fixes the same issue for patch_text().
> > >
> > > Fixes: 0ff7c3b33127 ("riscv: Use text_mutex instead of patch_lock")
> > > Co-developed-by: Guo Ren <[email protected]>
> > > Signed-off-by: Guo Ren <[email protected]>
> > > Cc: Zong Li <[email protected]>
> > > Cc: Palmer Dabbelt <[email protected]>
> > > Signed-off-by: Changbin Du <[email protected]>
> > > ---
> > > Changes in v3:
> > > - denote this also fixes function patch_text().
> > >
> > > Changes in v2:
> > > - Rewrite commit log with lockdep explanation [Guo Ren]
> > > - Rebase on v6.1 [Guo Ren]
> > >
> > > v1:
> > > https://lore.kernel.org/linux-riscv/[email protected]/
> > > ---
> > > arch/riscv/kernel/patch.c | 7 -------
> > > 1 file changed, 7 deletions(-)
> > >
> > > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
> > > index 765004b60513..8619706f8dfd 100644
> > > --- a/arch/riscv/kernel/patch.c
> > > +++ b/arch/riscv/kernel/patch.c
> > > @@ -55,13 +55,6 @@ static int patch_insn_write(void *addr, const void *insn, size_t len)
> > > bool across_pages = (((uintptr_t) addr & ~PAGE_MASK) + len) > PAGE_SIZE;
> > > int ret;
> > >
> > > - /*
> > > - * Before reaching here, it was expected to lock the text_mutex
> > > - * already, so we don't need to give another lock here and could
> > > - * ensure that it was safe between each cores.
> > > - */
> > > - lockdep_assert_held(&text_mutex);
> >
> > I must admit, patches like this do concern me a little, as a someone
> > unfamiliar with the world of probing and tracing.
> > Seeing an explicit check that the lock was held, leads me to believe
> > that the original author (Zong Li I think) thought that the text_mutex
> > lock was insufficient.
> > Do you think that their fear is unfounded? Explaining why it is safe to
> > remove this assertion in the commit message would go a long way towards
> > easing my anxiety!
> >
> > Also, why delete the comment altogether? The comment provides some
> > information that doesn't appear to become invalid, even with the
> > assertion removed?
> Stop_machine separated the mutex context and made a lockdep warning.
> So text_mutex can't be used here. We need to find another check
> solution. I agree with the patch.

Whether or not you agree with the change is not the point (with your SoB
I'd hope you agree with it).
I understand that you two are trying to fix a false positive lockdep
warning, but what I am asking for an explanation as to why the original
author's fear is unfounded.
Surely, having added the assertion, they were not thinking of the same
code path that you guys are hitting the false positive on?

Perhaps Zong themselves can tell us what the original fear was?

Thanks,
Conor.


Attachments:
(No filename) (6.48 kB)
signature.asc (228.00 B)
Download all attachments

2023-02-01 05:00:35

by Changbin Du

[permalink] [raw]
Subject: Re: [PATCH v3] riscv: patch: Fixup lockdep warning in stop_machine

On Tue, Jan 31, 2023 at 07:50:20AM +0000, Conor Dooley wrote:
> On Tue, Jan 31, 2023 at 03:26:33PM +0800, Guo Ren wrote:
[snip]
> > > >
> > > > - /*
> > > > - * Before reaching here, it was expected to lock the text_mutex
> > > > - * already, so we don't need to give another lock here and could
> > > > - * ensure that it was safe between each cores.
> > > > - */
> > > > - lockdep_assert_held(&text_mutex);
> > >
> > > I must admit, patches like this do concern me a little, as a someone
> > > unfamiliar with the world of probing and tracing.
> > > Seeing an explicit check that the lock was held, leads me to believe
> > > that the original author (Zong Li I think) thought that the text_mutex
> > > lock was insufficient.
> > > Do you think that their fear is unfounded? Explaining why it is safe to
> > > remove this assertion in the commit message would go a long way towards
> > > easing my anxiety!
> > >
> > > Also, why delete the comment altogether? The comment provides some
> > > information that doesn't appear to become invalid, even with the
> > > assertion removed?
> > Stop_machine separated the mutex context and made a lockdep warning.
> > So text_mutex can't be used here. We need to find another check
> > solution. I agree with the patch.
>
> Whether or not you agree with the change is not the point (with your SoB
> I'd hope you agree with it).
> I understand that you two are trying to fix a false positive lockdep
> warning, but what I am asking for an explanation as to why the original
> author's fear is unfounded.
> Surely, having added the assertion, they were not thinking of the same
> code path that you guys are hitting the false positive on?
>
The assertion is reasonable since the fixmap entry is shared. The text_mutex
does should be held before entering that function. But the false positive cases
make some functions (ftrace for example) difficult to use due to warning log
storm.

Either the lockdep should be fixed for stop_machine, or remove the assertion
simply now (we can keep the comments). (or do the assertion conditionly?)

And this is not a riscv only problem but common for architectures which use
stop_machine to patch text. (arm for example)

> Perhaps Zong themselves can tell us what the original fear was?
>
> Thanks,
> Conor.



--
Cheers,
Changbin Du

2023-02-01 14:02:19

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v3] riscv: patch: Fixup lockdep warning in stop_machine

On Thu, Feb 02, 2023 at 05:00:31AM +0800, Changbin Du wrote:
> On Tue, Jan 31, 2023 at 07:50:20AM +0000, Conor Dooley wrote:
> > On Tue, Jan 31, 2023 at 03:26:33PM +0800, Guo Ren wrote:
> [snip]
> > > > >
> > > > > - /*
> > > > > - * Before reaching here, it was expected to lock the text_mutex
> > > > > - * already, so we don't need to give another lock here and could
> > > > > - * ensure that it was safe between each cores.
> > > > > - */
> > > > > - lockdep_assert_held(&text_mutex);
> > > >
> > > > I must admit, patches like this do concern me a little, as a someone
> > > > unfamiliar with the world of probing and tracing.
> > > > Seeing an explicit check that the lock was held, leads me to believe
> > > > that the original author (Zong Li I think) thought that the text_mutex
> > > > lock was insufficient.
> > > > Do you think that their fear is unfounded? Explaining why it is safe to
> > > > remove this assertion in the commit message would go a long way towards
> > > > easing my anxiety!
> > > >
> > > > Also, why delete the comment altogether? The comment provides some
> > > > information that doesn't appear to become invalid, even with the
> > > > assertion removed?
> > > Stop_machine separated the mutex context and made a lockdep warning.
> > > So text_mutex can't be used here. We need to find another check
> > > solution. I agree with the patch.
> >
> > Whether or not you agree with the change is not the point (with your SoB
> > I'd hope you agree with it).
> > I understand that you two are trying to fix a false positive lockdep
> > warning, but what I am asking for an explanation as to why the original
> > author's fear is unfounded.
> > Surely, having added the assertion, they were not thinking of the same
> > code path that you guys are hitting the false positive on?
> >
> The assertion is reasonable since the fixmap entry is shared. The text_mutex
> does should be held before entering that function. But the false positive cases
> make some functions (ftrace for example) difficult to use due to warning log
> storm.
>
> Either the lockdep should be fixed for stop_machine, or remove the assertion
> simply now (we can keep the comments). (or do the assertion conditionly?)

How would you suggest checking it conditionally?

Also, if you persist in removing the assertion, there is a comment in
arch/riscv/kernel/ftrace.c that would need to be updated. (L129-ish)

The comment you removed in this patch seems valid both before and after
though, so I don't see a compelling reason for its removal.

> And this is not a riscv only problem but common for architectures which use
> stop_machine to patch text. (arm for example)
>
> > Perhaps Zong themselves can tell us what the original fear was?


Attachments:
(No filename) (2.71 kB)
signature.asc (228.00 B)
Download all attachments

2023-02-02 07:00:49

by Changbin Du

[permalink] [raw]
Subject: Re: [PATCH v3] riscv: patch: Fixup lockdep warning in stop_machine

On Wed, Feb 01, 2023 at 02:01:07PM +0000, Conor Dooley wrote:
> On Thu, Feb 02, 2023 at 05:00:31AM +0800, Changbin Du wrote:
> > On Tue, Jan 31, 2023 at 07:50:20AM +0000, Conor Dooley wrote:
> > > On Tue, Jan 31, 2023 at 03:26:33PM +0800, Guo Ren wrote:
> > [snip]
> > > > > >
> > > > > > - /*
> > > > > > - * Before reaching here, it was expected to lock the text_mutex
> > > > > > - * already, so we don't need to give another lock here and could
> > > > > > - * ensure that it was safe between each cores.
> > > > > > - */
> > > > > > - lockdep_assert_held(&text_mutex);
> > > > >
> > > > > I must admit, patches like this do concern me a little, as a someone
> > > > > unfamiliar with the world of probing and tracing.
> > > > > Seeing an explicit check that the lock was held, leads me to believe
> > > > > that the original author (Zong Li I think) thought that the text_mutex
> > > > > lock was insufficient.
> > > > > Do you think that their fear is unfounded? Explaining why it is safe to
> > > > > remove this assertion in the commit message would go a long way towards
> > > > > easing my anxiety!
> > > > >
> > > > > Also, why delete the comment altogether? The comment provides some
> > > > > information that doesn't appear to become invalid, even with the
> > > > > assertion removed?
> > > > Stop_machine separated the mutex context and made a lockdep warning.
> > > > So text_mutex can't be used here. We need to find another check
> > > > solution. I agree with the patch.
> > >
> > > Whether or not you agree with the change is not the point (with your SoB
> > > I'd hope you agree with it).
> > > I understand that you two are trying to fix a false positive lockdep
> > > warning, but what I am asking for an explanation as to why the original
> > > author's fear is unfounded.
> > > Surely, having added the assertion, they were not thinking of the same
> > > code path that you guys are hitting the false positive on?
> > >
> > The assertion is reasonable since the fixmap entry is shared. The text_mutex
> > does should be held before entering that function. But the false positive cases
> > make some functions (ftrace for example) difficult to use due to warning log
> > storm.
> >
> > Either the lockdep should be fixed for stop_machine, or remove the assertion
> > simply now (we can keep the comments). (or do the assertion conditionly?)
>
> How would you suggest checking it conditionally?
>
Please refer to a early patch from Palmer Dabbelt.
https://lore.kernel.org/all/[email protected]/

> Also, if you persist in removing the assertion, there is a comment in
> arch/riscv/kernel/ftrace.c that would need to be updated. (L129-ish)
>
No problem.

> The comment you removed in this patch seems valid both before and after
> though, so I don't see a compelling reason for its removal.
We all agreed. The key is to get rid of false positive case.

>
> > And this is not a riscv only problem but common for architectures which use
> > stop_machine to patch text. (arm for example)
> >
> > > Perhaps Zong themselves can tell us what the original fear was?
>



--
Cheers,
Changbin Du

2023-02-02 08:02:34

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v3] riscv: patch: Fixup lockdep warning in stop_machine

On Fri, Feb 03, 2023 at 07:00:43AM +0800, Changbin Du wrote:

btw, something is wrong with your mail client or host machine.
Everything that you are sending is timestamped in the future,
as it is currently 15:57 on the 2nd in UTC+8.

> On Wed, Feb 01, 2023 at 02:01:07PM +0000, Conor Dooley wrote:
> > On Thu, Feb 02, 2023 at 05:00:31AM +0800, Changbin Du wrote:
> > > On Tue, Jan 31, 2023 at 07:50:20AM +0000, Conor Dooley wrote:
> > > > On Tue, Jan 31, 2023 at 03:26:33PM +0800, Guo Ren wrote:
> > > [snip]
> > > > > > >
> > > > > > > - /*
> > > > > > > - * Before reaching here, it was expected to lock the text_mutex
> > > > > > > - * already, so we don't need to give another lock here and could
> > > > > > > - * ensure that it was safe between each cores.
> > > > > > > - */
> > > > > > > - lockdep_assert_held(&text_mutex);
> > > > > >
> > > > > > I must admit, patches like this do concern me a little, as a someone
> > > > > > unfamiliar with the world of probing and tracing.
> > > > > > Seeing an explicit check that the lock was held, leads me to believe
> > > > > > that the original author (Zong Li I think) thought that the text_mutex
> > > > > > lock was insufficient.
> > > > > > Do you think that their fear is unfounded? Explaining why it is safe to
> > > > > > remove this assertion in the commit message would go a long way towards
> > > > > > easing my anxiety!
> > > > > >
> > > > > > Also, why delete the comment altogether? The comment provides some
> > > > > > information that doesn't appear to become invalid, even with the
> > > > > > assertion removed?
> > > > > Stop_machine separated the mutex context and made a lockdep warning.
> > > > > So text_mutex can't be used here. We need to find another check
> > > > > solution. I agree with the patch.
> > > >
> > > > Whether or not you agree with the change is not the point (with your SoB
> > > > I'd hope you agree with it).
> > > > I understand that you two are trying to fix a false positive lockdep
> > > > warning, but what I am asking for an explanation as to why the original
> > > > author's fear is unfounded.
> > > > Surely, having added the assertion, they were not thinking of the same
> > > > code path that you guys are hitting the false positive on?
> > > >
> > > The assertion is reasonable since the fixmap entry is shared. The text_mutex
> > > does should be held before entering that function. But the false positive cases
> > > make some functions (ftrace for example) difficult to use due to warning log
> > > storm.
> > >
> > > Either the lockdep should be fixed for stop_machine, or remove the assertion
> > > simply now (we can keep the comments). (or do the assertion conditionly?)
> >
> > How would you suggest checking it conditionally?
> >
> Please refer to a early patch from Palmer Dabbelt.
> https://lore.kernel.org/all/[email protected]/

Oh cool, thanks for that.
Why not resend that approach, with your suggested fixup for
ftrace_init_nop() then?
It looks more complex, but is less worrisome & has an R-b from Steven
already.

Thanks,
Conor.


Attachments:
(No filename) (3.04 kB)
signature.asc (228.00 B)
Download all attachments

2023-02-02 11:40:05

by Changbin Du

[permalink] [raw]
Subject: Re: [PATCH v3] riscv: patch: Fixup lockdep warning in stop_machine

On Thu, Feb 02, 2023 at 08:01:44AM +0000, Conor Dooley wrote:
> On Fri, Feb 03, 2023 at 07:00:43AM +0800, Changbin Du wrote:
>
> btw, something is wrong with your mail client or host machine.
> Everything that you are sending is timestamped in the future,
> as it is currently 15:57 on the 2nd in UTC+8.
>
hmm, my machine goes 12 hours ahead. Thanks for your remainding.

> > On Wed, Feb 01, 2023 at 02:01:07PM +0000, Conor Dooley wrote:
> > > On Thu, Feb 02, 2023 at 05:00:31AM +0800, Changbin Du wrote:
> > > > On Tue, Jan 31, 2023 at 07:50:20AM +0000, Conor Dooley wrote:
> > > > > On Tue, Jan 31, 2023 at 03:26:33PM +0800, Guo Ren wrote:
> > > > [snip]
> > > > > > > >
> > > > > > > > - /*
> > > > > > > > - * Before reaching here, it was expected to lock the text_mutex
> > > > > > > > - * already, so we don't need to give another lock here and could
> > > > > > > > - * ensure that it was safe between each cores.
> > > > > > > > - */
> > > > > > > > - lockdep_assert_held(&text_mutex);
> > > > > > >
> > > > > > > I must admit, patches like this do concern me a little, as a someone
> > > > > > > unfamiliar with the world of probing and tracing.
> > > > > > > Seeing an explicit check that the lock was held, leads me to believe
> > > > > > > that the original author (Zong Li I think) thought that the text_mutex
> > > > > > > lock was insufficient.
> > > > > > > Do you think that their fear is unfounded? Explaining why it is safe to
> > > > > > > remove this assertion in the commit message would go a long way towards
> > > > > > > easing my anxiety!
> > > > > > >
> > > > > > > Also, why delete the comment altogether? The comment provides some
> > > > > > > information that doesn't appear to become invalid, even with the
> > > > > > > assertion removed?
> > > > > > Stop_machine separated the mutex context and made a lockdep warning.
> > > > > > So text_mutex can't be used here. We need to find another check
> > > > > > solution. I agree with the patch.
> > > > >
> > > > > Whether or not you agree with the change is not the point (with your SoB
> > > > > I'd hope you agree with it).
> > > > > I understand that you two are trying to fix a false positive lockdep
> > > > > warning, but what I am asking for an explanation as to why the original
> > > > > author's fear is unfounded.
> > > > > Surely, having added the assertion, they were not thinking of the same
> > > > > code path that you guys are hitting the false positive on?
> > > > >
> > > > The assertion is reasonable since the fixmap entry is shared. The text_mutex
> > > > does should be held before entering that function. But the false positive cases
> > > > make some functions (ftrace for example) difficult to use due to warning log
> > > > storm.
> > > >
> > > > Either the lockdep should be fixed for stop_machine, or remove the assertion
> > > > simply now (we can keep the comments). (or do the assertion conditionly?)
> > >
> > > How would you suggest checking it conditionally?
> > >
> > Please refer to a early patch from Palmer Dabbelt.
> > https://lore.kernel.org/all/[email protected]/
>
> Oh cool, thanks for that.
> Why not resend that approach, with your suggested fixup for
> ftrace_init_nop() then?
> It looks more complex, but is less worrisome & has an R-b from Steven
> already.
>
Personally I don't like the complex change, because the clients of text patching
api are very limited. So I think it's not hard to take care of it. (The arm code
also doesn't have the assertion)

So I would leave the decision making to maintainers :) Anyway I will send V4 as
a candidate.

> Thanks,
> Conor.
>



--
Cheers,
Changbin Du