2022-11-30 12:58:00

by Naresh Kamboju

[permalink] [raw]
Subject: arm64: allmodconfig: BUG: KCSAN: data-race in p9_client_cb / p9_client_rpc

[Please ignore if it is already reported, and not an expert of KCSAN]

While booting arm64 with allmodconfig following kernel BUG found,
this build is enabled with CONFIG_INIT_STACK_NONE=y

[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
[ 0.000000] Linux version 6.1.0-rc7-next-20221130 (tuxmake@tuxmake)
(aarch64-linux-gnu-gcc (Debian 11.3.0-6) 11.3.0, GNU ld (GNU Binutils
for Debian) 2.39) #2 SMP PREEMPT_DYNAMIC @1669786411
[ 0.000000] random: crng init done
[ 0.000000] Machine model: linux,dummy-virt
...
[ 424.408466] ==================================================================
[ 424.412792] BUG: KCSAN: data-race in p9_client_cb / p9_client_rpc
[ 424.416806]
[ 424.418214] write to 0xffff00000a753000 of 4 bytes by interrupt on cpu 0:
[ 424.422437] p9_client_cb+0x84/0x100
[ 424.425048] req_done+0xfc/0x1c0
[ 424.427443] vring_interrupt+0x174/0x1c0
[ 424.430204] __handle_irq_event_percpu+0x2c8/0x680
[ 424.433455] handle_irq_event+0x9c/0x180
[ 424.436187] handle_fasteoi_irq+0x2b0/0x340
[ 424.439139] generic_handle_domain_irq+0x78/0xc0
[ 424.442323] __gic_handle_irq_from_irqson.isra.0+0x3d8/0x480
[ 424.446054] gic_handle_irq+0xb4/0x100
[ 424.448663] call_on_irq_stack+0x2c/0x38
[ 424.451443] do_interrupt_handler+0xd0/0x140
[ 424.454452] el1_interrupt+0x88/0xc0
[ 424.457001] el1h_64_irq_handler+0x18/0x40
[ 424.459856] el1h_64_irq+0x78/0x7c
[ 424.462331] arch_local_irq_enable+0x50/0x80
[ 424.465273] arm64_preempt_schedule_irq+0x80/0xc0
[ 424.468497] el1_interrupt+0x90/0xc0
[ 424.471096] el1h_64_irq_handler+0x18/0x40
[ 424.474009] el1h_64_irq+0x78/0x7c
[ 424.476464] __tsan_read8+0x118/0x280
[ 424.479086] __delay+0x104/0x140
[ 424.481521] __udelay+0x5c/0xc0
[ 424.483905] kcsan_setup_watchpoint+0x6cc/0x7c0
[ 424.487081] __tsan_read4+0x168/0x280
[ 424.489729] p9_client_rpc+0x1d0/0x580
[ 424.492429] p9_client_getattr_dotl+0xd0/0x3c0
[ 424.495457] v9fs_inode_from_fid_dotl+0x48/0x1c0
[ 424.498602] v9fs_vfs_lookup+0x23c/0x3c0
[ 424.501386] __lookup_slow+0x1b0/0x240
[ 424.504056] walk_component+0x168/0x280
[ 424.506807] path_lookupat+0x154/0x2c0
[ 424.509489] filename_lookup+0x160/0x2c0
[ 424.512261] vfs_statx+0xc0/0x280
[ 424.514710] vfs_fstatat+0x84/0x100
[ 424.517308] __do_sys_newfstatat+0x64/0x100
[ 424.520189] __arm64_sys_newfstatat+0x74/0xc0
[ 424.523262] invoke_syscall+0xb0/0x1c0
[ 424.525939] el0_svc_common.constprop.0+0x10c/0x180
[ 424.529219] do_el0_svc+0x54/0x80
[ 424.531662] el0_svc+0x4c/0xc0
[ 424.533944] el0t_64_sync_handler+0xc8/0x180
[ 424.536837] el0t_64_sync+0x1a4/0x1a8
[ 424.539436]
[ 424.540810] read to 0xffff00000a753000 of 4 bytes by task 74 on cpu 0:
[ 424.544927] p9_client_rpc+0x1d0/0x580
[ 424.547692] p9_client_getattr_dotl+0xd0/0x3c0
[ 424.550564] v9fs_inode_from_fid_dotl+0x48/0x1c0
[ 424.553550] v9fs_vfs_lookup+0x23c/0x3c0
[ 424.556144] __lookup_slow+0x1b0/0x240
[ 424.558655] walk_component+0x168/0x280
[ 424.561192] path_lookupat+0x154/0x2c0
[ 424.563721] filename_lookup+0x160/0x2c0
[ 424.566337] vfs_statx+0xc0/0x280
[ 424.568638] vfs_fstatat+0x84/0x100
[ 424.571051] __do_sys_newfstatat+0x64/0x100
[ 424.573821] __arm64_sys_newfstatat+0x74/0xc0
[ 424.576650] invoke_syscall+0xb0/0x1c0
[ 424.579144] el0_svc_common.constprop.0+0x10c/0x180
[ 424.582212] do_el0_svc+0x54/0x80
[ 424.584475] el0_svc+0x4c/0xc0
[ 424.586611] el0t_64_sync_handler+0xc8/0x180
[ 424.589347] el0t_64_sync+0x1a4/0x1a8
[ 424.591758]
[ 424.593045] 1 lock held by systemd-journal/74:
[ 424.595821] #0: ffff00000a0ead88
(&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x158/0x280
[ 424.601588] irq event stamp: 416642
[ 424.603875] hardirqs last enabled at (416641):
[<ffff80000a552040>] preempt_schedule_irq+0x40/0x100
[ 424.609078] hardirqs last disabled at (416642):
[<ffff80000a5422b8>] el1_interrupt+0x78/0xc0
[ 424.613887] softirqs last enabled at (416464):
[<ffff800008011130>] __do_softirq+0x5b0/0x694
[ 424.618699] softirqs last disabled at (416453):
[<ffff80000801a9b0>] ____do_softirq+0x30/0x80
[ 424.623562]
[ 424.624841] value changed: 0x00000002 -> 0x00000003
[ 424.627838]
[ 424.629117] Reported by Kernel Concurrency Sanitizer on:
[ 424.632298] CPU: 0 PID: 74 Comm: systemd-journal Tainted: G
T 6.1.0-rc7-next-20221130 #2
26b4d3787db66414ab23fce17d22967bb2169e1f
[ 424.639393] Hardware name: linux,dummy-virt (DT)

Reported-by: Linux Kernel Functional Testing <[email protected]>

--
Linaro LKFT
https://lkft.linaro.org


2022-11-30 13:31:08

by Marco Elver

[permalink] [raw]
Subject: Re: arm64: allmodconfig: BUG: KCSAN: data-race in p9_client_cb / p9_client_rpc

On Wed, 30 Nov 2022 at 13:50, Naresh Kamboju <[email protected]> wrote:
>
> [Please ignore if it is already reported, and not an expert of KCSAN]
>
> While booting arm64 with allmodconfig following kernel BUG found,
> this build is enabled with CONFIG_INIT_STACK_NONE=y

Unsure why CONFIG_INIT_STACK_NONE=y is relevant.

> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
> [ 0.000000] Linux version 6.1.0-rc7-next-20221130 (tuxmake@tuxmake)
> (aarch64-linux-gnu-gcc (Debian 11.3.0-6) 11.3.0, GNU ld (GNU Binutils
> for Debian) 2.39) #2 SMP PREEMPT_DYNAMIC @1669786411
> [ 0.000000] random: crng init done
> [ 0.000000] Machine model: linux,dummy-virt
> ...
> [ 424.408466] ==================================================================
> [ 424.412792] BUG: KCSAN: data-race in p9_client_cb / p9_client_rpc
> [ 424.416806]
> [ 424.418214] write to 0xffff00000a753000 of 4 bytes by interrupt on cpu 0:
> [ 424.422437] p9_client_cb+0x84/0x100

Please always provide line numbers and kernel commit hash or tag (I
think it's next-20221130, but not entirely clear).

Then we can look at git blame of the lines and see if it's new code.

> [ 424.425048] req_done+0xfc/0x1c0
> [ 424.427443] vring_interrupt+0x174/0x1c0
> [ 424.430204] __handle_irq_event_percpu+0x2c8/0x680
> [ 424.433455] handle_irq_event+0x9c/0x180
> [ 424.436187] handle_fasteoi_irq+0x2b0/0x340
> [ 424.439139] generic_handle_domain_irq+0x78/0xc0
> [ 424.442323] __gic_handle_irq_from_irqson.isra.0+0x3d8/0x480
> [ 424.446054] gic_handle_irq+0xb4/0x100
> [ 424.448663] call_on_irq_stack+0x2c/0x38
> [ 424.451443] do_interrupt_handler+0xd0/0x140
> [ 424.454452] el1_interrupt+0x88/0xc0
> [ 424.457001] el1h_64_irq_handler+0x18/0x40
> [ 424.459856] el1h_64_irq+0x78/0x7c
> [ 424.462331] arch_local_irq_enable+0x50/0x80
> [ 424.465273] arm64_preempt_schedule_irq+0x80/0xc0
> [ 424.468497] el1_interrupt+0x90/0xc0
> [ 424.471096] el1h_64_irq_handler+0x18/0x40
> [ 424.474009] el1h_64_irq+0x78/0x7c
> [ 424.476464] __tsan_read8+0x118/0x280
> [ 424.479086] __delay+0x104/0x140
> [ 424.481521] __udelay+0x5c/0xc0
> [ 424.483905] kcsan_setup_watchpoint+0x6cc/0x7c0
> [ 424.487081] __tsan_read4+0x168/0x280
> [ 424.489729] p9_client_rpc+0x1d0/0x580
> [ 424.492429] p9_client_getattr_dotl+0xd0/0x3c0
> [ 424.495457] v9fs_inode_from_fid_dotl+0x48/0x1c0
> [ 424.498602] v9fs_vfs_lookup+0x23c/0x3c0
> [ 424.501386] __lookup_slow+0x1b0/0x240
> [ 424.504056] walk_component+0x168/0x280
> [ 424.506807] path_lookupat+0x154/0x2c0
> [ 424.509489] filename_lookup+0x160/0x2c0
> [ 424.512261] vfs_statx+0xc0/0x280
> [ 424.514710] vfs_fstatat+0x84/0x100
> [ 424.517308] __do_sys_newfstatat+0x64/0x100
> [ 424.520189] __arm64_sys_newfstatat+0x74/0xc0
> [ 424.523262] invoke_syscall+0xb0/0x1c0
> [ 424.525939] el0_svc_common.constprop.0+0x10c/0x180
> [ 424.529219] do_el0_svc+0x54/0x80
> [ 424.531662] el0_svc+0x4c/0xc0
> [ 424.533944] el0t_64_sync_handler+0xc8/0x180
> [ 424.536837] el0t_64_sync+0x1a4/0x1a8
> [ 424.539436]
> [ 424.540810] read to 0xffff00000a753000 of 4 bytes by task 74 on cpu 0:
> [ 424.544927] p9_client_rpc+0x1d0/0x580
> [ 424.547692] p9_client_getattr_dotl+0xd0/0x3c0
> [ 424.550564] v9fs_inode_from_fid_dotl+0x48/0x1c0
> [ 424.553550] v9fs_vfs_lookup+0x23c/0x3c0
> [ 424.556144] __lookup_slow+0x1b0/0x240
> [ 424.558655] walk_component+0x168/0x280
> [ 424.561192] path_lookupat+0x154/0x2c0
> [ 424.563721] filename_lookup+0x160/0x2c0
> [ 424.566337] vfs_statx+0xc0/0x280
> [ 424.568638] vfs_fstatat+0x84/0x100
> [ 424.571051] __do_sys_newfstatat+0x64/0x100
> [ 424.573821] __arm64_sys_newfstatat+0x74/0xc0
> [ 424.576650] invoke_syscall+0xb0/0x1c0
> [ 424.579144] el0_svc_common.constprop.0+0x10c/0x180
> [ 424.582212] do_el0_svc+0x54/0x80
> [ 424.584475] el0_svc+0x4c/0xc0
> [ 424.586611] el0t_64_sync_handler+0xc8/0x180
> [ 424.589347] el0t_64_sync+0x1a4/0x1a8
> [ 424.591758]
> [ 424.593045] 1 lock held by systemd-journal/74:
> [ 424.595821] #0: ffff00000a0ead88
> (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x158/0x280
> [ 424.601588] irq event stamp: 416642
> [ 424.603875] hardirqs last enabled at (416641):
> [<ffff80000a552040>] preempt_schedule_irq+0x40/0x100
> [ 424.609078] hardirqs last disabled at (416642):
> [<ffff80000a5422b8>] el1_interrupt+0x78/0xc0
> [ 424.613887] softirqs last enabled at (416464):
> [<ffff800008011130>] __do_softirq+0x5b0/0x694
> [ 424.618699] softirqs last disabled at (416453):
> [<ffff80000801a9b0>] ____do_softirq+0x30/0x80
> [ 424.623562]
> [ 424.624841] value changed: 0x00000002 -> 0x00000003
> [ 424.627838]
> [ 424.629117] Reported by Kernel Concurrency Sanitizer on:
> [ 424.632298] CPU: 0 PID: 74 Comm: systemd-journal Tainted: G
> T 6.1.0-rc7-next-20221130 #2
> 26b4d3787db66414ab23fce17d22967bb2169e1f
> [ 424.639393] Hardware name: linux,dummy-virt (DT)
>
> Reported-by: Linux Kernel Functional Testing <[email protected]>
>
> --
> Linaro LKFT
> https://lkft.linaro.org

2022-11-30 16:22:42

by Naresh Kamboju

[permalink] [raw]
Subject: Re: arm64: allmodconfig: BUG: KCSAN: data-race in p9_client_cb / p9_client_rpc

On Wed, 30 Nov 2022 at 18:25, Marco Elver <[email protected]> wrote:
>
> On Wed, 30 Nov 2022 at 13:50, Naresh Kamboju <[email protected]> wrote:
> >
> > [Please ignore if it is already reported, and not an expert of KCSAN]
> >
> > While booting arm64 with allmodconfig following kernel BUG found,
> > this build is enabled with CONFIG_INIT_STACK_NONE=y
>
> Unsure why CONFIG_INIT_STACK_NONE=y is relevant.

I agree.

This is from qemu-arm64 boot log.

>
> > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
> > [ 0.000000] Linux version 6.1.0-rc7-next-20221130 (tuxmake@tuxmake)
> > (aarch64-linux-gnu-gcc (Debian 11.3.0-6) 11.3.0, GNU ld (GNU Binutils
> > for Debian) 2.39) #2 SMP PREEMPT_DYNAMIC @1669786411
> > [ 0.000000] random: crng init done
> > [ 0.000000] Machine model: linux,dummy-virt
> > ...
> > [ 424.408466] ==================================================================
> > [ 424.412792] BUG: KCSAN: data-race in p9_client_cb / p9_client_rpc
> > [ 424.416806]
> > [ 424.418214] write to 0xffff00000a753000 of 4 bytes by interrupt on cpu 0:
> > [ 424.422437] p9_client_cb+0x84/0x100
>
> Please always provide line numbers and kernel commit hash or tag (I
> think it's next-20221130, but not entirely clear).

It is the Linux next-20221130 tag.
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/?h=next-20221130

>
> Then we can look at git blame of the lines and see if it's new code.

True.
Hope that tree and tag could help you get git details.


> > [ 424.425048] req_done+0xfc/0x1c0
> > [ 424.427443] vring_interrupt+0x174/0x1c0
> > [ 424.430204] __handle_irq_event_percpu+0x2c8/0x680
> > [ 424.433455] handle_irq_event+0x9c/0x180
> > [ 424.436187] handle_fasteoi_irq+0x2b0/0x340
> > [ 424.439139] generic_handle_domain_irq+0x78/0xc0
> > [ 424.442323] __gic_handle_irq_from_irqson.isra.0+0x3d8/0x480
> > [ 424.446054] gic_handle_irq+0xb4/0x100
> > [ 424.448663] call_on_irq_stack+0x2c/0x38
> > [ 424.451443] do_interrupt_handler+0xd0/0x140
> > [ 424.454452] el1_interrupt+0x88/0xc0
> > [ 424.457001] el1h_64_irq_handler+0x18/0x40
> > [ 424.459856] el1h_64_irq+0x78/0x7c
> > [ 424.462331] arch_local_irq_enable+0x50/0x80
> > [ 424.465273] arm64_preempt_schedule_irq+0x80/0xc0
> > [ 424.468497] el1_interrupt+0x90/0xc0
> > [ 424.471096] el1h_64_irq_handler+0x18/0x40
> > [ 424.474009] el1h_64_irq+0x78/0x7c
> > [ 424.476464] __tsan_read8+0x118/0x280
> > [ 424.479086] __delay+0x104/0x140
> > [ 424.481521] __udelay+0x5c/0xc0
> > [ 424.483905] kcsan_setup_watchpoint+0x6cc/0x7c0
> > [ 424.487081] __tsan_read4+0x168/0x280
> > [ 424.489729] p9_client_rpc+0x1d0/0x580
> > [ 424.492429] p9_client_getattr_dotl+0xd0/0x3c0
> > [ 424.495457] v9fs_inode_from_fid_dotl+0x48/0x1c0
> > [ 424.498602] v9fs_vfs_lookup+0x23c/0x3c0
> > [ 424.501386] __lookup_slow+0x1b0/0x240
> > [ 424.504056] walk_component+0x168/0x280
> > [ 424.506807] path_lookupat+0x154/0x2c0
> > [ 424.509489] filename_lookup+0x160/0x2c0
> > [ 424.512261] vfs_statx+0xc0/0x280
> > [ 424.514710] vfs_fstatat+0x84/0x100
> > [ 424.517308] __do_sys_newfstatat+0x64/0x100
> > [ 424.520189] __arm64_sys_newfstatat+0x74/0xc0
> > [ 424.523262] invoke_syscall+0xb0/0x1c0
> > [ 424.525939] el0_svc_common.constprop.0+0x10c/0x180
> > [ 424.529219] do_el0_svc+0x54/0x80
> > [ 424.531662] el0_svc+0x4c/0xc0
> > [ 424.533944] el0t_64_sync_handler+0xc8/0x180
> > [ 424.536837] el0t_64_sync+0x1a4/0x1a8
> > [ 424.539436]
> > [ 424.540810] read to 0xffff00000a753000 of 4 bytes by task 74 on cpu 0:
> > [ 424.544927] p9_client_rpc+0x1d0/0x580
> > [ 424.547692] p9_client_getattr_dotl+0xd0/0x3c0
> > [ 424.550564] v9fs_inode_from_fid_dotl+0x48/0x1c0
> > [ 424.553550] v9fs_vfs_lookup+0x23c/0x3c0
> > [ 424.556144] __lookup_slow+0x1b0/0x240
> > [ 424.558655] walk_component+0x168/0x280
> > [ 424.561192] path_lookupat+0x154/0x2c0
> > [ 424.563721] filename_lookup+0x160/0x2c0
> > [ 424.566337] vfs_statx+0xc0/0x280
> > [ 424.568638] vfs_fstatat+0x84/0x100
> > [ 424.571051] __do_sys_newfstatat+0x64/0x100
> > [ 424.573821] __arm64_sys_newfstatat+0x74/0xc0
> > [ 424.576650] invoke_syscall+0xb0/0x1c0
> > [ 424.579144] el0_svc_common.constprop.0+0x10c/0x180
> > [ 424.582212] do_el0_svc+0x54/0x80
> > [ 424.584475] el0_svc+0x4c/0xc0
> > [ 424.586611] el0t_64_sync_handler+0xc8/0x180
> > [ 424.589347] el0t_64_sync+0x1a4/0x1a8
> > [ 424.591758]
> > [ 424.593045] 1 lock held by systemd-journal/74:
> > [ 424.595821] #0: ffff00000a0ead88
> > (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x158/0x280
> > [ 424.601588] irq event stamp: 416642
> > [ 424.603875] hardirqs last enabled at (416641):
> > [<ffff80000a552040>] preempt_schedule_irq+0x40/0x100
> > [ 424.609078] hardirqs last disabled at (416642):
> > [<ffff80000a5422b8>] el1_interrupt+0x78/0xc0
> > [ 424.613887] softirqs last enabled at (416464):
> > [<ffff800008011130>] __do_softirq+0x5b0/0x694
> > [ 424.618699] softirqs last disabled at (416453):
> > [<ffff80000801a9b0>] ____do_softirq+0x30/0x80
> > [ 424.623562]
> > [ 424.624841] value changed: 0x00000002 -> 0x00000003
> > [ 424.627838]
> > [ 424.629117] Reported by Kernel Concurrency Sanitizer on:
> > [ 424.632298] CPU: 0 PID: 74 Comm: systemd-journal Tainted: G
> > T 6.1.0-rc7-next-20221130 #2
> > 26b4d3787db66414ab23fce17d22967bb2169e1f
> > [ 424.639393] Hardware name: linux,dummy-virt (DT)
> >
> > Reported-by: Linux Kernel Functional Testing <[email protected]>
> >
> > --
> > Linaro LKFT
> > https://lkft.linaro.org

2022-11-30 20:14:46

by Dominique Martinet

[permalink] [raw]
Subject: Re: arm64: allmodconfig: BUG: KCSAN: data-race in p9_client_cb / p9_client_rpc

Naresh Kamboju wrote on Wed, Nov 30, 2022 at 09:34:45PM +0530:
> > > [ 424.418214] write to 0xffff00000a753000 of 4 bytes by interrupt on cpu 0:
> > > [ 424.422437] p9_client_cb+0x84/0x100
> >
> > Then we can look at git blame of the lines and see if it's new code.
>
> True.
> Hope that tree and tag could help you get git details.

Even with the git tag, if we don't build for the same arch with the same
compiler version/options and the same .config we aren't likely to have
identical binaries, so we cannot make sense of these offsets without
much work.

As much as I'd like to investigate a data race in 9p (and geez that code
has been such a headache from syzbot already so I don't doubt there are
more), having line numbers is really not optional if we want to scale at
all.
If you still have the vmlinux binary from that build (or if you can
rebuild with the same options), running this text through addr2line
should not take you too long.
(You might need to build with at least CONFIG_DEBUG_INFO_REDUCED (or not
reduced), but that is on by default for aarch64)

--
Dominique

2022-12-01 08:27:22

by Naresh Kamboju

[permalink] [raw]
Subject: Re: arm64: allmodconfig: BUG: KCSAN: data-race in p9_client_cb / p9_client_rpc

On Thu, 1 Dec 2022 at 01:35, Dominique Martinet <[email protected]> wrote:
>
> Naresh Kamboju wrote on Wed, Nov 30, 2022 at 09:34:45PM +0530:
> > > > [ 424.418214] write to 0xffff00000a753000 of 4 bytes by interrupt on cpu 0:
> > > > [ 424.422437] p9_client_cb+0x84/0x100
> > >
> > > Then we can look at git blame of the lines and see if it's new code.
> >
> > True.
> > Hope that tree and tag could help you get git details.
>
> Even with the git tag, if we don't build for the same arch with the same
> compiler version/options and the same .config we aren't likely to have
> identical binaries, so we cannot make sense of these offsets without
> much work.
>
> As much as I'd like to investigate a data race in 9p (and geez that code
> has been such a headache from syzbot already so I don't doubt there are
> more), having line numbers is really not optional if we want to scale at
> all.
> If you still have the vmlinux binary from that build (or if you can
> rebuild with the same options), running this text through addr2line
> should not take you too long.

Please find build artifacts in this link,
- config
- vmlinux
- System.map
https://people.linaro.org/~anders.roxell/next-20221130-allmodconfig-arm64-tuxmake-build/

And

# aarch64-linux-gnu-objdump -D vmlinux|less search for p9_client_cb

Attached objdump log and here is the link.
- http://ix.io/4hk1

> (You might need to build with at least CONFIG_DEBUG_INFO_REDUCED (or not
> reduced), but that is on by default for aarch64)

Thanks for the suggestions.
The Kconfig is enabled now.
CONFIG_DEBUG_INFO_REDUCED=y

> --
> Dominique


- Naresh


Attachments:
objdump-p9_client_cb.txt (2.47 kB)

2022-12-03 15:43:45

by Dominique Martinet

[permalink] [raw]
Subject: Re: arm64: allmodconfig: BUG: KCSAN: data-race in p9_client_cb / p9_client_rpc

(reply out of order)

Naresh Kamboju wrote on Thu, Dec 01, 2022 at 01:13:25PM +0530:
> > (You might need to build with at least CONFIG_DEBUG_INFO_REDUCED (or not
> > reduced), but that is on by default for aarch64)
>
> Thanks for the suggestions.
> The Kconfig is enabled now.
> CONFIG_DEBUG_INFO_REDUCED=y

It looks enabled in your the config file you linked at, I don't
understand this remark?
Did you produce the trace the other day without it and rebuild the
kernel with it?
In this case you also have CONFIG_DEBUG_INFO_SPLIT set, so the vmlinux
file does not contain enough informations to retrieve line numbers or
types, and in particular addr2line cannot be used on the files you
provided.
I've never used split debug infos before, but digging old threads I'm
not too hopeful unless that changed:
https://lkml.iu.edu/hypermail/linux/kernel/1711.1/03393.html
https://sourceware.org/bugzilla/show_bug.cgi?id=22434

(...a test build later, it's still mostly useless...
normal build
$ ./scripts/faddr2line vmlinux __schedule+0x314
__schedule+0x314/0x6c0:
perf_fetch_caller_regs at include/linux/perf_event.h:1286
(inlined by) __perf_sw_event_sched at include/linux/perf_event.h:1307
(inlined by) perf_event_task_sched_out at include/linux/perf_event.h:1347
(inlined by) prepare_task_switch at kernel/sched/core.c:5053
(inlined by) context_switch at kernel/sched/core.c:5195
(inlined by) __schedule at kernel/sched/core.c:6561

split dwarf build
$ ./scripts/faddr2line vmlinux __schedule+0x314
aarch64-linux-gnu-addr2line: DWARF error: could not find abbrev number 860923
__schedule+0x314/0x780:
aarch64-linux-gnu-addr2line: DWARF error: could not find abbrev number 860923
__schedule at core.c:?

I'd tend to agree build time/space savings aren't worth the developer
time.
)

Anyway, address sanitizer used to have a kasan_symbolize.py script but
it looks like it got removed as no longer maintained, and I'm not sure
what's a good tool to just run these logs through nowadays, might want
to ask other test projects folks what they use...

> > If you still have the vmlinux binary from that build (or if you can
> > rebuild with the same options), running this text through addr2line
> > should not take you too long.
>
> Please find build artifacts in this link,
> - config
> - vmlinux
> - System.map
> https://people.linaro.org/~anders.roxell/next-20221130-allmodconfig-arm64-tuxmake-build/

So from the disassembly...

- p9_client_cb+0x84 is right before the wake_up and after the wmb(), so
I assume we're on writing req->status line 441:

---
p9_client_cb(...)
{
...
smp_wmb();
req->status = status;

wake_up(&req->wq);
---

report is about a write from 2 to 3, this makes sense we're going from
REQ_STATUS_SENT (2) to REQ_STATUS_RCVD (3).


- p9_client_rpc+0x1d0 isn't as simple to pin down as I'm having a hard
time making sense of the kcsan instrumentations...
The report is talking about a READ of 4 bytes at the same address, so
I'd expect to see an ccess to req->status (and we're likely spot on
wait_event_killable which checks req->status), but this doesn't seem to
match up with the assembly: here's the excerpt from disass around 0x1d0
= 464 (why doesn't gdb provide hex offsets..)
---
0xffff80000a46e9b8 <+440>: cmn w28, #0x200
0xffff80000a46e9bc <+444>: ccmn w28, #0xe, #0x4, ne // ne = any
0xffff80000a46e9c0 <+448>: b.eq 0xffff80000a46ecfc <p9_client_rpc+1276> // b.none
0xffff80000a46e9c4 <+452>: mov x0, x25
0xffff80000a46e9c8 <+456>: bl 0xffff800008543640 <__tsan_write4>
0xffff80000a46e9cc <+460>: mov w0, #0x2 // #2
0xffff80000a46e9d0 <+464>: str w0, [x21, #88]
0xffff80000a46e9d4 <+468>: b 0xffff80000a46ecfc <p9_client_rpc+1276>
0xffff80000a46e9d8 <+472>: mov w27, #0x1 // #1
0xffff80000a46e9dc <+476>: mov x0, x23
0xffff80000a46e9e0 <+480>: mov w1, #0x2bc // #700
0xffff80000a46e9e4 <+484>: bl 0xffff800008192d80 <__might_sleep>
---

+464 is a write to x21 (client 'c', from looking at how it is passed
into x0 for other function calls) at offset 88 (status field according
to dwarf infos from a rebuild with your config/same sources)

So, err, I'm a bit lost on this side.
But I can't really find a problem with what KCSAN complains about --
we are indeed accessing status from two threads without any locks.
Instead of a lock, we're using a barrier so that:
- recv thread/cb: writes to req stuff || write to req status
- p9_client_rpc: reads req status || reads other fields from req

Which has been working well enough (at least, without the barrier things
blow up quite fast).

So can I'll just consider this a false positive, but if someone knows
how much one can read into this that'd be appreciated.


Thanks,
--
Dominique


2022-12-03 17:37:25

by Marco Elver

[permalink] [raw]
Subject: Re: arm64: allmodconfig: BUG: KCSAN: data-race in p9_client_cb / p9_client_rpc

On Sat, 3 Dec 2022 at 16:37, Dominique Martinet <[email protected]> wrote:
>
> (reply out of order)
>
> Naresh Kamboju wrote on Thu, Dec 01, 2022 at 01:13:25PM +0530:
> > > (You might need to build with at least CONFIG_DEBUG_INFO_REDUCED (or not
> > > reduced), but that is on by default for aarch64)
> >
> > Thanks for the suggestions.
> > The Kconfig is enabled now.
> > CONFIG_DEBUG_INFO_REDUCED=y
>
> It looks enabled in your the config file you linked at, I don't
> understand this remark?
> Did you produce the trace the other day without it and rebuild the
> kernel with it?
> In this case you also have CONFIG_DEBUG_INFO_SPLIT set, so the vmlinux
> file does not contain enough informations to retrieve line numbers or
> types, and in particular addr2line cannot be used on the files you
> provided.
> I've never used split debug infos before, but digging old threads I'm
> not too hopeful unless that changed:
> https://lkml.iu.edu/hypermail/linux/kernel/1711.1/03393.html
> https://sourceware.org/bugzilla/show_bug.cgi?id=22434
>
> (...a test build later, it's still mostly useless...
> normal build
> $ ./scripts/faddr2line vmlinux __schedule+0x314
> __schedule+0x314/0x6c0:
> perf_fetch_caller_regs at include/linux/perf_event.h:1286
> (inlined by) __perf_sw_event_sched at include/linux/perf_event.h:1307
> (inlined by) perf_event_task_sched_out at include/linux/perf_event.h:1347
> (inlined by) prepare_task_switch at kernel/sched/core.c:5053
> (inlined by) context_switch at kernel/sched/core.c:5195
> (inlined by) __schedule at kernel/sched/core.c:6561
>
> split dwarf build
> $ ./scripts/faddr2line vmlinux __schedule+0x314
> aarch64-linux-gnu-addr2line: DWARF error: could not find abbrev number 860923
> __schedule+0x314/0x780:
> aarch64-linux-gnu-addr2line: DWARF error: could not find abbrev number 860923
> __schedule at core.c:?
>
> I'd tend to agree build time/space savings aren't worth the developer
> time.
> )
>
> Anyway, address sanitizer used to have a kasan_symbolize.py script but
> it looks like it got removed as no longer maintained, and I'm not sure
> what's a good tool to just run these logs through nowadays, might want
> to ask other test projects folks what they use...
>
> > > If you still have the vmlinux binary from that build (or if you can
> > > rebuild with the same options), running this text through addr2line
> > > should not take you too long.
> >
> > Please find build artifacts in this link,
> > - config
> > - vmlinux
> > - System.map
> > https://people.linaro.org/~anders.roxell/next-20221130-allmodconfig-arm64-tuxmake-build/
>
> So from the disassembly...
>
> - p9_client_cb+0x84 is right before the wake_up and after the wmb(), so
> I assume we're on writing req->status line 441:
>
> ---
> p9_client_cb(...)
> {
> ...
> smp_wmb();
> req->status = status;
>
> wake_up(&req->wq);
> ---
>
> report is about a write from 2 to 3, this makes sense we're going from
> REQ_STATUS_SENT (2) to REQ_STATUS_RCVD (3).
>
>
> - p9_client_rpc+0x1d0 isn't as simple to pin down as I'm having a hard
> time making sense of the kcsan instrumentations...
> The report is talking about a READ of 4 bytes at the same address, so
> I'd expect to see an ccess to req->status (and we're likely spot on
> wait_event_killable which checks req->status), but this doesn't seem to
> match up with the assembly: here's the excerpt from disass around 0x1d0
> = 464 (why doesn't gdb provide hex offsets..)
> ---
> 0xffff80000a46e9b8 <+440>: cmn w28, #0x200
> 0xffff80000a46e9bc <+444>: ccmn w28, #0xe, #0x4, ne // ne = any
> 0xffff80000a46e9c0 <+448>: b.eq 0xffff80000a46ecfc <p9_client_rpc+1276> // b.none
> 0xffff80000a46e9c4 <+452>: mov x0, x25
> 0xffff80000a46e9c8 <+456>: bl 0xffff800008543640 <__tsan_write4>
> 0xffff80000a46e9cc <+460>: mov w0, #0x2 // #2
> 0xffff80000a46e9d0 <+464>: str w0, [x21, #88]
> 0xffff80000a46e9d4 <+468>: b 0xffff80000a46ecfc <p9_client_rpc+1276>
> 0xffff80000a46e9d8 <+472>: mov w27, #0x1 // #1
> 0xffff80000a46e9dc <+476>: mov x0, x23
> 0xffff80000a46e9e0 <+480>: mov w1, #0x2bc // #700
> 0xffff80000a46e9e4 <+484>: bl 0xffff800008192d80 <__might_sleep>
> ---
>
> +464 is a write to x21 (client 'c', from looking at how it is passed
> into x0 for other function calls) at offset 88 (status field according
> to dwarf infos from a rebuild with your config/same sources)
>
> So, err, I'm a bit lost on this side.
> But I can't really find a problem with what KCSAN complains about --
> we are indeed accessing status from two threads without any locks.
> Instead of a lock, we're using a barrier so that:
> - recv thread/cb: writes to req stuff || write to req status
> - p9_client_rpc: reads req status || reads other fields from req
>
> Which has been working well enough (at least, without the barrier things
> blow up quite fast).
>
> So can I'll just consider this a false positive, but if someone knows
> how much one can read into this that'd be appreciated.

The barriers only ensure ordering, but not atomicity of the accesses
themselves (for one, the compiler is well in its right to transform
plain accesses in ways that the concurrent algorithm wasn't designed
for). In this case it looks like it's just missing
READ_ONCE()/WRITE_ONCE().

A (relatively) quick primer on the kernel's memory model and
where/what/how we need to "mark" accesses:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/access-marking.txt

Thanks,
-- Marco

2022-12-03 23:49:49

by Dominique Martinet

[permalink] [raw]
Subject: Re: arm64: allmodconfig: BUG: KCSAN: data-race in p9_client_cb / p9_client_rpc

Marco Elver wrote on Sat, Dec 03, 2022 at 05:46:46PM +0100:
> > But I can't really find a problem with what KCSAN complains about --
> > we are indeed accessing status from two threads without any locks.
> > Instead of a lock, we're using a barrier so that:
> > - recv thread/cb: writes to req stuff || write to req status
> > - p9_client_rpc: reads req status || reads other fields from req
> >
> > Which has been working well enough (at least, without the barrier things
> > blow up quite fast).
> >
> > So can I'll just consider this a false positive, but if someone knows
> > how much one can read into this that'd be appreciated.
>
> The barriers only ensure ordering, but not atomicity of the accesses
> themselves (for one, the compiler is well in its right to transform
> plain accesses in ways that the concurrent algorithm wasn't designed
> for). In this case it looks like it's just missing
> READ_ONCE()/WRITE_ONCE().

Aha! Thanks for this!

I've always believed plain int types accesses are always atomic and the
only thing to watch for would be compilers reordering instrucions, which
would be ensured by the barrier in this case, but I guess there are some
architectures or places where this isn't true?


I'm a bit confused though, I can only see five places where wait_event*
functions use READ_ONCE and I believe they more or less all would
require such a marker -- I guess non-equality checks might be safe
(waiting for a value to change from a known value) but if non-atomic
updates are on the table equality and comparisons checks all would need
to be decorated with READ_ONCE; afaiu, unlike usespace loops with
pthread_cond_wait there is nothing protecting the condition itself.

Should I just update the wrapped condition, as below?

- err = wait_event_killable(req->wq, req->status >= REQ_STATUS_RCVD);
+ err = wait_event_killable(req->wq,
+ READ_ONCE(req->status) >= REQ_STATUS_RCVD);

The writes all are straightforward, there's all the error paths to
convert to WRITE_ONCE too but that's not difficult (leaving only the
init without such a marker); I'll send a patch when you've confirmed the
read looks good.
(the other reads are a bit less obvious as some are protected by a lock
in trans_fd, which should cover all cases of possible concurrent updates
there as far as I can see, but this mixed model is definitely hard to
reason with... Well, that's how it was written and I won't ever have time
to rewrite any of this. Enough ranting.)


> A (relatively) quick primer on the kernel's memory model and
> where/what/how we need to "mark" accesses:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/access-marking.txt

I read Documentation/memory-barriers.txt ages ago but wasn't aware of
this memory-model directory; I've skimmed through and will have a proper
read as time permits.

Thank you,
--
Dominique

2022-12-05 07:21:46

by Marco Elver

[permalink] [raw]
Subject: Re: arm64: allmodconfig: BUG: KCSAN: data-race in p9_client_cb / p9_client_rpc

On Sun, 4 Dec 2022 at 00:08, Dominique Martinet <[email protected]> wrote:
>
> Marco Elver wrote on Sat, Dec 03, 2022 at 05:46:46PM +0100:
> > > But I can't really find a problem with what KCSAN complains about --
> > > we are indeed accessing status from two threads without any locks.
> > > Instead of a lock, we're using a barrier so that:
> > > - recv thread/cb: writes to req stuff || write to req status
> > > - p9_client_rpc: reads req status || reads other fields from req
> > >
> > > Which has been working well enough (at least, without the barrier things
> > > blow up quite fast).
> > >
> > > So can I'll just consider this a false positive, but if someone knows
> > > how much one can read into this that'd be appreciated.
> >
> > The barriers only ensure ordering, but not atomicity of the accesses
> > themselves (for one, the compiler is well in its right to transform
> > plain accesses in ways that the concurrent algorithm wasn't designed
> > for). In this case it looks like it's just missing
> > READ_ONCE()/WRITE_ONCE().
>
> Aha! Thanks for this!
>
> I've always believed plain int types accesses are always atomic and the
> only thing to watch for would be compilers reordering instrucions, which
> would be ensured by the barrier in this case, but I guess there are some
> architectures or places where this isn't true?
>
>
> I'm a bit confused though, I can only see five places where wait_event*
> functions use READ_ONCE and I believe they more or less all would
> require such a marker -- I guess non-equality checks might be safe
> (waiting for a value to change from a known value) but if non-atomic
> updates are on the table equality and comparisons checks all would need
> to be decorated with READ_ONCE; afaiu, unlike usespace loops with
> pthread_cond_wait there is nothing protecting the condition itself.
>
> Should I just update the wrapped condition, as below?
>
> - err = wait_event_killable(req->wq, req->status >= REQ_STATUS_RCVD);
> + err = wait_event_killable(req->wq,
> + READ_ONCE(req->status) >= REQ_STATUS_RCVD);

Yes, this looks good!

> The writes all are straightforward, there's all the error paths to
> convert to WRITE_ONCE too but that's not difficult (leaving only the
> init without such a marker); I'll send a patch when you've confirmed the
> read looks good.
> (the other reads are a bit less obvious as some are protected by a lock
> in trans_fd, which should cover all cases of possible concurrent updates
> there as far as I can see, but this mixed model is definitely hard to
> reason with... Well, that's how it was written and I won't ever have time
> to rewrite any of this. Enough ranting.)

If the lock-protected accesses indeed are non-racy, they should be
left unmarked. If some assumption here turns out to be wrong, KCSAN
would (hopefully) tell us one way or another.

Thanks!

-- Marco

2022-12-05 07:30:33

by Dominique Martinet

[permalink] [raw]
Subject: Re: arm64: allmodconfig: BUG: KCSAN: data-race in p9_client_cb / p9_client_rpc

Marco Elver wrote on Mon, Dec 05, 2022 at 08:00:00AM +0100:
> > Should I just update the wrapped condition, as below?
> >
> > - err = wait_event_killable(req->wq, req->status >= REQ_STATUS_RCVD);
> > + err = wait_event_killable(req->wq,
> > + READ_ONCE(req->status) >= REQ_STATUS_RCVD);
>
> Yes, this looks good!
>
> > The writes all are straightforward, there's all the error paths to
> > convert to WRITE_ONCE too but that's not difficult (leaving only the
> > init without such a marker); I'll send a patch when you've confirmed the
> > read looks good.
> > (the other reads are a bit less obvious as some are protected by a lock
> > in trans_fd, which should cover all cases of possible concurrent updates
> > there as far as I can see, but this mixed model is definitely hard to
> > reason with... Well, that's how it was written and I won't ever have time
> > to rewrite any of this. Enough ranting.)
>
> If the lock-protected accesses indeed are non-racy, they should be
> left unmarked. If some assumption here turns out to be wrong, KCSAN
> would (hopefully) tell us one way or another.

Great, that makes sense.

I've left the commit at home, will submit it tonight -- you and Naresh
will be in Cc from suggested/reported-by tags.

--
Dominique