2024-05-31 05:53:10

by kernel test robot

[permalink] [raw]
Subject: [linus:master] [shmem,percpu_counter] beb9868628: BUG:KCSAN:data-race_in__percpu_counter_limited_add/__percpu_counter_limited_add



Hello,

kernel test robot noticed "BUG:KCSAN:data-race_in__percpu_counter_limited_add/__percpu_counter_limited_add" on:

commit: beb9868628445306958fd7b2da1cd369a4a381cc ("shmem,percpu_counter: add _limited_add(fbc, limit, amount)")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master 2bfcfd584ff5ccc8bb7acde19b42570414bf880b]
[test failed on linux-next/master 6dc544b66971c7f9909ff038b62149105272d26a]

in testcase: trinity
version:
with following parameters:

runtime: 300s
group: group-01
nr_groups: 5



compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


in our tests,
dmesg.BUG:KCSAN:data-race_in__percpu_counter_limited_add/__percpu_counter_limited_add
does not always happen, 88 times out of 150 runs as below.


3022fd7af9604d44 beb9868628445306958fd7b2da1
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:60 147% 88:150 dmesg.BUG:KCSAN:data-race_in__percpu_counter_limited_add/__percpu_counter_limited_add



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]



[ 301.127451][ T4651] ==================================================================
[ 301.128485][ T4651] BUG: KCSAN: data-race in __percpu_counter_limited_add / __percpu_counter_limited_add
[ 301.129715][ T4651]
[ 301.130014][ T4651] write to 0xffff88810ffc3450 of 8 bytes by task 4674 on cpu 0:
[ 301.130981][ T4651] __percpu_counter_limited_add (lib/percpu_counter.c:325)
[ 301.131692][ T4651] shmem_inode_acct_blocks (mm/shmem.c:220)
[ 301.132319][ T4651] shmem_get_folio_gfp+0x3a9/0x750
[ 301.133054][ T4651] shmem_write_begin (mm/shmem.c:2135 mm/shmem.c:2692)
[ 301.133690][ T4651] generic_perform_write (mm/filemap.c:3962)
[ 301.134347][ T4651] shmem_file_write_iter (mm/shmem.c:2868)
[ 301.134956][ T4651] do_iter_readv_writev (include/linux/fs.h:1956 fs/read_write.c:735)
[ 301.135628][ T4651] do_iter_write (fs/read_write.c:860)
[ 301.136178][ T4651] vfs_writev (fs/read_write.c:933)
[ 301.136716][ T4651] __x64_sys_pwritev2 (fs/read_write.c:1030 fs/read_write.c:1089 fs/read_write.c:1080 fs/read_write.c:1080)
[ 301.139719][ T4651] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[ 301.140316][ T4651] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[ 301.141129][ T4651]
[ 301.141465][ T4651] read to 0xffff88810ffc3450 of 8 bytes by task 4651 on cpu 1:
[ 301.142436][ T4651] __percpu_counter_limited_add (lib/percpu_counter.c:302 (discriminator 1))
[ 301.143222][ T4651] shmem_inode_acct_blocks (mm/shmem.c:220)
[ 301.143931][ T4651] shmem_get_folio_gfp+0x3a9/0x750
[ 301.144696][ T4651] shmem_write_begin (mm/shmem.c:2135 mm/shmem.c:2692)
[ 301.145370][ T4651] generic_perform_write (mm/filemap.c:3962)
[ 301.146112][ T4651] shmem_file_write_iter (mm/shmem.c:2868)
[ 301.146697][ T4651] do_iter_readv_writev (include/linux/fs.h:1956 fs/read_write.c:735)
[ 301.147330][ T4651] do_iter_write (fs/read_write.c:860)
[ 301.147918][ T4651] vfs_writev (fs/read_write.c:933)
[ 301.148480][ T4651] __x64_sys_pwritev2 (fs/read_write.c:1030 fs/read_write.c:1089 fs/read_write.c:1080 fs/read_write.c:1080)
[ 301.149047][ T4651] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[ 301.149629][ T4651] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[ 301.150358][ T4651]
[ 301.150643][ T4651] value changed: 0x0000000000059e35 -> 0x0000000000059e56
[ 301.151483][ T4651]
[ 301.151778][ T4651] Reported by Kernel Concurrency Sanitizer on:
[ 301.152559][ T4651] CPU: 1 PID: 4651 Comm: trinity-c7 Not tainted 6.6.0-rc4-00235-gbeb986862844 #1
[ 301.153676][ T4651] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 301.154962][ T4651] ==================================================================

..

[ 341.345671][ T4958] ==================================================================
[ 341.346844][ T4958] BUG: KCSAN: data-race in __percpu_counter_limited_add / __percpu_counter_limited_add
[ 341.348131][ T4958]
[ 341.348468][ T4958] write to 0xffff88810ffc3450 of 8 bytes by task 4967 on cpu 0:
[ 341.349472][ T4958] __percpu_counter_limited_add (lib/percpu_counter.c:325)
[ 341.350219][ T4958] shmem_inode_acct_blocks (mm/shmem.c:220)
[ 341.350902][ T4958] shmem_get_folio_gfp+0x3a9/0x750
[ 341.351630][ T4958] shmem_write_begin (mm/shmem.c:2135 mm/shmem.c:2692)
[ 341.352270][ T4958] generic_perform_write (mm/filemap.c:3962)
[ 341.352893][ T4958] shmem_file_write_iter (mm/shmem.c:2868)
[ 341.353521][ T4958] do_iter_readv_writev (include/linux/fs.h:1956 fs/read_write.c:735)
[ 341.354170][ T4958] do_iter_write (fs/read_write.c:860)
[ 341.354738][ T4958] vfs_writev (fs/read_write.c:933)
[ 341.355289][ T4958] __x64_sys_pwritev2 (fs/read_write.c:1030 fs/read_write.c:1089 fs/read_write.c:1080 fs/read_write.c:1080)
[ 341.355926][ T4958] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[ 341.356511][ T4958] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[ 341.357277][ T4958]
[ 341.357592][ T4958] read to 0xffff88810ffc3450 of 8 bytes by task 4958 on cpu 1:
[ 341.358556][ T4958] __percpu_counter_limited_add (lib/percpu_counter.c:302 (discriminator 1))
[ 341.359312][ T4958] shmem_inode_acct_blocks (mm/shmem.c:220)
[ 341.360021][ T4958] shmem_get_folio_gfp+0x3a9/0x750
[ 341.360772][ T4958] shmem_write_begin (mm/shmem.c:2135 mm/shmem.c:2692)
[ 341.361417][ T4958] generic_perform_write (mm/filemap.c:3962)
[ 341.362106][ T4958] shmem_file_write_iter (mm/shmem.c:2868)
[ 341.362751][ T4958] do_iter_readv_writev (include/linux/fs.h:1956 fs/read_write.c:735)
[ 341.363443][ T4958] do_iter_write (fs/read_write.c:860)
[ 341.364025][ T4958] vfs_writev (fs/read_write.c:933)
[ 341.364585][ T4958] __x64_sys_pwritev2 (fs/read_write.c:1030 fs/read_write.c:1089 fs/read_write.c:1080 fs/read_write.c:1080)
[ 341.365226][ T4958] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[ 341.365824][ T4958] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[ 341.366587][ T4958]
[ 341.366921][ T4958] value changed: 0x000000000008c0e8 -> 0x000000000008c109
[ 341.367883][ T4958]
[ 341.368191][ T4958] Reported by Kernel Concurrency Sanitizer on:
[ 341.369014][ T4958] CPU: 1 PID: 4958 Comm: trinity-c5 Not tainted 6.6.0-rc4-00235-gbeb986862844 #1
[ 341.370255][ T4958] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 341.371512][ T4958] ==================================================================



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240531/[email protected]



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



2024-05-31 10:29:47

by Mateusz Guzik

[permalink] [raw]
Subject: Re: [linus:master] [shmem,percpu_counter] beb9868628: BUG:KCSAN:data-race_in__percpu_counter_limited_add/__percpu_counter_limited_add

On Fri, May 31, 2024 at 01:52:40PM +0800, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed "BUG:KCSAN:data-race_in__percpu_counter_limited_add/__percpu_counter_limited_add" on:
>
> commit: beb9868628445306958fd7b2da1cd369a4a381cc ("shmem,percpu_counter: add _limited_add(fbc, limit, amount)")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linus/master 2bfcfd584ff5ccc8bb7acde19b42570414bf880b]
> [test failed on linux-next/master 6dc544b66971c7f9909ff038b62149105272d26a]
>
> in testcase: trinity
> version:
> with following parameters:
>
> runtime: 300s
> group: group-01
> nr_groups: 5
>
>
>
> compiler: gcc-12
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
> in our tests,
> dmesg.BUG:KCSAN:data-race_in__percpu_counter_limited_add/__percpu_counter_limited_add
> does not always happen, 88 times out of 150 runs as below.
>
>
> 3022fd7af9604d44 beb9868628445306958fd7b2da1
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :60 147% 88:150 dmesg.BUG:KCSAN:data-race_in__percpu_counter_limited_add/__percpu_counter_limited_add
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <[email protected]>
> | Closes: https://lore.kernel.org/oe-lkp/[email protected]
>
>
>
> [ 301.127451][ T4651] ==================================================================
> [ 301.128485][ T4651] BUG: KCSAN: data-race in __percpu_counter_limited_add / __percpu_counter_limited_add
> [ 301.129715][ T4651]
> [ 301.130014][ T4651] write to 0xffff88810ffc3450 of 8 bytes by task 4674 on cpu 0:
> [ 301.130981][ T4651] __percpu_counter_limited_add (lib/percpu_counter.c:325)
> [ 301.131692][ T4651] shmem_inode_acct_blocks (mm/shmem.c:220)
> [ 301.132319][ T4651] shmem_get_folio_gfp+0x3a9/0x750
> [ 301.133054][ T4651] shmem_write_begin (mm/shmem.c:2135 mm/shmem.c:2692)
> [ 301.133690][ T4651] generic_perform_write (mm/filemap.c:3962)
> [ 301.134347][ T4651] shmem_file_write_iter (mm/shmem.c:2868)
> [ 301.134956][ T4651] do_iter_readv_writev (include/linux/fs.h:1956 fs/read_write.c:735)
> [ 301.135628][ T4651] do_iter_write (fs/read_write.c:860)
> [ 301.136178][ T4651] vfs_writev (fs/read_write.c:933)
> [ 301.136716][ T4651] __x64_sys_pwritev2 (fs/read_write.c:1030 fs/read_write.c:1089 fs/read_write.c:1080 fs/read_write.c:1080)
> [ 301.139719][ T4651] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
> [ 301.140316][ T4651] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
> [ 301.141129][ T4651]
> [ 301.141465][ T4651] read to 0xffff88810ffc3450 of 8 bytes by task 4651 on cpu 1:
> [ 301.142436][ T4651] __percpu_counter_limited_add (lib/percpu_counter.c:302 (discriminator 1))
> [ 301.143222][ T4651] shmem_inode_acct_blocks (mm/shmem.c:220)
> [ 301.143931][ T4651] shmem_get_folio_gfp+0x3a9/0x750
> [ 301.144696][ T4651] shmem_write_begin (mm/shmem.c:2135 mm/shmem.c:2692)
> [ 301.145370][ T4651] generic_perform_write (mm/filemap.c:3962)
> [ 301.146112][ T4651] shmem_file_write_iter (mm/shmem.c:2868)
> [ 301.146697][ T4651] do_iter_readv_writev (include/linux/fs.h:1956 fs/read_write.c:735)
> [ 301.147330][ T4651] do_iter_write (fs/read_write.c:860)
> [ 301.147918][ T4651] vfs_writev (fs/read_write.c:933)
> [ 301.148480][ T4651] __x64_sys_pwritev2 (fs/read_write.c:1030 fs/read_write.c:1089 fs/read_write.c:1080 fs/read_write.c:1080)
> [ 301.149047][ T4651] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
> [ 301.149629][ T4651] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
> [ 301.150358][ T4651]
> [ 301.150643][ T4651] value changed: 0x0000000000059e35 -> 0x0000000000059e56
> [ 301.151483][ T4651]
> [ 301.151778][ T4651] Reported by Kernel Concurrency Sanitizer on:
> [ 301.152559][ T4651] CPU: 1 PID: 4651 Comm: trinity-c7 Not tainted 6.6.0-rc4-00235-gbeb986862844 #1
> [ 301.153676][ T4651] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 301.154962][ T4651] ==================================================================
>

This is the bug I described in my mail:
https://lore.kernel.org/linux-mm/5eemkb4lo5eefp7ijgncgogwmadyzmvjfjmmmvfiki6cwdskfs@hi2z4drqeuz6/

write side:
if (good) {
count = __this_cpu_read(*fbc->counters);
fbc->count += count + amount; <---- line 325
__this_cpu_sub(*fbc->counters, count);
}


read side:
/* Skip taking the lock when safe */
if (abs(count + amount) <= batch &&
fbc->count + unknown <= limit) { <---- line 302
this_cpu_add(*fbc->counters, amount);
local_irq_restore(flags);
return true;
}


As described in the e-mail I don't believe the existence of the routine
is warranted to begin with.

>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240531/[email protected]
>
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>