2024-04-16 06:27:47

by Oliver Sang

[permalink] [raw]
Subject: [tip:timers/core] [timekeeping] e84f43e34f: BUG:KCSAN:data-race_in_timekeeping_advance/timekeeping_debug_get_ns



Hello,

kernel test robot noticed "BUG:KCSAN:data-race_in_timekeeping_advance/timekeeping_debug_get_ns" on:

commit: e84f43e34faf85816587f80594541ec978449d6e ("timekeeping: Consolidate timekeeping helpers")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git timers/core

[test failed on linux-next/master 9ed46da14b9b9b2ad4edb3b0c545b6dbe5c00d39]

in testcase: boot

compiler: gcc-13
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


we noticed this issue doesn't always happen on this commit (63 times out of
111 runs as below), however, parent keeps clean for this issue, but has other
KCSAN:data-race issues which does not happen on e84f43e34f.

e8e9d21a5df655a6 e84f43e34faf85816587f805945
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
16:60 -27% :111 dmesg.BUG:KCSAN:data-race_in_ktime_get/timekeeping_advance
7:60 -12% :111 dmesg.BUG:KCSAN:data-race_in_ktime_get_update_offsets_now/timekeeping_advance
:60 105% 63:111 dmesg.BUG:KCSAN:data-race_in_timekeeping_advance/timekeeping_debug_get_ns



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


[ 108.068027][ C1] BUG: KCSAN: data-race in timekeeping_advance / timekeeping_debug_get_ns
[ 108.069188][ C1]
[ 108.069511][ C1] write to 0xffffffff95672dc0 of 296 bytes by interrupt on cpu 0:
[ 108.070559][ C1] timekeeping_advance (kernel/time/timekeeping.c:2207 (discriminator 1))
[ 108.071242][ C1] update_wall_time (kernel/time/timekeeping.c:2222 (discriminator 1))
[ 108.071883][ C1] tick_do_update_jiffies64 (kernel/time/tick-sched.c:149)
[ 108.072638][ C1] tick_nohz_handler (kernel/time/tick-sched.c:229 kernel/time/tick-sched.c:287)
[ 108.073235][ C1] __hrtimer_run_queues (kernel/time/hrtimer.c:1692 kernel/time/hrtimer.c:1756)
[ 108.073840][ C1] hrtimer_interrupt (kernel/time/hrtimer.c:1821)
[ 108.074430][ C1] __sysvec_apic_timer_interrupt (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/kernel/apic/apic.c:1050)
[ 108.075132][ C1] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1043 (discriminator 2) arch/x86/kernel/apic/apic.c:1043 (discriminator 2))
[ 108.075821][ C1] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:702)
[ 108.076657][ C1]
[ 108.076982][ C1] read to 0xffffffff95672de0 of 8 bytes by interrupt on cpu 1:
[ 108.077994][ C1] timekeeping_debug_get_ns (kernel/time/timekeeping.c:373 kernel/time/timekeeping.c:383 kernel/time/timekeeping.c:280)
[ 108.078766][ C1] ktime_get (kernel/time/timekeeping.c:394 kernel/time/timekeeping.c:838)
[ 108.079325][ C1] tick_nohz_handler (kernel/time/tick-sched.c:220 kernel/time/tick-sched.c:287)
[ 108.079995][ C1] __hrtimer_run_queues (kernel/time/hrtimer.c:1692 kernel/time/hrtimer.c:1756)
[ 108.080740][ C1] hrtimer_interrupt (kernel/time/hrtimer.c:1821)
[ 108.081423][ C1] __sysvec_apic_timer_interrupt (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/kernel/apic/apic.c:1050)
[ 108.082241][ C1] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1043 (discriminator 2) arch/x86/kernel/apic/apic.c:1043 (discriminator 2))
[ 108.083014][ C1] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:702)
[ 108.083849][ C1]
[ 108.084183][ C1] value changed: 0x000e771a64000000 -> 0x000e959ee4000000
[ 108.085518][ C1]
[ 108.085775][ C1] Reported by Kernel Concurrency Sanitizer on:
[ 108.086563][ C1] CPU: 1 PID: 265 Comm: sed Tainted: G W E N 6.9.0-rc3-00015-ge84f43e34faf #1 ddd7212d5d239f10e5f20ca1605d3d23d4ce80eb
[ 108.088374][ C1] ==================================================================



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240416/[email protected]



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



2024-04-16 10:18:48

by Adrian Hunter

[permalink] [raw]
Subject: Re: [tip:timers/core] [timekeeping] e84f43e34f: BUG:KCSAN:data-race_in_timekeeping_advance/timekeeping_debug_get_ns

On 16/04/24 09:27, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed "BUG:KCSAN:data-race_in_timekeeping_advance/timekeeping_debug_get_ns" on:
>
> commit: e84f43e34faf85816587f80594541ec978449d6e ("timekeeping: Consolidate timekeeping helpers")
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git timers/core
>
> [test failed on linux-next/master 9ed46da14b9b9b2ad4edb3b0c545b6dbe5c00d39]
>
> in testcase: boot
>
> compiler: gcc-13
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
> we noticed this issue doesn't always happen on this commit (63 times out of
> 111 runs as below), however, parent keeps clean for this issue, but has other
> KCSAN:data-race issues which does not happen on e84f43e34f.
>
> e8e9d21a5df655a6 e84f43e34faf85816587f805945
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> 16:60 -27% :111 dmesg.BUG:KCSAN:data-race_in_ktime_get/timekeeping_advance
> 7:60 -12% :111 dmesg.BUG:KCSAN:data-race_in_ktime_get_update_offsets_now/timekeeping_advance
> :60 105% 63:111 dmesg.BUG:KCSAN:data-race_in_timekeeping_advance/timekeeping_debug_get_ns
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <[email protected]>
> | Closes: https://lore.kernel.org/oe-lkp/[email protected]
>
>
> [ 108.068027][ C1] BUG: KCSAN: data-race in timekeeping_advance / timekeeping_debug_get_ns
> [ 108.069188][ C1]
> [ 108.069511][ C1] write to 0xffffffff95672dc0 of 296 bytes by interrupt on cpu 0:
> [ 108.070559][ C1] timekeeping_advance (kernel/time/timekeeping.c:2207 (discriminator 1))
> [ 108.071242][ C1] update_wall_time (kernel/time/timekeeping.c:2222 (discriminator 1))
> [ 108.071883][ C1] tick_do_update_jiffies64 (kernel/time/tick-sched.c:149)
> [ 108.072638][ C1] tick_nohz_handler (kernel/time/tick-sched.c:229 kernel/time/tick-sched.c:287)
> [ 108.073235][ C1] __hrtimer_run_queues (kernel/time/hrtimer.c:1692 kernel/time/hrtimer.c:1756)
> [ 108.073840][ C1] hrtimer_interrupt (kernel/time/hrtimer.c:1821)
> [ 108.074430][ C1] __sysvec_apic_timer_interrupt (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/kernel/apic/apic.c:1050)
> [ 108.075132][ C1] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1043 (discriminator 2) arch/x86/kernel/apic/apic.c:1043 (discriminator 2))
> [ 108.075821][ C1] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:702)
> [ 108.076657][ C1]
> [ 108.076982][ C1] read to 0xffffffff95672de0 of 8 bytes by interrupt on cpu 1:
> [ 108.077994][ C1] timekeeping_debug_get_ns (kernel/time/timekeeping.c:373 kernel/time/timekeeping.c:383 kernel/time/timekeeping.c:280)

Looks like the nested seqlock in timekeeping_debug_get_ns()
results in premature kcsan_atomic_next(0), so the subsequent
access via timekeeping_cycles_to_ns(), although still under
seqlock, does not look that way to KCSAN.

> [ 108.078766][ C1] ktime_get (kernel/time/timekeeping.c:394 kernel/time/timekeeping.c:838)
> [ 108.079325][ C1] tick_nohz_handler (kernel/time/tick-sched.c:220 kernel/time/tick-sched.c:287)
> [ 108.079995][ C1] __hrtimer_run_queues (kernel/time/hrtimer.c:1692 kernel/time/hrtimer.c:1756)
> [ 108.080740][ C1] hrtimer_interrupt (kernel/time/hrtimer.c:1821)
> [ 108.081423][ C1] __sysvec_apic_timer_interrupt (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/kernel/apic/apic.c:1050)
> [ 108.082241][ C1] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1043 (discriminator 2) arch/x86/kernel/apic/apic.c:1043 (discriminator 2))
> [ 108.083014][ C1] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:702)
> [ 108.083849][ C1]
> [ 108.084183][ C1] value changed: 0x000e771a64000000 -> 0x000e959ee4000000
> [ 108.085518][ C1]
> [ 108.085775][ C1] Reported by Kernel Concurrency Sanitizer on:
> [ 108.086563][ C1] CPU: 1 PID: 265 Comm: sed Tainted: G W E N 6.9.0-rc3-00015-ge84f43e34faf #1 ddd7212d5d239f10e5f20ca1605d3d23d4ce80eb
> [ 108.088374][ C1] ==================================================================
>
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240416/[email protected]
>
>
>