LinuxLists.cc - UML time-travel warning from __run

2022-03-30 14:09:44

Subject: UML time-travel warning from __run_timers

Hello Johannes,

As requested in the roadtest thread[0], here is some information about
reproducing the warnings I'm seeing from the timekeeping code with time
travel in UML, after about 10-20 wall time seconds of idling.

[0] https://lore.kernel.org/lkml/[email protected]/

The warning is the WARN_ON_ONCE(!levels && !base->next_expiry_recalc) in
__run_timers(). The log and the config generated with savedefconfig are
below. The full .config is attached.

The kernel version is v5.17, but the splats are seen on older kernels
and on latest mainline also.

Thanks,
Vincent

$ ./vmlinux mem=64M rootfstype=hostfs rw init=/bin/sh time-travel
[ 0.000000][ T0] Linux version 5.17.0 (user@server) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #4 Wed Mar 30 12:50:22 CEST 2022
[ 0.000000][ T0] Zone ranges:
[ 0.000000][ T0] Normal [mem 0x0000000000000000-0x0000000065e5dfff]
[ 0.000000][ T0] Movable zone start for each node
[ 0.000000][ T0] Early memory node ranges
[ 0.000000][ T0] node 0: [mem 0x0000000000000000-0x0000000005e5dfff]
[ 0.000000][ T0] Initmem setup node 0 [mem 0x0000000000000000-0x0000000005e5dfff]
[ 0.000000][ T0] Built 1 zonelists, mobility grouping on. Total pages: 23827
[ 0.000000][ T0] Kernel command line: mem=64M rootfstype=hostfs rw init=/bin/sh time-travel root=98:0 console=tty
[ 0.000000][ T0] Unknown kernel command line parameters "mem=64M", will be passed to user space.
[ 0.000000][ T0] Dentry cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
[ 0.000000][ T0] Inode-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[ 0.000000][ T0] mem auto-init: stack:off, heap alloc:off, heap free:off
[ 0.000000][ T0] Memory: 44216K/96632K available (4153K kernel code, 1452K rwdata, 1784K rodata, 220K init, 12103K bss, 52416K reserved, 0K cma-reserved)
[ 0.000000][ T0] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000][ T0] Running RCU self tests
[ 0.000000][ T0] NR_IRQS: 128
[ 0.000000][ T0] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[ 0.000000][ T0] ... MAX_LOCKDEP_SUBCLASSES: 8
[ 0.000000][ T0] ... MAX_LOCK_DEPTH: 48
[ 0.000000][ T0] ... MAX_LOCKDEP_KEYS: 8192
[ 0.000000][ T0] ... CLASSHASH_SIZE: 4096
[ 0.000000][ T0] ... MAX_LOCKDEP_ENTRIES: 32768
[ 0.000000][ T0] ... MAX_LOCKDEP_CHAINS: 65536
[ 0.000000][ T0] ... CHAINHASH_SIZE: 32768
[ 0.000000][ T0] memory used by lock dependency info: 6365 kB
[ 0.000000][ T0] memory used for stack traces: 4224 kB
[ 0.000000][ T0] per task-struct memory footprint: 1920 bytes
[ 0.000000][ T0] clocksource: time-travel-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e205, max_idle_ns: 881590404426 ns
[ 0.000000][ T0] Calibrating delay loop... 4434.32 BogoMIPS (lpj=22171648)
[ 0.070000][ T0] pid_max: default: 32768 minimum: 301
[ 0.070000][ T0] Mount-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
[ 0.070000][ T0] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
[ 0.070000][ T0] Checking that host ptys support output SIGIO...Yes
[ 0.070000][ T1] devtmpfs: initialized
[ 0.070000][ T1] random: get_random_bytes called from 0x6001a9d3 with crng_init=0
[ 0.070000][ T1] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[ 0.070000][ T1] futex hash table entries: 256 (order: 2, 22528 bytes, linear)
[ 0.070000][ T1] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[ 0.070000][ T1] vgaarb: loaded
[ 0.070000][ T1] pps_core: LinuxPPS API ver. 1 registered
[ 0.070000][ T1] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <[email protected]>
[ 0.070000][ T1] PTP clock support registered
[ 0.070000][ T1] clocksource: Switched to clocksource time-travel-clock
[ 0.100000][ T1] VFS: Disk quotas dquot_6.6.0
[ 0.100000][ T1] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 0.100000][ T1] NET: Registered PF_UNIX/PF_LOCAL protocol family
[ 0.100000][ T1] PCI: CLS 0 bytes, default 64
[ 0.100000][ T1] printk: console [stderr0] disabled
[ 0.100000][ T1] mconsole (version 2) initialized on /home/user/.uml/soRICy/mconsole
[ 0.100000][ T1] Checking host MADV_REMOVE support...OK
[ 0.100000][ T1] PCI host bridge to bus 0000:00
[ 0.100000][ T1] pci_bus 0000:00: root bus resource [mem 0xf0000000-0xffffffff]
[ 0.100000][ T1] workingset: timestamp_bits=62 max_order=14 bucket_order=0
[ 0.100000][ T1] jitterentropy: Initialization failed with host not compliant with requirements: 10
[ 0.100000][ T1] io scheduler mq-deadline registered
[ 0.100000][ T1] io scheduler kyber registered
[ 0.100000][ T1] i2c_dev: i2c /dev entries driver
[ 0.100000][ T1] Initialized stdio console driver
[ 0.100000][ T1] Console initialized on /dev/tty0
[ 0.100000][ T1] printk: console [tty0] enabled
[ 0.100000][ T1] Initializing software serial port version 1
[ 0.100000][ T1] printk: console [mc-1] enabled
[ 0.100000][ T1] VFS: Mounted root (hostfs filesystem) on device 0:14.
[ 0.100000][ T1] devtmpfs: mounted
[ 0.100000][ T1] This architecture does not have kernel memory protection.
[ 0.100000][ T1] Run /bin/sh as init process
sh: cannot set terminal process group (-1): Inappropriate ioctl for device
sh: no job control in this shell
[?2004hsh-5.1# [ 1031.280000][ C0] random: fast init done
[ 1911.750000][ C0] random: crng init done
[10737482.720000][ C0] ------------[ cut here ]------------
[10737482.720000][ C0] WARNING: CPU: 0 PID: 0 at kernel/time/timer.c:1729 __run_timers+0x36d/0x380
[10737482.720000][ C0] Modules linked in:
[10737482.720000][ C0] CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0 #4
[10737482.720000][ C0] Stack:
[10737482.720000][ C0] 603f5639 00000100 00000000 00000000
[10737482.720000][ C0] 603fabe8 600b1fcd 000006c1 603f550a
[10737482.720000][ C0] 160643fc0 9604234c7 60614ec0 604234c7
[10737482.720000][ C0] Call Trace:
[10737482.720000][ C0] [<603f5639>] ? warn_slowpath_fmt+0x0/0xda
[10737482.720000][ C0] [<603fabe8>] ? _printk+0x0/0x9b
[10737482.720000][ C0] [<600b1fcd>] ? __run_timers+0x36d/0x380
[10737482.720000][ C0] [<603f550a>] ? __warn.cold+0x1f/0x14e
[10737482.720000][ C0] [<604234c7>] ? lock_is_held_type+0x157/0x1e0
[10737482.720000][ C0] [<603f5705>] ? warn_slowpath_fmt+0xcc/0xda
[10737482.720000][ C0] [<60092189>] ? lock_acquire+0x209/0x340
[10737482.720000][ C0] [<600b1c94>] ? __run_timers+0x34/0x380
[10737482.720000][ C0] [<603f5639>] ? warn_slowpath_fmt+0x0/0xda
[10737482.720000][ C0] [<600b1d85>] ? __run_timers+0x125/0x380
[10737482.720000][ C0] [<600b19a0>] ? call_timer_fn+0x0/0x2c0
[10737482.720000][ C0] [<6042d8dc>] ? _raw_spin_lock_irq+0x4c/0x80
[10737482.720000][ C0] [<600b1c94>] ? __run_timers+0x34/0x380
[10737482.720000][ C0] [<600b1fcd>] ? __run_timers+0x36d/0x380
[10737482.720000][ C0] [<604234c7>] ? lock_is_held_type+0x157/0x1e0
[10737482.720000][ C0] [<600247d0>] ? um_trace_signals_on+0x0/0x20
[10737482.720000][ C0] [<600b1c60>] ? __run_timers+0x0/0x380
[10737482.720000][ C0] [<600aa850>] ? rcu_read_lock_sched_held+0x0/0x40
[10737482.720000][ C0] [<6042e4d5>] ? __do_softirq+0x115/0x576
[10737482.720000][ C0] [<60091cbb>] ? lock_release+0x10b/0x3d0
[10737482.720000][ C0] [<600247d0>] ? um_trace_signals_on+0x0/0x20
[10737482.720000][ C0] [<60036560>] ? sig_handler_common+0x0/0xd0
[10737482.720000][ C0] [<60047585>] ? __irq_exit_rcu+0x175/0x1a0
[10737482.720000][ C0] [<6009ebe1>] ? generic_handle_irq+0x41/0x60
[10737482.720000][ C0] [<60047770>] ? irq_exit+0x10/0x30
[10737482.720000][ C0] [<60022adc>] ? do_IRQ+0x3c/0x50
[10737482.720000][ C0] [<60036e20>] ? um_set_signals+0x0/0x50
[10737482.720000][ C0] [<60025c57>] ? timer_handler+0x47/0x120
[10737482.720000][ C0] [<60036669>] ? timer_real_alarm_handler+0x39/0x70
[10737482.720000][ C0] [<600247f0>] ? um_trace_signals_off+0x0/0x20
[10737482.720000][ C0] [<60036e0e>] ? unblock_signals+0x10e/0x120
[10737482.720000][ C0] [<600aa850>] ? rcu_read_lock_sched_held+0x0/0x40
[10737482.720000][ C0] [<60422da0>] ? lockdep_hardirqs_on+0x0/0x1c0
[10737482.720000][ C0] [<60079960>] ? arch_cpu_idle_enter+0x0/0x10
[10737482.720000][ C0] [<60036ce0>] ? block_signals+0x0/0x20
[10737482.720000][ C0] [<6042d623>] ? default_idle_call+0xb3/0x1fa
[10737482.720000][ C0] [<60079970>] ? arch_cpu_idle_exit+0x0/0x10
[10737482.720000][ C0] [<60079960>] ? arch_cpu_idle_enter+0x0/0x10
[10737482.720000][ C0] [<60079970>] ? arch_cpu_idle_exit+0x0/0x10
[10737482.720000][ C0] [<60079a8d>] ? do_idle+0xdd/0x180
[10737482.720000][ C0] [<60091bb0>] ? lock_release+0x0/0x3d0
[10737482.720000][ C0] [<60066f40>] ? find_task_by_pid_ns+0x0/0xa0
[10737482.720000][ C0] [<600799b0>] ? do_idle+0x0/0x180
[10737482.720000][ C0] [<602996b0>] ? strlen+0x0/0x30
[10737482.720000][ C0] [<60079f9b>] ? cpu_startup_entry+0x1b/0x20
[10737482.720000][ C0] [<603fabe8>] ? _printk+0x0/0x9b
[10737482.720000][ C0] [<60001f3e>] ? 0x60001f3e
[10737482.720000][ C0] [<60068b00>] ? parse_args+0x0/0x4f0
[10737482.720000][ C0] [<60036ce0>] ? block_signals+0x0/0x20
[10737482.720000][ C0] [<60003ac3>] ? 0x60003ac3
[10737482.720000][ C0] [<600232b2>] ? new_thread_handler+0x82/0xc0
[10737482.720000][ C0] irq event stamp: 18391811
[10737482.720000][ C0] hardirqs last enabled at (18391810): [<000000006042dc14>] _raw_spin_unlock_irq+0x34/0x50
[10737482.720000][ C0] hardirqs last disabled at (18391811): [<000000006042d904>] _raw_spin_lock_irq+0x74/0x80
[10737482.720000][ C0] softirqs last enabled at (18391702): [<000000006042e743>] __do_softirq+0x383/0x576
[10737482.720000][ C0] softirqs last disabled at (18391799): [<0000000060047585>] __irq_exit_rcu+0x175/0x1a0
[10737482.720000][ C0] ---[ end trace 0000000000000000 ]---

CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_EXPERT=y
CONFIG_HOSTFS=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_UML_TIME_TRAVEL_SUPPORT=y
CONFIG_SSL=y
CONFIG_NULL_CHAN=y
CONFIG_PORT_CHAN=y
CONFIG_PTY_CHAN=y
CONFIG_TTY_CHAN=y
CONFIG_XTERM_CHAN=y
CONFIG_CON_CHAN="pts"
CONFIG_SSL_CHAN="pts"
CONFIG_VIRTIO_UML=y
CONFIG_UML_PCI_OVER_VIRTIO=y
CONFIG_UML_PCI_OVER_VIRTIO_DEVICE_ID=1234
CONFIG_GCOV_KERNEL=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_BINFMT_MISC=m
# CONFIG_COMPACTION is not set
CONFIG_NET=y
CONFIG_UNIX=y
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_OF=y
# CONFIG_INPUT is not set
CONFIG_LEGACY_PTY_COUNT=32
CONFIG_HW_RANDOM=y
CONFIG_RANDOM_TRUST_BOOTLOADER=y
CONFIG_I2C=y
# CONFIG_I2C_COMPAT is not set
CONFIG_I2C_CHARDEV=y
CONFIG_I2C_VIRTIO=y
CONFIG_I2C_STUB=m
CONFIG_GPIOLIB=y
CONFIG_GPIO_VIRTIO=y
CONFIG_REGULATOR=y
CONFIG_REGULATOR_DEBUG=y
CONFIG_REGULATOR_VIRTUAL_CONSUMER=y
CONFIG_REGULATOR_TPS6286X=m
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
CONFIG_LEDS_GPIO=y
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_HEARTBEAT=y
CONFIG_RTC_CLASS=y
# CONFIG_RTC_HCTOSYS is not set
# CONFIG_RTC_SYSTOHC is not set
CONFIG_RTC_DEBUG=y
# CONFIG_RTC_NVMEM is not set
CONFIG_RTC_DRV_PCF8563=m
CONFIG_IIO=y
CONFIG_OPT3001=m
CONFIG_VCNL4000=m
CONFIG_QUOTA=y
CONFIG_AUTOFS4_FS=m
CONFIG_PROC_KCORE=y
CONFIG_TMPFS=y
CONFIG_NLS=y
CONFIG_CRYPTO=y
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_JITTERENTROPY=y
CONFIG_CRC16=y
CONFIG_PRINTK_TIME=y
CONFIG_PRINTK_CALLER=y
CONFIG_DYNAMIC_DEBUG=y
CONFIG_FRAME_WARN=1024
CONFIG_READABLE_ASM=y
CONFIG_DEBUG_FS=y
CONFIG_UBSAN=y
CONFIG_PAGE_EXTENSION=y
CONFIG_DEBUG_OBJECTS=y
CONFIG_DEBUG_OBJECTS_FREE=y
CONFIG_DEBUG_OBJECTS_TIMERS=y
CONFIG_DEBUG_OBJECTS_WORK=y
CONFIG_PROVE_LOCKING=y
CONFIG_ENABLE_DEFAULT_TRACERS=y

Attachments:

(No filename) (12.10 kB)
uml-config (72.42 kB)
Download all attachments

2022-03-31 04:58:53

by Johannes Berg

[permalink] [raw]

Subject: Re: UML time-travel warning from __run_timers

On Wed, 2022-03-30 at 13:01 +0200, Vincent Whitchurch wrote:
> Hello Johannes,
>
> As requested in the roadtest thread[0], here is some information about
> reproducing the warnings I'm seeing from the timekeeping code with time
> travel in UML, after about 10-20 wall time seconds of idling.
>
> [0] https://lore.kernel.org/lkml/[email protected]/
>
> The warning is the WARN_ON_ONCE(!levels && !base->next_expiry_recalc) in
> __run_timers(). The log and the config generated with savedefconfig are
> below. The full .config is attached.
>
> The kernel version is v5.17, but the splats are seen on older kernels
> and on latest mainline also.
>

So interestingly, I cannot reproduce it with my config (it's already
March 2023), but I can with yours ...

johannes

2022-04-05 00:17:08

by Johannes Berg

[permalink] [raw]

Subject: Re: UML time-travel warning from __run_timers

On Sun, 2022-04-03 at 19:13 +0200, Johannes Berg wrote:
> On Sun, 2022-04-03 at 18:18 +0200, Thomas Gleixner wrote:
> > On Sat, Apr 02 2022 at 16:09, Johannes Berg wrote:
> > > At init, we get
> > >
> > > init_timer_cpu(0) base 0 clk=0xffff8ad0, next_expiry=0x13fff8acf
> > > init_timer_cpu(0) base 1 clk=0xffff8ad0, next_expiry=0x13fff8acf
> > >
> > > which makes sense, jiffies is set up to wrap very quickly after boot.
> > >
> > > The warning triggers when we have jiffies=0x13fff9600, so it's just
> > > after the "next_expiry", so in this code:
> >
> > which does not make sense.
>
> If you say so, I have no idea :)
>
> > If next_expiry is 0x13fff8acf and jiffies
> > advanced to 0x13fff9600 when the warning triggered, then either it
> > missed to expire the timer at 0x13fff8acf or it failed to recalculate
> > next_expiry.
>
> There was no timer. If there's ever a timer on this base (BASE_DEF) then
> this doesn't happen.
>
> So it has to be the latter, but I'm trying to understand in the code
> where it would*ever* recalculate next_expiry if it in fact never
> expires?
>

Actually, in a sense, this *is* the case of (just) recalculating
next_expiry, no? We just never set next_expiry_recalc since there was
never any timer on this?

So actually this also makes the warning go away:

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1729,6 +1733,7 @@ static inline void __run_timers(struct timer_base *base)
WARN_ON_ONCE(!levels && !base->next_expiry_recalc);
base->clk++;
base->next_expiry = __next_timer_interrupt(base);
+ base->next_expiry_recalc = !levels;

while (levels--)
expire_timers(base, heads + levels);
@@ -2005,6 +2010,7 @@ static void __init init_timer_cpu(int cpu)
raw_spin_lock_init(&base->lock);
base->clk = jiffies;
base->next_expiry = base->clk + NEXT_TIMER_MAX_DELTA;
+ base->next_expiry_recalc = true;
timer_base_init_expiry_lock(base);
}
}

johannes

2022-04-05 00:18:01

by Thomas Gleixner

[permalink] [raw]

Subject: Re: UML time-travel warning from __run_timers

On Sat, Apr 02 2022 at 16:09, Johannes Berg wrote:
> At init, we get
>
> init_timer_cpu(0) base 0 clk=0xffff8ad0, next_expiry=0x13fff8acf
> init_timer_cpu(0) base 1 clk=0xffff8ad0, next_expiry=0x13fff8acf
>
> which makes sense, jiffies is set up to wrap very quickly after boot.
>
> The warning triggers when we have jiffies=0x13fff9600, so it's just
> after the "next_expiry", so in this code:

which does not make sense. If next_expiry is 0x13fff8acf and jiffies
advanced to 0x13fff9600 when the warning triggered, then either it
missed to expire the timer at 0x13fff8acf or it failed to recalculate
next_expiry.

Could you enable all [hr]timer tracepoints on the kernel command line,
set panic on warn and ftrace_dump_on_oops which should spill out the
tracebuffer on the console and provide the output. That should at least
give us an hint what's going on.

Thanks,

tglx

2022-04-05 00:31:10

by Thomas Gleixner

[permalink] [raw]

Subject: Re: UML time-travel warning from __run_timers

On Sun, Apr 03 2022 at 19:13, Johannes Berg wrote:
> On Sun, 2022-04-03 at 18:18 +0200, Thomas Gleixner wrote:
>> On Sat, Apr 02 2022 at 16:09, Johannes Berg wrote:
> There was no timer. If there's ever a timer on this base (BASE_DEF) then
> this doesn't happen.

You said:

>> > init_timer_cpu(0) base 0 clk=0xffff8ad0, next_expiry=0x13fff8acf
>> > init_timer_cpu(0) base 1 clk=0xffff8ad0, next_expiry=0x13fff8acf

which confused me. It's actually initialized to:

base->clk + NEXT_TIMER_MAX_DELTA

but that's fine and it is overwritten by every timer which is inserted
to expire before that. So that's not an issue as the prandom timer is
firing and rearmed.

That would not happen if next_expiry would stay at 0x13fff8acf. The
first one in your trace expires at 5339070200, i.e. 0x13e3bbef8, which
is way before that.

So that 0x13fff8acf thing is really a red herring.

> Sure, but since we simulate ~50 days of uptime, that's a massive number
> of events and lots of them are not there:

Thanks for providing that. This does not give much of an hint. jiffies,
base clock and clock monotonic advance as expected. The prandom timer
seems to be the only timer which is armed here, but of course the trace
does not go far enough back to be sure. OTOH, there is no timer canceled
just before that which might trigger this. I suspect that's a base
forward call missing and I have a nagging feeling where, but I'm not
sure.

Can you please apply the debug patch below and run with the same
parameters as before?

Thanks,

tglx
---
Hint: I tried to figure out how to use that time travel muck, but did
not get to the point where I bothered to try myself. Might be
either my incompetence or lack of documentation. Clearly the bug
report lacks any hint how to reproduce that problem.
---

--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -52,15 +52,19 @@ DEFINE_EVENT(timer_class, timer_init,
TRACE_EVENT(timer_start,

TP_PROTO(struct timer_list *timer,
- unsigned long expires,
- unsigned int flags),
+ unsigned long expires,
+ unsigned long bucket_expiry,
+ unsigned long next_expiry,
+ unsigned int flags),

- TP_ARGS(timer, expires, flags),
+ TP_ARGS(timer, expires, bucket_expiry, next_expiry, flags),

TP_STRUCT__entry(
__field( void *, timer )
__field( void *, function )
__field( unsigned long, expires )
+ __field( unsigned long, bucket_expiry )
+ __field( unsigned long, next_expiry )
__field( unsigned long, now )
__field( unsigned int, flags )
),
@@ -69,13 +73,16 @@ TRACE_EVENT(timer_start,
__entry->timer = timer;
__entry->function = timer->function;
__entry->expires = expires;
+ __entry->bucket_expiry = bucket_expiry;
+ __entry->next_expiry = next_expiry;
__entry->now = jiffies;
__entry->flags = flags;
),

- TP_printk("timer=%p function=%ps expires=%lu [timeout=%ld] cpu=%u idx=%u flags=%s",
+ TP_printk("timer=%p function=%ps expires=%lu [timeout=%ld] bucket_expiry=%lu next_expiry=%lu cpu=%u idx=%u flags=%s",
__entry->timer, __entry->function, __entry->expires,
(long)__entry->expires - __entry->now,
+ __entry->bucket_expiry, __entry->next_expiry,
__entry->flags & TIMER_CPUMASK,
__entry->flags >> TIMER_ARRAYSHIFT,
decode_timer_flags(__entry->flags & TIMER_TRACE_FLAGMASK))
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -583,8 +583,6 @@ static void enqueue_timer(struct timer_b
__set_bit(idx, base->pending_map);
timer_set_idx(timer, idx);

- trace_timer_start(timer, timer->expires, timer->flags);
-
/*
* Check whether this is the new first expiring timer. The
* effective expiry time of the timer is required here
@@ -600,6 +598,7 @@ static void enqueue_timer(struct timer_b
base->next_expiry_recalc = false;
trigger_dyntick_cpu(base, timer);
}
+ trace_timer_start(timer, timer->expires, bucket_expiry, base->next_expiry, timer->flags);
}

static void internal_add_timer(struct timer_base *base, struct timer_list *timer)
@@ -893,6 +892,9 @@ static inline void forward_timer_base(st
{
unsigned long jnow = READ_ONCE(jiffies);

+ trace_printk("FWD: now=%lu clk=%lu next_expiry=%lu recalc=%d\n",
+ jnow, base->clk, base->next_expiry,
+ base->next_expiry_recalc);
/*
* No need to forward if we are close enough below jiffies.
* Also while executing timers, base->clk is 1 offset ahead
@@ -1710,6 +1712,7 @@ void timer_clear_idle(void)
static inline void __run_timers(struct timer_base *base)
{
struct hlist_head heads[LVL_DEPTH];
+ bool confused = false;
int levels;

if (time_before(jiffies, base->next_expiry))
@@ -1718,6 +1721,10 @@ static inline void __run_timers(struct t
timer_base_lock_expiry(base);
raw_spin_lock_irq(&base->lock);

+ trace_printk("RUN: now=%lu clk=%lu next_expiry=%lu recalc=%d\n",
+ jiffies, base->clk, base->next_expiry,
+ base->next_expiry_recalc);
+
while (time_after_eq(jiffies, base->clk) &&
time_after_eq(jiffies, base->next_expiry)) {
levels = collect_expired_timers(base, heads);
@@ -1726,7 +1733,12 @@ static inline void __run_timers(struct t
* timer at this clk is that all matching timers have been
* dequeued.
*/
- WARN_ON_ONCE(!levels && !base->next_expiry_recalc);
+ if (!levels && !base->next_expiry_recalc) {
+ trace_printk("WTH: now=%lu clk=%lu next_expiry=%lu recalc=%d\n",
+ jiffies, base->clk, base->next_expiry,
+ base->next_expiry_recalc);
+ confused = true;
+ }
base->clk++;
base->next_expiry = __next_timer_interrupt(base);

@@ -1735,6 +1747,9 @@ static inline void __run_timers(struct t
}
raw_spin_unlock_irq(&base->lock);
timer_base_unlock_expiry(base);
+
+ if (confused)
+ panic("Confused");
}

/*

2022-04-05 00:32:02

by Thomas Gleixner

[permalink] [raw]

Subject: Re: UML time-travel warning from __run_timers

On Mon, Apr 04 2022 at 09:02, Johannes Berg wrote:
> On Sun, 2022-04-03 at 21:51 +0200, Thomas Gleixner wrote:
>> but that's fine and it is overwritten by every timer which is inserted
>> to expire before that. So that's not an issue as the prandom timer is
>> firing and rearmed.
>
> No, as I said before, there's never any timer with base 1 (BASE_DEF) in
> the config we have. The prandom timer is not TIMER_DEFERRABLE (it
> probably could be, but it's not now). There's no deferrable timer at
> all. Once there is at least one, the warning goes away.

Groan. I overlooked the deferrable part. Yes, you are right. next_expiry
of the deferrable base is stale when there is no timer queued up to the
point where base->clk reaches the initial next_expiry value. So the
check is bogus.

Thanks,

tglx
---
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1724,9 +1724,8 @@ static inline void __run_timers(struct t
/*
* The only possible reason for not finding any expired
* timer at this clk is that all matching timers have been
- * dequeued.
+ * dequeued or no timer has been ever queued.
*/
- WARN_ON_ONCE(!levels && !base->next_expiry_recalc);
base->clk++;
base->next_expiry = __next_timer_interrupt(base);

2022-04-05 00:34:07

by Johannes Berg

[permalink] [raw]

Subject: Re: UML time-travel warning from __run_timers

On Mon, 2022-04-04 at 10:32 +0200, Thomas Gleixner wrote:
> On Mon, Apr 04 2022 at 09:02, Johannes Berg wrote:
> > On Sun, 2022-04-03 at 21:51 +0200, Thomas Gleixner wrote:
> > > but that's fine and it is overwritten by every timer which is inserted
> > > to expire before that. So that's not an issue as the prandom timer is
> > > firing and rearmed.
> >
> > No, as I said before, there's never any timer with base 1 (BASE_DEF) in
> > the config we have. The prandom timer is not TIMER_DEFERRABLE (it
> > probably could be, but it's not now). There's no deferrable timer at
> > all. Once there is at least one, the warning goes away.
>
> Groan. I overlooked the deferrable part. Yes, you are right. next_expiry
> of the deferrable base is stale when there is no timer queued up to the
> point where base->clk reaches the initial next_expiry value. So the
> check is bogus.
>
> Thanks,
>
> tglx
> ---
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1724,9 +1724,8 @@ static inline void __run_timers(struct t
> /*
> * The only possible reason for not finding any expired
> * timer at this clk is that all matching timers have been
> - * dequeued.
> + * dequeued or no timer has been ever queued.
> */
> - WARN_ON_ONCE(!levels && !base->next_expiry_recalc);
>

So I'm pretty sure we don't even need to test a patch simply removing
the WARN_ON_ONCE() since the entire problem Vincent reported was hitting
the WARN_ON_ONCE :)

(And I'm pretty sure I did at some point test some additional condition
inside it)

Are you going to merge that patch?

johannes

2022-04-05 01:05:30

by Johannes Berg

[permalink] [raw]

Subject: Re: UML time-travel warning from __run_timers

On Sun, 2022-04-03 at 21:51 +0200, Thomas Gleixner wrote:
>
> > There was no timer. If there's ever a timer on this base (BASE_DEF) then
> > this doesn't happen.
>
> You said:
>
> > > > init_timer_cpu(0) base 0 clk=0xffff8ad0, next_expiry=0x13fff8acf
> > > > init_timer_cpu(0) base 1 clk=0xffff8ad0, next_expiry=0x13fff8acf
>
> which confused me. It's actually initialized to:
>
> base->clk + NEXT_TIMER_MAX_DELTA
>
> but that's fine and it is overwritten by every timer which is inserted
> to expire before that. So that's not an issue as the prandom timer is
> firing and rearmed.

No, as I said before, there's never any timer with base 1 (BASE_DEF) in
the config we have. The prandom timer is not TIMER_DEFERRABLE (it
probably could be, but it's not now). There's no deferrable timer at
all. Once there is at least one, the warning goes away.

> That would not happen if next_expiry would stay at 0x13fff8acf. The
> first one in your trace expires at 5339070200, i.e. 0x13e3bbef8, which
> is way before that.

But it's not a deferrable timer, so it's on another timer wheel (base),
so it doesn't affect the "base 1" part above.

> Can you please apply the debug patch below and run with the same
> parameters as before?
>
> Thanks,
>
> tglx
> ---
> Hint: I tried to figure out how to use that time travel muck, but did
> not get to the point where I bothered to try myself. Might be
> either my incompetence or lack of documentation. Clearly the bug
> report lacks any hint how to reproduce that problem.

Well, the original bug report did have all the information, I gave the
link to it before:

https://lore.kernel.org/r/[email protected]

With that kernel config and command line, you can reproduce it easily.
All you need to know is to use "make ARCH=um" with that .config file :)

> + trace_printk("RUN: now=%lu clk=%lu next_expiry=%lu
> recalc=%d\n",
> + jiffies, base->clk, base->next_expiry,
> + base->next_expiry_recalc);

IMHO all of this extra debug is a waste of time since you're not
differentiating the two bases anywhere. You'll just get confused (as
above) since timers do happen on BASE_STD, just not on BASE_DEF.

johannes

2022-04-05 01:10:11

by Johannes Berg

[permalink] [raw]

Subject: Re: UML time-travel warning from __run_timers

On Sat, 2022-04-02 at 16:09 +0200, Johannes Berg wrote:
>
> (I put a WARN_ON into get_timer_cpu_base() and get_timer_this_cpu_base()
> in the if, and it never triggered; I guess my config has something that
> creates a deferrable timer, so it didn't trigger, but I didn't check
> that now.)
>

OK, so FWIW, I checked that now, and I have e.g. CONFIG_WQ_WATCHDOG
enabled, which makes a deferrable timer:

static void wq_watchdog_init(void)
{
timer_setup(&wq_watchdog_timer, wq_watchdog_timer_fn, TIMER_DEFERRABLE);

but I think a bunch of other (networking) things too that end up using a
TIMER_DEFERRABLE timer, via workqueues.

But maybe this would also happen if it was used just a single time,
since the timer would run & go away, leaving base->next_expiry never to
change again? But the WQ watchdog and also neigh_managed_work() are
periodic.

johannes

2022-04-05 01:35:00

by Johannes Berg

[permalink] [raw]

Subject: Re: UML time-travel warning from __run_timers

Hi Vincent,

> [10737482.720000][ C0] ------------[ cut here ]------------
> [10737482.720000][ C0] WARNING: CPU: 0 PID: 0 at kernel/time/timer.c:1729 __run_timers+0x36d/0x380
>

[for those new on the thread, full message and config here:
https://lore.kernel.org/r/[email protected]]

I think maybe you found a bug in the timers code?

Your config has CONFIG_NO_HZ_COMMON, so we have both BASE_STD and
BASE_DEF.

Evidently, in your config, we *never* have any timer with
TIMER_DEFERRABLE, which would put it into BASE_DEF.

(I put a WARN_ON into get_timer_cpu_base() and get_timer_this_cpu_base()
in the if, and it never triggered; I guess my config has something that
creates a deferrable timer, so it didn't trigger, but I didn't check
that now.)

Therefore, base->next_expiry never changes or something?

At init, we get

init_timer_cpu(0) base 0 clk=0xffff8ad0, next_expiry=0x13fff8acf
init_timer_cpu(0) base 1 clk=0xffff8ad0, next_expiry=0x13fff8acf

which makes sense, jiffies is set up to wrap very quickly after boot.

The warning triggers when we have jiffies=0x13fff9600, so it's just
after the "next_expiry", so in this code:

static inline void __run_timers(struct timer_base *base)
{
struct hlist_head heads[LVL_DEPTH];
int levels;

if (time_before(jiffies, base->next_expiry))
return;

we no longer return. Previously, we've *never* executed past that if for
BASE_DEF.

But we never touched this timer base nor did we ever want to recalc it I
guess, so

WARN_ON_ONCE(!levels && !base->next_expiry_recalc);

triggers.

I thought about changing that condition to

if (time_before(...) || !base->timers_pending)
return;

and that *does* make the splat go away, but I fear that might make it
not recalculate when needed, so perhaps in the condition we should have

if (time_before(...) ||
(!base->timers_pending && !base->next_expiry_recalc))
return;

or something? (which also avoids hitting the warning)

But I really don't know anything about this code, so adding a few CCs.
Can you help?

Thanks,
johannes

2022-04-05 01:37:29

by Thomas Gleixner

[permalink] [raw]

Subject: Re: UML time-travel warning from __run_timers

Johannes,

On Sun, Apr 03 2022 at 19:19, Johannes Berg wrote:
> Actually, in a sense, this *is* the case of (just) recalculating
> next_expiry, no? We just never set next_expiry_recalc since there was
> never any timer on this?

why are you insisting on fishing in the dark?

> So actually this also makes the warning go away:
>
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1729,6 +1733,7 @@ static inline void __run_timers(struct timer_base *base)
> WARN_ON_ONCE(!levels && !base->next_expiry_recalc);
> base->clk++;
> base->next_expiry = __next_timer_interrupt(base);
> + base->next_expiry_recalc = !levels;

You are papering over the problem. That makes the warnign go away, but
does not explain anyhting about the root cause. Can you please provide
the information which was asked for?

> while (levels--)
> expire_timers(base, heads + levels);
> @@ -2005,6 +2010,7 @@ static void __init init_timer_cpu(int cpu)
> raw_spin_lock_init(&base->lock);
> base->clk = jiffies;
> base->next_expiry = base->clk + NEXT_TIMER_MAX_DELTA;
> + base->next_expiry_recalc = true;

This is complete nonsense because at the point where the CPU base is
initialized next_expiry _IS_ correct at the outer max. Why would it be
required to be recalculated? The only reason why it needs to be
recalculated is when a timer is canceled before expiry, but there is
_NO_ timer which can be canceled at this point.

So what are you trying to solve here?

Thanks,

tglx

2022-04-05 02:25:59

by Johannes Berg

[permalink] [raw]

Subject: Re: UML time-travel warning from __run_timers

On Sun, 2022-04-03 at 18:18 +0200, Thomas Gleixner wrote:
> On Sat, Apr 02 2022 at 16:09, Johannes Berg wrote:
> > At init, we get
> >
> > init_timer_cpu(0) base 0 clk=0xffff8ad0, next_expiry=0x13fff8acf
> > init_timer_cpu(0) base 1 clk=0xffff8ad0, next_expiry=0x13fff8acf
> >
> > which makes sense, jiffies is set up to wrap very quickly after boot.
> >
> > The warning triggers when we have jiffies=0x13fff9600, so it's just
> > after the "next_expiry", so in this code:
>
> which does not make sense.

If you say so, I have no idea :)

> If next_expiry is 0x13fff8acf and jiffies
> advanced to 0x13fff9600 when the warning triggered, then either it
> missed to expire the timer at 0x13fff8acf or it failed to recalculate
> next_expiry.

There was no timer. If there's ever a timer on this base (BASE_DEF) then
this doesn't happen.

So it has to be the latter, but I'm trying to understand in the code
where it would*ever* recalculate next_expiry if it in fact never
expires?

> Could you enable all [hr]timer tracepoints on the kernel command line,
> set panic on warn and ftrace_dump_on_oops which should spill out the
> tracebuffer on the console and provide the output. That should at least
> give us an hint what's going on.
>

Sure, but since we simulate ~50 days of uptime, that's a massive number
of events and lots of them are not there:

https://p.sipsolutions.net/fb491cbbde82c600.txt

johannes

2022-04-05 03:20:55

by Thomas Gleixner

[permalink] [raw]

Subject: Re: UML time-travel warning from __run_timers

On Mon, Apr 04 2022 at 10:37, Johannes Berg wrote:
> On Mon, 2022-04-04 at 10:32 +0200, Thomas Gleixner wrote:
>> --- a/kernel/time/timer.c
>> +++ b/kernel/time/timer.c
>> @@ -1724,9 +1724,8 @@ static inline void __run_timers(struct t
>> /*
>> * The only possible reason for not finding any expired
>> * timer at this clk is that all matching timers have been
>> - * dequeued.
>> + * dequeued or no timer has been ever queued.
>> */
>> - WARN_ON_ONCE(!levels && !base->next_expiry_recalc);
>>
>
> So I'm pretty sure we don't even need to test a patch simply removing
> the WARN_ON_ONCE() since the entire problem Vincent reported was hitting
> the WARN_ON_ONCE :)

:)

> (And I'm pretty sure I did at some point test some additional condition
> inside it)
>
> Are you going to merge that patch?

Let me write a coherent changelog.