2022-11-08 11:31:11

by Alexandru Elisei

[permalink] [raw]
Subject: [BUG] rtc-efi: Error in efi.get_time() spams dmesg with error message

Hi,

Commit d3549a938b73 ("efi/arm64: libstub: avoid SetVirtualAddressMap() when
possible") exposed a firmware error on an Ampere Altra machine that was
causing the machine to panic. Then commit 23715a26c8d8 ("arm64: efi:
Recover from synchronous exceptions occurring in firmware") made the EFI
exception non-fatal, and disabled runtime services when the exception
happens. The interaction between those two patches are being discussed in a
separate thread [1], but that should be orthogonal to this.

Now efi.get_time() fails and each time an error message is printed to
dmesg, which happens several times a second and clutters dmesg
unnecessarily, to the point it becomes unusable.

I was wondering if it would be possible to turn dev_err() into a
dev_WARN_ONCE() or do something to avoid this issue. Tried to replace
dev_err() with dev_err_ratelimited(), and the error message was displayed
less often (about once per second), but dmesg was still being cluttered.

Here's a log with what is happening (the boot part of the log has been
removed for brevity, I've kept the kernel splats for context, can provide
full logs, kernel config, command line, etc, to reproduce it; goes without
saying that I am willing to test the fix myself):

[ 55.479519] [Firmware Bug]: Unable to handle paging request in EFI runtime service
[ 55.487122] CPU: 62 PID: 9 Comm: kworker/u320:0 Tainted: G I 6.1.0-rc4 #60
[ 55.487128] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
[ 55.487131] Workqueue: efi_rts_wq efi_call_rts
[ 55.487158] Call trace:
[ 55.487161] dump_backtrace.part.0+0xdc/0xf0
[ 55.487177] show_stack+0x18/0x40
[ 55.487180] dump_stack_lvl+0x68/0x84
[ 55.487190] dump_stack+0x18/0x34
[ 55.487192] efi_runtime_fixup_exception+0x74/0x88
[ 55.487199] __do_kernel_fault+0x108/0x1b0
[ 55.487204] do_page_fault+0xd0/0x400
[ 55.487207] do_translation_fault+0xac/0xc0
[ 55.487209] do_mem_abort+0x44/0x94
[ 55.487212] el1_abort+0x40/0x6c
[ 55.487214] el1h_64_sync_handler+0xd8/0xe4
[ 55.487218] el1h_64_sync+0x64/0x68
[ 55.487221] 0xb7eb7ae4
[ 55.487224] 0xb7eb8668
[ 55.487225] 0xb7eb6e08
[ 55.487227] 0xb7eb68ec
[ 55.487228] 0xb7eb3824
[ 55.487230] 0xb7eb05a8
[ 55.487231] 0xb7eb12a0
[ 55.487232] 0xb7e43504
[ 55.487234] 0xb7e43650
[ 55.487235] 0xb7e482d0
[ 55.487237] 0xb7e4907c
[ 55.487238] 0xb7e49ff4
[ 55.487239] 0xb7e40888
[ 55.487241] 0xb7cb3328
[ 55.487242] 0xb7cb0674
[ 55.487243] __efi_rt_asm_wrapper+0x54/0x70
[ 55.487246] efi_call_rts+0x28c/0x3d0
[ 55.487249] process_one_work+0x1d0/0x320
[ 55.487258] worker_thread+0x14c/0x444
[ 55.487261] kthread+0x10c/0x110
[ 55.487264] ret_from_fork+0x10/0x20
[ 55.487268] [Firmware Bug]: Synchronous exception occurred in EFI runtime service set_time()
[ 55.495735] ------------[ cut here ]------------
[ 55.495739] WARNING: CPU: 62 PID: 9 at drivers/firmware/efi/runtime-wrappers.c:111 efi_call_virt_check_flags+0x40/0xac
[ 55.495746] Modules linked in:
[ 55.495749] CPU: 62 PID: 9 Comm: kworker/u320:0 Tainted: G I 6.1.0-rc4 #60
[ 55.495751] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
[ 55.495753] Workqueue: efi_rts_wq efi_call_rts
[ 55.495757] pstate: 004000c9 (nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 55.495761] pc : efi_call_virt_check_flags+0x40/0xac
[ 55.495764] lr : efi_call_rts+0x29c/0x3d0
[ 55.495767] sp : ffff80000861bd40
[ 55.495768] x29: ffff80000861bd40 x28: 0000000000000000 x27: 0000000000000000
[ 55.495772] x26: ffffb251470e9e68 x25: ffff3fff89714805 x24: 0000000000000000
[ 55.495775] x23: 0000000000000000 x22: 0000000000000000 x21: 00000000000000c0
[ 55.495778] x20: ffffb25146688de0 x19: 0000000000000000 x18: ffffffffffffffff
[ 55.495780] x17: 657320656d69746e x16: 757220494645206e x15: 6920646572727563
[ 55.495784] x14: 636f206e6f697470 x13: ffff403e40540000 x12: 0000000000001c14
[ 55.495787] x11: 000000000000095c x10: ffff403e40800000 x9 : ffff403e40540000
[ 55.495790] x8 : 00000000ffff7fff x7 : ffff403e40800000 x6 : 0000000000000000
[ 55.495792] x5 : ffff083e7fe9aaa0 x4 : 0000000000000000 x3 : 0000000000000000
[ 55.495796] x2 : 0000000000000000 x1 : ffffb25146688de0 x0 : 00000000000000c0
[ 55.495799] Call trace:
[ 55.495800] efi_call_virt_check_flags+0x40/0xac
[ 55.495802] efi_call_rts+0x29c/0x3d0
[ 55.495805] process_one_work+0x1d0/0x320
[ 55.495808] worker_thread+0x14c/0x444
[ 55.495811] kthread+0x10c/0x110
[ 55.495814] ret_from_fork+0x10/0x20
[ 55.495815] ---[ end trace 0000000000000000 ]---
[ 55.495818] Disabling lock debugging due to kernel taint
[ 55.495822] efi: [Firmware Bug]: IRQ flags corrupted (0x00000000=>0x000000c0) by EFI set_time
[ 55.504434] efi: EFI Runtime Services are disabled!
[ 55.504465] rtc-efi rtc-efi.0: can't read time
[ 56.479370] efi: EFI Runtime Services are disabled!
[ 56.479394] rtc-efi rtc-efi.0: can't read time
[ 56.483855] rtc-efi rtc-efi.0: can't read time
[ 56.488306] rtc-efi rtc-efi.0: can't read time
[ 57.479574] rtc-efi rtc-efi.0: can't read time
[ 57.484030] rtc-efi rtc-efi.0: can't read time
[ 57.488474] rtc-efi rtc-efi.0: can't read time
[ 58.479692] rtc-efi rtc-efi.0: can't read time
[ 58.484139] rtc-efi rtc-efi.0: can't read time
[ 58.488582] rtc-efi rtc-efi.0: can't read time
[ 59.479691] rtc-efi rtc-efi.0: can't read time
... on, and on, on ...

[1] https://lore.kernel.org/linux-arm-kernel/[email protected]/

Thanks,
Alex


2022-11-08 22:14:55

by Alexandre Belloni

[permalink] [raw]
Subject: Re: [BUG] rtc-efi: Error in efi.get_time() spams dmesg with error message

On 08/11/2022 10:55:15+0000, Alexandru Elisei wrote:
> Hi,
>
> Commit d3549a938b73 ("efi/arm64: libstub: avoid SetVirtualAddressMap() when
> possible") exposed a firmware error on an Ampere Altra machine that was
> causing the machine to panic. Then commit 23715a26c8d8 ("arm64: efi:
> Recover from synchronous exceptions occurring in firmware") made the EFI
> exception non-fatal, and disabled runtime services when the exception
> happens. The interaction between those two patches are being discussed in a
> separate thread [1], but that should be orthogonal to this.
>
> Now efi.get_time() fails and each time an error message is printed to
> dmesg, which happens several times a second and clutters dmesg
> unnecessarily, to the point it becomes unusable.
>
> I was wondering if it would be possible to turn dev_err() into a
> dev_WARN_ONCE() or do something to avoid this issue. Tried to replace
> dev_err() with dev_err_ratelimited(), and the error message was displayed
> less often (about once per second), but dmesg was still being cluttered.
>

The question this raise is what is actually trying to read the RTC this
often?

This should be read once at boot and maybe every time you wake up from
suspend but there is no real reason to read it multiple times per
seconds.

> Here's a log with what is happening (the boot part of the log has been
> removed for brevity, I've kept the kernel splats for context, can provide
> full logs, kernel config, command line, etc, to reproduce it; goes without
> saying that I am willing to test the fix myself):
>
> [ 55.479519] [Firmware Bug]: Unable to handle paging request in EFI runtime service
> [ 55.487122] CPU: 62 PID: 9 Comm: kworker/u320:0 Tainted: G I 6.1.0-rc4 #60
> [ 55.487128] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
> [ 55.487131] Workqueue: efi_rts_wq efi_call_rts
> [ 55.487158] Call trace:
> [ 55.487161] dump_backtrace.part.0+0xdc/0xf0
> [ 55.487177] show_stack+0x18/0x40
> [ 55.487180] dump_stack_lvl+0x68/0x84
> [ 55.487190] dump_stack+0x18/0x34
> [ 55.487192] efi_runtime_fixup_exception+0x74/0x88
> [ 55.487199] __do_kernel_fault+0x108/0x1b0
> [ 55.487204] do_page_fault+0xd0/0x400
> [ 55.487207] do_translation_fault+0xac/0xc0
> [ 55.487209] do_mem_abort+0x44/0x94
> [ 55.487212] el1_abort+0x40/0x6c
> [ 55.487214] el1h_64_sync_handler+0xd8/0xe4
> [ 55.487218] el1h_64_sync+0x64/0x68
> [ 55.487221] 0xb7eb7ae4
> [ 55.487224] 0xb7eb8668
> [ 55.487225] 0xb7eb6e08
> [ 55.487227] 0xb7eb68ec
> [ 55.487228] 0xb7eb3824
> [ 55.487230] 0xb7eb05a8
> [ 55.487231] 0xb7eb12a0
> [ 55.487232] 0xb7e43504
> [ 55.487234] 0xb7e43650
> [ 55.487235] 0xb7e482d0
> [ 55.487237] 0xb7e4907c
> [ 55.487238] 0xb7e49ff4
> [ 55.487239] 0xb7e40888
> [ 55.487241] 0xb7cb3328
> [ 55.487242] 0xb7cb0674
> [ 55.487243] __efi_rt_asm_wrapper+0x54/0x70
> [ 55.487246] efi_call_rts+0x28c/0x3d0
> [ 55.487249] process_one_work+0x1d0/0x320
> [ 55.487258] worker_thread+0x14c/0x444
> [ 55.487261] kthread+0x10c/0x110
> [ 55.487264] ret_from_fork+0x10/0x20
> [ 55.487268] [Firmware Bug]: Synchronous exception occurred in EFI runtime service set_time()
> [ 55.495735] ------------[ cut here ]------------
> [ 55.495739] WARNING: CPU: 62 PID: 9 at drivers/firmware/efi/runtime-wrappers.c:111 efi_call_virt_check_flags+0x40/0xac
> [ 55.495746] Modules linked in:
> [ 55.495749] CPU: 62 PID: 9 Comm: kworker/u320:0 Tainted: G I 6.1.0-rc4 #60
> [ 55.495751] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
> [ 55.495753] Workqueue: efi_rts_wq efi_call_rts
> [ 55.495757] pstate: 004000c9 (nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 55.495761] pc : efi_call_virt_check_flags+0x40/0xac
> [ 55.495764] lr : efi_call_rts+0x29c/0x3d0
> [ 55.495767] sp : ffff80000861bd40
> [ 55.495768] x29: ffff80000861bd40 x28: 0000000000000000 x27: 0000000000000000
> [ 55.495772] x26: ffffb251470e9e68 x25: ffff3fff89714805 x24: 0000000000000000
> [ 55.495775] x23: 0000000000000000 x22: 0000000000000000 x21: 00000000000000c0
> [ 55.495778] x20: ffffb25146688de0 x19: 0000000000000000 x18: ffffffffffffffff
> [ 55.495780] x17: 657320656d69746e x16: 757220494645206e x15: 6920646572727563
> [ 55.495784] x14: 636f206e6f697470 x13: ffff403e40540000 x12: 0000000000001c14
> [ 55.495787] x11: 000000000000095c x10: ffff403e40800000 x9 : ffff403e40540000
> [ 55.495790] x8 : 00000000ffff7fff x7 : ffff403e40800000 x6 : 0000000000000000
> [ 55.495792] x5 : ffff083e7fe9aaa0 x4 : 0000000000000000 x3 : 0000000000000000
> [ 55.495796] x2 : 0000000000000000 x1 : ffffb25146688de0 x0 : 00000000000000c0
> [ 55.495799] Call trace:
> [ 55.495800] efi_call_virt_check_flags+0x40/0xac
> [ 55.495802] efi_call_rts+0x29c/0x3d0
> [ 55.495805] process_one_work+0x1d0/0x320
> [ 55.495808] worker_thread+0x14c/0x444
> [ 55.495811] kthread+0x10c/0x110
> [ 55.495814] ret_from_fork+0x10/0x20
> [ 55.495815] ---[ end trace 0000000000000000 ]---
> [ 55.495818] Disabling lock debugging due to kernel taint
> [ 55.495822] efi: [Firmware Bug]: IRQ flags corrupted (0x00000000=>0x000000c0) by EFI set_time
> [ 55.504434] efi: EFI Runtime Services are disabled!
> [ 55.504465] rtc-efi rtc-efi.0: can't read time
> [ 56.479370] efi: EFI Runtime Services are disabled!
> [ 56.479394] rtc-efi rtc-efi.0: can't read time
> [ 56.483855] rtc-efi rtc-efi.0: can't read time
> [ 56.488306] rtc-efi rtc-efi.0: can't read time
> [ 57.479574] rtc-efi rtc-efi.0: can't read time
> [ 57.484030] rtc-efi rtc-efi.0: can't read time
> [ 57.488474] rtc-efi rtc-efi.0: can't read time
> [ 58.479692] rtc-efi rtc-efi.0: can't read time
> [ 58.484139] rtc-efi rtc-efi.0: can't read time
> [ 58.488582] rtc-efi rtc-efi.0: can't read time
> [ 59.479691] rtc-efi rtc-efi.0: can't read time
> ... on, and on, on ...
>
> [1] https://lore.kernel.org/linux-arm-kernel/[email protected]/
>
> Thanks,
> Alex

--
Alexandre Belloni, co-owner and COO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2022-11-09 14:54:53

by Alexandru Elisei

[permalink] [raw]
Subject: Re: [BUG] rtc-efi: Error in efi.get_time() spams dmesg with error message

Hi,

On Tue, Nov 08, 2022 at 10:41:18PM +0100, Alexandre Belloni wrote:
> On 08/11/2022 10:55:15+0000, Alexandru Elisei wrote:
> > Hi,
> >
> > Commit d3549a938b73 ("efi/arm64: libstub: avoid SetVirtualAddressMap() when
> > possible") exposed a firmware error on an Ampere Altra machine that was
> > causing the machine to panic. Then commit 23715a26c8d8 ("arm64: efi:
> > Recover from synchronous exceptions occurring in firmware") made the EFI
> > exception non-fatal, and disabled runtime services when the exception
> > happens. The interaction between those two patches are being discussed in a
> > separate thread [1], but that should be orthogonal to this.
> >
> > Now efi.get_time() fails and each time an error message is printed to
> > dmesg, which happens several times a second and clutters dmesg
> > unnecessarily, to the point it becomes unusable.
> >
> > I was wondering if it would be possible to turn dev_err() into a
> > dev_WARN_ONCE() or do something to avoid this issue. Tried to replace
> > dev_err() with dev_err_ratelimited(), and the error message was displayed
> > less often (about once per second), but dmesg was still being cluttered.
> >
>
> The question this raise is what is actually trying to read the RTC this
> often?
>
> This should be read once at boot and maybe every time you wake up from
> suspend but there is no real reason to read it multiple times per
> seconds.

Reverted the commit the exposed the firmware bug, which means rtc-efi works as
it should. Added these debug statements to check how many times efi_read_time()
is called if there are no errors:

--- a/drivers/rtc/rtc-efi.c
+++ b/drivers/rtc/rtc-efi.c
@@ -154,6 +154,7 @@ static int efi_set_alarm(struct device *dev, struct rtc_wkalrm *wkalrm)
return status == EFI_SUCCESS ? 0 : -EINVAL;
}

+static unsigned long i = 0;
static int efi_read_time(struct device *dev, struct rtc_time *tm)
{
efi_status_t status;
@@ -162,6 +163,9 @@ static int efi_read_time(struct device *dev, struct rtc_time *tm)

status = efi.get_time(&eft, &cap);

+ i++;
+ pr_info("%s: Call number %lu\n", __func__, i);
+
if (status != EFI_SUCCESS) {
/* should never happen */
dev_err(dev, "can't read time\n");

The function gets called 3 times, twice during boot and once after. I would say
that efi_read_time() gets called so many times because it fails.

Thanks,
Alex

>
> > Here's a log with what is happening (the boot part of the log has been
> > removed for brevity, I've kept the kernel splats for context, can provide
> > full logs, kernel config, command line, etc, to reproduce it; goes without
> > saying that I am willing to test the fix myself):
> >
> > [ 55.479519] [Firmware Bug]: Unable to handle paging request in EFI runtime service
> > [ 55.487122] CPU: 62 PID: 9 Comm: kworker/u320:0 Tainted: G I 6.1.0-rc4 #60
> > [ 55.487128] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
> > [ 55.487131] Workqueue: efi_rts_wq efi_call_rts
> > [ 55.487158] Call trace:
> > [ 55.487161] dump_backtrace.part.0+0xdc/0xf0
> > [ 55.487177] show_stack+0x18/0x40
> > [ 55.487180] dump_stack_lvl+0x68/0x84
> > [ 55.487190] dump_stack+0x18/0x34
> > [ 55.487192] efi_runtime_fixup_exception+0x74/0x88
> > [ 55.487199] __do_kernel_fault+0x108/0x1b0
> > [ 55.487204] do_page_fault+0xd0/0x400
> > [ 55.487207] do_translation_fault+0xac/0xc0
> > [ 55.487209] do_mem_abort+0x44/0x94
> > [ 55.487212] el1_abort+0x40/0x6c
> > [ 55.487214] el1h_64_sync_handler+0xd8/0xe4
> > [ 55.487218] el1h_64_sync+0x64/0x68
> > [ 55.487221] 0xb7eb7ae4
> > [ 55.487224] 0xb7eb8668
> > [ 55.487225] 0xb7eb6e08
> > [ 55.487227] 0xb7eb68ec
> > [ 55.487228] 0xb7eb3824
> > [ 55.487230] 0xb7eb05a8
> > [ 55.487231] 0xb7eb12a0
> > [ 55.487232] 0xb7e43504
> > [ 55.487234] 0xb7e43650
> > [ 55.487235] 0xb7e482d0
> > [ 55.487237] 0xb7e4907c
> > [ 55.487238] 0xb7e49ff4
> > [ 55.487239] 0xb7e40888
> > [ 55.487241] 0xb7cb3328
> > [ 55.487242] 0xb7cb0674
> > [ 55.487243] __efi_rt_asm_wrapper+0x54/0x70
> > [ 55.487246] efi_call_rts+0x28c/0x3d0
> > [ 55.487249] process_one_work+0x1d0/0x320
> > [ 55.487258] worker_thread+0x14c/0x444
> > [ 55.487261] kthread+0x10c/0x110
> > [ 55.487264] ret_from_fork+0x10/0x20
> > [ 55.487268] [Firmware Bug]: Synchronous exception occurred in EFI runtime service set_time()
> > [ 55.495735] ------------[ cut here ]------------
> > [ 55.495739] WARNING: CPU: 62 PID: 9 at drivers/firmware/efi/runtime-wrappers.c:111 efi_call_virt_check_flags+0x40/0xac
> > [ 55.495746] Modules linked in:
> > [ 55.495749] CPU: 62 PID: 9 Comm: kworker/u320:0 Tainted: G I 6.1.0-rc4 #60
> > [ 55.495751] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
> > [ 55.495753] Workqueue: efi_rts_wq efi_call_rts
> > [ 55.495757] pstate: 004000c9 (nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [ 55.495761] pc : efi_call_virt_check_flags+0x40/0xac
> > [ 55.495764] lr : efi_call_rts+0x29c/0x3d0
> > [ 55.495767] sp : ffff80000861bd40
> > [ 55.495768] x29: ffff80000861bd40 x28: 0000000000000000 x27: 0000000000000000
> > [ 55.495772] x26: ffffb251470e9e68 x25: ffff3fff89714805 x24: 0000000000000000
> > [ 55.495775] x23: 0000000000000000 x22: 0000000000000000 x21: 00000000000000c0
> > [ 55.495778] x20: ffffb25146688de0 x19: 0000000000000000 x18: ffffffffffffffff
> > [ 55.495780] x17: 657320656d69746e x16: 757220494645206e x15: 6920646572727563
> > [ 55.495784] x14: 636f206e6f697470 x13: ffff403e40540000 x12: 0000000000001c14
> > [ 55.495787] x11: 000000000000095c x10: ffff403e40800000 x9 : ffff403e40540000
> > [ 55.495790] x8 : 00000000ffff7fff x7 : ffff403e40800000 x6 : 0000000000000000
> > [ 55.495792] x5 : ffff083e7fe9aaa0 x4 : 0000000000000000 x3 : 0000000000000000
> > [ 55.495796] x2 : 0000000000000000 x1 : ffffb25146688de0 x0 : 00000000000000c0
> > [ 55.495799] Call trace:
> > [ 55.495800] efi_call_virt_check_flags+0x40/0xac
> > [ 55.495802] efi_call_rts+0x29c/0x3d0
> > [ 55.495805] process_one_work+0x1d0/0x320
> > [ 55.495808] worker_thread+0x14c/0x444
> > [ 55.495811] kthread+0x10c/0x110
> > [ 55.495814] ret_from_fork+0x10/0x20
> > [ 55.495815] ---[ end trace 0000000000000000 ]---
> > [ 55.495818] Disabling lock debugging due to kernel taint
> > [ 55.495822] efi: [Firmware Bug]: IRQ flags corrupted (0x00000000=>0x000000c0) by EFI set_time
> > [ 55.504434] efi: EFI Runtime Services are disabled!
> > [ 55.504465] rtc-efi rtc-efi.0: can't read time
> > [ 56.479370] efi: EFI Runtime Services are disabled!
> > [ 56.479394] rtc-efi rtc-efi.0: can't read time
> > [ 56.483855] rtc-efi rtc-efi.0: can't read time
> > [ 56.488306] rtc-efi rtc-efi.0: can't read time
> > [ 57.479574] rtc-efi rtc-efi.0: can't read time
> > [ 57.484030] rtc-efi rtc-efi.0: can't read time
> > [ 57.488474] rtc-efi rtc-efi.0: can't read time
> > [ 58.479692] rtc-efi rtc-efi.0: can't read time
> > [ 58.484139] rtc-efi rtc-efi.0: can't read time
> > [ 58.488582] rtc-efi rtc-efi.0: can't read time
> > [ 59.479691] rtc-efi rtc-efi.0: can't read time
> > ... on, and on, on ...
> >
> > [1] https://lore.kernel.org/linux-arm-kernel/[email protected]/
> >
> > Thanks,
> > Alex
>
> --
> Alexandre Belloni, co-owner and COO, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com

2022-11-09 16:39:17

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [BUG] rtc-efi: Error in efi.get_time() spams dmesg with error message #forregzbot

[Note: this mail is primarily send for documentation purposes and/or for
regzbot, my Linux kernel regression tracking bot. That's why I removed
most or all folks from the list of recipients, but left any that looked
like a mailing lists. These mails usually contain '#forregzbot' in the
subject, to make them easy to spot and filter out.]

[TLDR: I'm adding this regression report to the list of tracked
regressions; all text from me you find below is based on a few templates
paragraphs you might have encountered already already in similar form.]

Hi, this is your Linux kernel regression tracker.

On 08.11.22 11:55, Alexandru Elisei wrote:

> Commit d3549a938b73 ("efi/arm64: libstub: avoid SetVirtualAddressMap() when
> possible") exposed a firmware error on an Ampere Altra machine that was
> causing the machine to panic. Then commit 23715a26c8d8 ("arm64: efi:
> Recover from synchronous exceptions occurring in firmware") made the EFI
> exception non-fatal, and disabled runtime services when the exception
> happens. The interaction between those two patches are being discussed in a
> separate thread [1], but that should be orthogonal to this.
>
> Now efi.get_time() fails and each time an error message is printed to
> dmesg, which happens several times a second and clutters dmesg
> unnecessarily, to the point it becomes unusable.

Thanks for the report. To be sure below issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
tracking bot:

#regzbot ^introduced 23715a26c8d8
#regzbot title arm64: efi: efi.get_time() fails and clutters dmesg
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply -- ideally with also
telling regzbot about it, as explained here:
https://linux-regtracking.leemhuis.info/tracked-regression/

Reminder for developers: When fixing the issue, add 'Link:' tags
pointing to the report (the mail this one replies to), as explained for
in the Linux kernel's documentation; above webpage explains why this is
important for tracked regressions.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

2022-11-09 20:55:09

by Alexandre Belloni

[permalink] [raw]
Subject: Re: [BUG] rtc-efi: Error in efi.get_time() spams dmesg with error message

On 09/11/2022 14:51:19+0000, Alexandru Elisei wrote:
> Hi,
>
> On Tue, Nov 08, 2022 at 10:41:18PM +0100, Alexandre Belloni wrote:
> > On 08/11/2022 10:55:15+0000, Alexandru Elisei wrote:
> > > Hi,
> > >
> > > Commit d3549a938b73 ("efi/arm64: libstub: avoid SetVirtualAddressMap() when
> > > possible") exposed a firmware error on an Ampere Altra machine that was
> > > causing the machine to panic. Then commit 23715a26c8d8 ("arm64: efi:
> > > Recover from synchronous exceptions occurring in firmware") made the EFI
> > > exception non-fatal, and disabled runtime services when the exception
> > > happens. The interaction between those two patches are being discussed in a
> > > separate thread [1], but that should be orthogonal to this.
> > >
> > > Now efi.get_time() fails and each time an error message is printed to
> > > dmesg, which happens several times a second and clutters dmesg
> > > unnecessarily, to the point it becomes unusable.
> > >
> > > I was wondering if it would be possible to turn dev_err() into a
> > > dev_WARN_ONCE() or do something to avoid this issue. Tried to replace
> > > dev_err() with dev_err_ratelimited(), and the error message was displayed
> > > less often (about once per second), but dmesg was still being cluttered.
> > >
> >
> > The question this raise is what is actually trying to read the RTC this
> > often?
> >
> > This should be read once at boot and maybe every time you wake up from
> > suspend but there is no real reason to read it multiple times per
> > seconds.
>
> Reverted the commit the exposed the firmware bug, which means rtc-efi works as
> it should. Added these debug statements to check how many times efi_read_time()
> is called if there are no errors:
>
> --- a/drivers/rtc/rtc-efi.c
> +++ b/drivers/rtc/rtc-efi.c
> @@ -154,6 +154,7 @@ static int efi_set_alarm(struct device *dev, struct rtc_wkalrm *wkalrm)
> return status == EFI_SUCCESS ? 0 : -EINVAL;
> }
>
> +static unsigned long i = 0;
> static int efi_read_time(struct device *dev, struct rtc_time *tm)
> {
> efi_status_t status;
> @@ -162,6 +163,9 @@ static int efi_read_time(struct device *dev, struct rtc_time *tm)
>
> status = efi.get_time(&eft, &cap);
>
> + i++;
> + pr_info("%s: Call number %lu\n", __func__, i);
> +
> if (status != EFI_SUCCESS) {
> /* should never happen */
> dev_err(dev, "can't read time\n");
>
> The function gets called 3 times, twice during boot and once after. I would say
> that efi_read_time() gets called so many times because it fails.
>

It should really get called only once, at device registration when
CONFIG_RTC_HCTOSYS is set (which I despise):
https://elixir.bootlin.com/linux/latest/source/drivers/rtc/class.c#L431

Could you maybe use dump_stack() ?


--
Alexandre Belloni, co-owner and COO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2022-11-10 17:10:25

by Alexandru Elisei

[permalink] [raw]
Subject: Re: [BUG] rtc-efi: Error in efi.get_time() spams dmesg with error message

Hi,

On Wed, Nov 09, 2022 at 09:31:27PM +0100, Alexandre Belloni wrote:
> On 09/11/2022 14:51:19+0000, Alexandru Elisei wrote:
> > Hi,
> >
> > On Tue, Nov 08, 2022 at 10:41:18PM +0100, Alexandre Belloni wrote:
> > > On 08/11/2022 10:55:15+0000, Alexandru Elisei wrote:
> > > > Hi,
> > > >
> > > > Commit d3549a938b73 ("efi/arm64: libstub: avoid SetVirtualAddressMap() when
> > > > possible") exposed a firmware error on an Ampere Altra machine that was
> > > > causing the machine to panic. Then commit 23715a26c8d8 ("arm64: efi:
> > > > Recover from synchronous exceptions occurring in firmware") made the EFI
> > > > exception non-fatal, and disabled runtime services when the exception
> > > > happens. The interaction between those two patches are being discussed in a
> > > > separate thread [1], but that should be orthogonal to this.
> > > >
> > > > Now efi.get_time() fails and each time an error message is printed to
> > > > dmesg, which happens several times a second and clutters dmesg
> > > > unnecessarily, to the point it becomes unusable.
> > > >
> > > > I was wondering if it would be possible to turn dev_err() into a
> > > > dev_WARN_ONCE() or do something to avoid this issue. Tried to replace
> > > > dev_err() with dev_err_ratelimited(), and the error message was displayed
> > > > less often (about once per second), but dmesg was still being cluttered.
> > > >
> > >
> > > The question this raise is what is actually trying to read the RTC this
> > > often?
> > >
> > > This should be read once at boot and maybe every time you wake up from
> > > suspend but there is no real reason to read it multiple times per
> > > seconds.
> >
> > Reverted the commit the exposed the firmware bug, which means rtc-efi works as
> > it should. Added these debug statements to check how many times efi_read_time()
> > is called if there are no errors:
> >
> > --- a/drivers/rtc/rtc-efi.c
> > +++ b/drivers/rtc/rtc-efi.c
> > @@ -154,6 +154,7 @@ static int efi_set_alarm(struct device *dev, struct rtc_wkalrm *wkalrm)
> > return status == EFI_SUCCESS ? 0 : -EINVAL;
> > }
> >
> > +static unsigned long i = 0;
> > static int efi_read_time(struct device *dev, struct rtc_time *tm)
> > {
> > efi_status_t status;
> > @@ -162,6 +163,9 @@ static int efi_read_time(struct device *dev, struct rtc_time *tm)
> >
> > status = efi.get_time(&eft, &cap);
> >
> > + i++;
> > + pr_info("%s: Call number %lu\n", __func__, i);
> > +
> > if (status != EFI_SUCCESS) {
> > /* should never happen */
> > dev_err(dev, "can't read time\n");
> >
> > The function gets called 3 times, twice during boot and once after. I would say
> > that efi_read_time() gets called so many times because it fails.
> >
>
> It should really get called only once, at device registration when
> CONFIG_RTC_HCTOSYS is set (which I despise):
> https://elixir.bootlin.com/linux/latest/source/drivers/rtc/class.c#L431
>
> Could you maybe use dump_stack() ?

Made this change to the driver:

--- a/drivers/rtc/rtc-efi.c
+++ b/drivers/rtc/rtc-efi.c
@@ -160,6 +160,9 @@ static int efi_read_time(struct device *dev, struct rtc_time *tm)
efi_time_t eft;
efi_time_cap_t cap;

+ pr_info("efi_read_time()\n");
+ dump_stack();
+
status = efi.get_time(&eft, &cap);

if (status != EFI_SUCCESS) {

I don't know if it makes a difference, but the driver is compiled in
(CONFIG_RTC_DRV_EFI=y) instead of it being a module.

For the case when there's no synchronous exception (so runtime services are
available):

[ 16.106871] rtc_efi: efi_read_time()
[ 16.110461] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc4-00002-g5c5bcc194a43 #85
[ 16.118455] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
[ 16.131221] Call trace:
[ 16.133657] dump_backtrace.part.0+0xdc/0xf0
[ 16.137936] show_stack+0x18/0x40
[ 16.141241] dump_stack_lvl+0x68/0x84
[ 16.144902] dump_stack+0x18/0x34
[ 16.148205] efi_read_time+0x38/0xac
[ 16.151770] __rtc_read_time+0x44/0x8c
[ 16.155515] __rtc_read_alarm+0x27c/0x400
[ 16.159512] __devm_rtc_register_device+0x100/0x300
[ 16.164378] efi_rtc_probe+0xc4/0xf8
[ 16.167951] platform_probe+0x68/0xe0
[ 16.171609] really_probe+0xbc/0x2dc
[ 16.175174] __driver_probe_device+0x78/0xe0
[ 16.179432] driver_probe_device+0x3c/0x160
[ 16.183604] __driver_attach+0x80/0x190
[ 16.187428] bus_for_each_dev+0x70/0xd0
[ 16.191254] driver_attach+0x24/0x30
[ 16.194817] bus_add_driver+0x150/0x200
[ 16.198641] driver_register+0x78/0x130
[ 16.202466] __platform_driver_probe+0x58/0x11c
[ 16.206985] efi_rtc_driver_init+0x24/0x30
[ 16.211070] do_one_initcall+0x50/0x1c0
[ 16.214895] kernel_init_freeable+0x214/0x280
[ 16.219245] kernel_init+0x24/0x12c
[ 16.222725] ret_from_fork+0x10/0x20
[ 16.226874] rtc-efi rtc-efi.0: registered as rtc0
[ 16.231577] rtc_efi: efi_read_time()
[ 16.235142] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc4-00002-g5c5bcc194a43 #85
[ 16.243133] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
[ 16.255897] Call trace:
[ 16.258331] dump_backtrace.part.0+0xdc/0xf0
[ 16.262589] show_stack+0x18/0x40
[ 16.265893] dump_stack_lvl+0x68/0x84
[ 16.269544] dump_stack+0x18/0x34
[ 16.272847] efi_read_time+0x38/0xac
[ 16.276411] __rtc_read_time+0x44/0x8c
[ 16.280148] rtc_read_time+0x3c/0x70
[ 16.283711] __devm_rtc_register_device+0x16c/0x300
[ 16.288575] efi_rtc_probe+0xc4/0xf8
[ 16.292139] platform_probe+0x68/0xe0
[ 16.295790] really_probe+0xbc/0x2dc
[ 16.299353] __driver_probe_device+0x78/0xe0
[ 16.303611] driver_probe_device+0x3c/0x160
[ 16.307783] __driver_attach+0x80/0x190
[ 16.311607] bus_for_each_dev+0x70/0xd0
[ 16.315431] driver_attach+0x24/0x30
[ 16.318994] bus_add_driver+0x150/0x200
[ 16.322817] driver_register+0x78/0x130
[ 16.326642] __platform_driver_probe+0x58/0x11c
[ 16.331160] efi_rtc_driver_init+0x24/0x30
[ 16.335245] do_one_initcall+0x50/0x1c0
[ 16.339069] kernel_init_freeable+0x214/0x280
[ 16.343414] kernel_init+0x24/0x12c
[ 16.346891] ret_from_fork+0x10/0x20
[ 16.351038] rtc-efi rtc-efi.0: setting system clock to 2022-11-10T14:57:06 UTC (1668092226)
[..]
[ 55.871457] rtc_efi: efi_read_time()
[ 55.871483] CPU: 20 PID: 1744 Comm: kworker/20:2 Not tainted 6.1.0-rc4-00002-g5c5bcc194a43 #85
[ 55.871495] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
[ 55.871500] Workqueue: events rtc_timer_do_work
[ 55.871529] Call trace:
[ 55.871533] dump_backtrace.part.0+0xdc/0xf0
[ 55.871553] show_stack+0x18/0x40
[ 55.871560] dump_stack_lvl+0x68/0x84
[ 55.871573] dump_stack+0x18/0x34
[ 55.871578] efi_read_time+0x38/0xac
[ 55.871584] __rtc_read_time+0x44/0x8c
[ 55.871589] rtc_timer_do_work+0x64/0x1c0
[ 55.871593] process_one_work+0x1d0/0x320
[ 55.871604] worker_thread+0x14c/0x444
[ 55.871610] kthread+0x10c/0x110
[ 55.871617] ret_from_fork+0x10/0x20

Running a v6.1-rc4 kernel, after the EFI synchronous abort, so runtime services
are disabled:

[ 55.547383] rtc_efi: efi_read_time()
[ 55.547394] CPU: 1 PID: 829 Comm: kworker/1:1 Tainted: G W I 6.1.0-rc4-00001-gbd8082c3515b #86
[ 55.547397] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
[ 55.547399] Workqueue: events rtc_timer_do_work
[ 55.547405] Call trace:
[ 55.547406] dump_backtrace.part.0+0xdc/0xf0
[ 55.547411] show_stack+0x18/0x40
[ 55.547414] dump_stack_lvl+0x68/0x84
[ 55.547418] dump_stack+0x18/0x34
[ 55.547420] efi_read_time+0x38/0xac
[ 55.547422] __rtc_read_time+0x44/0x8c
[ 55.547424] rtc_timer_do_work+0x64/0x1c0
[ 55.547425] process_one_work+0x1d0/0x320
[ 55.547429] worker_thread+0x14c/0x444
[ 55.547432] kthread+0x10c/0x110
[ 55.547435] ret_from_fork+0x10/0x20
[ 55.547437] efi: EFI Runtime Services are disabled!
[ 55.547441] rtc-efi rtc-efi.0: can't read time
[ 56.522572] efi: EFI Runtime Services are disabled!
[ 56.522602] rtc_efi: efi_read_time()
[ 56.522607] CPU: 46 PID: 883 Comm: kworker/46:1 Tainted: G W I 6.1.0-rc4-00001-gbd8082c3515b #86
[ 56.522620] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
[ 56.522627] Workqueue: events rtc_timer_do_work
[ 56.522640] Call trace:
[ 56.522644] dump_backtrace.part.0+0xdc/0xf0
[ 56.522658] show_stack+0x18/0x40
[ 56.522670] dump_stack_lvl+0x68/0x84
[ 56.522680] dump_stack+0x18/0x34
[ 56.522689] efi_read_time+0x38/0xac
[ 56.522696] __rtc_read_time+0x44/0x8c
[ 56.522703] rtc_timer_do_work+0x64/0x1c0
[ 56.522710] process_one_work+0x1d0/0x320
[ 56.522721] worker_thread+0x14c/0x444
[ 56.522731] kthread+0x10c/0x110
[ 56.522740] ret_from_fork+0x10/0x20
[ 56.522751] rtc-efi rtc-efi.0: can't read time
[ 56.527202] CPU: 46 PID: 883 Comm: kworker/46:1 Tainted: G W I 6.1.0-rc4-00001-gbd8082c3515b #86
[ 56.527205] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
[ 56.527207] Workqueue: events rtc_timer_do_work
[ 56.527209] Call trace:
[ 56.527210] dump_backtrace.part.0+0xdc/0xf0
[ 56.527213] show_stack+0x18/0x40
[ 56.527217] dump_stack_lvl+0x68/0x84
[ 56.527219] dump_stack+0x18/0x34
[ 56.527222] efi_read_time+0x38/0xac
[ 56.527224] __rtc_read_time+0x44/0x8c
[ 56.527226] rtc_timer_do_work+0x64/0x1c0
[ 56.527227] process_one_work+0x1d0/0x320
[ 56.527230] worker_thread+0x14c/0x444
[ 56.527234] kthread+0x10c/0x110
[ 56.527236] ret_from_fork+0x10/0x20
[ 56.527239] rtc-efi rtc-efi.0: can't read time
[ 56.531689] rtc_efi: efi_read_time()
[ 56.531700] CPU: 61 PID: 900 Comm: kworker/61:1 Tainted: G W I 6.1.0-rc4-00001-gbd8082c3515b #86
[ 56.531705] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
[ 56.531708] Workqueue: events rtc_timer_do_work
[ 56.531714] Call trace:
[ 56.531716] dump_backtrace.part.0+0xdc/0xf0
[ 56.531723] show_stack+0x18/0x40
[ 56.531728] dump_stack_lvl+0x68/0x84
[ 56.531732] dump_stack+0x18/0x34
[ 56.531737] efi_read_time+0x38/0xac
[ 56.531741] __rtc_read_time+0x44/0x8c
[ 56.531743] rtc_timer_do_work+0x64/0x1c0
[ 56.531747] process_one_work+0x1d0/0x320
[ 56.531752] worker_thread+0x14c/0x444
[ 56.531757] kthread+0x10c/0x110
[ 56.531761] ret_from_fork+0x10/0x20
[ 56.531768] rtc-efi rtc-efi.0: can't read time
[..]

I've removed the first two GetTime() calls that happens before runtime
services become unavailable, they looked similar to the calls in the first
scenario. Can provide full log if needed.

Thanks,
Alex

2022-11-13 17:59:45

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [BUG] rtc-efi: Error in efi.get_time() spams dmesg with error message #forregzbot

[Note: this mail is primarily send for documentation purposes and/or for
regzbot, my Linux kernel regression tracking bot. That's why I removed
most or all folks from the list of recipients, but left any that looked
like a mailing lists. These mails usually contain '#forregzbot' in the
subject, to make them easy to spot and filter out.]

On 09.11.22 16:50, Thorsten Leemhuis wrote:
> On 08.11.22 11:55, Alexandru Elisei wrote:
>
>> Commit d3549a938b73 ("efi/arm64: libstub: avoid SetVirtualAddressMap() when
>> possible") exposed a firmware error on an Ampere Altra machine that was
>> causing the machine to panic. Then commit 23715a26c8d8 ("arm64: efi:
>> Recover from synchronous exceptions occurring in firmware") made the EFI
>> exception non-fatal, and disabled runtime services when the exception
>> happens. The interaction between those two patches are being discussed in a
>> separate thread [1], but that should be orthogonal to this.
>>
>> Now efi.get_time() fails and each time an error message is printed to
>> dmesg, which happens several times a second and clutters dmesg
>> unnecessarily, to the point it becomes unusable.
>
> Thanks for the report. To be sure below issue doesn't fall through the
> cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
> tracking bot:
>
> #regzbot ^introduced 23715a26c8d8
> #regzbot title arm64: efi: efi.get_time() fails and clutters dmesg
> #regzbot ignore-activity

#regzbot fixed-by: 550b33cfd445296

2023-02-17 14:15:26

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [BUG] rtc-efi: Error in efi.get_time() spams dmesg with error message

On Thu, 10 Nov 2022 at 17:25, Alexandru Elisei <[email protected]> wrote:
>
> Hi,
>
> On Wed, Nov 09, 2022 at 09:31:27PM +0100, Alexandre Belloni wrote:
> > On 09/11/2022 14:51:19+0000, Alexandru Elisei wrote:
> > > Hi,
> > >
> > > On Tue, Nov 08, 2022 at 10:41:18PM +0100, Alexandre Belloni wrote:
> > > > On 08/11/2022 10:55:15+0000, Alexandru Elisei wrote:
> > > > > Hi,
> > > > >
> > > > > Commit d3549a938b73 ("efi/arm64: libstub: avoid SetVirtualAddressMap() when
> > > > > possible") exposed a firmware error on an Ampere Altra machine that was
> > > > > causing the machine to panic. Then commit 23715a26c8d8 ("arm64: efi:
> > > > > Recover from synchronous exceptions occurring in firmware") made the EFI
> > > > > exception non-fatal, and disabled runtime services when the exception
> > > > > happens. The interaction between those two patches are being discussed in a
> > > > > separate thread [1], but that should be orthogonal to this.
> > > > >
> > > > > Now efi.get_time() fails and each time an error message is printed to
> > > > > dmesg, which happens several times a second and clutters dmesg
> > > > > unnecessarily, to the point it becomes unusable.
> > > > >
> > > > > I was wondering if it would be possible to turn dev_err() into a
> > > > > dev_WARN_ONCE() or do something to avoid this issue. Tried to replace
> > > > > dev_err() with dev_err_ratelimited(), and the error message was displayed
> > > > > less often (about once per second), but dmesg was still being cluttered.
> > > > >
> > > >
> > > > The question this raise is what is actually trying to read the RTC this
> > > > often?
> > > >
> > > > This should be read once at boot and maybe every time you wake up from
> > > > suspend but there is no real reason to read it multiple times per
> > > > seconds.
> > >
> > > Reverted the commit the exposed the firmware bug, which means rtc-efi works as
> > > it should. Added these debug statements to check how many times efi_read_time()
> > > is called if there are no errors:
> > >
> > > --- a/drivers/rtc/rtc-efi.c
> > > +++ b/drivers/rtc/rtc-efi.c
> > > @@ -154,6 +154,7 @@ static int efi_set_alarm(struct device *dev, struct rtc_wkalrm *wkalrm)
> > > return status == EFI_SUCCESS ? 0 : -EINVAL;
> > > }
> > >
> > > +static unsigned long i = 0;
> > > static int efi_read_time(struct device *dev, struct rtc_time *tm)
> > > {
> > > efi_status_t status;
> > > @@ -162,6 +163,9 @@ static int efi_read_time(struct device *dev, struct rtc_time *tm)
> > >
> > > status = efi.get_time(&eft, &cap);
> > >
> > > + i++;
> > > + pr_info("%s: Call number %lu\n", __func__, i);
> > > +
> > > if (status != EFI_SUCCESS) {
> > > /* should never happen */
> > > dev_err(dev, "can't read time\n");
> > >
> > > The function gets called 3 times, twice during boot and once after. I would say
> > > that efi_read_time() gets called so many times because it fails.
> > >
> >
> > It should really get called only once, at device registration when
> > CONFIG_RTC_HCTOSYS is set (which I despise):
> > https://elixir.bootlin.com/linux/latest/source/drivers/rtc/class.c#L431
> >
> > Could you maybe use dump_stack() ?
>
> Made this change to the driver:
>
> --- a/drivers/rtc/rtc-efi.c
> +++ b/drivers/rtc/rtc-efi.c
> @@ -160,6 +160,9 @@ static int efi_read_time(struct device *dev, struct rtc_time *tm)
> efi_time_t eft;
> efi_time_cap_t cap;
>
> + pr_info("efi_read_time()\n");
> + dump_stack();
> +
> status = efi.get_time(&eft, &cap);
>
> if (status != EFI_SUCCESS) {
>
> I don't know if it makes a difference, but the driver is compiled in
> (CONFIG_RTC_DRV_EFI=y) instead of it being a module.
>
> For the case when there's no synchronous exception (so runtime services are
> available):
>
> [ 16.106871] rtc_efi: efi_read_time()
> [ 16.110461] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc4-00002-g5c5bcc194a43 #85
> [ 16.118455] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
> [ 16.131221] Call trace:
> [ 16.133657] dump_backtrace.part.0+0xdc/0xf0
> [ 16.137936] show_stack+0x18/0x40
> [ 16.141241] dump_stack_lvl+0x68/0x84
> [ 16.144902] dump_stack+0x18/0x34
> [ 16.148205] efi_read_time+0x38/0xac
> [ 16.151770] __rtc_read_time+0x44/0x8c
> [ 16.155515] __rtc_read_alarm+0x27c/0x400
> [ 16.159512] __devm_rtc_register_device+0x100/0x300
> [ 16.164378] efi_rtc_probe+0xc4/0xf8
> [ 16.167951] platform_probe+0x68/0xe0
> [ 16.171609] really_probe+0xbc/0x2dc
> [ 16.175174] __driver_probe_device+0x78/0xe0
> [ 16.179432] driver_probe_device+0x3c/0x160
> [ 16.183604] __driver_attach+0x80/0x190
> [ 16.187428] bus_for_each_dev+0x70/0xd0
> [ 16.191254] driver_attach+0x24/0x30
> [ 16.194817] bus_add_driver+0x150/0x200
> [ 16.198641] driver_register+0x78/0x130
> [ 16.202466] __platform_driver_probe+0x58/0x11c
> [ 16.206985] efi_rtc_driver_init+0x24/0x30
> [ 16.211070] do_one_initcall+0x50/0x1c0
> [ 16.214895] kernel_init_freeable+0x214/0x280
> [ 16.219245] kernel_init+0x24/0x12c
> [ 16.222725] ret_from_fork+0x10/0x20
> [ 16.226874] rtc-efi rtc-efi.0: registered as rtc0
> [ 16.231577] rtc_efi: efi_read_time()
> [ 16.235142] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc4-00002-g5c5bcc194a43 #85
> [ 16.243133] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
> [ 16.255897] Call trace:
> [ 16.258331] dump_backtrace.part.0+0xdc/0xf0
> [ 16.262589] show_stack+0x18/0x40
> [ 16.265893] dump_stack_lvl+0x68/0x84
> [ 16.269544] dump_stack+0x18/0x34
> [ 16.272847] efi_read_time+0x38/0xac
> [ 16.276411] __rtc_read_time+0x44/0x8c
> [ 16.280148] rtc_read_time+0x3c/0x70
> [ 16.283711] __devm_rtc_register_device+0x16c/0x300
> [ 16.288575] efi_rtc_probe+0xc4/0xf8
> [ 16.292139] platform_probe+0x68/0xe0
> [ 16.295790] really_probe+0xbc/0x2dc
> [ 16.299353] __driver_probe_device+0x78/0xe0
> [ 16.303611] driver_probe_device+0x3c/0x160
> [ 16.307783] __driver_attach+0x80/0x190
> [ 16.311607] bus_for_each_dev+0x70/0xd0
> [ 16.315431] driver_attach+0x24/0x30
> [ 16.318994] bus_add_driver+0x150/0x200
> [ 16.322817] driver_register+0x78/0x130
> [ 16.326642] __platform_driver_probe+0x58/0x11c
> [ 16.331160] efi_rtc_driver_init+0x24/0x30
> [ 16.335245] do_one_initcall+0x50/0x1c0
> [ 16.339069] kernel_init_freeable+0x214/0x280
> [ 16.343414] kernel_init+0x24/0x12c
> [ 16.346891] ret_from_fork+0x10/0x20
> [ 16.351038] rtc-efi rtc-efi.0: setting system clock to 2022-11-10T14:57:06 UTC (1668092226)
> [..]
> [ 55.871457] rtc_efi: efi_read_time()
> [ 55.871483] CPU: 20 PID: 1744 Comm: kworker/20:2 Not tainted 6.1.0-rc4-00002-g5c5bcc194a43 #85
> [ 55.871495] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
> [ 55.871500] Workqueue: events rtc_timer_do_work
> [ 55.871529] Call trace:
> [ 55.871533] dump_backtrace.part.0+0xdc/0xf0
> [ 55.871553] show_stack+0x18/0x40
> [ 55.871560] dump_stack_lvl+0x68/0x84
> [ 55.871573] dump_stack+0x18/0x34
> [ 55.871578] efi_read_time+0x38/0xac
> [ 55.871584] __rtc_read_time+0x44/0x8c
> [ 55.871589] rtc_timer_do_work+0x64/0x1c0
> [ 55.871593] process_one_work+0x1d0/0x320
> [ 55.871604] worker_thread+0x14c/0x444
> [ 55.871610] kthread+0x10c/0x110
> [ 55.871617] ret_from_fork+0x10/0x20
>
> Running a v6.1-rc4 kernel, after the EFI synchronous abort, so runtime services
> are disabled:
>
> [ 55.547383] rtc_efi: efi_read_time()
> [ 55.547394] CPU: 1 PID: 829 Comm: kworker/1:1 Tainted: G W I 6.1.0-rc4-00001-gbd8082c3515b #86
> [ 55.547397] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
> [ 55.547399] Workqueue: events rtc_timer_do_work
> [ 55.547405] Call trace:
> [ 55.547406] dump_backtrace.part.0+0xdc/0xf0
> [ 55.547411] show_stack+0x18/0x40
> [ 55.547414] dump_stack_lvl+0x68/0x84
> [ 55.547418] dump_stack+0x18/0x34
> [ 55.547420] efi_read_time+0x38/0xac
> [ 55.547422] __rtc_read_time+0x44/0x8c
> [ 55.547424] rtc_timer_do_work+0x64/0x1c0
> [ 55.547425] process_one_work+0x1d0/0x320
> [ 55.547429] worker_thread+0x14c/0x444
> [ 55.547432] kthread+0x10c/0x110
> [ 55.547435] ret_from_fork+0x10/0x20
> [ 55.547437] efi: EFI Runtime Services are disabled!
> [ 55.547441] rtc-efi rtc-efi.0: can't read time
> [ 56.522572] efi: EFI Runtime Services are disabled!
> [ 56.522602] rtc_efi: efi_read_time()
> [ 56.522607] CPU: 46 PID: 883 Comm: kworker/46:1 Tainted: G W I 6.1.0-rc4-00001-gbd8082c3515b #86
> [ 56.522620] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
> [ 56.522627] Workqueue: events rtc_timer_do_work
> [ 56.522640] Call trace:
> [ 56.522644] dump_backtrace.part.0+0xdc/0xf0
> [ 56.522658] show_stack+0x18/0x40
> [ 56.522670] dump_stack_lvl+0x68/0x84
> [ 56.522680] dump_stack+0x18/0x34
> [ 56.522689] efi_read_time+0x38/0xac
> [ 56.522696] __rtc_read_time+0x44/0x8c
> [ 56.522703] rtc_timer_do_work+0x64/0x1c0
> [ 56.522710] process_one_work+0x1d0/0x320
> [ 56.522721] worker_thread+0x14c/0x444
> [ 56.522731] kthread+0x10c/0x110
> [ 56.522740] ret_from_fork+0x10/0x20
> [ 56.522751] rtc-efi rtc-efi.0: can't read time
> [ 56.527202] CPU: 46 PID: 883 Comm: kworker/46:1 Tainted: G W I 6.1.0-rc4-00001-gbd8082c3515b #86
> [ 56.527205] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
> [ 56.527207] Workqueue: events rtc_timer_do_work
> [ 56.527209] Call trace:
> [ 56.527210] dump_backtrace.part.0+0xdc/0xf0
> [ 56.527213] show_stack+0x18/0x40
> [ 56.527217] dump_stack_lvl+0x68/0x84
> [ 56.527219] dump_stack+0x18/0x34
> [ 56.527222] efi_read_time+0x38/0xac
> [ 56.527224] __rtc_read_time+0x44/0x8c
> [ 56.527226] rtc_timer_do_work+0x64/0x1c0
> [ 56.527227] process_one_work+0x1d0/0x320
> [ 56.527230] worker_thread+0x14c/0x444
> [ 56.527234] kthread+0x10c/0x110
> [ 56.527236] ret_from_fork+0x10/0x20
> [ 56.527239] rtc-efi rtc-efi.0: can't read time
> [ 56.531689] rtc_efi: efi_read_time()
> [ 56.531700] CPU: 61 PID: 900 Comm: kworker/61:1 Tainted: G W I 6.1.0-rc4-00001-gbd8082c3515b #86
> [ 56.531705] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
> [ 56.531708] Workqueue: events rtc_timer_do_work
> [ 56.531714] Call trace:
> [ 56.531716] dump_backtrace.part.0+0xdc/0xf0
> [ 56.531723] show_stack+0x18/0x40
> [ 56.531728] dump_stack_lvl+0x68/0x84
> [ 56.531732] dump_stack+0x18/0x34
> [ 56.531737] efi_read_time+0x38/0xac
> [ 56.531741] __rtc_read_time+0x44/0x8c
> [ 56.531743] rtc_timer_do_work+0x64/0x1c0
> [ 56.531747] process_one_work+0x1d0/0x320
> [ 56.531752] worker_thread+0x14c/0x444
> [ 56.531757] kthread+0x10c/0x110
> [ 56.531761] ret_from_fork+0x10/0x20
> [ 56.531768] rtc-efi rtc-efi.0: can't read time
> [..]
>
> I've removed the first two GetTime() calls that happens before runtime
> services become unavailable, they looked similar to the calls in the first
> scenario. Can provide full log if needed.
>

Apologies - it appears I failed to follow up here.

I am sending out a patch that changes the dev_err to dev_err_once(),
as it is kind of immaterial whether the failure occurs once or
multiple times - the underlying issue is not going to be any
different.

That said, it would still be interesting to understand why this is
happening. Unfortunately, the EFI runtime calls are made from a
separate worker kthread, so the backtrace will not give you the
caller.

Recent kernels will give you the backtrace of the calling thread as
well, so if you apply the diff below, you should be able to tell where
the calls are coming from

diff --git a/drivers/firmware/efi/runtime-wrappers.c
b/drivers/firmware/efi/runtime-wrappers.c
index 7feee3d9c2bfbeec..2920849570b239d6 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -62,6 +62,7 @@ struct efi_runtime_work efi_rts_work;
\
if (!efi_enabled(EFI_RUNTIME_SERVICES)) { \
pr_warn_once("EFI Runtime Services are disabled!\n"); \
+ WARN_ON(1); \
goto exit; \
} \
\