2023-11-28 14:46:57

by Mario Limonciello

[permalink] [raw]
Subject: [PATCH v4 0/4] Extend time to wait for UIP for some callers

A number of users have reported their system will have a failure reading
the RTC around s2idle entry or exit.

This failure manifests as UIP clear taking longer than 10ms.

By a debugging patch provided by Mateusz Jończyk it is shown that this
has taken upwards of 300ms in some cases.

This series adjusts the UIP timeout to be configurable by the caller and
changes some callers which aren't called in an interrupt context to allow
longer timeouts.

Mario Limonciello (4):
rtc: mc146818-lib: Adjust failure return code for mc146818_get_time()
rtc: Adjust failure return code for cmos_set_alarm()
rtc: Add support for configuring the UIP timeout for RTC reads
rtc: Extend timeout for waiting for UIP to clear to 1s

arch/alpha/kernel/rtc.c | 2 +-
arch/x86/kernel/hpet.c | 2 +-
arch/x86/kernel/rtc.c | 2 +-
drivers/base/power/trace.c | 2 +-
drivers/rtc/rtc-cmos.c | 10 ++++-----
drivers/rtc/rtc-mc146818-lib.c | 39 ++++++++++++++++++++++++++--------
include/linux/mc146818rtc.h | 3 ++-
7 files changed, 41 insertions(+), 19 deletions(-)


base-commit: 2cc14f52aeb78ce3f29677c2de1f06c0e91471ab
--
2.34.1


2023-11-28 14:46:57

by Mario Limonciello

[permalink] [raw]
Subject: [PATCH v4 2/4] rtc: Adjust failure return code for cmos_set_alarm()

When mc146818_avoid_UIP() fails to return a valid value, this is because
UIP didn't clear in the timeout period. Adjust the return code in this
case to -ETIMEDOUT.

Tested-by: Mateusz Jończyk <[email protected]>
Reviewed-by: Mateusz Jończyk <[email protected]>
Acked-by: Mateusz Jończyk <[email protected]>
Cc: [email protected]
Fixes: cdedc45c579f ("rtc: cmos: avoid UIP when reading alarm time")
Fixes: cd17420ebea5 ("rtc: cmos: avoid UIP when writing alarm time")
Signed-off-by: Mario Limonciello <[email protected]>
---
v1->v2:
* cover cmos_read_alarm() too
* Add stable tags
---
drivers/rtc/rtc-cmos.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
index 228fb2d11c70..391f91d3d144 100644
--- a/drivers/rtc/rtc-cmos.c
+++ b/drivers/rtc/rtc-cmos.c
@@ -292,7 +292,7 @@ static int cmos_read_alarm(struct device *dev, struct rtc_wkalrm *t)

/* This not only a rtc_op, but also called directly */
if (!is_valid_irq(cmos->irq))
- return -EIO;
+ return -ETIMEDOUT;

/* Basic alarms only support hour, minute, and seconds fields.
* Some also support day and month, for alarms up to a year in
@@ -557,7 +557,7 @@ static int cmos_set_alarm(struct device *dev, struct rtc_wkalrm *t)
* Use mc146818_avoid_UIP() to avoid this.
*/
if (!mc146818_avoid_UIP(cmos_set_alarm_callback, &p))
- return -EIO;
+ return -ETIMEDOUT;

cmos->alarm_expires = rtc_tm_to_time64(&t->time);

--
2.34.1

2023-11-28 14:47:06

by Mario Limonciello

[permalink] [raw]
Subject: [PATCH v4 4/4] rtc: Extend timeout for waiting for UIP to clear to 1s

Specs don't say anything about UIP being cleared within 10ms. They
only say that UIP won't occur for another 244uS. If a long NMI occurs
while UIP is still updating it might not be possible to get valid
data in 10ms.

This has been observed in the wild that around s2idle some calls can
take up to 480ms before UIP is clear.

Adjust callers from outside an interrupt context to wait for up to a
1s instead of 10ms.

Cc: [email protected] # 6.1.y
Fixes: ec5895c0f2d8 ("rtc: mc146818-lib: extract mc146818_avoid_UIP")
Reported-by: Carsten Hatger <[email protected]>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217626
Tested-by: Mateusz Jończyk <[email protected]>
Reviewed-by: Mateusz Jończyk <[email protected]>
Acked-by: Mateusz Jończyk <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
---
v2->v3:
* Add tags
* Add Carsten's name
v1->v2:
* Add tags
---
arch/x86/kernel/rtc.c | 2 +-
drivers/base/power/trace.c | 2 +-
drivers/rtc/rtc-cmos.c | 2 +-
drivers/rtc/rtc-mc146818-lib.c | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c
index 961ebc7f1872..2e7066980f3e 100644
--- a/arch/x86/kernel/rtc.c
+++ b/arch/x86/kernel/rtc.c
@@ -67,7 +67,7 @@ void mach_get_cmos_time(struct timespec64 *now)
return;
}

- if (mc146818_get_time(&tm, 10)) {
+ if (mc146818_get_time(&tm, 1000)) {
pr_err("Unable to read current time from RTC\n");
now->tv_sec = now->tv_nsec = 0;
return;
diff --git a/drivers/base/power/trace.c b/drivers/base/power/trace.c
index c2e925357474..cd6e559648b2 100644
--- a/drivers/base/power/trace.c
+++ b/drivers/base/power/trace.c
@@ -120,7 +120,7 @@ static unsigned int read_magic_time(void)
struct rtc_time time;
unsigned int val;

- if (mc146818_get_time(&time, 10) < 0) {
+ if (mc146818_get_time(&time, 1000) < 0) {
pr_err("Unable to read current time from RTC\n");
return 0;
}
diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
index d278b085821e..6bc1e0279cd9 100644
--- a/drivers/rtc/rtc-cmos.c
+++ b/drivers/rtc/rtc-cmos.c
@@ -231,7 +231,7 @@ static int cmos_read_time(struct device *dev, struct rtc_time *t)
if (!pm_trace_rtc_valid())
return -EIO;

- ret = mc146818_get_time(t, 10);
+ ret = mc146818_get_time(t, 1000);
if (ret < 0) {
dev_err_ratelimited(dev, "unable to read current time\n");
return ret;
diff --git a/drivers/rtc/rtc-mc146818-lib.c b/drivers/rtc/rtc-mc146818-lib.c
index 4c17b3cef11e..651bf3c279c7 100644
--- a/drivers/rtc/rtc-mc146818-lib.c
+++ b/drivers/rtc/rtc-mc146818-lib.c
@@ -93,7 +93,7 @@ EXPORT_SYMBOL_GPL(mc146818_avoid_UIP);
*/
bool mc146818_does_rtc_work(void)
{
- return mc146818_avoid_UIP(NULL, 10, NULL);
+ return mc146818_avoid_UIP(NULL, 1000, NULL);
}
EXPORT_SYMBOL_GPL(mc146818_does_rtc_work);

--
2.34.1

2023-11-28 14:47:08

by Mario Limonciello

[permalink] [raw]
Subject: [PATCH v4 1/4] rtc: mc146818-lib: Adjust failure return code for mc146818_get_time()

mc146818_get_time() calls mc146818_avoid_UIP() to avoid fetching the
time while RTC update is in progress (UIP). When this fails, the return
code is -EIO, but actually there was no IO failure.

The reason for the return from mc146818_avoid_UIP() is that the UIP
wasn't cleared in the time period. Adjust the return code to -ETIMEDOUT
to match the behavior.

Tested-by: Mateusz Jończyk <[email protected]>
Reviewed-by: Mateusz Jończyk <[email protected]>
Acked-by: Mateusz Jończyk <[email protected]>
Cc: [email protected]
Fixes: 2a61b0ac5493 ("rtc: mc146818-lib: refactor mc146818_get_time")
Signed-off-by: Mario Limonciello <[email protected]>
---
v2->v3:
* Add tags
v1->v2:
* Add stable and fixes tag
---
drivers/rtc/rtc-mc146818-lib.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/rtc/rtc-mc146818-lib.c b/drivers/rtc/rtc-mc146818-lib.c
index f1c09f1db044..43a28e82674e 100644
--- a/drivers/rtc/rtc-mc146818-lib.c
+++ b/drivers/rtc/rtc-mc146818-lib.c
@@ -138,7 +138,7 @@ int mc146818_get_time(struct rtc_time *time)

if (!mc146818_avoid_UIP(mc146818_get_time_callback, &p)) {
memset(time, 0, sizeof(*time));
- return -EIO;
+ return -ETIMEDOUT;
}

if (!(p.ctrl & RTC_DM_BINARY) || RTC_ALWAYS_BCD)
--
2.34.1

2023-11-28 14:47:15

by Mario Limonciello

[permalink] [raw]
Subject: [PATCH v4 3/4] rtc: Add support for configuring the UIP timeout for RTC reads

The UIP timeout is hardcoded to 10ms for all RTC reads, but in some
contexts this might not be enough time. Add a timeout parameter to
mc146818_get_time() and mc146818_get_time_callback().

If UIP timeout is configured by caller to be >=100 ms and a call
takes this long, log a warning.

Make all callers use 10ms to ensure no functional changes.

Cc: [email protected] # 6.1.y
Fixes: ec5895c0f2d8 ("rtc: mc146818-lib: extract mc146818_avoid_UIP")
Signed-off-by: Mario Limonciello <[email protected]>
---
v3->v4:
* Change label
* Logic adjustment for for loop
* Fix pr_warn variable
v2->v3:
* Logic adjustments
* Clarify warning message
v1->v2:
* Add a warning if 100ms or more
* Add stable and fixes tags
---
arch/alpha/kernel/rtc.c | 2 +-
arch/x86/kernel/hpet.c | 2 +-
arch/x86/kernel/rtc.c | 2 +-
drivers/base/power/trace.c | 2 +-
drivers/rtc/rtc-cmos.c | 6 +++---
drivers/rtc/rtc-mc146818-lib.c | 37 ++++++++++++++++++++++++++--------
include/linux/mc146818rtc.h | 3 ++-
7 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/arch/alpha/kernel/rtc.c b/arch/alpha/kernel/rtc.c
index fb3025396ac9..cfdf90bc8b3f 100644
--- a/arch/alpha/kernel/rtc.c
+++ b/arch/alpha/kernel/rtc.c
@@ -80,7 +80,7 @@ init_rtc_epoch(void)
static int
alpha_rtc_read_time(struct device *dev, struct rtc_time *tm)
{
- int ret = mc146818_get_time(tm);
+ int ret = mc146818_get_time(tm, 10);

if (ret < 0) {
dev_err_ratelimited(dev, "unable to read current time\n");
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 41eecf180b7f..17adad4cbe78 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -1438,7 +1438,7 @@ irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id)
memset(&curr_time, 0, sizeof(struct rtc_time));

if (hpet_rtc_flags & (RTC_UIE | RTC_AIE)) {
- if (unlikely(mc146818_get_time(&curr_time) < 0)) {
+ if (unlikely(mc146818_get_time(&curr_time, 10) < 0)) {
pr_err_ratelimited("unable to read current time from RTC\n");
return IRQ_HANDLED;
}
diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c
index 1309b9b05338..961ebc7f1872 100644
--- a/arch/x86/kernel/rtc.c
+++ b/arch/x86/kernel/rtc.c
@@ -67,7 +67,7 @@ void mach_get_cmos_time(struct timespec64 *now)
return;
}

- if (mc146818_get_time(&tm)) {
+ if (mc146818_get_time(&tm, 10)) {
pr_err("Unable to read current time from RTC\n");
now->tv_sec = now->tv_nsec = 0;
return;
diff --git a/drivers/base/power/trace.c b/drivers/base/power/trace.c
index 72b7a92337b1..c2e925357474 100644
--- a/drivers/base/power/trace.c
+++ b/drivers/base/power/trace.c
@@ -120,7 +120,7 @@ static unsigned int read_magic_time(void)
struct rtc_time time;
unsigned int val;

- if (mc146818_get_time(&time) < 0) {
+ if (mc146818_get_time(&time, 10) < 0) {
pr_err("Unable to read current time from RTC\n");
return 0;
}
diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
index 391f91d3d144..d278b085821e 100644
--- a/drivers/rtc/rtc-cmos.c
+++ b/drivers/rtc/rtc-cmos.c
@@ -231,7 +231,7 @@ static int cmos_read_time(struct device *dev, struct rtc_time *t)
if (!pm_trace_rtc_valid())
return -EIO;

- ret = mc146818_get_time(t);
+ ret = mc146818_get_time(t, 10);
if (ret < 0) {
dev_err_ratelimited(dev, "unable to read current time\n");
return ret;
@@ -307,7 +307,7 @@ static int cmos_read_alarm(struct device *dev, struct rtc_wkalrm *t)
*
* Use the mc146818_avoid_UIP() function to avoid this.
*/
- if (!mc146818_avoid_UIP(cmos_read_alarm_callback, &p))
+ if (!mc146818_avoid_UIP(cmos_read_alarm_callback, 10, &p))
return -EIO;

if (!(p.rtc_control & RTC_DM_BINARY) || RTC_ALWAYS_BCD) {
@@ -556,7 +556,7 @@ static int cmos_set_alarm(struct device *dev, struct rtc_wkalrm *t)
*
* Use mc146818_avoid_UIP() to avoid this.
*/
- if (!mc146818_avoid_UIP(cmos_set_alarm_callback, &p))
+ if (!mc146818_avoid_UIP(cmos_set_alarm_callback, 10, &p))
return -ETIMEDOUT;

cmos->alarm_expires = rtc_tm_to_time64(&t->time);
diff --git a/drivers/rtc/rtc-mc146818-lib.c b/drivers/rtc/rtc-mc146818-lib.c
index 43a28e82674e..4c17b3cef11e 100644
--- a/drivers/rtc/rtc-mc146818-lib.c
+++ b/drivers/rtc/rtc-mc146818-lib.c
@@ -8,26 +8,31 @@
#include <linux/acpi.h>
#endif

+#define UIP_RECHECK_DELAY 100 /* usec */
+#define UIP_RECHECK_DELAY_MS (USEC_PER_MSEC / UIP_RECHECK_DELAY)
+#define UIP_RECHECK_LOOPS_MS(x) (x / UIP_RECHECK_DELAY_MS)
+
/*
* Execute a function while the UIP (Update-in-progress) bit of the RTC is
- * unset.
+ * unset. The timeout is configurable by the caller in ms.
*
* Warning: callback may be executed more then once.
*/
bool mc146818_avoid_UIP(void (*callback)(unsigned char seconds, void *param),
+ int timeout,
void *param)
{
int i;
unsigned long flags;
unsigned char seconds;

- for (i = 0; i < 100; i++) {
+ for (i = 0; UIP_RECHECK_LOOPS_MS(i) < timeout; i++) {
spin_lock_irqsave(&rtc_lock, flags);

/*
* Check whether there is an update in progress during which the
* readout is unspecified. The maximum update time is ~2ms. Poll
- * every 100 usec for completion.
+ * for completion.
*
* Store the second value before checking UIP so a long lasting
* NMI which happens to hit after the UIP check cannot make
@@ -37,7 +42,7 @@ bool mc146818_avoid_UIP(void (*callback)(unsigned char seconds, void *param),

if (CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP) {
spin_unlock_irqrestore(&rtc_lock, flags);
- udelay(100);
+ udelay(UIP_RECHECK_DELAY);
continue;
}

@@ -56,7 +61,7 @@ bool mc146818_avoid_UIP(void (*callback)(unsigned char seconds, void *param),
*/
if (CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP) {
spin_unlock_irqrestore(&rtc_lock, flags);
- udelay(100);
+ udelay(UIP_RECHECK_DELAY);
continue;
}

@@ -72,6 +77,10 @@ bool mc146818_avoid_UIP(void (*callback)(unsigned char seconds, void *param),
}
spin_unlock_irqrestore(&rtc_lock, flags);

+ if (UIP_RECHECK_LOOPS_MS(i) >= 100)
+ pr_warn("Reading current time from RTC took around %li ms\n",
+ UIP_RECHECK_LOOPS_MS(i));
+
return true;
}
return false;
@@ -84,7 +93,7 @@ EXPORT_SYMBOL_GPL(mc146818_avoid_UIP);
*/
bool mc146818_does_rtc_work(void)
{
- return mc146818_avoid_UIP(NULL, NULL);
+ return mc146818_avoid_UIP(NULL, 10, NULL);
}
EXPORT_SYMBOL_GPL(mc146818_does_rtc_work);

@@ -130,13 +139,25 @@ static void mc146818_get_time_callback(unsigned char seconds, void *param_in)
p->ctrl = CMOS_READ(RTC_CONTROL);
}

-int mc146818_get_time(struct rtc_time *time)
+/**
+ * mc146818_get_time - Get the current time from the RTC
+ * @time: pointer to struct rtc_time to store the current time
+ * @timeout: timeout value in ms
+ *
+ * This function reads the current time from the RTC and stores it in the
+ * provided struct rtc_time. The timeout parameter specifies the maximum
+ * time to wait for the RTC to become ready.
+ *
+ * Return: 0 on success, -ETIMEDOUT if the RTC did not become ready within
+ * the specified timeout, or another error code if an error occurred.
+ */
+int mc146818_get_time(struct rtc_time *time, int timeout)
{
struct mc146818_get_time_callback_param p = {
.time = time
};

- if (!mc146818_avoid_UIP(mc146818_get_time_callback, &p)) {
+ if (!mc146818_avoid_UIP(mc146818_get_time_callback, timeout, &p)) {
memset(time, 0, sizeof(*time));
return -ETIMEDOUT;
}
diff --git a/include/linux/mc146818rtc.h b/include/linux/mc146818rtc.h
index b0da04fe087b..34dfcc77f505 100644
--- a/include/linux/mc146818rtc.h
+++ b/include/linux/mc146818rtc.h
@@ -126,10 +126,11 @@ struct cmos_rtc_board_info {
#endif /* ARCH_RTC_LOCATION */

bool mc146818_does_rtc_work(void);
-int mc146818_get_time(struct rtc_time *time);
+int mc146818_get_time(struct rtc_time *time, int timeout);
int mc146818_set_time(struct rtc_time *time);

bool mc146818_avoid_UIP(void (*callback)(unsigned char seconds, void *param),
+ int timeout,
void *param);

#endif /* _MC146818RTC_H */
--
2.34.1

2023-12-03 21:38:13

by Mateusz Jończyk

[permalink] [raw]
Subject: Re: [PATCH v4 3/4] rtc: Add support for configuring the UIP timeout for RTC reads

// resending - first message was rejected as HTML-formatted

W dniu 28.11.2023 o 06:36, Mario Limonciello pisze:
> The UIP timeout is hardcoded to 10ms for all RTC reads, but in some
> contexts this might not be enough time. Add a timeout parameter to
> mc146818_get_time() and mc146818_get_time_callback().
>
> If UIP timeout is configured by caller to be >=100 ms and a call
> takes this long, log a warning.
>
> Make all callers use 10ms to ensure no functional changes.
>
> Cc: [email protected] # 6.1.y
> Fixes: ec5895c0f2d8 ("rtc: mc146818-lib: extract mc146818_avoid_UIP")
> Signed-off-by: Mario Limonciello <[email protected]>
> ---
> v3->v4:
> * Change label
> * Logic adjustment for for loop
> * Fix pr_warn variable
> v2->v3:
> * Logic adjustments
> * Clarify warning message
> v1->v2:
> * Add a warning if 100ms or more
> * Add stable and fixes tags
>
Hello,

Tested-by: Mateusz Jończyk <[email protected]>
Reviewed-by: Mateusz Jończyk <[email protected]>
Acked-by: Mateusz Jończyk <[email protected]>

I have retested the whole series on 2 computers
(not affected by the bug, resp. with / without HPET IRQ emulation).

Thank you for this patch series.

@Alexandre would it be possible to submit it for Linux 6.7?
There were several problem reports and it is marked for stable.

Greetings,

Mateusz

2023-12-17 21:39:43

by Alexandre Belloni

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] Extend time to wait for UIP for some callers


On Mon, 27 Nov 2023 23:36:49 -0600, Mario Limonciello wrote:
> A number of users have reported their system will have a failure reading
> the RTC around s2idle entry or exit.
>
> This failure manifests as UIP clear taking longer than 10ms.
>
> By a debugging patch provided by Mateusz Jończyk it is shown that this
> has taken upwards of 300ms in some cases.
>
> [...]

Applied, thanks!

[1/4] rtc: mc146818-lib: Adjust failure return code for mc146818_get_time()
commit: af838635a3eb9b1bc0d98599c101ebca98f31311
[2/4] rtc: Adjust failure return code for cmos_set_alarm()
commit: 1311a8f0d4b23f58bbababa13623aa40b8ad4e0c
[3/4] rtc: Add support for configuring the UIP timeout for RTC reads
commit: 120931db07b49252aba2073096b595482d71857c
[4/4] rtc: Extend timeout for waiting for UIP to clear to 1s
commit: cef9ecc8e938dd48a560f7dd9be1246359248d20


--
Alexandre Belloni, co-owner and COO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com