2021-02-18 14:26:14

by Bruno Thomsen

[permalink] [raw]
Subject: watchdog: pcf2127: systemd fails on 5.11

Hi,

After updating the kernel from 5.8.17 to 5.11 systemd (246.6) is
unable to init watchdog in pcf2127 during boot. Kernel option
CONFIG_WATCHDOG_OPEN_TIMEOUT=300 is working as expected.
It's possible to get watchdog from userspace working in
the following 2 ways.
1) Disable watchdog in systemd and use busybox watchdog.
2) Restart systemd after boot with "kill 1".

During boot setting the system clock from RTC is working.
RTC read/write from userland with hwclock is also working.

DTS: imx7d-flex-concentrator-mfg.dts
SOC: NXP i.MX7D
Drivers: rtc-pcf2127, spi-imx
Communication: SPI

There are no patches applied to the kernel.

When systemd changes watchdog timeout it receives an
error that to our best knowledge comes from spi-imx[1].

We suspect it's a race condition between drivers or
incompatible error handling.

Any help in investigating the issue is appreciated.

/Bruno

[1] https://elixir.bootlin.com/linux/v5.11/source/drivers/spi/spi-imx.c#L1424

Relevant boot trace of pcf2127 and systemd:

Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: pcf2127_probe
Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0:
pcf2127_rtc_read_time: raw data is cr3=08, sec=37, min=26, hr=09,
mday=18, wday=04, mon=02, year=21
Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0:
pcf2127_rtc_read_time: tm is secs=37, mins=26, hours=9, mday=18,
mon=1, year=121, wday=4
Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: char device (252:0)
Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: registered as rtc0
Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0:
pcf2127_rtc_read_time: raw data is cr3=08, sec=37, min=26, hr=09,
mday=18, wday=04, mon=02, year=21
Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0:
pcf2127_rtc_read_time: tm is secs=37, mins=26, hours=9, mday=18,
mon=1, year=121, wday=4
Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: setting system
clock to 2021-02-18T09:26:37 UTC (1613640397)
Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: I/O Error in PIO
Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: SPI transfer failed: -110
Feb 18 09:26:46 tqma7 kernel: spi_master spi1: failed to transfer one
message from queue
Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: I/O Error in PIO
Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: SPI transfer failed: -110
Feb 18 09:26:46 tqma7 kernel: spi_master spi1: failed to transfer one
message from queue
Feb 18 09:26:46 tqma7 systemd[1]: Hardware watchdog 'NXP
PCF2127/PCF2129 Watchdog', version 0
Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: new watchdog
timeout: 120s (old: 60s)
Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: I/O Error in PIO
Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: SPI transfer failed: -110
Feb 18 09:26:46 tqma7 kernel: spi_master spi1: failed to transfer one
message from queue
Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0:
pcf2127_wdt_active_ping: watchdog restart failed, ret=-110
Feb 18 09:26:46 tqma7 systemd[1]: Failed to set timeout to 120s:
Connection timed out


Relevant trace after systemd restart (kill 1):

Feb 02 10:42:39 tqma7 kernel: watchdog: watchdog0: nowayout prevents
watchdog being stopped!
Feb 02 10:42:39 tqma7 kernel: watchdog: watchdog0: nowayout prevents
watchdog being stopped!
Feb 02 10:42:39 tqma7 kernel: watchdog: watchdog0: watchdog did not stop!
Feb 02 10:42:39 tqma7 systemd[1]: systemd 246.6 running in system
mode. (-PAM -AUDIT -SELINUX -IMA -APPARMOR -SMACK -SYSVINIT -UTMP
-LIBCRYPTSETUP -GCRYPT -GNUTLS -ACL +XZ -LZ4 +ZSTD +SECCOMP +BLKID
+ELFUTILS +KMOD -IDN2 -IDN -PCRE2 default-hierarchy=unified)
Feb 02 10:42:39 tqma7 systemd[1]: Detected architecture arm.
Feb 02 10:42:40 tqma7 systemd[1]: Hardware watchdog 'NXP
PCF2127/PCF2129 Watchdog', version 0
Feb 02 10:42:40 tqma7 kernel: rtc-pcf2127-spi spi1.0: new watchdog
timeout: 30s (old: 30s)
Feb 02 10:42:40 tqma7 systemd[1]: Set hardware watchdog to 30s.


2021-02-22 22:46:40

by Guenter Roeck

[permalink] [raw]
Subject: Re: watchdog: pcf2127: systemd fails on 5.11

On Thu, Feb 18, 2021 at 01:35:36PM +0100, Bruno Thomsen wrote:
> Hi,
>
> After updating the kernel from 5.8.17 to 5.11 systemd (246.6) is
> unable to init watchdog in pcf2127 during boot. Kernel option
> CONFIG_WATCHDOG_OPEN_TIMEOUT=300 is working as expected.
> It's possible to get watchdog from userspace working in
> the following 2 ways.
> 1) Disable watchdog in systemd and use busybox watchdog.
> 2) Restart systemd after boot with "kill 1".
>
> During boot setting the system clock from RTC is working.
> RTC read/write from userland with hwclock is also working.
>
> DTS: imx7d-flex-concentrator-mfg.dts
> SOC: NXP i.MX7D
> Drivers: rtc-pcf2127, spi-imx
> Communication: SPI
>
> There are no patches applied to the kernel.
>
> When systemd changes watchdog timeout it receives an
> error that to our best knowledge comes from spi-imx[1].
>
> We suspect it's a race condition between drivers or
> incompatible error handling.
>
> Any help in investigating the issue is appreciated.
>
Difficult to say without access to hardware. The code does have a
potential problem, though: It calls pcf2127_wdt_ping not only from
watchdog code but also from various rtc related functions, but there
is not access protection. This is even more concerning because the ping
function is called from an interrupt handler. At the same time, the
watchdog initialization sets min_hw_heartbeat_ms to 500, which suggests
that there may be a minimum time between heartbeats (which is clearly
violated by the current code).

Guenter

> /Bruno
>
> [1] https://elixir.bootlin.com/linux/v5.11/source/drivers/spi/spi-imx.c#L1424
>
> Relevant boot trace of pcf2127 and systemd:
>
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: pcf2127_probe
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0:
> pcf2127_rtc_read_time: raw data is cr3=08, sec=37, min=26, hr=09,
> mday=18, wday=04, mon=02, year=21
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0:
> pcf2127_rtc_read_time: tm is secs=37, mins=26, hours=9, mday=18,
> mon=1, year=121, wday=4
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: char device (252:0)
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: registered as rtc0
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0:
> pcf2127_rtc_read_time: raw data is cr3=08, sec=37, min=26, hr=09,
> mday=18, wday=04, mon=02, year=21
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0:
> pcf2127_rtc_read_time: tm is secs=37, mins=26, hours=9, mday=18,
> mon=1, year=121, wday=4
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: setting system
> clock to 2021-02-18T09:26:37 UTC (1613640397)
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: I/O Error in PIO
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: SPI transfer failed: -110
> Feb 18 09:26:46 tqma7 kernel: spi_master spi1: failed to transfer one
> message from queue
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: I/O Error in PIO
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: SPI transfer failed: -110
> Feb 18 09:26:46 tqma7 kernel: spi_master spi1: failed to transfer one
> message from queue
> Feb 18 09:26:46 tqma7 systemd[1]: Hardware watchdog 'NXP
> PCF2127/PCF2129 Watchdog', version 0
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: new watchdog
> timeout: 120s (old: 60s)
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: I/O Error in PIO
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: SPI transfer failed: -110
> Feb 18 09:26:46 tqma7 kernel: spi_master spi1: failed to transfer one
> message from queue
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0:
> pcf2127_wdt_active_ping: watchdog restart failed, ret=-110
> Feb 18 09:26:46 tqma7 systemd[1]: Failed to set timeout to 120s:
> Connection timed out
>
>
> Relevant trace after systemd restart (kill 1):
>
> Feb 02 10:42:39 tqma7 kernel: watchdog: watchdog0: nowayout prevents
> watchdog being stopped!
> Feb 02 10:42:39 tqma7 kernel: watchdog: watchdog0: nowayout prevents
> watchdog being stopped!
> Feb 02 10:42:39 tqma7 kernel: watchdog: watchdog0: watchdog did not stop!
> Feb 02 10:42:39 tqma7 systemd[1]: systemd 246.6 running in system
> mode. (-PAM -AUDIT -SELINUX -IMA -APPARMOR -SMACK -SYSVINIT -UTMP
> -LIBCRYPTSETUP -GCRYPT -GNUTLS -ACL +XZ -LZ4 +ZSTD +SECCOMP +BLKID
> +ELFUTILS +KMOD -IDN2 -IDN -PCRE2 default-hierarchy=unified)
> Feb 02 10:42:39 tqma7 systemd[1]: Detected architecture arm.
> Feb 02 10:42:40 tqma7 systemd[1]: Hardware watchdog 'NXP
> PCF2127/PCF2129 Watchdog', version 0
> Feb 02 10:42:40 tqma7 kernel: rtc-pcf2127-spi spi1.0: new watchdog
> timeout: 30s (old: 30s)
> Feb 02 10:42:40 tqma7 systemd[1]: Set hardware watchdog to 30s.

2021-02-24 15:36:37

by Bruno Thomsen

[permalink] [raw]
Subject: Re: watchdog: pcf2127: systemd fails on 5.11

Den man. 22. feb. 2021 kl. 23.43 skrev Guenter Roeck <[email protected]>:
>
> On Thu, Feb 18, 2021 at 01:35:36PM +0100, Bruno Thomsen wrote:
> > Hi,
> >
> > After updating the kernel from 5.8.17 to 5.11 systemd (246.6) is
> > unable to init watchdog in pcf2127 during boot. Kernel option
> > CONFIG_WATCHDOG_OPEN_TIMEOUT=300 is working as expected.
> > It's possible to get watchdog from userspace working in
> > the following 2 ways.
> > 1) Disable watchdog in systemd and use busybox watchdog.
> > 2) Restart systemd after boot with "kill 1".
> >
> > During boot setting the system clock from RTC is working.
> > RTC read/write from userland with hwclock is also working.
> >
> > DTS: imx7d-flex-concentrator-mfg.dts
> > SOC: NXP i.MX7D
> > Drivers: rtc-pcf2127, spi-imx
> > Communication: SPI
> >
> > There are no patches applied to the kernel.
> >
> > When systemd changes watchdog timeout it receives an
> > error that to our best knowledge comes from spi-imx[1].
> >
> > We suspect it's a race condition between drivers or
> > incompatible error handling.
> >
> > Any help in investigating the issue is appreciated.
> >
> Difficult to say without access to hardware. The code does have a
> potential problem, though: It calls pcf2127_wdt_ping not only from
> watchdog code but also from various rtc related functions, but there
> is not access protection. This is even more concerning because the ping
> function is called from an interrupt handler. At the same time, the
> watchdog initialization sets min_hw_heartbeat_ms to 500, which suggests
> that there may be a minimum time between heartbeats (which is clearly
> violated by the current code).

Hi Guenter

Thanks for input.

You could be right about that, I don't think the watchdog feature should
be available for use if the alarm feature is enabled due to how CTRL2
register behaves.

The hardware I am testing on is a custom board, but it's actually
possible to get a Raspberry Pi module called RasClock that has
the chip.

I will test some locking around WD_VAL register access as that is used
in pcf2127_wdt_ping function.

My initial test shows that spin_lock_irqsave around regmap calls are not
a good idea as it result in:
BUG: scheduling while atomic: watchdog/70/0x00000002
BUG: scheduling while atomic: systemd/1/0x00000002

/Bruno

2021-02-24 15:38:27

by Alexandre Belloni

[permalink] [raw]
Subject: Re: watchdog: pcf2127: systemd fails on 5.11

Hi,

On 24/02/2021 15:55:00+0100, Bruno Thomsen wrote:
> You could be right about that, I don't think the watchdog feature should
> be available for use if the alarm feature is enabled due to how CTRL2
> register behaves.
>
> The hardware I am testing on is a custom board, but it's actually
> possible to get a Raspberry Pi module called RasClock that has
> the chip.
>

I have an eval board for the PCF2127 (and PCF2129), the OM13513.

> I will test some locking around WD_VAL register access as that is used
> in pcf2127_wdt_ping function.
>
> My initial test shows that spin_lock_irqsave around regmap calls are not
> a good idea as it result in:
> BUG: scheduling while atomic: watchdog/70/0x00000002
> BUG: scheduling while atomic: systemd/1/0x00000002
>

The issue is not only regmap but the fact that i2C and spi accesses are
allowed to sleep.


--
Alexandre Belloni, co-owner and COO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2021-02-24 15:46:47

by Guenter Roeck

[permalink] [raw]
Subject: Re: watchdog: pcf2127: systemd fails on 5.11

On Wed, Feb 24, 2021 at 04:00:14PM +0100, Alexandre Belloni wrote:
> Hi,
>
> On 24/02/2021 15:55:00+0100, Bruno Thomsen wrote:
> > You could be right about that, I don't think the watchdog feature should
> > be available for use if the alarm feature is enabled due to how CTRL2
> > register behaves.
> >
> > The hardware I am testing on is a custom board, but it's actually
> > possible to get a Raspberry Pi module called RasClock that has
> > the chip.
> >
>
> I have an eval board for the PCF2127 (and PCF2129), the OM13513.
>
> > I will test some locking around WD_VAL register access as that is used
> > in pcf2127_wdt_ping function.
> >
> > My initial test shows that spin_lock_irqsave around regmap calls are not
> > a good idea as it result in:
> > BUG: scheduling while atomic: watchdog/70/0x00000002
> > BUG: scheduling while atomic: systemd/1/0x00000002
> >
>
> The issue is not only regmap but the fact that i2C and spi accesses are
> allowed to sleep.
>
Correct, those would have to be mutexes.

Thanks,
Guenter