2023-12-01 16:15:03

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH] tty: max310x: work around regmap->regcache data corruption

(I like your analysis and a workaround, however there are quite a few
style minors, see below)

On Fri, Dec 1, 2023 at 5:56 PM Jan Kundrát <[email protected]> wrote:
>
> The TL;DR summary is that the regmap_noinc_write spills over the data

regmap_noinc_write()
(we refer to function as func() in the text and code comments)

> that are correctly written to the HW also to the following registers in
> the regcache. As a result, regcache then contains user-controlled
> garbage which will be used later for bit updates on unrelated registers.
>
> This patch is a "wrong" fix; a real fix would involve fixing regmap
> and/or regcache, but that code has too many indirections for my little
> mind.
>
> I was investigating a regression that happened somewhere between 5.12.4

v5.12.4
(you even used v later)

> (plus 14 of our patches) and v6.5.9 (plus 7 of our patches). Our
> MAX14830 UART would work fine the first time, but when our application
> opens the UART the second time it just wouldn't send anything over the
> physical TX pin. With the help of a logical analyzer, I found out that
> the kernel was sending value 0xcd to the MODE1 register, which on this
> chip is a request to set the UART's TX pin to the Hi-Z mode and to
> switch off RX completely. That's certainly not the intention of the
> code, but that's what I was seeing on the physical SPI bus, and also in
> the log when I instrumented the regmap layer.
>
> It turned out that one of the *data* bytes which were sent over the UART
> was 0xdd, and that this *data byte* somehow ended up in the regcache's
> idea about the value within the MODE1 register. When the UART is opened
> up the next time and max310x_startup updates a single unrelated bit in
> MODE1, that code consults the regcache, notices the 0xdd data byte in
> there, and ends up sending 0xcd over SPI.
>
> Here's what dump_stack() shows:

According to the Submitting Patches, this can be reduced to show only
important lines.

> max310x spi1.2: regcache_write: reg 0x9 value 0xdd
> max310x spi1.2: PWNED

> CPU: 1 PID: 26 Comm: kworker/1:1 Not tainted 6.5.9-7-g9e090fe75fd8 #7

This is not important...

> Hardware name: Marvell Armada 380/385 (Device Tree)
> Workqueue: events max310x_tx_proc

> unwind_backtrace from show_stack+0x10/0x14
> show_stack from dump_stack_lvl+0x40/0x4c
> dump_stack_lvl from regcache_write+0xc0/0xc4

...nor these...

> regcache_write from _regmap_raw_write_impl+0x178/0x828
> _regmap_raw_write_impl from _regmap_raw_write+0xb8/0x134
> _regmap_raw_write from regmap_noinc_write+0x130/0x178
> regmap_noinc_write from max310x_tx_proc+0xd4/0x1a4
> max310x_tx_proc from process_one_work+0x21c/0x4e4

> process_one_work from worker_thread+0x50/0x54c
> worker_thread from kthread+0xe0/0xfc
> kthread from ret_from_fork+0x14/0x28

...neither of these.

> Clearly, regmap_noinc_write of a register 0x00 (that's the TX FIFO on

()

> this chip) has no business updating register 0x09, but that's what was
> happening here. The regmap_config is already set up in a way that
> register 0x00 is marked precious and volatile, so it has no business
> going through the cache at all. Also, the documentation for
> regmap_noinc_write suggests that this driver was using the regmap

()

> infrastructure correctly, and that the real bug is somewhere in
> regmap/regcache where a call to regmap_noinc_write end up updating an

()
ends

> unrelated register in regcache.
>
> Until regmap/regcache is fixed, let's just use an adapted version of the
> old code that bypasses regmap altogether, and just sends out an SPI
> transaction.
>
> This is related to my commit 3f42b142ea1171967e40e10e4b0241c0d6d28d41
> ("serial: max310x: fix IO data corruption in batched operations") which

12 characters of the SHA-1 is enough (for now).

> introduced usage of regmap_noinc_write() to this driver. That commit is
> a fixup of commit 285e76fc049c4d32c772eea9460a7ef28a193802 ("serial:
> max310x: use regmap methods for SPI batch operations") which started

Ditto.

> using regmap_raw_write(), which was however also a wrong function.

Oh, cool! Here you used () :-)

> Fixes: 3f42b142ea11 ("serial: max310x: fix IO data corruption in batched operations")
> Fixes: 285e76fc049c ("serial: max310x: use regmap methods for SPI batch operations")
> Signed-off-by: Jan Kundrát <[email protected]>

> To: Mark Brown <[email protected]>
> To: Cosmin Tanislav <[email protected]>
> To: [email protected]
> Cc: Andy Shevchenko <[email protected]>
> Cc: [email protected]

"To:" lines are so unusual.

I have kinda "smart" script [1] that has a heuristics that works in
99.9+% cases well. Consider using it or taking an idea from it.

[1]: https://github.com/andy-shev/home-bin-tools/blob/master/ge2maintainer.sh

...

> static void max310x_batch_write(struct uart_port *port, u8 *txbuf, unsigned int len)
> {
> - struct max310x_one *one = to_max310x_port(port);
> -
> - regmap_noinc_write(one->regmap, MAX310X_THR_REG, txbuf, len);
> + const u8 header = (port->iobase * 0x20 + MAX310X_THR_REG) | MAX310X_WRITE_BIT;
> + struct spi_transfer xfer[] = {
> + {
> + .tx_buf = &header,
> + .len = 1,
> + },
> + {
> + .tx_buf = txbuf,
> + .len = len,
> + },
> + };
> + spi_sync_transfer(to_spi_device(port->dev), xfer, ARRAY_SIZE(xfer));

Can you add a comment like FIXME explaining why regmap is not used
here, otherwise I will expect some clever guy to "fix" this again.

> }
>
> static void max310x_batch_read(struct uart_port *port, u8 *rxbuf, unsigned int len)
> {
> - struct max310x_one *one = to_max310x_port(port);
> -
> - regmap_noinc_read(one->regmap, MAX310X_RHR_REG, rxbuf, len);
> + const u8 header = port->iobase * 0x20 + MAX310X_RHR_REG;
> + struct spi_transfer xfer[] = {
> + {
> + .tx_buf = &header,
> + .len = 1,
> + },
> + {
> + .rx_buf = rxbuf,
> + .len = len,
> + },
> + };
> + spi_sync_transfer(to_spi_device(port->dev), xfer, ARRAY_SIZE(xfer));

Ditto.

> }

--
With Best Regards,
Andy Shevchenko


2023-12-01 16:23:27

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH] tty: max310x: work around regmap->regcache data corruption

On Fri, Dec 01, 2023 at 06:14:02PM +0200, Andy Shevchenko wrote:

> > + };
> > + spi_sync_transfer(to_spi_device(port->dev), xfer, ARRAY_SIZE(xfer));

> Can you add a comment like FIXME explaining why regmap is not used
> here, otherwise I will expect some clever guy to "fix" this again.

Or just do the fix in the right place rather than throwing a bodge
about...


Attachments:
(No filename) (388.00 B)
signature.asc (499.00 B)
Download all attachments