2017-09-27 21:35:53

by Shawn N

[permalink] [raw]
Subject: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling

For host commands that take a long time to process, cros ec can return
early by signaling a EC_RES_IN_PROGRESS result. The host must then poll
status with EC_CMD_GET_COMMS_STATUS until completion of the command.

None of the above applies when data link errors are encountered. When
errors such as EC_SPI_PAST_END are encountered during command
transmission, it usually means the command was not received by the EC.
Treating such errors as if they were 'EC_RES_IN_PROGRESS' results is
almost always the wrong decision, and can result in host commands
silently being lost.

Reported-and-tested-by: Jon Hunter <[email protected]>
Signed-off-by: Shawn Nematbakhsh <[email protected]>
---
drivers/mfd/cros_ec_spi.c | 52 ++++++++++++++++++++++-------------------------
1 file changed, 24 insertions(+), 28 deletions(-)

diff --git a/drivers/mfd/cros_ec_spi.c b/drivers/mfd/cros_ec_spi.c
index c9714072e224..d019b5b00b67 100644
--- a/drivers/mfd/cros_ec_spi.c
+++ b/drivers/mfd/cros_ec_spi.c
@@ -377,6 +377,7 @@ static int cros_ec_pkt_xfer_spi(struct cros_ec_device *ec_dev,
u8 *ptr;
u8 *rx_buf;
u8 sum;
+ u8 rx_byte;
int ret = 0, final_ret;

len = cros_ec_prepare_tx(ec_dev, ec_msg);
@@ -421,25 +422,22 @@ static int cros_ec_pkt_xfer_spi(struct cros_ec_device *ec_dev,
if (!ret) {
/* Verify that EC can process command */
for (i = 0; i < len; i++) {
- switch (rx_buf[i]) {
- case EC_SPI_PAST_END:
- case EC_SPI_RX_BAD_DATA:
- case EC_SPI_NOT_READY:
- ret = -EAGAIN;
- ec_msg->result = EC_RES_IN_PROGRESS;
- default:
+ rx_byte = rx_buf[i];
+ if (rx_byte == EC_SPI_PAST_END ||
+ rx_byte == EC_SPI_RX_BAD_DATA ||
+ rx_byte == EC_SPI_NOT_READY) {
+ ret = -EREMOTEIO;
break;
}
- if (ret)
- break;
}
- if (!ret)
- ret = cros_ec_spi_receive_packet(ec_dev,
- ec_msg->insize + sizeof(*response));
- } else {
- dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
}

+ if (!ret)
+ ret = cros_ec_spi_receive_packet(ec_dev,
+ ec_msg->insize + sizeof(*response));
+ else
+ dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
+
final_ret = terminate_request(ec_dev);

spi_bus_unlock(ec_spi->spi->master);
@@ -508,6 +506,7 @@ static int cros_ec_cmd_xfer_spi(struct cros_ec_device *ec_dev,
int i, len;
u8 *ptr;
u8 *rx_buf;
+ u8 rx_byte;
int sum;
int ret = 0, final_ret;

@@ -544,25 +543,22 @@ static int cros_ec_cmd_xfer_spi(struct cros_ec_device *ec_dev,
if (!ret) {
/* Verify that EC can process command */
for (i = 0; i < len; i++) {
- switch (rx_buf[i]) {
- case EC_SPI_PAST_END:
- case EC_SPI_RX_BAD_DATA:
- case EC_SPI_NOT_READY:
- ret = -EAGAIN;
- ec_msg->result = EC_RES_IN_PROGRESS;
- default:
+ rx_byte = rx_buf[i];
+ if (rx_byte == EC_SPI_PAST_END ||
+ rx_byte == EC_SPI_RX_BAD_DATA ||
+ rx_byte == EC_SPI_NOT_READY) {
+ ret = -EREMOTEIO;
break;
}
- if (ret)
- break;
}
- if (!ret)
- ret = cros_ec_spi_receive_response(ec_dev,
- ec_msg->insize + EC_MSG_TX_PROTO_BYTES);
- } else {
- dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
}

+ if (!ret)
+ ret = cros_ec_spi_receive_response(ec_dev,
+ ec_msg->insize + EC_MSG_TX_PROTO_BYTES);
+ else
+ dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
+
final_ret = terminate_request(ec_dev);

spi_bus_unlock(ec_spi->spi->master);
--
2.12.2


2017-09-27 22:19:14

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling

On Wed, Sep 27, 2017 at 02:35:27PM -0700, Shawn Nematbakhsh wrote:
> For host commands that take a long time to process, cros ec can return
> early by signaling a EC_RES_IN_PROGRESS result. The host must then poll
> status with EC_CMD_GET_COMMS_STATUS until completion of the command.
>
> None of the above applies when data link errors are encountered. When
> errors such as EC_SPI_PAST_END are encountered during command
> transmission, it usually means the command was not received by the EC.
> Treating such errors as if they were 'EC_RES_IN_PROGRESS' results is
> almost always the wrong decision, and can result in host commands
> silently being lost.
>
> Reported-and-tested-by: Jon Hunter <[email protected]>
> Signed-off-by: Shawn Nematbakhsh <[email protected]>
> ---
> drivers/mfd/cros_ec_spi.c | 52 ++++++++++++++++++++++-------------------------
> 1 file changed, 24 insertions(+), 28 deletions(-)

I'm still not sure I understand the full extent of the
originally-reported error (it's still likely a SPI transport issue?),
but I believe this patch is good anyway:

Reviewed-by: Brian Norris <[email protected]>

I wonder if we should tone down the BUG_ON()'s in drivers/mfd/cros_ec*
and drivers/platform/chrome/* too. That's basically a no-no these days,
as all of these type of things should be able to gracefully propagate
errors, no matter how "unlikely" it should be to see a crazy protocol
version number or a bad message length.

2018-03-26 16:49:29

by Enric Balletbo Serra

[permalink] [raw]
Subject: Re: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling

Dear all,

Cc'ing some more chromium folks.

2017-11-29 13:11 GMT+01:00 Lee Jones <[email protected]>:
> On Wed, 27 Sep 2017, Shawn Nematbakhsh wrote:
>
>> For host commands that take a long time to process, cros ec can return
>> early by signaling a EC_RES_IN_PROGRESS result. The host must then poll
>> status with EC_CMD_GET_COMMS_STATUS until completion of the command.
>>
>> None of the above applies when data link errors are encountered. When
>> errors such as EC_SPI_PAST_END are encountered during command
>> transmission, it usually means the command was not received by the EC.
>> Treating such errors as if they were 'EC_RES_IN_PROGRESS' results is
>> almost always the wrong decision, and can result in host commands
>> silently being lost.
>>
>> Reported-and-tested-by: Jon Hunter <[email protected]>
>> Signed-off-by: Shawn Nematbakhsh <[email protected]>
>> ---
>> drivers/mfd/cros_ec_spi.c | 52 ++++++++++++++++++++++-------------------------
>> 1 file changed, 24 insertions(+), 28 deletions(-)
>
> Applied, thanks.
>

This patch is a bit old and is already applied but I would like to
discuss an issue that Tomeu found playing with kernelci and a Veyron
Jaq attached to our lab.

Seems that after this patch, the veyron jaq spits out lots of spi
tranfer error messages. See full log here [1]

cros-ec-spi spi0.0: spi transfer failed: -121
cros-ec-spi spi0.0: Command xfer error (err:-121)
cros-ec-i2c-tunnel ff110000.spi:ec@0:i2c-tunnel: Error transferring
EC i2c message -121

The issue is random, not always happens, but when happens makes
cros-ec unusable. Reproduce the issue is easier if CONFIG_SMP is not
set.

What happens is that the master starts the transmission and the
cros-ec returns an spi error event (EC_SPI_RX_BAD_DATA - data is
0xfb). The difference between after and before the patch is that now
the cros-ec does not recover, whereas without this patch after some
tries it succeeds (note that before the patch the affected code
returned -EAGAIN whereas now returns -EREMOTEIO)

I think that reverting this patch is NOT the solution, but I don't
have enough knowledge yet to understand why the cros-ec fails. I need
to look at the cros-ec firmware to understand better what is
happening, but meanwhile, thoughts? clues?

An important note is that I did not reproduce the issue on a Veyron
Minnie, even with CONFIG_SMP disabled.

[1] https://lava.collabora.co.uk/scheduler/job/1085493#L905

Best regards,
Enric

> --
> Lee Jones
> Linaro STMicroelectronics Landing Team Lead
> Linaro.org │ Open source software for ARM SoCs
> Follow Linaro: Facebook | Twitter | Blog

2018-03-26 17:28:19

by Alexandru M Stan

[permalink] [raw]
Subject: Re: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling

Hello Enric

On Mon, Mar 26, 2018 at 9:48 AM, Enric Balletbo Serra
<[email protected]> wrote:
> Dear all,
>
> Cc'ing some more chromium folks.
>
> 2017-11-29 13:11 GMT+01:00 Lee Jones <[email protected]>:
>> On Wed, 27 Sep 2017, Shawn Nematbakhsh wrote:
>>
>>> For host commands that take a long time to process, cros ec can return
>>> early by signaling a EC_RES_IN_PROGRESS result. The host must then poll
>>> status with EC_CMD_GET_COMMS_STATUS until completion of the command.
>>>
>>> None of the above applies when data link errors are encountered. When
>>> errors such as EC_SPI_PAST_END are encountered during command
>>> transmission, it usually means the command was not received by the EC.
>>> Treating such errors as if they were 'EC_RES_IN_PROGRESS' results is
>>> almost always the wrong decision, and can result in host commands
>>> silently being lost.
>>>
>>> Reported-and-tested-by: Jon Hunter <[email protected]>
>>> Signed-off-by: Shawn Nematbakhsh <[email protected]>
>>> ---
>>> drivers/mfd/cros_ec_spi.c | 52 ++++++++++++++++++++++-------------------------
>>> 1 file changed, 24 insertions(+), 28 deletions(-)
>>
>> Applied, thanks.
>>
>
> This patch is a bit old and is already applied but I would like to
> discuss an issue that Tomeu found playing with kernelci and a Veyron
> Jaq attached to our lab.
>
> Seems that after this patch, the veyron jaq spits out lots of spi
> tranfer error messages. See full log here [1]
>
> cros-ec-spi spi0.0: spi transfer failed: -121
> cros-ec-spi spi0.0: Command xfer error (err:-121)
> cros-ec-i2c-tunnel ff110000.spi:ec@0:i2c-tunnel: Error transferring
> EC i2c message -121
>
> The issue is random, not always happens, but when happens makes
> cros-ec unusable. Reproduce the issue is easier if CONFIG_SMP is not
> set.
>
> What happens is that the master starts the transmission and the
> cros-ec returns an spi error event (EC_SPI_RX_BAD_DATA - data is
> 0xfb). The difference between after and before the patch is that now
> the cros-ec does not recover, whereas without this patch after some
> tries it succeeds (note that before the patch the affected code
> returned -EAGAIN whereas now returns -EREMOTEIO)
>
> I think that reverting this patch is NOT the solution, but I don't
> have enough knowledge yet to understand why the cros-ec fails. I need
> to look at the cros-ec firmware to understand better what is
> happening, but meanwhile, thoughts? clues?
>
> An important note is that I did not reproduce the issue on a Veyron
> Minnie, even with CONFIG_SMP disabled.
>
> [1] https://lava.collabora.co.uk/scheduler/job/1085493#L905
>
> Best regards,
> Enric
>
>> --
>> Lee Jones
>> Linaro STMicroelectronics Landing Team Lead
>> Linaro.org │ Open source software for ARM SoCs
>> Follow Linaro: Facebook | Twitter | Blog

Would it be possible for you to run "ectool version" on both your
machines? Based on the code the EC is running we might have some hints
on what the differences are.

You can find both ectool and the code the ec runs on
https://chromium.googlesource.com/chromiumos/platform/ec/+/firmware-veyron-6588.B.
Though I would use ectool from the master branch.

One thing I suspect is different is that veyron_minnie regularly polls
an accelerometer, depending on the userspace you're running it's
possible it unwedged itself with a few accelerometer requests.

2018-03-27 10:51:49

by Enric Balletbo Serra

[permalink] [raw]
Subject: Re: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling

Hi Alexandru

2018-03-26 19:26 GMT+02:00 Alexandru M Stan <[email protected]>:
> Hello Enric
>
> On Mon, Mar 26, 2018 at 9:48 AM, Enric Balletbo Serra
> <[email protected]> wrote:
>> Dear all,
>>
>> Cc'ing some more chromium folks.
>>
>> 2017-11-29 13:11 GMT+01:00 Lee Jones <[email protected]>:
>>> On Wed, 27 Sep 2017, Shawn Nematbakhsh wrote:
>>>
>>>> For host commands that take a long time to process, cros ec can return
>>>> early by signaling a EC_RES_IN_PROGRESS result. The host must then poll
>>>> status with EC_CMD_GET_COMMS_STATUS until completion of the command.
>>>>
>>>> None of the above applies when data link errors are encountered. When
>>>> errors such as EC_SPI_PAST_END are encountered during command
>>>> transmission, it usually means the command was not received by the EC.
>>>> Treating such errors as if they were 'EC_RES_IN_PROGRESS' results is
>>>> almost always the wrong decision, and can result in host commands
>>>> silently being lost.
>>>>
>>>> Reported-and-tested-by: Jon Hunter <[email protected]>
>>>> Signed-off-by: Shawn Nematbakhsh <[email protected]>
>>>> ---
>>>> drivers/mfd/cros_ec_spi.c | 52 ++++++++++++++++++++++-------------------------
>>>> 1 file changed, 24 insertions(+), 28 deletions(-)
>>>
>>> Applied, thanks.
>>>
>>
>> This patch is a bit old and is already applied but I would like to
>> discuss an issue that Tomeu found playing with kernelci and a Veyron
>> Jaq attached to our lab.
>>
>> Seems that after this patch, the veyron jaq spits out lots of spi
>> tranfer error messages. See full log here [1]
>>
>> cros-ec-spi spi0.0: spi transfer failed: -121
>> cros-ec-spi spi0.0: Command xfer error (err:-121)
>> cros-ec-i2c-tunnel ff110000.spi:ec@0:i2c-tunnel: Error transferring
>> EC i2c message -121
>>
>> The issue is random, not always happens, but when happens makes
>> cros-ec unusable. Reproduce the issue is easier if CONFIG_SMP is not
>> set.
>>
>> What happens is that the master starts the transmission and the
>> cros-ec returns an spi error event (EC_SPI_RX_BAD_DATA - data is
>> 0xfb). The difference between after and before the patch is that now
>> the cros-ec does not recover, whereas without this patch after some
>> tries it succeeds (note that before the patch the affected code
>> returned -EAGAIN whereas now returns -EREMOTEIO)
>>
>> I think that reverting this patch is NOT the solution, but I don't
>> have enough knowledge yet to understand why the cros-ec fails. I need
>> to look at the cros-ec firmware to understand better what is
>> happening, but meanwhile, thoughts? clues?
>>
>> An important note is that I did not reproduce the issue on a Veyron
>> Minnie, even with CONFIG_SMP disabled.
>>
>> [1] https://lava.collabora.co.uk/scheduler/job/1085493#L905
>>
>> Best regards,
>> Enric
>>
>>> --
>>> Lee Jones
>>> Linaro STMicroelectronics Landing Team Lead
>>> Linaro.org │ Open source software for ARM SoCs
>>> Follow Linaro: Facebook | Twitter | Blog
>
> Would it be possible for you to run "ectool version" on both your
> machines? Based on the code the EC is running we might have some hints
> on what the differences are.
>

I think that accessing to the ec console should give the same result, right?

In such case here is:

Veyron Minnie ( ASUS Chromebook Flip C100PA )
-------------------------------------------------------------------
Chip: stm stm32f07x
Board: 0
RO: minnie_v1.1.2697-faafaa5
RW: minnie_v1.1.2712-242f6bd
Build: minnie_v1.1.2712-242f6bd 2016-11-03 00:34:41 @build196-m2

Veyron Jaq ( Haier Mighty MP )
--------------------------------------------------------------------
Chip: stm stm32f07x
Board: 0
RO: mighty_v1.1.2680-6727e1d
RW: mighty_v1.1.2712-242f6bd
Build: mighty_v1.1.2680-6727e1d 2015-03-24 01:12:48 @build290-m2

We're running the RW firmware and I just discovered that our jaq is a
mighty (but I guess it's the same?)

Thanks,
Enric

> You can find both ectool and the code the ec runs on
> https://chromium.googlesource.com/chromiumos/platform/ec/+/firmware-veyron-6588.B.
> Though I would use ectool from the master branch.
>
> One thing I suspect is different is that veyron_minnie regularly polls
> an accelerometer, depending on the userspace you're running it's
> possible it unwedged itself with a few accelerometer requests.

2018-03-29 22:10:41

by Alexandru M Stan

[permalink] [raw]
Subject: Re: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling

On Tue, Mar 27, 2018 at 3:49 AM, Enric Balletbo Serra
<[email protected]> wrote:
> I think that accessing to the ec console should give the same result, right?

Yep, even better.

>
> In such case here is:
>
> Veyron Minnie ( ASUS Chromebook Flip C100PA )
> -------------------------------------------------------------------
> Chip: stm stm32f07x
> Board: 0
> RO: minnie_v1.1.2697-faafaa5
> RW: minnie_v1.1.2712-242f6bd
> Build: minnie_v1.1.2712-242f6bd 2016-11-03 00:34:41 @build196-m2
>
> Veyron Jaq ( Haier Mighty MP )
> --------------------------------------------------------------------
> Chip: stm stm32f07x
> Board: 0
> RO: mighty_v1.1.2680-6727e1d
> RW: mighty_v1.1.2712-242f6bd
> Build: mighty_v1.1.2680-6727e1d 2015-03-24 01:12:48 @build290-m2

Looks like your mighty is running the RO firmware, whereas your minnie
runs RW. Is it possible you have the 0x200 bit in the gbb flags set on
mighty? That would prevent the RO->RW transition, and give you an
older firmware.

6727e1d..242f6bd is quite the change. I see some spi changes too,
though i believe it's mostly at power state transitions
(suspend/resume). Other changes include battery settings (yeah.. you
should really avoid running that RO if you can avoid it) and a ton of
accelerometer stuff for minnie.

If it's not the gbb flags, and we can't figure it out why you're stuck
in RO, you can also use "sysjump RW" to force the RW copy on mighty.
See if there's any behavior changes in what you care about.

>
> We're running the RW firmware and I just discovered that our jaq is a
> mighty (but I guess it's the same?)

They're essentially the same, but they're running slightly different
firmware. In practice the only difference is that mighty's firmware
reads an extra gpio for the battery presence.

Feel free to diff the board/{jaq,mighty} ec folders for yourself for
more details/assurances.

>
> Thanks,
> Enric

All in all I'm not sure that the version differences are enough to
explain the spi errors you see in the kernel.

My bet is back to the accelerometer stuff:
Are you running chromeos ui on this device (is there a chromeos-chrome
process constantly polling the accelerometer, so asking the cros-ec
driver for transfers)?
Another thing to make sure accelerometer is disabled is to run
"accelrate 0 0" on the minnie EC.
If none of that accelerometer stuff is enabled, minnie should
essentially act like a mighty/jaq.

2017-11-29 12:12:19

by Lee Jones

[permalink] [raw]
Subject: Re: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling

On Wed, 27 Sep 2017, Shawn Nematbakhsh wrote:

> For host commands that take a long time to process, cros ec can return
> early by signaling a EC_RES_IN_PROGRESS result. The host must then poll
> status with EC_CMD_GET_COMMS_STATUS until completion of the command.
>
> None of the above applies when data link errors are encountered. When
> errors such as EC_SPI_PAST_END are encountered during command
> transmission, it usually means the command was not received by the EC.
> Treating such errors as if they were 'EC_RES_IN_PROGRESS' results is
> almost always the wrong decision, and can result in host commands
> silently being lost.
>
> Reported-and-tested-by: Jon Hunter <[email protected]>
> Signed-off-by: Shawn Nematbakhsh <[email protected]>
> ---
> drivers/mfd/cros_ec_spi.c | 52 ++++++++++++++++++++++-------------------------
> 1 file changed, 24 insertions(+), 28 deletions(-)

Applied, thanks.

--
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog

From 1585401129148958358@xxx Wed Nov 29 11:52:18 +0000 2017
X-GM-THRID: 1579730367073760625
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread

2017-11-29 11:52:17

by Jon Hunter

[permalink] [raw]
Subject: Re: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling

Hi Lee,

On 14/11/17 17:00, Shawn N wrote:
> On Wed, Sep 27, 2017 at 3:19 PM, Brian Norris <[email protected]> wrote:
>> On Wed, Sep 27, 2017 at 02:35:27PM -0700, Shawn Nematbakhsh wrote:
>>> For host commands that take a long time to process, cros ec can return
>>> early by signaling a EC_RES_IN_PROGRESS result. The host must then poll
>>> status with EC_CMD_GET_COMMS_STATUS until completion of the command.
>>>
>>> None of the above applies when data link errors are encountered. When
>>> errors such as EC_SPI_PAST_END are encountered during command
>>> transmission, it usually means the command was not received by the EC.
>>> Treating such errors as if they were 'EC_RES_IN_PROGRESS' results is
>>> almost always the wrong decision, and can result in host commands
>>> silently being lost.
>>>
>>> Reported-and-tested-by: Jon Hunter <[email protected]>
>>> Signed-off-by: Shawn Nematbakhsh <[email protected]>
>>> ---
>>> drivers/mfd/cros_ec_spi.c | 52 ++++++++++++++++++++++-------------------------
>>> 1 file changed, 24 insertions(+), 28 deletions(-)
>>
>> I'm still not sure I understand the full extent of the
>> originally-reported error (it's still likely a SPI transport issue?),
>> but I believe this patch is good anyway:
>>
>> Reviewed-by: Brian Norris <[email protected]>
>
> Jon tracked down the root cause of the originally-reported error, but
> we should still land this patch, as it fixes error signaling that was
> previously broken.

Can you also queue this one as a fix for v4.15?

Cheers
Jon

--
nvpublic

From 1584736922386945423@xxx Wed Nov 22 03:55:01 +0000 2017
X-GM-THRID: 1579730367073760625
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread

2017-11-22 03:55:01

by Benson Leung

[permalink] [raw]
Subject: Re: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling

Hi Shawn,

On Wed, Sep 27, 2017 at 02:35:27PM -0700, Shawn Nematbakhsh wrote:
> For host commands that take a long time to process, cros ec can return
> early by signaling a EC_RES_IN_PROGRESS result. The host must then poll
> status with EC_CMD_GET_COMMS_STATUS until completion of the command.
>
> None of the above applies when data link errors are encountered. When
> errors such as EC_SPI_PAST_END are encountered during command
> transmission, it usually means the command was not received by the EC.
> Treating such errors as if they were 'EC_RES_IN_PROGRESS' results is
> almost always the wrong decision, and can result in host commands
> silently being lost.
>
> Reported-and-tested-by: Jon Hunter <[email protected]>
> Signed-off-by: Shawn Nematbakhsh <[email protected]>

Acked-by: Benson Leung <[email protected]>

> ---
> drivers/mfd/cros_ec_spi.c | 52 ++++++++++++++++++++++-------------------------
> 1 file changed, 24 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/mfd/cros_ec_spi.c b/drivers/mfd/cros_ec_spi.c
> index c9714072e224..d019b5b00b67 100644
> --- a/drivers/mfd/cros_ec_spi.c
> +++ b/drivers/mfd/cros_ec_spi.c
> @@ -377,6 +377,7 @@ static int cros_ec_pkt_xfer_spi(struct cros_ec_device *ec_dev,
> u8 *ptr;
> u8 *rx_buf;
> u8 sum;
> + u8 rx_byte;
> int ret = 0, final_ret;
>
> len = cros_ec_prepare_tx(ec_dev, ec_msg);
> @@ -421,25 +422,22 @@ static int cros_ec_pkt_xfer_spi(struct cros_ec_device *ec_dev,
> if (!ret) {
> /* Verify that EC can process command */
> for (i = 0; i < len; i++) {
> - switch (rx_buf[i]) {
> - case EC_SPI_PAST_END:
> - case EC_SPI_RX_BAD_DATA:
> - case EC_SPI_NOT_READY:
> - ret = -EAGAIN;
> - ec_msg->result = EC_RES_IN_PROGRESS;
> - default:
> + rx_byte = rx_buf[i];
> + if (rx_byte == EC_SPI_PAST_END ||
> + rx_byte == EC_SPI_RX_BAD_DATA ||
> + rx_byte == EC_SPI_NOT_READY) {
> + ret = -EREMOTEIO;
> break;
> }
> - if (ret)
> - break;
> }
> - if (!ret)
> - ret = cros_ec_spi_receive_packet(ec_dev,
> - ec_msg->insize + sizeof(*response));
> - } else {
> - dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
> }
>
> + if (!ret)
> + ret = cros_ec_spi_receive_packet(ec_dev,
> + ec_msg->insize + sizeof(*response));
> + else
> + dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
> +
> final_ret = terminate_request(ec_dev);
>
> spi_bus_unlock(ec_spi->spi->master);
> @@ -508,6 +506,7 @@ static int cros_ec_cmd_xfer_spi(struct cros_ec_device *ec_dev,
> int i, len;
> u8 *ptr;
> u8 *rx_buf;
> + u8 rx_byte;
> int sum;
> int ret = 0, final_ret;
>
> @@ -544,25 +543,22 @@ static int cros_ec_cmd_xfer_spi(struct cros_ec_device *ec_dev,
> if (!ret) {
> /* Verify that EC can process command */
> for (i = 0; i < len; i++) {
> - switch (rx_buf[i]) {
> - case EC_SPI_PAST_END:
> - case EC_SPI_RX_BAD_DATA:
> - case EC_SPI_NOT_READY:
> - ret = -EAGAIN;
> - ec_msg->result = EC_RES_IN_PROGRESS;
> - default:
> + rx_byte = rx_buf[i];
> + if (rx_byte == EC_SPI_PAST_END ||
> + rx_byte == EC_SPI_RX_BAD_DATA ||
> + rx_byte == EC_SPI_NOT_READY) {
> + ret = -EREMOTEIO;
> break;
> }
> - if (ret)
> - break;
> }
> - if (!ret)
> - ret = cros_ec_spi_receive_response(ec_dev,
> - ec_msg->insize + EC_MSG_TX_PROTO_BYTES);
> - } else {
> - dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
> }
>
> + if (!ret)
> + ret = cros_ec_spi_receive_response(ec_dev,
> + ec_msg->insize + EC_MSG_TX_PROTO_BYTES);
> + else
> + dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
> +
> final_ret = terminate_request(ec_dev);
>
> spi_bus_unlock(ec_spi->spi->master);
> --
> 2.12.2
>

--
Benson Leung
Staff Software Engineer
Chrome OS Kernel
Google Inc.
[email protected]
Chromium OS Project
[email protected]


Attachments:
(No filename) (3.91 kB)
signature.asc (836.00 B)
Digital signature
Download all attachments

2017-11-14 17:39:51

by Shawn N

[permalink] [raw]
Subject: Re: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling

On Wed, Sep 27, 2017 at 3:19 PM, Brian Norris <[email protected]> wrote:
> On Wed, Sep 27, 2017 at 02:35:27PM -0700, Shawn Nematbakhsh wrote:
>> For host commands that take a long time to process, cros ec can return
>> early by signaling a EC_RES_IN_PROGRESS result. The host must then poll
>> status with EC_CMD_GET_COMMS_STATUS until completion of the command.
>>
>> None of the above applies when data link errors are encountered. When
>> errors such as EC_SPI_PAST_END are encountered during command
>> transmission, it usually means the command was not received by the EC.
>> Treating such errors as if they were 'EC_RES_IN_PROGRESS' results is
>> almost always the wrong decision, and can result in host commands
>> silently being lost.
>>
>> Reported-and-tested-by: Jon Hunter <[email protected]>
>> Signed-off-by: Shawn Nematbakhsh <[email protected]>
>> ---
>> drivers/mfd/cros_ec_spi.c | 52 ++++++++++++++++++++++-------------------------
>> 1 file changed, 24 insertions(+), 28 deletions(-)
>
> I'm still not sure I understand the full extent of the
> originally-reported error (it's still likely a SPI transport issue?),
> but I believe this patch is good anyway:
>
> Reviewed-by: Brian Norris <[email protected]>

Jon tracked down the root cause of the originally-reported error, but
we should still land this patch, as it fixes error signaling that was
previously broken.

>
> I wonder if we should tone down the BUG_ON()'s in drivers/mfd/cros_ec*
> and drivers/platform/chrome/* too. That's basically a no-no these days,
> as all of these type of things should be able to gracefully propagate
> errors, no matter how "unlikely" it should be to see a crazy protocol
> version number or a bad message length.

From 1579733012745858952@xxx Wed Sep 27 22:20:01 +0000 2017
X-GM-THRID: 1579730367073760625
X-Gmail-Labels: Inbox,Category Forums