2020-07-03 07:20:03

by Yu-Hsuan Hsu

[permalink] [raw]
Subject: [PATCH v2] ASoC: cros_ec_codec: Log results when EC commands fail

Log results of failed EC commands to identify a problem more easily.

Replace cros_ec_cmd_xfer_status with cros_ec_cmd_xfer because the result
has already been checked in this function. The wrapper is not needed.

Signed-off-by: Yu-Hsuan Hsu <[email protected]>
---
sound/soc/codecs/cros_ec_codec.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/sound/soc/codecs/cros_ec_codec.c b/sound/soc/codecs/cros_ec_codec.c
index 8d45c628e988e..a4ab62f59efa6 100644
--- a/sound/soc/codecs/cros_ec_codec.c
+++ b/sound/soc/codecs/cros_ec_codec.c
@@ -90,10 +90,17 @@ static int send_ec_host_command(struct cros_ec_device *ec_dev, uint32_t cmd,
if (outsize)
memcpy(msg->data, out, outsize);

- ret = cros_ec_cmd_xfer_status(ec_dev, msg);
+ ret = cros_ec_cmd_xfer(ec_dev, msg);
if (ret < 0)
goto error;

+ if (msg->result != EC_RES_SUCCESS) {
+ dev_err(ec_dev->dev, "Command %d failed: %d\n", cmd,
+ msg->result);
+ ret = -EPROTO;
+ goto error;
+ }
+
if (insize)
memcpy(in, msg->data, insize);

--
2.27.0.212.ge8ba1cc988-goog


2020-07-03 07:33:23

by Tzung-Bi Shih

[permalink] [raw]
Subject: Re: [PATCH v2] ASoC: cros_ec_codec: Log results when EC commands fail

On Fri, Jul 3, 2020 at 3:19 PM Yu-Hsuan Hsu <[email protected]> wrote:
> Log results of failed EC commands to identify a problem more easily.
>
> Replace cros_ec_cmd_xfer_status with cros_ec_cmd_xfer because the result
> has already been checked in this function. The wrapper is not needed.

Alternatively, you can still use cros_ec_cmd_xfer_status( ). I guess
it is okay to have 2 logs for an error.

> diff --git a/sound/soc/codecs/cros_ec_codec.c b/sound/soc/codecs/cros_ec_codec.c
> index 8d45c628e988e..a4ab62f59efa6 100644
> --- a/sound/soc/codecs/cros_ec_codec.c
> +++ b/sound/soc/codecs/cros_ec_codec.c
> @@ -90,10 +90,17 @@ static int send_ec_host_command(struct cros_ec_device *ec_dev, uint32_t cmd,
> if (outsize)
> memcpy(msg->data, out, outsize);
>
> - ret = cros_ec_cmd_xfer_status(ec_dev, msg);
> + ret = cros_ec_cmd_xfer(ec_dev, msg);
> if (ret < 0)
I am thinking of if it is a better solution to print msg->result here.

> goto error;
>
> + if (msg->result != EC_RES_SUCCESS) {
> + dev_err(ec_dev->dev, "Command %d failed: %d\n", cmd,
> + msg->result);
> + ret = -EPROTO;
> + goto error;
> + }
So that you don't need this block.

2020-07-03 08:11:46

by Yu-Hsuan Hsu

[permalink] [raw]
Subject: Re: [PATCH v2] ASoC: cros_ec_codec: Log results when EC commands fail

Tzung-Bi Shih <[email protected]> 於 2020年7月3日 週五 下午3:32寫道:
>
> On Fri, Jul 3, 2020 at 3:19 PM Yu-Hsuan Hsu <[email protected]> wrote:
> > Log results of failed EC commands to identify a problem more easily.
> >
> > Replace cros_ec_cmd_xfer_status with cros_ec_cmd_xfer because the result
> > has already been checked in this function. The wrapper is not needed.
>
> Alternatively, you can still use cros_ec_cmd_xfer_status( ). I guess
> it is okay to have 2 logs for an error.
>
> > diff --git a/sound/soc/codecs/cros_ec_codec.c b/sound/soc/codecs/cros_ec_codec.c
> > index 8d45c628e988e..a4ab62f59efa6 100644
> > --- a/sound/soc/codecs/cros_ec_codec.c
> > +++ b/sound/soc/codecs/cros_ec_codec.c
> > @@ -90,10 +90,17 @@ static int send_ec_host_command(struct cros_ec_device *ec_dev, uint32_t cmd,
> > if (outsize)
> > memcpy(msg->data, out, outsize);
> >
> > - ret = cros_ec_cmd_xfer_status(ec_dev, msg);
> > + ret = cros_ec_cmd_xfer(ec_dev, msg);
> > if (ret < 0)
> I am thinking of if it is a better solution to print msg->result here.
The problem is the msg->result is not always meaningful.
In cros_ec_cmd_xfer_status, we know that the msg->result is meaningful
only when ret == 0. Therefore, we can not print the msg->result
directly here.

In addition, adding a conditional operator here to check whether ret
is -EPROTO is not a good way, either.
We should consider the situation that cros_ec_cmd_xfer may return
-EPROTO directly.

>
> > goto error;
> >
> > + if (msg->result != EC_RES_SUCCESS) {
> > + dev_err(ec_dev->dev, "Command %d failed: %d\n", cmd,
> > + msg->result);
> > + ret = -EPROTO;
> > + goto error;
> > + }
> So that you don't need this block.

2020-07-03 08:39:52

by Enric Balletbo i Serra

[permalink] [raw]
Subject: Re: [PATCH v2] ASoC: cros_ec_codec: Log results when EC commands fail

Hi Yu-Hsuan,

Thank you for your patch

On 3/7/20 9:19, Yu-Hsuan Hsu wrote:
> Log results of failed EC commands to identify a problem more easily.
>
> Replace cros_ec_cmd_xfer_status with cros_ec_cmd_xfer because the result
> has already been checked in this function. The wrapper is not needed.
>

Nack, we did an effort to remove all public users of cros_ec_cmd_xfer() in
favour of cros_ec_cmd_xfer_status() and you are reintroducing again. You can do
the same but using cros_ec_cmd_xfer_status(). In fact, your patch will not build
on top of the upcoming changes.

> Signed-off-by: Yu-Hsuan Hsu <[email protected]>
> ---
> sound/soc/codecs/cros_ec_codec.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/sound/soc/codecs/cros_ec_codec.c b/sound/soc/codecs/cros_ec_codec.c
> index 8d45c628e988e..a4ab62f59efa6 100644
> --- a/sound/soc/codecs/cros_ec_codec.c
> +++ b/sound/soc/codecs/cros_ec_codec.c
> @@ -90,10 +90,17 @@ static int send_ec_host_command(struct cros_ec_device *ec_dev, uint32_t cmd,
> if (outsize)
> memcpy(msg->data, out, outsize);
>
> - ret = cros_ec_cmd_xfer_status(ec_dev, msg);
> + ret = cros_ec_cmd_xfer(ec_dev, msg);
> if (ret < 0)
> goto error;
>
> + if (msg->result != EC_RES_SUCCESS) {
> + dev_err(ec_dev->dev, "Command %d failed: %d\n", cmd,
> + msg->result);
> + ret = -EPROTO;
> + goto error;
> + }
> +
> if (insize)
> memcpy(in, msg->data, insize);
>
>

2020-07-03 08:49:39

by Yu-Hsuan Hsu

[permalink] [raw]
Subject: Re: [PATCH v2] ASoC: cros_ec_codec: Log results when EC commands fail

Enric Balletbo i Serra <[email protected]> 於 2020年7月3日 週五 下午4:38寫道:
>
> Hi Yu-Hsuan,
>
> Thank you for your patch
>
> On 3/7/20 9:19, Yu-Hsuan Hsu wrote:
> > Log results of failed EC commands to identify a problem more easily.
> >
> > Replace cros_ec_cmd_xfer_status with cros_ec_cmd_xfer because the result
> > has already been checked in this function. The wrapper is not needed.
> >
>
> Nack, we did an effort to remove all public users of cros_ec_cmd_xfer() in
> favour of cros_ec_cmd_xfer_status() and you are reintroducing again. You can do
> the same but using cros_ec_cmd_xfer_status(). In fact, your patch will not build
> on top of the upcoming changes.
Thanks! But I have a question about implementing it. Does it look like
the one below?
ret = cros_ec_cmd_xfer_status(ec_dev, msg);
if (ret < 0) {
if (ret == -EPROTO)
dev_err(..., msg->result)
goto error;
}
I'm not sure whether it makes sense to check ret == -EPROTO here.

>
> > Signed-off-by: Yu-Hsuan Hsu <[email protected]>
> > ---
> > sound/soc/codecs/cros_ec_codec.c | 9 ++++++++-
> > 1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/sound/soc/codecs/cros_ec_codec.c b/sound/soc/codecs/cros_ec_codec.c
> > index 8d45c628e988e..a4ab62f59efa6 100644
> > --- a/sound/soc/codecs/cros_ec_codec.c
> > +++ b/sound/soc/codecs/cros_ec_codec.c
> > @@ -90,10 +90,17 @@ static int send_ec_host_command(struct cros_ec_device *ec_dev, uint32_t cmd,
> > if (outsize)
> > memcpy(msg->data, out, outsize);
> >
> > - ret = cros_ec_cmd_xfer_status(ec_dev, msg);
> > + ret = cros_ec_cmd_xfer(ec_dev, msg);
> > if (ret < 0)
> > goto error;
> >
> > + if (msg->result != EC_RES_SUCCESS) {
> > + dev_err(ec_dev->dev, "Command %d failed: %d\n", cmd,
> > + msg->result);
> > + ret = -EPROTO;
> > + goto error;
> > + }
> > +
> > if (insize)
> > memcpy(in, msg->data, insize);
> >
> >

2020-07-03 09:20:30

by Enric Balletbo i Serra

[permalink] [raw]
Subject: Re: [PATCH v2] ASoC: cros_ec_codec: Log results when EC commands fail

Hi Yu-Hsuan,

On 3/7/20 10:48, Yu-Hsuan Hsu wrote:
> Enric Balletbo i Serra <[email protected]> 於 2020年7月3日 週五 下午4:38寫道:
>>
>> Hi Yu-Hsuan,
>>
>> Thank you for your patch
>>
>> On 3/7/20 9:19, Yu-Hsuan Hsu wrote:
>>> Log results of failed EC commands to identify a problem more easily.
>>>
>>> Replace cros_ec_cmd_xfer_status with cros_ec_cmd_xfer because the result
>>> has already been checked in this function. The wrapper is not needed.
>>>
>>
>> Nack, we did an effort to remove all public users of cros_ec_cmd_xfer() in
>> favour of cros_ec_cmd_xfer_status() and you are reintroducing again. You can do
>> the same but using cros_ec_cmd_xfer_status(). In fact, your patch will not build
>> on top of the upcoming changes.
> Thanks! But I have a question about implementing it. Does it look like
> the one below?
> ret = cros_ec_cmd_xfer_status(ec_dev, msg);
> if (ret < 0) {

In this case will already print an error.

What are you trying to achieve?

If the only reason is of this patch is print a message you should either, or
enable dynamic printk and enable dev_dbg or event better use the kernel trace
functionality. There is no need to be more verbose.

Example:
$ echo 1 > /sys/kernel/debug/tracing/events/cros_ec/enable
$ cat /sys/kernel/debug/tracing/trace

369.416372: cros_ec_request_start: version: 0, command: EC_CMD_USB_PD_POWER_INFO
369.420528: cros_ec_request_done: version: 0, command:
EC_CMD_USB_PD_POWER_INFO, ec result: EC_RES_SUCCESS, retval: 16

Cheers,
Enric

> if (ret == -EPROTO)
> dev_err(..., msg->result)
> goto error;
> }
> I'm not sure whether it makes sense to check ret == -EPROTO here.
>
>>
>>> Signed-off-by: Yu-Hsuan Hsu <[email protected]>
>>> ---
>>> sound/soc/codecs/cros_ec_codec.c | 9 ++++++++-
>>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/sound/soc/codecs/cros_ec_codec.c b/sound/soc/codecs/cros_ec_codec.c
>>> index 8d45c628e988e..a4ab62f59efa6 100644
>>> --- a/sound/soc/codecs/cros_ec_codec.c
>>> +++ b/sound/soc/codecs/cros_ec_codec.c
>>> @@ -90,10 +90,17 @@ static int send_ec_host_command(struct cros_ec_device *ec_dev, uint32_t cmd,
>>> if (outsize)
>>> memcpy(msg->data, out, outsize);
>>>
>>> - ret = cros_ec_cmd_xfer_status(ec_dev, msg);
>>> + ret = cros_ec_cmd_xfer(ec_dev, msg);
>>> if (ret < 0)
>>> goto error;
>>>
>>> + if (msg->result != EC_RES_SUCCESS) {
>>> + dev_err(ec_dev->dev, "Command %d failed: %d\n", cmd,
>>> + msg->result);
>>> + ret = -EPROTO;
>>> + goto error;
>>> + }
>>> +
>>> if (insize)
>>> memcpy(in, msg->data, insize);
>>>
>>>

2020-07-03 09:42:40

by Yu-Hsuan Hsu

[permalink] [raw]
Subject: Re: [PATCH v2] ASoC: cros_ec_codec: Log results when EC commands fail

Enric Balletbo i Serra <[email protected]> 於 2020年7月3日 週五 下午5:19寫道:
>
> Hi Yu-Hsuan,
>
> On 3/7/20 10:48, Yu-Hsuan Hsu wrote:
> > Enric Balletbo i Serra <[email protected]> 於 2020年7月3日 週五 下午4:38寫道:
> >>
> >> Hi Yu-Hsuan,
> >>
> >> Thank you for your patch
> >>
> >> On 3/7/20 9:19, Yu-Hsuan Hsu wrote:
> >>> Log results of failed EC commands to identify a problem more easily.
> >>>
> >>> Replace cros_ec_cmd_xfer_status with cros_ec_cmd_xfer because the result
> >>> has already been checked in this function. The wrapper is not needed.
> >>>
> >>
> >> Nack, we did an effort to remove all public users of cros_ec_cmd_xfer() in
> >> favour of cros_ec_cmd_xfer_status() and you are reintroducing again. You can do
> >> the same but using cros_ec_cmd_xfer_status(). In fact, your patch will not build
> >> on top of the upcoming changes.
> > Thanks! But I have a question about implementing it. Does it look like
> > the one below?
> > ret = cros_ec_cmd_xfer_status(ec_dev, msg);
> > if (ret < 0) {
>
> In this case will already print an error.
>
> What are you trying to achieve?
>
> If the only reason is of this patch is print a message you should either, or
> enable dynamic printk and enable dev_dbg or event better use the kernel trace
> functionality. There is no need to be more verbose.
>
> Example:
> $ echo 1 > /sys/kernel/debug/tracing/events/cros_ec/enable
> $ cat /sys/kernel/debug/tracing/trace
>
> 369.416372: cros_ec_request_start: version: 0, command: EC_CMD_USB_PD_POWER_INFO
> 369.420528: cros_ec_request_done: version: 0, command:
> EC_CMD_USB_PD_POWER_INFO, ec result: EC_RES_SUCCESS, retval: 16
>
> Cheers,
> Enric
>
Thank Enric,

The situation is that some users encountered errors on ChromeBook.
From their feedback reports, we only get the message like
'cros-ec-codec GOOG0013:00: ASoC: Failed to set DAI format: -71'.
We know that -71 is -EPROTO but it is not clear enough for us to find
out the root cause. That's why we want the detail of the result.
Because the situation happens on users' side, it is not possible for
them to enable kernel trace (ChromeOS does not allow users to touch
kernel).

The other way we thought is changing dev_dbg to dev_err in
cros_ec_cmd_xfer_status. But we are not sure whether it is also an
error for other usages.

> > if (ret == -EPROTO)
> > dev_err(..., msg->result)
> > goto error;
> > }
> > I'm not sure whether it makes sense to check ret == -EPROTO here.
> >
> >>
> >>> Signed-off-by: Yu-Hsuan Hsu <[email protected]>
> >>> ---
> >>> sound/soc/codecs/cros_ec_codec.c | 9 ++++++++-
> >>> 1 file changed, 8 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/sound/soc/codecs/cros_ec_codec.c b/sound/soc/codecs/cros_ec_codec.c
> >>> index 8d45c628e988e..a4ab62f59efa6 100644
> >>> --- a/sound/soc/codecs/cros_ec_codec.c
> >>> +++ b/sound/soc/codecs/cros_ec_codec.c
> >>> @@ -90,10 +90,17 @@ static int send_ec_host_command(struct cros_ec_device *ec_dev, uint32_t cmd,
> >>> if (outsize)
> >>> memcpy(msg->data, out, outsize);
> >>>
> >>> - ret = cros_ec_cmd_xfer_status(ec_dev, msg);
> >>> + ret = cros_ec_cmd_xfer(ec_dev, msg);
> >>> if (ret < 0)
> >>> goto error;
> >>>
> >>> + if (msg->result != EC_RES_SUCCESS) {
> >>> + dev_err(ec_dev->dev, "Command %d failed: %d\n", cmd,
> >>> + msg->result);
> >>> + ret = -EPROTO;
> >>> + goto error;
> >>> + }
> >>> +
> >>> if (insize)
> >>> memcpy(in, msg->data, insize);
> >>>
> >>>

2020-07-03 10:59:37

by Enric Balletbo i Serra

[permalink] [raw]
Subject: Re: [PATCH v2] ASoC: cros_ec_codec: Log results when EC commands fail

Hi Yu-Hsuan,

On 3/7/20 11:40, Yu-Hsuan Hsu wrote:
> Enric Balletbo i Serra <[email protected]> 於 2020年7月3日 週五 下午5:19寫道:
>>
>> Hi Yu-Hsuan,
>>
>> On 3/7/20 10:48, Yu-Hsuan Hsu wrote:
>>> Enric Balletbo i Serra <[email protected]> 於 2020年7月3日 週五 下午4:38寫道:
>>>>
>>>> Hi Yu-Hsuan,
>>>>
>>>> Thank you for your patch
>>>>
>>>> On 3/7/20 9:19, Yu-Hsuan Hsu wrote:
>>>>> Log results of failed EC commands to identify a problem more easily.
>>>>>
>>>>> Replace cros_ec_cmd_xfer_status with cros_ec_cmd_xfer because the result
>>>>> has already been checked in this function. The wrapper is not needed.
>>>>>
>>>>
>>>> Nack, we did an effort to remove all public users of cros_ec_cmd_xfer() in
>>>> favour of cros_ec_cmd_xfer_status() and you are reintroducing again. You can do
>>>> the same but using cros_ec_cmd_xfer_status(). In fact, your patch will not build
>>>> on top of the upcoming changes.
>>> Thanks! But I have a question about implementing it. Does it look like
>>> the one below?
>>> ret = cros_ec_cmd_xfer_status(ec_dev, msg);
>>> if (ret < 0) {
>>
>> In this case will already print an error.
>>
>> What are you trying to achieve?
>>
>> If the only reason is of this patch is print a message you should either, or
>> enable dynamic printk and enable dev_dbg or event better use the kernel trace
>> functionality. There is no need to be more verbose.
>>
>> Example:
>> $ echo 1 > /sys/kernel/debug/tracing/events/cros_ec/enable
>> $ cat /sys/kernel/debug/tracing/trace
>>
>> 369.416372: cros_ec_request_start: version: 0, command: EC_CMD_USB_PD_POWER_INFO
>> 369.420528: cros_ec_request_done: version: 0, command:
>> EC_CMD_USB_PD_POWER_INFO, ec result: EC_RES_SUCCESS, retval: 16
>>
>> Cheers,
>> Enric
>>
> Thank Enric,
>
> The situation is that some users encountered errors on ChromeBook.

And, aren't you able to reproduce the issue?


> From their feedback reports, we only get the message like
> 'cros-ec-codec GOOG0013:00: ASoC: Failed to set DAI format: -71'.
> We know that -71 is -EPROTO but it is not clear enough for us to find
> out the root cause. That's why we want the detail of the result.


If I am not mistaken this ends calling i2s_rx_set_daifmt() into the EC firmware,
if the result is -EPROTO that means is not returning EC_RES_SUCCESS, so there
are few options:

if (i2s_rx_enabled)
return EC_RES_BUSY;

if (daifmt >= EC_CODEC_I2S_RX_DAIFMT_COUNT)
return EC_RES_INVALID_PARAM;

if (audio_codec_i2s_rx_set_daifmt(daifmt) != EC_SUCCESS)
return EC_RES_ERROR;

> Because the situation happens on users' side, it is not possible for
> them to enable kernel trace (ChromeOS does not allow users to touch
> kernel).
>

Are you sure that when you know the error code you'll find the root cause
(without adding more prints)? There is only three possibilities? You can't start
adding prints just to debug a user issue because you don't allow to be more
verbose. I understand that might help you but is not the way to go.

You should really reproduce the issue yourself an use actual debug
tools/prints./traces.

Cheers,
Enric

> The other way we thought is changing dev_dbg to dev_err in
> cros_ec_cmd_xfer_status. But we are not sure whether it is also an
> error for other usages.
>
>>> if (ret == -EPROTO)
>>> dev_err(..., msg->result)
>>> goto error;
>>> }
>>> I'm not sure whether it makes sense to check ret == -EPROTO here.
>>>
>>>>
>>>>> Signed-off-by: Yu-Hsuan Hsu <[email protected]>
>>>>> ---
>>>>> sound/soc/codecs/cros_ec_codec.c | 9 ++++++++-
>>>>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/sound/soc/codecs/cros_ec_codec.c b/sound/soc/codecs/cros_ec_codec.c
>>>>> index 8d45c628e988e..a4ab62f59efa6 100644
>>>>> --- a/sound/soc/codecs/cros_ec_codec.c
>>>>> +++ b/sound/soc/codecs/cros_ec_codec.c
>>>>> @@ -90,10 +90,17 @@ static int send_ec_host_command(struct cros_ec_device *ec_dev, uint32_t cmd,
>>>>> if (outsize)
>>>>> memcpy(msg->data, out, outsize);
>>>>>
>>>>> - ret = cros_ec_cmd_xfer_status(ec_dev, msg);
>>>>> + ret = cros_ec_cmd_xfer(ec_dev, msg);
>>>>> if (ret < 0)
>>>>> goto error;
>>>>>
>>>>> + if (msg->result != EC_RES_SUCCESS) {
>>>>> + dev_err(ec_dev->dev, "Command %d failed: %d\n", cmd,
>>>>> + msg->result);
>>>>> + ret = -EPROTO;
>>>>> + goto error;
>>>>> + }
>>>>> +
>>>>> if (insize)
>>>>> memcpy(in, msg->data, insize);
>>>>>
>>>>>

2020-07-03 16:01:17

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v2] ASoC: cros_ec_codec: Log results when EC commands fail

On Fri, Jul 3, 2020 at 3:56 AM Enric Balletbo i Serra
<[email protected]> wrote:
>
> Hi Yu-Hsuan,
>
> On 3/7/20 11:40, Yu-Hsuan Hsu wrote:
> > Enric Balletbo i Serra <[email protected]> 於 2020年7月3日 週五 下午5:19寫道:
> >>
> >> Hi Yu-Hsuan,
> >>
> >> On 3/7/20 10:48, Yu-Hsuan Hsu wrote:
> >>> Enric Balletbo i Serra <[email protected]> 於 2020年7月3日 週五 下午4:38寫道:
> >>>>
> >>>> Hi Yu-Hsuan,
> >>>>
> >>>> Thank you for your patch
> >>>>
> >>>> On 3/7/20 9:19, Yu-Hsuan Hsu wrote:
> >>>>> Log results of failed EC commands to identify a problem more easily.
> >>>>>
> >>>>> Replace cros_ec_cmd_xfer_status with cros_ec_cmd_xfer because the result
> >>>>> has already been checked in this function. The wrapper is not needed.
> >>>>>
> >>>>
> >>>> Nack, we did an effort to remove all public users of cros_ec_cmd_xfer() in
> >>>> favour of cros_ec_cmd_xfer_status() and you are reintroducing again. You can do
> >>>> the same but using cros_ec_cmd_xfer_status(). In fact, your patch will not build
> >>>> on top of the upcoming changes.
> >>> Thanks! But I have a question about implementing it. Does it look like
> >>> the one below?
> >>> ret = cros_ec_cmd_xfer_status(ec_dev, msg);
> >>> if (ret < 0) {
> >>
> >> In this case will already print an error.
> >>
> >> What are you trying to achieve?
> >>
> >> If the only reason is of this patch is print a message you should either, or
> >> enable dynamic printk and enable dev_dbg or event better use the kernel trace
> >> functionality. There is no need to be more verbose.
> >>
> >> Example:
> >> $ echo 1 > /sys/kernel/debug/tracing/events/cros_ec/enable
> >> $ cat /sys/kernel/debug/tracing/trace
> >>
> >> 369.416372: cros_ec_request_start: version: 0, command: EC_CMD_USB_PD_POWER_INFO
> >> 369.420528: cros_ec_request_done: version: 0, command:
> >> EC_CMD_USB_PD_POWER_INFO, ec result: EC_RES_SUCCESS, retval: 16
> >>
> >> Cheers,
> >> Enric
> >>
> > Thank Enric,
> >
> > The situation is that some users encountered errors on ChromeBook.
>
> And, aren't you able to reproduce the issue?
>
>
> > From their feedback reports, we only get the message like
> > 'cros-ec-codec GOOG0013:00: ASoC: Failed to set DAI format: -71'.
> > We know that -71 is -EPROTO but it is not clear enough for us to find
> > out the root cause. That's why we want the detail of the result.
>
>
> If I am not mistaken this ends calling i2s_rx_set_daifmt() into the EC firmware,
> if the result is -EPROTO that means is not returning EC_RES_SUCCESS, so there
> are few options:
>
> if (i2s_rx_enabled)
> return EC_RES_BUSY;
>
> if (daifmt >= EC_CODEC_I2S_RX_DAIFMT_COUNT)
> return EC_RES_INVALID_PARAM;
>
> if (audio_codec_i2s_rx_set_daifmt(daifmt) != EC_SUCCESS)
> return EC_RES_ERROR;
>
> > Because the situation happens on users' side, it is not possible for
> > them to enable kernel trace (ChromeOS does not allow users to touch
> > kernel).
> >
>
> Are you sure that when you know the error code you'll find the root cause
> (without adding more prints)? There is only three possibilities? You can't start
> adding prints just to debug a user issue because you don't allow to be more
> verbose. I understand that might help you but is not the way to go.
>
> You should really reproduce the issue yourself an use actual debug
> tools/prints./traces.
>

Another possibility would be to change cros_ec_cmd_xfer_status() to
return a more granular error code, such as -EINVAL for
EC_RES_INVALID_PARAM, -EBUSY for EC_RES_BUSY, -EINPROGRESS for
EC_RES_IN_PROGRESS, -ETIMEDOUT for EC_RES_TIMEOUT, -EOVERFLOW for
EC_RES_OVERFLOW, -ENODATA for EC_RES_UNAVAILABLE, and so on.

However, it appears that the various low level functions already
replace various EC error codes with a blank EC_RES_ERROR. No amount of
logging will tell us what exactly went wrong in those functions. Lucky
for us, audio_codec_i2s_rx_set_daifmt() only ever returns EC_SUCCESS,
so we know that the problem is either that i2s_rx_enabled is true or
that daifmt is too large. None of those really warrants more verbose
logging.

From the context, my personal bet is that i2s_rx_enabled is true: I
don't immediately see how disabling it is enforced before trying to
set the DAI format, and I don't see how "daifmt >=
EC_CODEC_I2S_RX_DAIFMT_COUNT" can ever be true.

Guenter


Guenter

> Cheers,
> Enric
>
> > The other way we thought is changing dev_dbg to dev_err in
> > cros_ec_cmd_xfer_status. But we are not sure whether it is also an
> > error for other usages.
> >
> >>> if (ret == -EPROTO)
> >>> dev_err(..., msg->result)
> >>> goto error;
> >>> }
> >>> I'm not sure whether it makes sense to check ret == -EPROTO here.
> >>>
> >>>>
> >>>>> Signed-off-by: Yu-Hsuan Hsu <[email protected]>
> >>>>> ---
> >>>>> sound/soc/codecs/cros_ec_codec.c | 9 ++++++++-
> >>>>> 1 file changed, 8 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/sound/soc/codecs/cros_ec_codec.c b/sound/soc/codecs/cros_ec_codec.c
> >>>>> index 8d45c628e988e..a4ab62f59efa6 100644
> >>>>> --- a/sound/soc/codecs/cros_ec_codec.c
> >>>>> +++ b/sound/soc/codecs/cros_ec_codec.c
> >>>>> @@ -90,10 +90,17 @@ static int send_ec_host_command(struct cros_ec_device *ec_dev, uint32_t cmd,
> >>>>> if (outsize)
> >>>>> memcpy(msg->data, out, outsize);
> >>>>>
> >>>>> - ret = cros_ec_cmd_xfer_status(ec_dev, msg);
> >>>>> + ret = cros_ec_cmd_xfer(ec_dev, msg);
> >>>>> if (ret < 0)
> >>>>> goto error;
> >>>>>
> >>>>> + if (msg->result != EC_RES_SUCCESS) {
> >>>>> + dev_err(ec_dev->dev, "Command %d failed: %d\n", cmd,
> >>>>> + msg->result);
> >>>>> + ret = -EPROTO;
> >>>>> + goto error;
> >>>>> + }
> >>>>> +
> >>>>> if (insize)
> >>>>> memcpy(in, msg->data, insize);
> >>>>>
> >>>>>

2020-07-03 19:11:55

by Yu-Hsuan Hsu

[permalink] [raw]
Subject: Re: [PATCH v2] ASoC: cros_ec_codec: Log results when EC commands fail

Guenter Roeck <[email protected]> 於 2020年7月3日 週五 下午11:58寫道:
>
> On Fri, Jul 3, 2020 at 3:56 AM Enric Balletbo i Serra
> <[email protected]> wrote:
> >
> > Hi Yu-Hsuan,
> >
> > On 3/7/20 11:40, Yu-Hsuan Hsu wrote:
> > > Enric Balletbo i Serra <[email protected]> 於 2020年7月3日 週五 下午5:19寫道:
> > >>
> > >> Hi Yu-Hsuan,
> > >>
> > >> On 3/7/20 10:48, Yu-Hsuan Hsu wrote:
> > >>> Enric Balletbo i Serra <[email protected]> 於 2020年7月3日 週五 下午4:38寫道:
> > >>>>
> > >>>> Hi Yu-Hsuan,
> > >>>>
> > >>>> Thank you for your patch
> > >>>>
> > >>>> On 3/7/20 9:19, Yu-Hsuan Hsu wrote:
> > >>>>> Log results of failed EC commands to identify a problem more easily.
> > >>>>>
> > >>>>> Replace cros_ec_cmd_xfer_status with cros_ec_cmd_xfer because the result
> > >>>>> has already been checked in this function. The wrapper is not needed.
> > >>>>>
> > >>>>
> > >>>> Nack, we did an effort to remove all public users of cros_ec_cmd_xfer() in
> > >>>> favour of cros_ec_cmd_xfer_status() and you are reintroducing again. You can do
> > >>>> the same but using cros_ec_cmd_xfer_status(). In fact, your patch will not build
> > >>>> on top of the upcoming changes.
> > >>> Thanks! But I have a question about implementing it. Does it look like
> > >>> the one below?
> > >>> ret = cros_ec_cmd_xfer_status(ec_dev, msg);
> > >>> if (ret < 0) {
> > >>
> > >> In this case will already print an error.
> > >>
> > >> What are you trying to achieve?
> > >>
> > >> If the only reason is of this patch is print a message you should either, or
> > >> enable dynamic printk and enable dev_dbg or event better use the kernel trace
> > >> functionality. There is no need to be more verbose.
> > >>
> > >> Example:
> > >> $ echo 1 > /sys/kernel/debug/tracing/events/cros_ec/enable
> > >> $ cat /sys/kernel/debug/tracing/trace
> > >>
> > >> 369.416372: cros_ec_request_start: version: 0, command: EC_CMD_USB_PD_POWER_INFO
> > >> 369.420528: cros_ec_request_done: version: 0, command:
> > >> EC_CMD_USB_PD_POWER_INFO, ec result: EC_RES_SUCCESS, retval: 16
> > >>
> > >> Cheers,
> > >> Enric
> > >>
> > > Thank Enric,
> > >
> > > The situation is that some users encountered errors on ChromeBook.
> >
> > And, aren't you able to reproduce the issue?
> >
> >
> > > From their feedback reports, we only get the message like
> > > 'cros-ec-codec GOOG0013:00: ASoC: Failed to set DAI format: -71'.
> > > We know that -71 is -EPROTO but it is not clear enough for us to find
> > > out the root cause. That's why we want the detail of the result.
> >
> >
> > If I am not mistaken this ends calling i2s_rx_set_daifmt() into the EC firmware,
> > if the result is -EPROTO that means is not returning EC_RES_SUCCESS, so there
> > are few options:
> >
> > if (i2s_rx_enabled)
> > return EC_RES_BUSY;
> >
> > if (daifmt >= EC_CODEC_I2S_RX_DAIFMT_COUNT)
> > return EC_RES_INVALID_PARAM;
> >
> > if (audio_codec_i2s_rx_set_daifmt(daifmt) != EC_SUCCESS)
> > return EC_RES_ERROR;
> >
> > > Because the situation happens on users' side, it is not possible for
> > > them to enable kernel trace (ChromeOS does not allow users to touch
> > > kernel).
> > >
> >
> > Are you sure that when you know the error code you'll find the root cause
> > (without adding more prints)? There is only three possibilities? You can't start
> > adding prints just to debug a user issue because you don't allow to be more
> > verbose. I understand that might help you but is not the way to go.

Hi Enric and Guenter,

Thanks for your inspiring comments.
I'm not sure whether we will find the root cause if I know the error
code. But I think it's not a point.
We wanted to add this error log because we found that the current one
is not enough. Since it is a real error, it would be better if we can
make it more detailed, right?
In addition, we thought it would be helpful in the future as well.
That's why we chose to upstream instead of merging into our source
tree only.

> >
> > You should really reproduce the issue yourself an use actual debug
> > tools/prints./traces
We are trying but still unable to reproduce this issue.
However, as I maintained above, it is not a main concern of this change.
> >
>
> Another possibility would be to change cros_ec_cmd_xfer_status() to
> return a more granular error code, such as -EINVAL for
> EC_RES_INVALID_PARAM, -EBUSY for EC_RES_BUSY, -EINPROGRESS for
> EC_RES_IN_PROGRESS, -ETIMEDOUT for EC_RES_TIMEOUT, -EOVERFLOW for
> EC_RES_OVERFLOW, -ENODATA for EC_RES_UNAVAILABLE, and so on.
Since there are many kinds of results from EC, why not just make users
able to check on their own?
For example, users can wait and try again if the result is EC_RES_BUSY.

>
> However, it appears that the various low level functions already
> replace various EC error codes with a blank EC_RES_ERROR. No amount of
> logging will tell us what exactly went wrong in those functions. Lucky
> for us, audio_codec_i2s_rx_set_daifmt() only ever returns EC_SUCCESS,
> so we know that the problem is either that i2s_rx_enabled is true or
> that daifmt is too large. None of those really warrants more verbose
> logging.
>
> From the context, my personal bet is that i2s_rx_enabled is true: I
> don't immediately see how disabling it is enforced before trying to
> set the DAI format, and I don't see how "daifmt >=
> EC_CODEC_I2S_RX_DAIFMT_COUNT" can ever be true.
I totally agree. According to the source, it seems that both path are
impossible. I'm not really understand the whole path but is it
possible for EC to return other results? I will do more tests and look
carefully into the source. Really thanks for your suggestions.

Cheers,
Yu-Hsuan
>
> Guenter
>
>
> Guenter
>
> > Cheers,
> > Enric
> >
> > > The other way we thought is changing dev_dbg to dev_err in
> > > cros_ec_cmd_xfer_status. But we are not sure whether it is also an
> > > error for other usages.
> > >
> > >>> if (ret == -EPROTO)
> > >>> dev_err(..., msg->result)
> > >>> goto error;
> > >>> }
> > >>> I'm not sure whether it makes sense to check ret == -EPROTO here.
> > >>>
> > >>>>
> > >>>>> Signed-off-by: Yu-Hsuan Hsu <[email protected]>
> > >>>>> ---
> > >>>>> sound/soc/codecs/cros_ec_codec.c | 9 ++++++++-
> > >>>>> 1 file changed, 8 insertions(+), 1 deletion(-)
> > >>>>>
> > >>>>> diff --git a/sound/soc/codecs/cros_ec_codec.c b/sound/soc/codecs/cros_ec_codec.c
> > >>>>> index 8d45c628e988e..a4ab62f59efa6 100644
> > >>>>> --- a/sound/soc/codecs/cros_ec_codec.c
> > >>>>> +++ b/sound/soc/codecs/cros_ec_codec.c
> > >>>>> @@ -90,10 +90,17 @@ static int send_ec_host_command(struct cros_ec_device *ec_dev, uint32_t cmd,
> > >>>>> if (outsize)
> > >>>>> memcpy(msg->data, out, outsize);
> > >>>>>
> > >>>>> - ret = cros_ec_cmd_xfer_status(ec_dev, msg);
> > >>>>> + ret = cros_ec_cmd_xfer(ec_dev, msg);
> > >>>>> if (ret < 0)
> > >>>>> goto error;
> > >>>>>
> > >>>>> + if (msg->result != EC_RES_SUCCESS) {
> > >>>>> + dev_err(ec_dev->dev, "Command %d failed: %d\n", cmd,
> > >>>>> + msg->result);
> > >>>>> + ret = -EPROTO;
> > >>>>> + goto error;
> > >>>>> + }
> > >>>>> +
> > >>>>> if (insize)
> > >>>>> memcpy(in, msg->data, insize);
> > >>>>>
> > >>>>>

2020-07-03 19:29:57

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v2] ASoC: cros_ec_codec: Log results when EC commands fail

On Fri, Jul 3, 2020 at 12:11 PM Yu-Hsuan Hsu <[email protected]> wrote:
>
> Guenter Roeck <[email protected]> 於 2020年7月3日 週五 下午11:58寫道:
> >
> > On Fri, Jul 3, 2020 at 3:56 AM Enric Balletbo i Serra
> > <[email protected]> wrote:
> > >
> > > Hi Yu-Hsuan,
> > >
> > > On 3/7/20 11:40, Yu-Hsuan Hsu wrote:
> > > > Enric Balletbo i Serra <[email protected]> 於 2020年7月3日 週五 下午5:19寫道:
> > > >>
> > > >> Hi Yu-Hsuan,
> > > >>
> > > >> On 3/7/20 10:48, Yu-Hsuan Hsu wrote:
> > > >>> Enric Balletbo i Serra <[email protected]> 於 2020年7月3日 週五 下午4:38寫道:
> > > >>>>
> > > >>>> Hi Yu-Hsuan,
> > > >>>>
> > > >>>> Thank you for your patch
> > > >>>>
> > > >>>> On 3/7/20 9:19, Yu-Hsuan Hsu wrote:
> > > >>>>> Log results of failed EC commands to identify a problem more easily.
> > > >>>>>
> > > >>>>> Replace cros_ec_cmd_xfer_status with cros_ec_cmd_xfer because the result
> > > >>>>> has already been checked in this function. The wrapper is not needed.
> > > >>>>>
> > > >>>>
> > > >>>> Nack, we did an effort to remove all public users of cros_ec_cmd_xfer() in
> > > >>>> favour of cros_ec_cmd_xfer_status() and you are reintroducing again. You can do
> > > >>>> the same but using cros_ec_cmd_xfer_status(). In fact, your patch will not build
> > > >>>> on top of the upcoming changes.
> > > >>> Thanks! But I have a question about implementing it. Does it look like
> > > >>> the one below?
> > > >>> ret = cros_ec_cmd_xfer_status(ec_dev, msg);
> > > >>> if (ret < 0) {
> > > >>
> > > >> In this case will already print an error.
> > > >>
> > > >> What are you trying to achieve?
> > > >>
> > > >> If the only reason is of this patch is print a message you should either, or
> > > >> enable dynamic printk and enable dev_dbg or event better use the kernel trace
> > > >> functionality. There is no need to be more verbose.
> > > >>
> > > >> Example:
> > > >> $ echo 1 > /sys/kernel/debug/tracing/events/cros_ec/enable
> > > >> $ cat /sys/kernel/debug/tracing/trace
> > > >>
> > > >> 369.416372: cros_ec_request_start: version: 0, command: EC_CMD_USB_PD_POWER_INFO
> > > >> 369.420528: cros_ec_request_done: version: 0, command:
> > > >> EC_CMD_USB_PD_POWER_INFO, ec result: EC_RES_SUCCESS, retval: 16
> > > >>
> > > >> Cheers,
> > > >> Enric
> > > >>
> > > > Thank Enric,
> > > >
> > > > The situation is that some users encountered errors on ChromeBook.
> > >
> > > And, aren't you able to reproduce the issue?
> > >
> > >
> > > > From their feedback reports, we only get the message like
> > > > 'cros-ec-codec GOOG0013:00: ASoC: Failed to set DAI format: -71'.
> > > > We know that -71 is -EPROTO but it is not clear enough for us to find
> > > > out the root cause. That's why we want the detail of the result.
> > >
> > >
> > > If I am not mistaken this ends calling i2s_rx_set_daifmt() into the EC firmware,
> > > if the result is -EPROTO that means is not returning EC_RES_SUCCESS, so there
> > > are few options:
> > >
> > > if (i2s_rx_enabled)
> > > return EC_RES_BUSY;
> > >
> > > if (daifmt >= EC_CODEC_I2S_RX_DAIFMT_COUNT)
> > > return EC_RES_INVALID_PARAM;
> > >
> > > if (audio_codec_i2s_rx_set_daifmt(daifmt) != EC_SUCCESS)
> > > return EC_RES_ERROR;
> > >
> > > > Because the situation happens on users' side, it is not possible for
> > > > them to enable kernel trace (ChromeOS does not allow users to touch
> > > > kernel).
> > > >
> > >
> > > Are you sure that when you know the error code you'll find the root cause
> > > (without adding more prints)? There is only three possibilities? You can't start
> > > adding prints just to debug a user issue because you don't allow to be more
> > > verbose. I understand that might help you but is not the way to go.
>
> Hi Enric and Guenter,
>
> Thanks for your inspiring comments.
> I'm not sure whether we will find the root cause if I know the error
> code. But I think it's not a point.
> We wanted to add this error log because we found that the current one
> is not enough. Since it is a real error, it would be better if we can
> make it more detailed, right?
> In addition, we thought it would be helpful in the future as well.
> That's why we chose to upstream instead of merging into our source
> tree only.
>

Kernel log messages are almost never useful for actual users. In most
cases, they just clog up the kernel log, making it useless for when it
actually counts (such as when the system crashes). On top of that,
there already is an error message, only the error code doesn't mean
much because it is "universalized". I don't see how adding another
error message would improve that, even more so if that error message
is only added in one place. On the other side, converting EC error
codes to Linux kernel error codes would help every caller of
cros_ec_cmd_xfer_status without adding more logging noise.

> > >
> > > You should really reproduce the issue yourself an use actual debug
> > > tools/prints./traces
> We are trying but still unable to reproduce this issue.
> However, as I maintained above, it is not a main concern of this change.
> > >
> >
> > Another possibility would be to change cros_ec_cmd_xfer_status() to
> > return a more granular error code, such as -EINVAL for
> > EC_RES_INVALID_PARAM, -EBUSY for EC_RES_BUSY, -EINPROGRESS for
> > EC_RES_IN_PROGRESS, -ETIMEDOUT for EC_RES_TIMEOUT, -EOVERFLOW for
> > EC_RES_OVERFLOW, -ENODATA for EC_RES_UNAVAILABLE, and so on.
> Since there are many kinds of results from EC, why not just make users
> able to check on their own?
> For example, users can wait and try again if the result is EC_RES_BUSY.
>

That is exactly what -EBUSY is for: It lets the user space application
decide what to do about it. A kernel log message can not and will
never achieve that.

> >
> > However, it appears that the various low level functions already
> > replace various EC error codes with a blank EC_RES_ERROR. No amount of
> > logging will tell us what exactly went wrong in those functions. Lucky
> > for us, audio_codec_i2s_rx_set_daifmt() only ever returns EC_SUCCESS,
> > so we know that the problem is either that i2s_rx_enabled is true or
> > that daifmt is too large. None of those really warrants more verbose
> > logging.
> >
> > From the context, my personal bet is that i2s_rx_enabled is true: I
> > don't immediately see how disabling it is enforced before trying to
> > set the DAI format, and I don't see how "daifmt >=
> > EC_CODEC_I2S_RX_DAIFMT_COUNT" can ever be true.
> I totally agree. According to the source, it seems that both path are

You are effectively saying that there is code to ensure that
i2s_rx_enabled is false. Granted, the code is too complex to easily
understand, and I may not have seen that flow. Per the same logic,
though, it might well be possible that _because_ the code is not easy
to understand there may well be a data path where i2s_rx_enabled is
set. One would need to ensure that the sequence of <disable rx> - <set
dai format> - <re-enable rx> is guaranteed, and that it is always
executed under a kernel lock. Maybe you can point me to that code, for
my education.

On the other side, it is much easier to verify that "daifmt >=
EC_CODEC_I2S_RX_DAIFMT_COUNT" is never true.

Thanks,
Guenter

> impossible. I'm not really understand the whole path but is it
> possible for EC to return other results? I will do more tests and look
> carefully into the source. Really thanks for your suggestions.
>
> Cheers,
> Yu-Hsuan
> >
> > Guenter
> >
> >
> > Guenter
> >
> > > Cheers,
> > > Enric
> > >
> > > > The other way we thought is changing dev_dbg to dev_err in
> > > > cros_ec_cmd_xfer_status. But we are not sure whether it is also an
> > > > error for other usages.
> > > >
> > > >>> if (ret == -EPROTO)
> > > >>> dev_err(..., msg->result)
> > > >>> goto error;
> > > >>> }
> > > >>> I'm not sure whether it makes sense to check ret == -EPROTO here.
> > > >>>
> > > >>>>
> > > >>>>> Signed-off-by: Yu-Hsuan Hsu <[email protected]>
> > > >>>>> ---
> > > >>>>> sound/soc/codecs/cros_ec_codec.c | 9 ++++++++-
> > > >>>>> 1 file changed, 8 insertions(+), 1 deletion(-)
> > > >>>>>
> > > >>>>> diff --git a/sound/soc/codecs/cros_ec_codec.c b/sound/soc/codecs/cros_ec_codec.c
> > > >>>>> index 8d45c628e988e..a4ab62f59efa6 100644
> > > >>>>> --- a/sound/soc/codecs/cros_ec_codec.c
> > > >>>>> +++ b/sound/soc/codecs/cros_ec_codec.c
> > > >>>>> @@ -90,10 +90,17 @@ static int send_ec_host_command(struct cros_ec_device *ec_dev, uint32_t cmd,
> > > >>>>> if (outsize)
> > > >>>>> memcpy(msg->data, out, outsize);
> > > >>>>>
> > > >>>>> - ret = cros_ec_cmd_xfer_status(ec_dev, msg);
> > > >>>>> + ret = cros_ec_cmd_xfer(ec_dev, msg);
> > > >>>>> if (ret < 0)
> > > >>>>> goto error;
> > > >>>>>
> > > >>>>> + if (msg->result != EC_RES_SUCCESS) {
> > > >>>>> + dev_err(ec_dev->dev, "Command %d failed: %d\n", cmd,
> > > >>>>> + msg->result);
> > > >>>>> + ret = -EPROTO;
> > > >>>>> + goto error;
> > > >>>>> + }
> > > >>>>> +
> > > >>>>> if (insize)
> > > >>>>> memcpy(in, msg->data, insize);
> > > >>>>>
> > > >>>>>

2020-07-04 11:35:33

by Yu-Hsuan Hsu

[permalink] [raw]
Subject: Re: [PATCH v2] ASoC: cros_ec_codec: Log results when EC commands fail

Guenter Roeck <[email protected]> 於 2020年7月4日 週六 上午3:28寫道:
>
> On Fri, Jul 3, 2020 at 12:11 PM Yu-Hsuan Hsu <[email protected]> wrote:
> >
> > Guenter Roeck <[email protected]> 於 2020年7月3日 週五 下午11:58寫道:
> > >
> > > On Fri, Jul 3, 2020 at 3:56 AM Enric Balletbo i Serra
> > > <[email protected]> wrote:
> > > >
> > > > Hi Yu-Hsuan,
> > > >
> > > > On 3/7/20 11:40, Yu-Hsuan Hsu wrote:
> > > > > Enric Balletbo i Serra <[email protected]> 於 2020年7月3日 週五 下午5:19寫道:
> > > > >>
> > > > >> Hi Yu-Hsuan,
> > > > >>
> > > > >> On 3/7/20 10:48, Yu-Hsuan Hsu wrote:
> > > > >>> Enric Balletbo i Serra <[email protected]> 於 2020年7月3日 週五 下午4:38寫道:
> > > > >>>>
> > > > >>>> Hi Yu-Hsuan,
> > > > >>>>
> > > > >>>> Thank you for your patch
> > > > >>>>
> > > > >>>> On 3/7/20 9:19, Yu-Hsuan Hsu wrote:
> > > > >>>>> Log results of failed EC commands to identify a problem more easily.
> > > > >>>>>
> > > > >>>>> Replace cros_ec_cmd_xfer_status with cros_ec_cmd_xfer because the result
> > > > >>>>> has already been checked in this function. The wrapper is not needed.
> > > > >>>>>
> > > > >>>>
> > > > >>>> Nack, we did an effort to remove all public users of cros_ec_cmd_xfer() in
> > > > >>>> favour of cros_ec_cmd_xfer_status() and you are reintroducing again. You can do
> > > > >>>> the same but using cros_ec_cmd_xfer_status(). In fact, your patch will not build
> > > > >>>> on top of the upcoming changes.
> > > > >>> Thanks! But I have a question about implementing it. Does it look like
> > > > >>> the one below?
> > > > >>> ret = cros_ec_cmd_xfer_status(ec_dev, msg);
> > > > >>> if (ret < 0) {
> > > > >>
> > > > >> In this case will already print an error.
> > > > >>
> > > > >> What are you trying to achieve?
> > > > >>
> > > > >> If the only reason is of this patch is print a message you should either, or
> > > > >> enable dynamic printk and enable dev_dbg or event better use the kernel trace
> > > > >> functionality. There is no need to be more verbose.
> > > > >>
> > > > >> Example:
> > > > >> $ echo 1 > /sys/kernel/debug/tracing/events/cros_ec/enable
> > > > >> $ cat /sys/kernel/debug/tracing/trace
> > > > >>
> > > > >> 369.416372: cros_ec_request_start: version: 0, command: EC_CMD_USB_PD_POWER_INFO
> > > > >> 369.420528: cros_ec_request_done: version: 0, command:
> > > > >> EC_CMD_USB_PD_POWER_INFO, ec result: EC_RES_SUCCESS, retval: 16
> > > > >>
> > > > >> Cheers,
> > > > >> Enric
> > > > >>
> > > > > Thank Enric,
> > > > >
> > > > > The situation is that some users encountered errors on ChromeBook.
> > > >
> > > > And, aren't you able to reproduce the issue?
> > > >
> > > >
> > > > > From their feedback reports, we only get the message like
> > > > > 'cros-ec-codec GOOG0013:00: ASoC: Failed to set DAI format: -71'.
> > > > > We know that -71 is -EPROTO but it is not clear enough for us to find
> > > > > out the root cause. That's why we want the detail of the result.
> > > >
> > > >
> > > > If I am not mistaken this ends calling i2s_rx_set_daifmt() into the EC firmware,
> > > > if the result is -EPROTO that means is not returning EC_RES_SUCCESS, so there
> > > > are few options:
> > > >
> > > > if (i2s_rx_enabled)
> > > > return EC_RES_BUSY;
> > > >
> > > > if (daifmt >= EC_CODEC_I2S_RX_DAIFMT_COUNT)
> > > > return EC_RES_INVALID_PARAM;
> > > >
> > > > if (audio_codec_i2s_rx_set_daifmt(daifmt) != EC_SUCCESS)
> > > > return EC_RES_ERROR;
> > > >
> > > > > Because the situation happens on users' side, it is not possible for
> > > > > them to enable kernel trace (ChromeOS does not allow users to touch
> > > > > kernel).
> > > > >
> > > >
> > > > Are you sure that when you know the error code you'll find the root cause
> > > > (without adding more prints)? There is only three possibilities? You can't start
> > > > adding prints just to debug a user issue because you don't allow to be more
> > > > verbose. I understand that might help you but is not the way to go.
> >
> > Hi Enric and Guenter,
> >
> > Thanks for your inspiring comments.
> > I'm not sure whether we will find the root cause if I know the error
> > code. But I think it's not a point.
> > We wanted to add this error log because we found that the current one
> > is not enough. Since it is a real error, it would be better if we can
> > make it more detailed, right?
> > In addition, we thought it would be helpful in the future as well.
> > That's why we chose to upstream instead of merging into our source
> > tree only.
> >
>
> Kernel log messages are almost never useful for actual users. In most
> cases, they just clog up the kernel log, making it useless for when it
> actually counts (such as when the system crashes). On top of that,
> there already is an error message, only the error code doesn't mean
> much because it is "universalized". I don't see how adding another
> error message would improve that, even more so if that error message
> is only added in one place. On the other side, converting EC error
> codes to Linux kernel error codes would help every caller of
> cros_ec_cmd_xfer_status without adding more logging noise.

Thanks for the explanation. If I understand correctly, the mean point
is this log is noisy.
In this case, the log will only appear one time during the card
probing(if it fails). But I do agree that it may create some potential
noises.
Converting EC error codes to Linux kernel error codes also work for
me. Is it possible for you to send the corresponding patch? If not, I
can try to work on it.
In addition, is it possible that one error code can come from linux or from EC?

>
> > > >
> > > > You should really reproduce the issue yourself an use actual debug
> > > > tools/prints./traces
> > We are trying but still unable to reproduce this issue.
> > However, as I maintained above, it is not a main concern of this change.
> > > >
> > >
> > > Another possibility would be to change cros_ec_cmd_xfer_status() to
> > > return a more granular error code, such as -EINVAL for
> > > EC_RES_INVALID_PARAM, -EBUSY for EC_RES_BUSY, -EINPROGRESS for
> > > EC_RES_IN_PROGRESS, -ETIMEDOUT for EC_RES_TIMEOUT, -EOVERFLOW for
> > > EC_RES_OVERFLOW, -ENODATA for EC_RES_UNAVAILABLE, and so on.
> > Since there are many kinds of results from EC, why not just make users
> > able to check on their own?
> > For example, users can wait and try again if the result is EC_RES_BUSY.
> >
>
> That is exactly what -EBUSY is for: It lets the user space application
> decide what to do about it. A kernel log message can not and will
> never achieve that.
>
> > >
> > > However, it appears that the various low level functions already
> > > replace various EC error codes with a blank EC_RES_ERROR. No amount of
> > > logging will tell us what exactly went wrong in those functions. Lucky
> > > for us, audio_codec_i2s_rx_set_daifmt() only ever returns EC_SUCCESS,
> > > so we know that the problem is either that i2s_rx_enabled is true or
> > > that daifmt is too large. None of those really warrants more verbose
> > > logging.
> > >
> > > From the context, my personal bet is that i2s_rx_enabled is true: I
> > > don't immediately see how disabling it is enforced before trying to
> > > set the DAI format, and I don't see how "daifmt >=
> > > EC_CODEC_I2S_RX_DAIFMT_COUNT" can ever be true.
> > I totally agree. According to the source, it seems that both path are
>
> You are effectively saying that there is code to ensure that
> i2s_rx_enabled is false. Granted, the code is too complex to easily
> understand, and I may not have seen that flow. Per the same logic,
> though, it might well be possible that _because_ the code is not easy
> to understand there may well be a data path where i2s_rx_enabled is
> set. One would need to ensure that the sequence of <disable rx> - <set
> dai format> - <re-enable rx> is guaranteed, and that it is always
> executed under a kernel lock. Maybe you can point me to that code, for
> my education.
>
> On the other side, it is much easier to verify that "daifmt >=
> EC_CODEC_I2S_RX_DAIFMT_COUNT" is never true.
>
> Thanks,
> Guenter
The complete errors are:
ERR kernel: [ 8.568171] cros-ec-codec GOOG0013:00: ASoC: error at
snd_soc_dai_set_fmt on GOOG0013:00: -71
WARNING kernel: [ 8.583312] cros-ec-codec GOOG0013:00: ASoC: Failed
to set DAI format: -71
ERR kernel: [ 8.629388] acp3x-alc5682-max98357 AMDI5682:00:
devm_snd_soc_register_card(acp3xalc5682m98357) failed: -71
WARNING kernel: [ 8.645306] acp3x-alc5682-max98357: probe of
AMDI5682:00 failed with error -71

This issue happened when probing the card while the i2s_rx is only
enabled when the device is used. Therefore, the i2s_rx should not be
enabled before the card is probed. However, I may miss something. I
will check the entire path later.
For "daifmt >= EC_CODEC_I2S_RX_DAIFMT_COUNT", it is obviously not to
be true by looking into i2s_rx_set_fmt function.

Thanks,
Yu-Hsuan

>
> > impossible. I'm not really understand the whole path but is it
> > possible for EC to return other results? I will do more tests and look
> > carefully into the source. Really thanks for your suggestions.
> >
> > Cheers,
> > Yu-Hsuan
> > >
> > > Guenter
> > >
> > >
> > > Guenter
> > >
> > > > Cheers,
> > > > Enric
> > > >
> > > > > The other way we thought is changing dev_dbg to dev_err in
> > > > > cros_ec_cmd_xfer_status. But we are not sure whether it is also an
> > > > > error for other usages.
> > > > >
> > > > >>> if (ret == -EPROTO)
> > > > >>> dev_err(..., msg->result)
> > > > >>> goto error;
> > > > >>> }
> > > > >>> I'm not sure whether it makes sense to check ret == -EPROTO here.
> > > > >>>
> > > > >>>>
> > > > >>>>> Signed-off-by: Yu-Hsuan Hsu <[email protected]>
> > > > >>>>> ---
> > > > >>>>> sound/soc/codecs/cros_ec_codec.c | 9 ++++++++-
> > > > >>>>> 1 file changed, 8 insertions(+), 1 deletion(-)
> > > > >>>>>
> > > > >>>>> diff --git a/sound/soc/codecs/cros_ec_codec.c b/sound/soc/codecs/cros_ec_codec.c
> > > > >>>>> index 8d45c628e988e..a4ab62f59efa6 100644
> > > > >>>>> --- a/sound/soc/codecs/cros_ec_codec.c
> > > > >>>>> +++ b/sound/soc/codecs/cros_ec_codec.c
> > > > >>>>> @@ -90,10 +90,17 @@ static int send_ec_host_command(struct cros_ec_device *ec_dev, uint32_t cmd,
> > > > >>>>> if (outsize)
> > > > >>>>> memcpy(msg->data, out, outsize);
> > > > >>>>>
> > > > >>>>> - ret = cros_ec_cmd_xfer_status(ec_dev, msg);
> > > > >>>>> + ret = cros_ec_cmd_xfer(ec_dev, msg);
> > > > >>>>> if (ret < 0)
> > > > >>>>> goto error;
> > > > >>>>>
> > > > >>>>> + if (msg->result != EC_RES_SUCCESS) {
> > > > >>>>> + dev_err(ec_dev->dev, "Command %d failed: %d\n", cmd,
> > > > >>>>> + msg->result);
> > > > >>>>> + ret = -EPROTO;
> > > > >>>>> + goto error;
> > > > >>>>> + }
> > > > >>>>> +
> > > > >>>>> if (insize)
> > > > >>>>> memcpy(in, msg->data, insize);
> > > > >>>>>
> > > > >>>>>

2020-07-06 09:49:25

by Yu-Hsuan Hsu

[permalink] [raw]
Subject: Re: [PATCH v2] ASoC: cros_ec_codec: Log results when EC commands fail

Yu-Hsuan Hsu <[email protected]> 於 2020年7月4日 週六 下午7:34寫道:
>
> Guenter Roeck <[email protected]> 於 2020年7月4日 週六 上午3:28寫道:
> >
> > On Fri, Jul 3, 2020 at 12:11 PM Yu-Hsuan Hsu <[email protected]> wrote:
> > >
> > > Guenter Roeck <[email protected]> 於 2020年7月3日 週五 下午11:58寫道:
> > > >
> > > > On Fri, Jul 3, 2020 at 3:56 AM Enric Balletbo i Serra
> > > > <[email protected]> wrote:
> > > > >
> > > > > Hi Yu-Hsuan,
> > > > >
> > > > > On 3/7/20 11:40, Yu-Hsuan Hsu wrote:
> > > > > > Enric Balletbo i Serra <[email protected]> 於 2020年7月3日 週五 下午5:19寫道:
> > > > > >>
> > > > > >> Hi Yu-Hsuan,
> > > > > >>
> > > > > >> On 3/7/20 10:48, Yu-Hsuan Hsu wrote:
> > > > > >>> Enric Balletbo i Serra <[email protected]> 於 2020年7月3日 週五 下午4:38寫道:
> > > > > >>>>
> > > > > >>>> Hi Yu-Hsuan,
> > > > > >>>>
> > > > > >>>> Thank you for your patch
> > > > > >>>>
> > > > > >>>> On 3/7/20 9:19, Yu-Hsuan Hsu wrote:
> > > > > >>>>> Log results of failed EC commands to identify a problem more easily.
> > > > > >>>>>
> > > > > >>>>> Replace cros_ec_cmd_xfer_status with cros_ec_cmd_xfer because the result
> > > > > >>>>> has already been checked in this function. The wrapper is not needed.
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>> Nack, we did an effort to remove all public users of cros_ec_cmd_xfer() in
> > > > > >>>> favour of cros_ec_cmd_xfer_status() and you are reintroducing again. You can do
> > > > > >>>> the same but using cros_ec_cmd_xfer_status(). In fact, your patch will not build
> > > > > >>>> on top of the upcoming changes.
> > > > > >>> Thanks! But I have a question about implementing it. Does it look like
> > > > > >>> the one below?
> > > > > >>> ret = cros_ec_cmd_xfer_status(ec_dev, msg);
> > > > > >>> if (ret < 0) {
> > > > > >>
> > > > > >> In this case will already print an error.
> > > > > >>
> > > > > >> What are you trying to achieve?
> > > > > >>
> > > > > >> If the only reason is of this patch is print a message you should either, or
> > > > > >> enable dynamic printk and enable dev_dbg or event better use the kernel trace
> > > > > >> functionality. There is no need to be more verbose.
> > > > > >>
> > > > > >> Example:
> > > > > >> $ echo 1 > /sys/kernel/debug/tracing/events/cros_ec/enable
> > > > > >> $ cat /sys/kernel/debug/tracing/trace
> > > > > >>
> > > > > >> 369.416372: cros_ec_request_start: version: 0, command: EC_CMD_USB_PD_POWER_INFO
> > > > > >> 369.420528: cros_ec_request_done: version: 0, command:
> > > > > >> EC_CMD_USB_PD_POWER_INFO, ec result: EC_RES_SUCCESS, retval: 16
> > > > > >>
> > > > > >> Cheers,
> > > > > >> Enric
> > > > > >>
> > > > > > Thank Enric,
> > > > > >
> > > > > > The situation is that some users encountered errors on ChromeBook.
> > > > >
> > > > > And, aren't you able to reproduce the issue?
> > > > >
> > > > >
> > > > > > From their feedback reports, we only get the message like
> > > > > > 'cros-ec-codec GOOG0013:00: ASoC: Failed to set DAI format: -71'.
> > > > > > We know that -71 is -EPROTO but it is not clear enough for us to find
> > > > > > out the root cause. That's why we want the detail of the result.
> > > > >
> > > > >
> > > > > If I am not mistaken this ends calling i2s_rx_set_daifmt() into the EC firmware,
> > > > > if the result is -EPROTO that means is not returning EC_RES_SUCCESS, so there
> > > > > are few options:
> > > > >
> > > > > if (i2s_rx_enabled)
> > > > > return EC_RES_BUSY;
> > > > >
> > > > > if (daifmt >= EC_CODEC_I2S_RX_DAIFMT_COUNT)
> > > > > return EC_RES_INVALID_PARAM;
> > > > >
> > > > > if (audio_codec_i2s_rx_set_daifmt(daifmt) != EC_SUCCESS)
> > > > > return EC_RES_ERROR;
> > > > >
> > > > > > Because the situation happens on users' side, it is not possible for
> > > > > > them to enable kernel trace (ChromeOS does not allow users to touch
> > > > > > kernel).
> > > > > >
> > > > >
> > > > > Are you sure that when you know the error code you'll find the root cause
> > > > > (without adding more prints)? There is only three possibilities? You can't start
> > > > > adding prints just to debug a user issue because you don't allow to be more
> > > > > verbose. I understand that might help you but is not the way to go.
> > >
> > > Hi Enric and Guenter,
> > >
> > > Thanks for your inspiring comments.
> > > I'm not sure whether we will find the root cause if I know the error
> > > code. But I think it's not a point.
> > > We wanted to add this error log because we found that the current one
> > > is not enough. Since it is a real error, it would be better if we can
> > > make it more detailed, right?
> > > In addition, we thought it would be helpful in the future as well.
> > > That's why we chose to upstream instead of merging into our source
> > > tree only.
> > >
> >
> > Kernel log messages are almost never useful for actual users. In most
> > cases, they just clog up the kernel log, making it useless for when it
> > actually counts (such as when the system crashes). On top of that,
> > there already is an error message, only the error code doesn't mean
> > much because it is "universalized". I don't see how adding another
> > error message would improve that, even more so if that error message
> > is only added in one place. On the other side, converting EC error
> > codes to Linux kernel error codes would help every caller of
> > cros_ec_cmd_xfer_status without adding more logging noise.
>
> Thanks for the explanation. If I understand correctly, the mean point
> is this log is noisy.
> In this case, the log will only appear one time during the card
> probing(if it fails). But I do agree that it may create some potential
> noises.
> Converting EC error codes to Linux kernel error codes also work for
> me. Is it possible for you to send the corresponding patch? If not, I
> can try to work on it.
> In addition, is it possible that one error code can come from linux or from EC?
>
> >
> > > > >
> > > > > You should really reproduce the issue yourself an use actual debug
> > > > > tools/prints./traces
> > > We are trying but still unable to reproduce this issue.
> > > However, as I maintained above, it is not a main concern of this change.
> > > > >
> > > >
> > > > Another possibility would be to change cros_ec_cmd_xfer_status() to
> > > > return a more granular error code, such as -EINVAL for
> > > > EC_RES_INVALID_PARAM, -EBUSY for EC_RES_BUSY, -EINPROGRESS for
> > > > EC_RES_IN_PROGRESS, -ETIMEDOUT for EC_RES_TIMEOUT, -EOVERFLOW for
> > > > EC_RES_OVERFLOW, -ENODATA for EC_RES_UNAVAILABLE, and so on.
> > > Since there are many kinds of results from EC, why not just make users
> > > able to check on their own?
> > > For example, users can wait and try again if the result is EC_RES_BUSY.
> > >
> >
> > That is exactly what -EBUSY is for: It lets the user space application
> > decide what to do about it. A kernel log message can not and will
> > never achieve that.
> >
> > > >
> > > > However, it appears that the various low level functions already
> > > > replace various EC error codes with a blank EC_RES_ERROR. No amount of
> > > > logging will tell us what exactly went wrong in those functions. Lucky
> > > > for us, audio_codec_i2s_rx_set_daifmt() only ever returns EC_SUCCESS,
> > > > so we know that the problem is either that i2s_rx_enabled is true or
> > > > that daifmt is too large. None of those really warrants more verbose
> > > > logging.
> > > >
> > > > From the context, my personal bet is that i2s_rx_enabled is true: I
> > > > don't immediately see how disabling it is enforced before trying to
> > > > set the DAI format, and I don't see how "daifmt >=
> > > > EC_CODEC_I2S_RX_DAIFMT_COUNT" can ever be true.
> > > I totally agree. According to the source, it seems that both path are
> >
> > You are effectively saying that there is code to ensure that
> > i2s_rx_enabled is false. Granted, the code is too complex to easily
> > understand, and I may not have seen that flow. Per the same logic,
> > though, it might well be possible that _because_ the code is not easy
> > to understand there may well be a data path where i2s_rx_enabled is
> > set. One would need to ensure that the sequence of <disable rx> - <set
> > dai format> - <re-enable rx> is guaranteed, and that it is always
> > executed under a kernel lock. Maybe you can point me to that code, for
> > my education.
> >
> > On the other side, it is much easier to verify that "daifmt >=
> > EC_CODEC_I2S_RX_DAIFMT_COUNT" is never true.
> >
> > Thanks,
> > Guenter
> The complete errors are:
> ERR kernel: [ 8.568171] cros-ec-codec GOOG0013:00: ASoC: error at
> snd_soc_dai_set_fmt on GOOG0013:00: -71
> WARNING kernel: [ 8.583312] cros-ec-codec GOOG0013:00: ASoC: Failed
> to set DAI format: -71
> ERR kernel: [ 8.629388] acp3x-alc5682-max98357 AMDI5682:00:
> devm_snd_soc_register_card(acp3xalc5682m98357) failed: -71
> WARNING kernel: [ 8.645306] acp3x-alc5682-max98357: probe of
> AMDI5682:00 failed with error -71
>
> This issue happened when probing the card while the i2s_rx is only
> enabled when the device is used. Therefore, the i2s_rx should not be
> enabled before the card is probed. However, I may miss something. I
> will check the entire path later.
I finally found a possible root cause. EC does not restart when we
reboots kernel. It means if kernel crashes while i2s_rx is enabled, we
will trigger this issue. In addition, the issue will stay until the
next time rebooting EC.

I will work on the fix. Thank you for very helpful information.

> For "daifmt >= EC_CODEC_I2S_RX_DAIFMT_COUNT", it is obviously not to
> be true by looking into i2s_rx_set_fmt function.
>
> Thanks,
> Yu-Hsuan
>
> >
> > > impossible. I'm not really understand the whole path but is it
> > > possible for EC to return other results? I will do more tests and look
> > > carefully into the source. Really thanks for your suggestions.
> > >
> > > Cheers,
> > > Yu-Hsuan
> > > >
> > > > Guenter
> > > >
> > > >
> > > > Guenter
> > > >
> > > > > Cheers,
> > > > > Enric
> > > > >
> > > > > > The other way we thought is changing dev_dbg to dev_err in
> > > > > > cros_ec_cmd_xfer_status. But we are not sure whether it is also an
> > > > > > error for other usages.
> > > > > >
> > > > > >>> if (ret == -EPROTO)
> > > > > >>> dev_err(..., msg->result)
> > > > > >>> goto error;
> > > > > >>> }
> > > > > >>> I'm not sure whether it makes sense to check ret == -EPROTO here.
> > > > > >>>
> > > > > >>>>
> > > > > >>>>> Signed-off-by: Yu-Hsuan Hsu <[email protected]>
> > > > > >>>>> ---
> > > > > >>>>> sound/soc/codecs/cros_ec_codec.c | 9 ++++++++-
> > > > > >>>>> 1 file changed, 8 insertions(+), 1 deletion(-)
> > > > > >>>>>
> > > > > >>>>> diff --git a/sound/soc/codecs/cros_ec_codec.c b/sound/soc/codecs/cros_ec_codec.c
> > > > > >>>>> index 8d45c628e988e..a4ab62f59efa6 100644
> > > > > >>>>> --- a/sound/soc/codecs/cros_ec_codec.c
> > > > > >>>>> +++ b/sound/soc/codecs/cros_ec_codec.c
> > > > > >>>>> @@ -90,10 +90,17 @@ static int send_ec_host_command(struct cros_ec_device *ec_dev, uint32_t cmd,
> > > > > >>>>> if (outsize)
> > > > > >>>>> memcpy(msg->data, out, outsize);
> > > > > >>>>>
> > > > > >>>>> - ret = cros_ec_cmd_xfer_status(ec_dev, msg);
> > > > > >>>>> + ret = cros_ec_cmd_xfer(ec_dev, msg);
> > > > > >>>>> if (ret < 0)
> > > > > >>>>> goto error;
> > > > > >>>>>
> > > > > >>>>> + if (msg->result != EC_RES_SUCCESS) {
> > > > > >>>>> + dev_err(ec_dev->dev, "Command %d failed: %d\n", cmd,
> > > > > >>>>> + msg->result);
> > > > > >>>>> + ret = -EPROTO;
> > > > > >>>>> + goto error;
> > > > > >>>>> + }
> > > > > >>>>> +
> > > > > >>>>> if (insize)
> > > > > >>>>> memcpy(in, msg->data, insize);
> > > > > >>>>>
> > > > > >>>>>