As with cros_ec_cmd_xfer_status(), etc., it's not enough to simply check
for the return status of send_command() -- that only covers transport or
other similarly-fatal errors. One must also check the ->result field, to
see whether the command really succeeded. If not, we can't use the data
it returns.
The caller of cros_ec_get_host_event_wake_mask() ignores this, and so
for example, on EC's where the command is not implemented, we're using
junk (or in practice, all zeros) for our wake-mask. We should be using a
non-zero default (currently, it's supposed to be all-1's).
Fix this by checking the ->result field and returning -EPROTO for
errors.
I might label this as fixing commit 29d99b966d60 ("cros_ec: Don't signal
wake event for non-wake host events"), except that this fix alone
actually may make things worse, as it now allows for a lot more spurious
wakeups. The patch "platform/chrome: cros_ec_proto: ignore battery/AC
wakeups on old ECs" helps to mitigate this.
Signed-off-by: Brian Norris <[email protected]>
---
drivers/platform/chrome/cros_ec_proto.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/drivers/platform/chrome/cros_ec_proto.c b/drivers/platform/chrome/cros_ec_proto.c
index e93024b55ce8..01a74abe4191 100644
--- a/drivers/platform/chrome/cros_ec_proto.c
+++ b/drivers/platform/chrome/cros_ec_proto.c
@@ -208,6 +208,12 @@ static int cros_ec_get_host_event_wake_mask(struct cros_ec_device *ec_dev,
msg->insize = sizeof(*r);
ret = send_command(ec_dev, msg);
+ if (ret >= 0) {
+ if (msg->result == EC_RES_INVALID_COMMAND)
+ return -ENOTSUPP;
+ if (msg->result != EC_RES_SUCCESS)
+ return -EPROTO;
+ }
if (ret > 0) {
r = (struct ec_response_host_event_mask *)msg->data;
*mask = r->mask;
@@ -488,6 +494,13 @@ int cros_ec_query_all(struct cros_ec_device *ec_dev)
BIT(EC_HOST_EVENT_BATTERY_CRITICAL) |
BIT(EC_HOST_EVENT_PD_MCU) |
BIT(EC_HOST_EVENT_BATTERY_STATUS));
+ /*
+ * Old ECs may not support this command. Complain about all
+ * other errors.
+ */
+ if (ret != -ENOTSUPP)
+ dev_err(ec_dev->dev,
+ "failed to retrieve wake mask: %d\n", ret);
}
ret = 0;
--
2.28.0.rc0.105.gf9edc3c819-goog
Hi Brian,
Thank you for your patch, I'll take a look soon but I'd like to ask if you can
join the discussion with this patchset [1], specially this one [2]. We're trying
to match EC errors with standard linux kernel errors because we think can be
helpful.
[1] https://lore.kernel.org/patchwork/cover/1276734/
[2] https://lore.kernel.org/patchwork/patch/1276738/
Thanks,
Enric
On 22/7/20 3:57, Brian Norris wrote:
> As with cros_ec_cmd_xfer_status(), etc., it's not enough to simply check
> for the return status of send_command() -- that only covers transport or
> other similarly-fatal errors. One must also check the ->result field, to
> see whether the command really succeeded. If not, we can't use the data
> it returns.
>
> The caller of cros_ec_get_host_event_wake_mask() ignores this, and so
> for example, on EC's where the command is not implemented, we're using
> junk (or in practice, all zeros) for our wake-mask. We should be using a
> non-zero default (currently, it's supposed to be all-1's).
>
> Fix this by checking the ->result field and returning -EPROTO for
> errors.
>
> I might label this as fixing commit 29d99b966d60 ("cros_ec: Don't signal
> wake event for non-wake host events"), except that this fix alone
> actually may make things worse, as it now allows for a lot more spurious
> wakeups. The patch "platform/chrome: cros_ec_proto: ignore battery/AC
> wakeups on old ECs" helps to mitigate this.
>
> Signed-off-by: Brian Norris <[email protected]>
> ---
> drivers/platform/chrome/cros_ec_proto.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/drivers/platform/chrome/cros_ec_proto.c b/drivers/platform/chrome/cros_ec_proto.c
> index e93024b55ce8..01a74abe4191 100644
> --- a/drivers/platform/chrome/cros_ec_proto.c
> +++ b/drivers/platform/chrome/cros_ec_proto.c
> @@ -208,6 +208,12 @@ static int cros_ec_get_host_event_wake_mask(struct cros_ec_device *ec_dev,
> msg->insize = sizeof(*r);
>
> ret = send_command(ec_dev, msg);
> + if (ret >= 0) {
> + if (msg->result == EC_RES_INVALID_COMMAND)
> + return -ENOTSUPP;
> + if (msg->result != EC_RES_SUCCESS)
> + return -EPROTO;
> + }
> if (ret > 0) {
> r = (struct ec_response_host_event_mask *)msg->data;
> *mask = r->mask;
> @@ -488,6 +494,13 @@ int cros_ec_query_all(struct cros_ec_device *ec_dev)
> BIT(EC_HOST_EVENT_BATTERY_CRITICAL) |
> BIT(EC_HOST_EVENT_PD_MCU) |
> BIT(EC_HOST_EVENT_BATTERY_STATUS));
> + /*
> + * Old ECs may not support this command. Complain about all
> + * other errors.
> + */
> + if (ret != -ENOTSUPP)
> + dev_err(ec_dev->dev,
> + "failed to retrieve wake mask: %d\n", ret);
> }
>
> ret = 0;
>
On Wed, Jul 22, 2020 at 3:19 AM Enric Balletbo i Serra
<[email protected]> wrote:
>
> Hi Brian,
>
> Thank you for your patch, I'll take a look soon but I'd like to ask if you can
> join the discussion with this patchset [1], specially this one [2]. We're trying
> to match EC errors with standard linux kernel errors because we think can be
> helpful.
>
> [1] https://lore.kernel.org/patchwork/cover/1276734/
> [2] https://lore.kernel.org/patchwork/patch/1276738/
Hi Enric,
Thanks, I'll do that. I do wonder sometimes how non-maintainers should
best support "community" around these things, for subsystems that
don't have a dedicated mailing list and are therefore sent only to
maintainers + LKML-fire-hose. I could probably subscribe to LKML and
filter it, but something tells me my mailbox will still manage to
explode somehow... Anyway, I digress.
Other than perhaps taking a lesson not to propagate -ENOTSUPP, I don't
think this series should block on that, as this is a bugfix IMO.
Regards,
Brian
On Wed, Jul 22, 2020 at 1:50 PM Brian Norris <[email protected]> wrote:
>
> On Wed, Jul 22, 2020 at 3:19 AM Enric Balletbo i Serra
> <[email protected]> wrote:
> >
> > Hi Brian,
> >
> > Thank you for your patch, I'll take a look soon but I'd like to ask if you can
> > join the discussion with this patchset [1], specially this one [2]. We're trying
> > to match EC errors with standard linux kernel errors because we think can be
> > helpful.
> >
> > [1] https://lore.kernel.org/patchwork/cover/1276734/
> > [2] https://lore.kernel.org/patchwork/patch/1276738/
>
> Hi Enric,
>
> Thanks, I'll do that. I do wonder sometimes how non-maintainers should
> best support "community" around these things, for subsystems that
> don't have a dedicated mailing list and are therefore sent only to
> maintainers + LKML-fire-hose. I could probably subscribe to LKML and
> filter it, but something tells me my mailbox will still manage to
> explode somehow... Anyway, I digress.
>
> Other than perhaps taking a lesson not to propagate -ENOTSUPP, I don't
> think this series should block on that, as this is a bugfix IMO.
>
My patch will return -EOPNOTSUPP for EC_RES_INVALID_COMMAND, so maybe
you could do the same. In my latest version (not yet submitted) I
extracted the conversion into a separate function, so if your patch is
accepted now I can just add another patch on top of it to start using
that function.
Thanks,
Guenter
On Wed, Jul 22, 2020 at 2:13 PM Guenter Roeck <[email protected]> wrote:
> On Wed, Jul 22, 2020 at 1:50 PM Brian Norris <[email protected]> wrote:
> > Other than perhaps taking a lesson not to propagate -ENOTSUPP, I don't
> > think this series should block on that, as this is a bugfix IMO.
>
> My patch will return -EOPNOTSUPP for EC_RES_INVALID_COMMAND, so maybe
> you could do the same. In my latest version (not yet submitted) I
> extracted the conversion into a separate function, so if your patch is
> accepted now I can just add another patch on top of it to start using
> that function.
Sure, I can use EOPNOTSUPP in v2.
BTW, the error code is completely internal to cros_ec_proto.c in my
patch, so it seems even less-related to your series, unless I got
refactor cros_ec_get_host_event_wake_mask() to use
cros_ec_cmd_xfer_status() instead of send_command(). I'm actually not
sure why we don't do that, now that I think about it...
So WDYT? Should I rebase on your eventual v3 and refactor to
cros_ec_cmd_xfer_status()? Or (re)submit this first, and add one more
cros_ec_cmd_xfer_status() usage for you to tweak in your series?
I don't mind a lot either way, except that I would like to port this
to older kernels soon.
Brian
On Wed, Jul 22, 2020 at 5:43 PM Brian Norris <[email protected]> wrote:
> unless I got
> refactor cros_ec_get_host_event_wake_mask() to use
> cros_ec_cmd_xfer_status() instead of send_command(). I'm actually not
> sure why we don't do that, now that I think about it...
Ah, that would appear to be recursion (cros_ec_query_all() ->
cros_ec_get_host_event_wake_mask() -> cros_ec_cmd_xfer_status() -> ...
cros_ec_query_all()), although that could only happen if the first
cros_ec_query_all() doesn't initialize 'proto_version' to something
besides EC_PROTO_VERSION_UNKNOWN. That is only possible if the EC
reports '0' back to us.
I might skip out on that particular refactor for the moment.
Brian
On Wed, Jul 22, 2020 at 5:50 PM Brian Norris <[email protected]> wrote:
>
> On Wed, Jul 22, 2020 at 5:43 PM Brian Norris <[email protected]> wrote:
> > unless I got
> > refactor cros_ec_get_host_event_wake_mask() to use
> > cros_ec_cmd_xfer_status() instead of send_command(). I'm actually not
> > sure why we don't do that, now that I think about it...
>
> Ah, that would appear to be recursion (cros_ec_query_all() ->
> cros_ec_get_host_event_wake_mask() -> cros_ec_cmd_xfer_status() -> ...
> cros_ec_query_all()), although that could only happen if the first
> cros_ec_query_all() doesn't initialize 'proto_version' to something
> besides EC_PROTO_VERSION_UNKNOWN. That is only possible if the EC
> reports '0' back to us.
>
> I might skip out on that particular refactor for the moment.
>
Agreed, better not touch that. As for the order of changes, I agree
that they are independent. Best approach might be to submit yours, and
then we can clean up things a bit more later after my series is in the
tree.
Thanks,
Guenter
> Brian
Hi Brian,
On 22/7/20 22:50, Brian Norris wrote:
> On Wed, Jul 22, 2020 at 3:19 AM Enric Balletbo i Serra
> <[email protected]> wrote:
>>
>> Hi Brian,
>>
>> Thank you for your patch, I'll take a look soon but I'd like to ask if you can
>> join the discussion with this patchset [1], specially this one [2]. We're trying
>> to match EC errors with standard linux kernel errors because we think can be
>> helpful.
>>
>> [1] https://lore.kernel.org/patchwork/cover/1276734/
>> [2] https://lore.kernel.org/patchwork/patch/1276738/
>
> Hi Enric,
>
> Thanks, I'll do that. I do wonder sometimes how non-maintainers should
> best support "community" around these things, for subsystems that
> don't have a dedicated mailing list and are therefore sent only to
> maintainers + LKML-fire-hose. I could probably subscribe to LKML and
> filter it, but something tells me my mailbox will still manage to
> explode somehow... Anyway, I digress.
>
We talked sometimes on having a dedicated ML, but for one reason or the other,
and as we don't have a lot of volume, we didn't set up (could be an option).
Another thing that we can do (although this is specific for you and doesn't
solve the problem with people like you that are interested on this), is add you
as a reviewer in the MAINTAINERS file. The CrOS EC has a lot of subtleties, and
having more ChromeOS people involved in the reviewing upstream is more than welcome.
Regards,
Enric
> Other than perhaps taking a lesson not to propagate -ENOTSUPP, I don't
> think this series should block on that, as this is a bugfix IMO.
>
> Regards,
> Brian
>
Hi Brian,
On 23/7/20 2:43, Brian Norris wrote:
> On Wed, Jul 22, 2020 at 2:13 PM Guenter Roeck <[email protected]> wrote:
>> On Wed, Jul 22, 2020 at 1:50 PM Brian Norris <[email protected]> wrote:
>>> Other than perhaps taking a lesson not to propagate -ENOTSUPP, I don't
>>> think this series should block on that, as this is a bugfix IMO.
>>
>> My patch will return -EOPNOTSUPP for EC_RES_INVALID_COMMAND, so maybe
>> you could do the same. In my latest version (not yet submitted) I
>> extracted the conversion into a separate function, so if your patch is
>> accepted now I can just add another patch on top of it to start using
>> that function.
>
> Sure, I can use EOPNOTSUPP in v2.
>
Yes, please, can you send a v2 using EOPNOTSUPP
> BTW, the error code is completely internal to cros_ec_proto.c in my
> patch, so it seems even less-related to your series, unless I got
> refactor cros_ec_get_host_event_wake_mask() to use
> cros_ec_cmd_xfer_status() instead of send_command(). I'm actually not
> sure why we don't do that, now that I think about it...
>
> So WDYT? Should I rebase on your eventual v3 and refactor to
> cros_ec_cmd_xfer_status()? Or (re)submit this first, and add one more
> cros_ec_cmd_xfer_status() usage for you to tweak in your series?
>
No need to rebase on top of Guenter patches, as I plan to pick your patches first.
Regards,
Enric
> I don't mind a lot either way, except that I would like to port this
> to older kernels soon.
>
> Brian
>
On Thu, Jul 23, 2020 at 1:04 AM Enric Balletbo i Serra
<[email protected]> wrote:
> Another thing that we can do (although this is specific for you and doesn't
> solve the problem with people like you that are interested on this), is add you
> as a reviewer in the MAINTAINERS file. The CrOS EC has a lot of subtleties, and
> having more ChromeOS people involved in the reviewing upstream is more than welcome.
I'll think about that. If my lore.kernel.org searches over the last 4
months are correct, there's been about 50 patches a month, which is a
bit much as a side project. (I don't regularly work on EC stuff these
days.) It still might be better to subscribe and filter on my own
("opt in"), rather than pretend that I'm going to pay any attention to
50 patches a month.
Brian