2021-02-09 17:16:25

by Eddie James

[permalink] [raw]
Subject: [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC

In the event that the OCC is not initialized when the driver sends a poll
command, the driver may receive an invalid response. This isn't an error
condition unless there is no valid response before the timeout expires. So
change the starting sequence number and check for the un-initialized OCC
state before returning the response in order to detect this condition and
continue waiting if necessary.

Eddie James (4):
fsi: occ: Don't accept response from un-initialized OCC
fsi: occ: Log error for checksum failure
hwmon: (occ) Start sequence number at one
hwmon: (occ) Print response status in first poll error message

drivers/fsi/fsi-occ.c | 11 ++++++++---
drivers/hwmon/occ/common.c | 7 +++++--
2 files changed, 13 insertions(+), 5 deletions(-)

--
2.27.0


2021-02-09 17:16:49

by Eddie James

[permalink] [raw]
Subject: [PATCH 2/4] fsi: occ: Log error for checksum failure

Log an error if the response checksum doesn't match the
calculated checksum.

Signed-off-by: Eddie James <[email protected]>
---
drivers/fsi/fsi-occ.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/fsi/fsi-occ.c b/drivers/fsi/fsi-occ.c
index cb05b6dacc9d..524460995465 100644
--- a/drivers/fsi/fsi-occ.c
+++ b/drivers/fsi/fsi-occ.c
@@ -223,7 +223,8 @@ static const struct file_operations occ_fops = {
.release = occ_release,
};

-static int occ_verify_checksum(struct occ_response *resp, u16 data_length)
+static int occ_verify_checksum(struct occ *occ, struct occ_response *resp,
+ u16 data_length)
{
/* Fetch the two bytes after the data for the checksum. */
u16 checksum_resp = get_unaligned_be16(&resp->data[data_length]);
@@ -238,8 +239,11 @@ static int occ_verify_checksum(struct occ_response *resp, u16 data_length)
for (i = 0; i < data_length; ++i)
checksum += resp->data[i];

- if (checksum != checksum_resp)
+ if (checksum != checksum_resp) {
+ dev_err(occ->dev, "Bad checksum: %04x!=%04x\n", checksum,
+ checksum_resp);
return -EBADMSG;
+ }

return 0;
}
@@ -533,7 +537,7 @@ int fsi_occ_submit(struct device *dev, const void *request, size_t req_len,
}

*resp_len = resp_data_length + 7;
- rc = occ_verify_checksum(resp, resp_data_length);
+ rc = occ_verify_checksum(occ, resp, resp_data_length);

done:
mutex_unlock(&occ->occ_lock);
--
2.27.0

2021-02-09 17:17:25

by Eddie James

[permalink] [raw]
Subject: [PATCH 4/4] hwmon: (occ) Print response status in first poll error message

In order to better debug problems starting up the driver, print
the response status from the OCC in the error logged when the first
poll command fails.

Signed-off-by: Eddie James <[email protected]>
---
drivers/hwmon/occ/common.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/hwmon/occ/common.c b/drivers/hwmon/occ/common.c
index ee0c5d12dfdf..f71d62b57468 100644
--- a/drivers/hwmon/occ/common.c
+++ b/drivers/hwmon/occ/common.c
@@ -1161,8 +1161,9 @@ int occ_setup(struct occ *occ, const char *name)
dev_info(occ->bus_dev, "host is not ready\n");
return rc;
} else if (rc < 0) {
- dev_err(occ->bus_dev, "failed to get OCC poll response: %d\n",
- rc);
+ dev_err(occ->bus_dev,
+ "failed to get OCC poll response=%02x: %d\n",
+ occ->resp.return_status, rc);
return rc;
}

--
2.27.0

2021-02-09 17:18:31

by Eddie James

[permalink] [raw]
Subject: [PATCH 3/4] hwmon: (occ) Start sequence number at one

Initialize the sequence number at one, rather than zero, in order
to prevent false matches with the zero-initialized OCC SRAM
buffer before the OCC is fully initialized.

Signed-off-by: Eddie James <[email protected]>
---
drivers/hwmon/occ/common.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/hwmon/occ/common.c b/drivers/hwmon/occ/common.c
index 7a5e539b567b..ee0c5d12dfdf 100644
--- a/drivers/hwmon/occ/common.c
+++ b/drivers/hwmon/occ/common.c
@@ -1150,6 +1150,8 @@ int occ_setup(struct occ *occ, const char *name)
{
int rc;

+ /* start with 1 to avoid false match with zero-initialized SRAM buffer */
+ occ->seq_no = 1;
mutex_init(&occ->lock);
occ->groups[0] = &occ->group;

--
2.27.0

2021-02-09 20:23:25

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 4/4] hwmon: (occ) Print response status in first poll error message

On Tue, Feb 09, 2021 at 11:12:35AM -0600, Eddie James wrote:
> In order to better debug problems starting up the driver, print
> the response status from the OCC in the error logged when the first
> poll command fails.
>
> Signed-off-by: Eddie James <[email protected]>

Acked-by: Guenter Roeck <[email protected]>

> ---
> drivers/hwmon/occ/common.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hwmon/occ/common.c b/drivers/hwmon/occ/common.c
> index ee0c5d12dfdf..f71d62b57468 100644
> --- a/drivers/hwmon/occ/common.c
> +++ b/drivers/hwmon/occ/common.c
> @@ -1161,8 +1161,9 @@ int occ_setup(struct occ *occ, const char *name)
> dev_info(occ->bus_dev, "host is not ready\n");
> return rc;
> } else if (rc < 0) {
> - dev_err(occ->bus_dev, "failed to get OCC poll response: %d\n",
> - rc);
> + dev_err(occ->bus_dev,
> + "failed to get OCC poll response=%02x: %d\n",
> + occ->resp.return_status, rc);
> return rc;
> }
>
> --
> 2.27.0
>

2021-02-10 07:53:46

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 3/4] hwmon: (occ) Start sequence number at one

On Tue, Feb 09, 2021 at 11:12:34AM -0600, Eddie James wrote:
> Initialize the sequence number at one, rather than zero, in order
> to prevent false matches with the zero-initialized OCC SRAM
> buffer before the OCC is fully initialized.
>
> Signed-off-by: Eddie James <[email protected]>

Acked-by: Guenter Roeck <[email protected]>

For now I'll assume that the series has to be submitted together,
and that this won't happen through the hwmon branch.

Guenter

> ---
> drivers/hwmon/occ/common.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/hwmon/occ/common.c b/drivers/hwmon/occ/common.c
> index 7a5e539b567b..ee0c5d12dfdf 100644
> --- a/drivers/hwmon/occ/common.c
> +++ b/drivers/hwmon/occ/common.c
> @@ -1150,6 +1150,8 @@ int occ_setup(struct occ *occ, const char *name)
> {
> int rc;
>
> + /* start with 1 to avoid false match with zero-initialized SRAM buffer */
> + occ->seq_no = 1;
> mutex_init(&occ->lock);
> occ->groups[0] = &occ->group;
>
> --
> 2.27.0
>

2021-04-05 23:54:33

by Eddie James

[permalink] [raw]
Subject: Re: [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC

On Tue, 2021-02-09 at 11:12 -0600, Eddie James wrote:
> In the event that the OCC is not initialized when the driver sends a
> poll
> command, the driver may receive an invalid response. This isn't an
> error
> condition unless there is no valid response before the timeout
> expires. So
> change the starting sequence number and check for the un-initialized
> OCC
> state before returning the response in order to detect this condition
> and
> continue waiting if necessary.

Hi Joel,

Do you have any comments on the FSI side of this series?

Thanks,
Eddie

>
> Eddie James (4):
> fsi: occ: Don't accept response from un-initialized OCC
> fsi: occ: Log error for checksum failure
> hwmon: (occ) Start sequence number at one
> hwmon: (occ) Print response status in first poll error message
>
> drivers/fsi/fsi-occ.c | 11 ++++++++---
> drivers/hwmon/occ/common.c | 7 +++++--
> 2 files changed, 13 insertions(+), 5 deletions(-)
>

2021-04-06 16:13:04

by Joel Stanley

[permalink] [raw]
Subject: Re: [PATCH 2/4] fsi: occ: Log error for checksum failure

On Tue, 9 Feb 2021 at 17:13, Eddie James <[email protected]> wrote:
>
> Log an error if the response checksum doesn't match the
> calculated checksum.

Reviewed-by: Joel Stanley <[email protected]>

>
> Signed-off-by: Eddie James <[email protected]>
> ---
> drivers/fsi/fsi-occ.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/fsi/fsi-occ.c b/drivers/fsi/fsi-occ.c
> index cb05b6dacc9d..524460995465 100644
> --- a/drivers/fsi/fsi-occ.c
> +++ b/drivers/fsi/fsi-occ.c
> @@ -223,7 +223,8 @@ static const struct file_operations occ_fops = {
> .release = occ_release,
> };
>
> -static int occ_verify_checksum(struct occ_response *resp, u16 data_length)
> +static int occ_verify_checksum(struct occ *occ, struct occ_response *resp,
> + u16 data_length)
> {
> /* Fetch the two bytes after the data for the checksum. */
> u16 checksum_resp = get_unaligned_be16(&resp->data[data_length]);
> @@ -238,8 +239,11 @@ static int occ_verify_checksum(struct occ_response *resp, u16 data_length)
> for (i = 0; i < data_length; ++i)
> checksum += resp->data[i];
>
> - if (checksum != checksum_resp)
> + if (checksum != checksum_resp) {
> + dev_err(occ->dev, "Bad checksum: %04x!=%04x\n", checksum,
> + checksum_resp);

Just confirming that this is unexpected, we won't see this eg. if the
system is booting or when the BMC is reset while the host is running?

> return -EBADMSG;
> + }
>
> return 0;
> }
> @@ -533,7 +537,7 @@ int fsi_occ_submit(struct device *dev, const void *request, size_t req_len,
> }
>
> *resp_len = resp_data_length + 7;
> - rc = occ_verify_checksum(resp, resp_data_length);
> + rc = occ_verify_checksum(occ, resp, resp_data_length);
>
> done:
> mutex_unlock(&occ->occ_lock);
> --
> 2.27.0
>

2021-04-06 16:40:23

by Joel Stanley

[permalink] [raw]
Subject: Re: [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC

On Mon, 5 Apr 2021 at 15:34, Eddie James <[email protected]> wrote:
>
> On Tue, 2021-02-09 at 11:12 -0600, Eddie James wrote:
> > In the event that the OCC is not initialized when the driver sends a
> > poll
> > command, the driver may receive an invalid response. This isn't an
> > error
> > condition unless there is no valid response before the timeout
> > expires. So
> > change the starting sequence number and check for the un-initialized
> > OCC
> > state before returning the response in order to detect this condition
> > and
> > continue waiting if necessary.
>
> Hi Joel,
>
> Do you have any comments on the FSI side of this series?

They look fine to me. Guenter, if you want to take the series through
the hwmon tree I do not anticipate any conflicts.

Cheers,

Joel