2020-09-07 16:28:37

by Markus Boehme

[permalink] [raw]
Subject: [PATCH 2/3] ipmi: Add timeout waiting for device GUID

We have observed hosts with misbehaving BMCs that receive a Get Device
GUID command but don't respond. This leads to an indefinite wait in the
ipmi_msghandler's __get_guid function, showing up as hung task messages
for modprobe.

According to IPMI 2.0 specification chapter 20, the implementation of
the Get Device GUID command is optional. Therefore, add a timeout to
waiting for its response and treat the lack of one the same as missing a
device GUID.

Signed-off-by: Stefan Nuernberger <[email protected]>
Signed-off-by: Markus Boehme <[email protected]>
---
drivers/char/ipmi/ipmi_msghandler.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
index 2b213c9..2a2e8b2 100644
--- a/drivers/char/ipmi/ipmi_msghandler.c
+++ b/drivers/char/ipmi/ipmi_msghandler.c
@@ -3184,18 +3184,26 @@ static void guid_handler(struct ipmi_smi *intf, struct ipmi_recv_msg *msg)

static void __get_guid(struct ipmi_smi *intf)
{
- int rv;
+ long rv;
struct bmc_device *bmc = intf->bmc;

bmc->dyn_guid_set = 2;
intf->null_user_handler = guid_handler;
rv = send_guid_cmd(intf, 0);
- if (rv)
+ if (rv) {
/* Send failed, no GUID available. */
bmc->dyn_guid_set = 0;
- else
- wait_event(intf->waitq, bmc->dyn_guid_set != 2);
+ goto out;
+ }

+ rv = wait_event_timeout(intf->waitq, bmc->dyn_guid_set != 2, 5 * HZ);
+ if (rv == 0) {
+ dev_warn_once(intf->si_dev,
+ "Timed out waiting for GUID. Assuming GUID is not available.\n");
+ bmc->dyn_guid_set = 0;
+ }
+
+out:
/* dyn_guid_set makes the guid data available. */
smp_rmb();

--
2.7.4


2020-09-08 00:11:48

by Corey Minyard

[permalink] [raw]
Subject: Re: [PATCH 2/3] ipmi: Add timeout waiting for device GUID

On Mon, Sep 07, 2020 at 06:25:36PM +0200, Markus Boehme wrote:
> We have observed hosts with misbehaving BMCs that receive a Get Device
> GUID command but don't respond. This leads to an indefinite wait in the
> ipmi_msghandler's __get_guid function, showing up as hung task messages
> for modprobe.
>
> According to IPMI 2.0 specification chapter 20, the implementation of
> the Get Device GUID command is optional. Therefore, add a timeout to
> waiting for its response and treat the lack of one the same as missing a
> device GUID.

This patch looks good. It's a little bit of a rewrite, but the reasons
are obvious.

-corey

>
> Signed-off-by: Stefan Nuernberger <[email protected]>
> Signed-off-by: Markus Boehme <[email protected]>
> ---
> drivers/char/ipmi/ipmi_msghandler.c | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
> index 2b213c9..2a2e8b2 100644
> --- a/drivers/char/ipmi/ipmi_msghandler.c
> +++ b/drivers/char/ipmi/ipmi_msghandler.c
> @@ -3184,18 +3184,26 @@ static void guid_handler(struct ipmi_smi *intf, struct ipmi_recv_msg *msg)
>
> static void __get_guid(struct ipmi_smi *intf)
> {
> - int rv;
> + long rv;
> struct bmc_device *bmc = intf->bmc;
>
> bmc->dyn_guid_set = 2;
> intf->null_user_handler = guid_handler;
> rv = send_guid_cmd(intf, 0);
> - if (rv)
> + if (rv) {
> /* Send failed, no GUID available. */
> bmc->dyn_guid_set = 0;
> - else
> - wait_event(intf->waitq, bmc->dyn_guid_set != 2);
> + goto out;
> + }
>
> + rv = wait_event_timeout(intf->waitq, bmc->dyn_guid_set != 2, 5 * HZ);
> + if (rv == 0) {
> + dev_warn_once(intf->si_dev,
> + "Timed out waiting for GUID. Assuming GUID is not available.\n");
> + bmc->dyn_guid_set = 0;
> + }
> +
> +out:
> /* dyn_guid_set makes the guid data available. */
> smp_rmb();
>
> --
> 2.7.4
>