2010-07-27 16:01:44

by Matthew Garrett

[permalink] [raw]
Subject: [PATCH] ipmi: Run a dummy command before submitting a new command

Newer firmware revisions on HP's ILO3 (1.05 and later) generate state
machine errors with the current IPMI code. Running through the IPMI
timeout handler once before submitting the command avoids this.

Signed-off-by: Matthew Garrett <[email protected]>
Cc: Corey Minyard <[email protected]>
---
drivers/char/ipmi/ipmi_si_intf.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
index e39a744..3f06199 100644
--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -317,6 +317,7 @@ static int unload_when_empty = 1;
static int add_smi(struct smi_info *smi);
static int try_smi_init(struct smi_info *smi);
static void cleanup_one_si(struct smi_info *to_clean);
+static void smi_timeout(unsigned long data);

static ATOMIC_NOTIFIER_HEAD(xaction_notifier_list);
static int register_xaction_notifier(struct notifier_block *nb)
@@ -897,6 +898,7 @@ static void sender(void *send_info,
#endif

mod_timer(&smi_info->si_timer, jiffies + SI_TIMEOUT_JIFFIES);
+ smi_timeout((unsigned long)smi_info);

if (smi_info->thread)
wake_up_process(smi_info->thread);
--
1.7.1.1


2010-07-27 17:07:23

by Corey Minyard

[permalink] [raw]
Subject: Re: [PATCH] ipmi: Run a dummy command before submitting a new command

I don't think this is the right way to handle the problem. Though it's
not going to break anything, this change is just a hack. We need to
figure out why these machine exhibit this behavior. If it's a bug in
the driver, then we need to fix the driver. If it's a bug in the HP
firmware, then we need to document it well as such, get HP to fix their
firmware, and possibly tie it into the xaction handler that's already in
start_next_msg.

The only interaction with the device that this change should cause is
one read from the status register, since the device should be idle at
this point. If that's the case, and it's not a driver bug, you can try
adding an xaction that calls smi_info->handlers->event(smi_info->si_sm, 0).

There are debugging flags in the state machines that might help debug
this, too.

-corey

On 07/27/2010 11:01 AM, Matthew Garrett wrote:
> Newer firmware revisions on HP's ILO3 (1.05 and later) generate state
> machine errors with the current IPMI code. Running through the IPMI
> timeout handler once before submitting the command avoids this.
>
> Signed-off-by: Matthew Garrett<[email protected]>
> Cc: Corey Minyard<[email protected]>
> ---
> drivers/char/ipmi/ipmi_si_intf.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
> index e39a744..3f06199 100644
> --- a/drivers/char/ipmi/ipmi_si_intf.c
> +++ b/drivers/char/ipmi/ipmi_si_intf.c
> @@ -317,6 +317,7 @@ static int unload_when_empty = 1;
> static int add_smi(struct smi_info *smi);
> static int try_smi_init(struct smi_info *smi);
> static void cleanup_one_si(struct smi_info *to_clean);
> +static void smi_timeout(unsigned long data);
>
> static ATOMIC_NOTIFIER_HEAD(xaction_notifier_list);
> static int register_xaction_notifier(struct notifier_block *nb)
> @@ -897,6 +898,7 @@ static void sender(void *send_info,
> #endif
>
> mod_timer(&smi_info->si_timer, jiffies + SI_TIMEOUT_JIFFIES);
> + smi_timeout((unsigned long)smi_info);
>
> if (smi_info->thread)
> wake_up_process(smi_info->thread);
>

2010-07-27 17:21:10

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH] ipmi: Run a dummy command before submitting a new command

On Tue, Jul 27, 2010 at 12:07:11PM -0500, Corey Minyard wrote:
> I don't think this is the right way to handle the problem. Though it's
> not going to break anything, this change is just a hack. We need to
> figure out why these machine exhibit this behavior. If it's a bug in
> the driver, then we need to fix the driver. If it's a bug in the HP
> firmware, then we need to document it well as such, get HP to fix their
> firmware, and possibly tie it into the xaction handler that's already in
> start_next_msg.

Yeah, I agree that this isn't the optimal approach. I'm waiting to hear
from HP if they have any idea what happened between 1.01 (which worked)
and 1.05 (which is broken), which might give some more insight into what
we're doing wrong.

> The only interaction with the device that this change should cause is
> one read from the status register, since the device should be idle at
> this point. If that's the case, and it's not a driver bug, you can try
> adding an xaction that calls smi_info->handlers->event(smi_info->si_sm,
> 0).

I'll try to see what's going on.

--
Matthew Garrett | [email protected]