2004-06-04 21:03:12

by Corey Minyard

[permalink] [raw]
Subject: Re: IPMI hangup in 2.6.6-rc3

Strange. I don't know what changed to cause this.

The best bet is to use printks to trace this back to see where the
message is being lost (handle_bmc_rsp, handle_new_recv_msg, and so forth).

-Corey

Philipp Matthias Hahn wrote:

>Hi Holger, Corey, LKML!
>
>On Tue, May 04, 2004 at 07:05:12AM +0200, Holger Kiehl wrote:
>
>
>>When compiling in IPMI (not as modules) my system hangs just after
>>it prints out detection of IPMI. 2.6.5 did work fine. Compiling
>>it as a module and inserting it with modprobe causes modprobe
>>to hang in D state, there is nothing unusual in /var/log/messages:
>>
>>May 4 08:46:34 apollo kernel: ipmi message handler version v31
>>May 4 08:46:34 apollo kernel: IPMI System Interface driver version v31, KCS version v31, SMIC version v31, BT version v31
>>May 4 08:46:34 apollo kernel: ipmi_si: Found SMBIOS-specified state machine at I/O address 0xca2
>>May 4 08:54:14 apollo kernel: ipmi device interface version v31
>>
>>
>
>Same for me on one of our single Xeon with 2.6.7-rc1. Using SysRq-T I
>was able to track it somehow down to the following situation:
>
>modprobe D C201AD20 0 3735 2415 (NOTLB)
>f6b51f0c 00000082 00000000 c201ad20 c201a0a0 00008124 00000002 00000000
> f7c03000 f6b51ee4 f8bda434 c04b5dc0 c201ad20 00000000 00000000 8ce9c0c0
> 000f431a c03b8d80 f75fccd0 f75fce80 00000246 00000003 f6b50000 f7c03000
>Call Trace:
> [<f8bdb46d>] ipmi_register_smi+0x22a/0x386 [ipmi_msghandler]
> [<f8b570a6>] init_one_smi+0x1e6/0x4c2 [ipmi_si]
> [<f8b270c2>] init_ipmi_si+0xc2/0x203 [ipmi_si]
> [<c0137910>] sys_init_module+0x116/0x24d
> [<c0106053>] syscall_call+0x7/0xb
>
>modprobe hangs at linux-2.6.7-rc1/drivers/char/ipmi/ipmi_msghandler.c:1727
> wait_event((*intf)->waitq, ((*intf)->curr_channel>=IPMI_MAX_CHANNELS));
>
>This event should be fired by channel_handler(), but isn't for some
>unknown reason. I verified this by adding some printk() there, which
>wheren't shown.
>
>When I tried a 2.4 kernel with the patches from openipmi.sf.net, I
>was somehow able to use IPMI, but got into problems later.
>
>Any idea, what I can do to track the problem further down?
>
>BYtE
>Philipp
>
>



2004-06-10 17:12:20

by Alex Williamson

[permalink] [raw]
Subject: Re: IPMI hangup in 2.6.6-rc3


I was seeing a hang on an hp rx8620 ia64 system as well. I'd get up
to here:

ipmi message handler version v31
ipmi device interface version v31
IPMI System Interface driver version v31, KCS version v31, SMIC version
v31, BT version v31
ipmi_si: ACPI/SPMI specifies "bt" memory SI @ 0xffc30040000
ipmi_si: ipmi_si unable to claim interrupt 17, running polled

Then hang forever. The hang appears to be because the timer for polling
is setup after the call to ipmi_register_smi(). If I add the following
chunk of code just before the call to ipmi_register_smi(), I get past
the hang:

new_smi->timer_stopped = 0;
init_timer(&(new_smi->si_timer));
new_smi->si_timer.data = (long) new_smi;
new_smi->si_timer.function = smi_timeout;
new_smi->last_timeout_jiffies = jiffies;
new_smi->si_timer.expires = jiffies + SI_TIMEOUT_JIFFIES;
add_timer(&(new_smi->si_timer));

(I commented out the corresponding lines towards the end of
init_one_smi()) I'm not sure the IPMI interface actually works (it does
using the v30 patch on a 2.4 kernel), but at least the driver doesn't
hangup the box. Anyway, looks like the polling handler needs to get
setup earlier or we'll wait forever. Thanks,

Alex

On Fri, 2004-06-04 at 14:59, Corey Minyard wrote:
> Strange. I don't know what changed to cause this.
>
> The best bet is to use printks to trace this back to see where the
> message is being lost (handle_bmc_rsp, handle_new_recv_msg, and so forth).
>
> -Corey
>
> Philipp Matthias Hahn wrote:
>
> >Hi Holger, Corey, LKML!
> >
> >On Tue, May 04, 2004 at 07:05:12AM +0200, Holger Kiehl wrote:
> >
> >
> >>When compiling in IPMI (not as modules) my system hangs just after
> >>it prints out detection of IPMI. 2.6.5 did work fine. Compiling
> >>it as a module and inserting it with modprobe causes modprobe
> >>to hang in D state, there is nothing unusual in /var/log/messages:
> >>
> >>May 4 08:46:34 apollo kernel: ipmi message handler version v31
> >>May 4 08:46:34 apollo kernel: IPMI System Interface driver version v31, KCS version v31, SMIC version v31, BT version v31
> >>May 4 08:46:34 apollo kernel: ipmi_si: Found SMBIOS-specified state machine at I/O address 0xca2
> >>May 4 08:54:14 apollo kernel: ipmi device interface version v31
> >>
> >>
> >
> >Same for me on one of our single Xeon with 2.6.7-rc1. Using SysRq-T I
> >was able to track it somehow down to the following situation:
> >
> >modprobe D C201AD20 0 3735 2415 (NOTLB)
> >f6b51f0c 00000082 00000000 c201ad20 c201a0a0 00008124 00000002 00000000
> > f7c03000 f6b51ee4 f8bda434 c04b5dc0 c201ad20 00000000 00000000 8ce9c0c0
> > 000f431a c03b8d80 f75fccd0 f75fce80 00000246 00000003 f6b50000 f7c03000
> >Call Trace:
> > [<f8bdb46d>] ipmi_register_smi+0x22a/0x386 [ipmi_msghandler]
> > [<f8b570a6>] init_one_smi+0x1e6/0x4c2 [ipmi_si]
> > [<f8b270c2>] init_ipmi_si+0xc2/0x203 [ipmi_si]
> > [<c0137910>] sys_init_module+0x116/0x24d
> > [<c0106053>] syscall_call+0x7/0xb
> >
> >modprobe hangs at linux-2.6.7-rc1/drivers/char/ipmi/ipmi_msghandler.c:1727
> > wait_event((*intf)->waitq, ((*intf)->curr_channel>=IPMI_MAX_CHANNELS));
> >
> >This event should be fired by channel_handler(), but isn't for some
> >unknown reason. I verified this by adding some printk() there, which
> >wheren't shown.
> >
> >When I tried a 2.4 kernel with the patches from openipmi.sf.net, I
> >was somehow able to use IPMI, but got into problems later.
> >
> >Any idea, what I can do to track the problem further down?
> >
> >BYtE
> >Philipp
> >
> >
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2004-06-16 12:32:21

by Holger Kiehl

[permalink] [raw]
Subject: Re: IPMI hangup in 2.6.6-rc3

Hello

This patch fixes the hangup for me too using 2.6.7. However I cannot verify
if it runs properly since ipmitool crashes, I think that is because the
kernel interface has changed. Using the ipmi driver from 2.6.5 ipmitool
works fine.

Holger

On Thu, 10 Jun 2004, Alex Williamson wrote:

>
> I was seeing a hang on an hp rx8620 ia64 system as well. I'd get up
> to here:
>
> ipmi message handler version v31
> ipmi device interface version v31
> IPMI System Interface driver version v31, KCS version v31, SMIC version
> v31, BT version v31
> ipmi_si: ACPI/SPMI specifies "bt" memory SI @ 0xffc30040000
> ipmi_si: ipmi_si unable to claim interrupt 17, running polled
>
> Then hang forever. The hang appears to be because the timer for polling
> is setup after the call to ipmi_register_smi(). If I add the following
> chunk of code just before the call to ipmi_register_smi(), I get past
> the hang:
>
> new_smi->timer_stopped = 0;
> init_timer(&(new_smi->si_timer));
> new_smi->si_timer.data = (long) new_smi;
> new_smi->si_timer.function = smi_timeout;
> new_smi->last_timeout_jiffies = jiffies;
> new_smi->si_timer.expires = jiffies + SI_TIMEOUT_JIFFIES;
> add_timer(&(new_smi->si_timer));
>
> (I commented out the corresponding lines towards the end of
> init_one_smi()) I'm not sure the IPMI interface actually works (it does
> using the v30 patch on a 2.4 kernel), but at least the driver doesn't
> hangup the box. Anyway, looks like the polling handler needs to get
> setup earlier or we'll wait forever. Thanks,
>
> Alex
>
> On Fri, 2004-06-04 at 14:59, Corey Minyard wrote:
> > Strange. I don't know what changed to cause this.
> >
> > The best bet is to use printks to trace this back to see where the
> > message is being lost (handle_bmc_rsp, handle_new_recv_msg, and so forth).
> >
> > -Corey
> >
> > Philipp Matthias Hahn wrote:
> >
> > >Hi Holger, Corey, LKML!
> > >
> > >On Tue, May 04, 2004 at 07:05:12AM +0200, Holger Kiehl wrote:
> > >
> > >
> > >>When compiling in IPMI (not as modules) my system hangs just after
> > >>it prints out detection of IPMI. 2.6.5 did work fine. Compiling
> > >>it as a module and inserting it with modprobe causes modprobe
> > >>to hang in D state, there is nothing unusual in /var/log/messages:
> > >>
> > >>May 4 08:46:34 apollo kernel: ipmi message handler version v31
> > >>May 4 08:46:34 apollo kernel: IPMI System Interface driver version v31, KCS version v31, SMIC version v31, BT version v31
> > >>May 4 08:46:34 apollo kernel: ipmi_si: Found SMBIOS-specified state machine at I/O address 0xca2
> > >>May 4 08:54:14 apollo kernel: ipmi device interface version v31
> > >>
> > >>
> > >
> > >Same for me on one of our single Xeon with 2.6.7-rc1. Using SysRq-T I
> > >was able to track it somehow down to the following situation:
> > >
> > >modprobe D C201AD20 0 3735 2415 (NOTLB)
> > >f6b51f0c 00000082 00000000 c201ad20 c201a0a0 00008124 00000002 00000000
> > > f7c03000 f6b51ee4 f8bda434 c04b5dc0 c201ad20 00000000 00000000 8ce9c0c0
> > > 000f431a c03b8d80 f75fccd0 f75fce80 00000246 00000003 f6b50000 f7c03000
> > >Call Trace:
> > > [<f8bdb46d>] ipmi_register_smi+0x22a/0x386 [ipmi_msghandler]
> > > [<f8b570a6>] init_one_smi+0x1e6/0x4c2 [ipmi_si]
> > > [<f8b270c2>] init_ipmi_si+0xc2/0x203 [ipmi_si]
> > > [<c0137910>] sys_init_module+0x116/0x24d
> > > [<c0106053>] syscall_call+0x7/0xb
> > >
> > >modprobe hangs at linux-2.6.7-rc1/drivers/char/ipmi/ipmi_msghandler.c:1727
> > > wait_event((*intf)->waitq, ((*intf)->curr_channel>=IPMI_MAX_CHANNELS));
> > >
> > >This event should be fired by channel_handler(), but isn't for some
> > >unknown reason. I verified this by adding some printk() there, which
> > >wheren't shown.
> > >
> > >When I tried a 2.4 kernel with the patches from openipmi.sf.net, I
> > >was somehow able to use IPMI, but got into problems later.
> > >
> > >Any idea, what I can do to track the problem further down?
> > >
> > >BYtE
> > >Philipp
> > >
> > >
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>

2004-06-16 14:19:25

by Corey Minyard

[permalink] [raw]
Subject: Re: IPMI hangup in 2.6.6-rc3

diff -ur linux.orig/drivers/char/ipmi/ipmi_msghandler.c linux/drivers/char/ipmi/ipmi_msghandler.c
--- linux.orig/drivers/char/ipmi/ipmi_msghandler.c 2004-06-14 22:32:03.000000000 -0500
+++ linux/drivers/char/ipmi/ipmi_msghandler.c 2004-06-16 09:13:07.000000000 -0500
@@ -1648,6 +1648,22 @@
/* It's the one we want */
if (msg->rsp[2] != 0) {
/* Got an error from the channel, just go on. */
+
+ if (msg->rsp[2] == IPMI_INVALID_COMMAND_ERR) {
+ /* If the MC does not support this
+ command, that is legal. We just
+ assume it has one IPMB at channel
+ zero. */
+ intf->channels[0].medium
+ = IPMI_CHANNEL_MEDIUM_IPMB;
+ intf->channels[0].protocol
+ = IPMI_CHANNEL_PROTOCOL_IPMB;
+ rv = -ENOSYS;
+
+ intf->curr_channel = IPMI_MAX_CHANNELS;
+ wake_up(&intf->waitq);
+ goto out;
+ }
goto next_channel;
}
if (msg->rsp_size < 6) {
@@ -1671,10 +1687,12 @@
wake_up(&intf->waitq);

printk(KERN_WARNING "ipmi_msghandler: Error sending"
- "channel information: 0x%x\n",
+ "channel information: %d\n",
rv);
}
}
+ out:
+ return;
}

int ipmi_register_smi(struct ipmi_smi_handlers *handlers,
diff -ur linux.orig/drivers/char/ipmi/ipmi_si_intf.c linux/drivers/char/ipmi/ipmi_si_intf.c
--- linux.orig/drivers/char/ipmi/ipmi_si_intf.c 2004-06-14 22:32:03.000000000 -0500
+++ linux/drivers/char/ipmi/ipmi_si_intf.c 2004-06-16 08:51:29.000000000 -0500
@@ -1848,6 +1848,21 @@
atomic_set(&new_smi->req_events, 0);
new_smi->run_to_completion = 0;

+ new_smi->interrupt_disabled = 0;
+ new_smi->timer_stopped = 0;
+ new_smi->stop_operation = 0;
+
+ /* The ipmi_register_smi() code does some operations to
+ determine the channel information, so we must be ready to
+ handle operations before it is called. This means we have
+ to stop the timer if we get an error after this point. */
+ init_timer(&(new_smi->si_timer));
+ new_smi->si_timer.data = (long) new_smi;
+ new_smi->si_timer.function = smi_timeout;
+ new_smi->last_timeout_jiffies = jiffies;
+ new_smi->si_timer.expires = jiffies + SI_TIMEOUT_JIFFIES;
+ add_timer(&(new_smi->si_timer));
+
rv = ipmi_register_smi(&handlers,
new_smi,
new_smi->ipmi_version_major,
@@ -1857,7 +1872,7 @@
printk(KERN_ERR
"ipmi_si: Unable to register device: error %d\n",
rv);
- goto out_err;
+ goto out_err_stop_timer;
}

rv = ipmi_smi_add_proc_entry(new_smi->intf, "type",
@@ -1867,7 +1882,7 @@
printk(KERN_ERR
"ipmi_si: Unable to create proc entry: %d\n",
rv);
- goto out_err;
+ goto out_err_stop_timer;
}

rv = ipmi_smi_add_proc_entry(new_smi->intf, "si_stats",
@@ -1877,7 +1892,7 @@
printk(KERN_ERR
"ipmi_si: Unable to create proc entry: %d\n",
rv);
- goto out_err;
+ goto out_err_stop_timer;
}

start_clear_flags(new_smi);
@@ -1886,34 +1901,40 @@
if (new_smi->irq)
new_smi->si_state = SI_CLEARING_FLAGS_THEN_SET_IRQ;

- new_smi->interrupt_disabled = 0;
- new_smi->timer_stopped = 0;
- new_smi->stop_operation = 0;
-
- init_timer(&(new_smi->si_timer));
- new_smi->si_timer.data = (long) new_smi;
- new_smi->si_timer.function = smi_timeout;
- new_smi->last_timeout_jiffies = jiffies;
- new_smi->si_timer.expires = jiffies + SI_TIMEOUT_JIFFIES;
- add_timer(&(new_smi->si_timer));
-
*smi = new_smi;

printk(" IPMI %s interface initialized\n", si_type[intf_num]);

return 0;

+ out_err_stop_timer:
+ new_smi->stop_operation = 1;
+
+ /* Wait for the timer to stop. This avoids problems with race
+ conditions removing the timer here. */
+ while (!new_smi->timer_stopped) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ schedule_timeout(1);
+ }
+
out_err:
if (new_smi->intf)
ipmi_unregister_smi(new_smi->intf);

new_smi->irq_cleanup(new_smi);
+
+ /* Wait until we know that we are out of any interrupt
+ handlers might have been running before we freed the
+ interrupt. */
+ synchronize_kernel();
+
if (new_smi->si_sm) {
if (new_smi->handlers)
new_smi->handlers->cleanup(new_smi->si_sm);
kfree(new_smi->si_sm);
}
new_smi->io_cleanup(new_smi);
+
return rv;
}


Attachments:
ipmi-fixhang.diff (4.07 kB)

2004-06-16 14:22:48

by Corey Minyard

[permalink] [raw]
Subject: Re: IPMI hangup in 2.6.6-rc3

diff -ur linux.orig/drivers/char/ipmi/ipmi_msghandler.c linux/drivers/char/ipmi/ipmi_msghandler.c
--- linux.orig/drivers/char/ipmi/ipmi_msghandler.c 2004-06-14 22:32:03.000000000 -0500
+++ linux/drivers/char/ipmi/ipmi_msghandler.c 2004-06-16 09:13:07.000000000 -0500
@@ -1648,6 +1648,22 @@
/* It's the one we want */
if (msg->rsp[2] != 0) {
/* Got an error from the channel, just go on. */
+
+ if (msg->rsp[2] == IPMI_INVALID_COMMAND_ERR) {
+ /* If the MC does not support this
+ command, that is legal. We just
+ assume it has one IPMB at channel
+ zero. */
+ intf->channels[0].medium
+ = IPMI_CHANNEL_MEDIUM_IPMB;
+ intf->channels[0].protocol
+ = IPMI_CHANNEL_PROTOCOL_IPMB;
+ rv = -ENOSYS;
+
+ intf->curr_channel = IPMI_MAX_CHANNELS;
+ wake_up(&intf->waitq);
+ goto out;
+ }
goto next_channel;
}
if (msg->rsp_size < 6) {
@@ -1671,10 +1687,12 @@
wake_up(&intf->waitq);

printk(KERN_WARNING "ipmi_msghandler: Error sending"
- "channel information: 0x%x\n",
+ "channel information: %d\n",
rv);
}
}
+ out:
+ return;
}

int ipmi_register_smi(struct ipmi_smi_handlers *handlers,
diff -ur linux.orig/drivers/char/ipmi/ipmi_si_intf.c linux/drivers/char/ipmi/ipmi_si_intf.c
--- linux.orig/drivers/char/ipmi/ipmi_si_intf.c 2004-06-14 22:32:03.000000000 -0500
+++ linux/drivers/char/ipmi/ipmi_si_intf.c 2004-06-16 08:51:29.000000000 -0500
@@ -1848,6 +1848,21 @@
atomic_set(&new_smi->req_events, 0);
new_smi->run_to_completion = 0;

+ new_smi->interrupt_disabled = 0;
+ new_smi->timer_stopped = 0;
+ new_smi->stop_operation = 0;
+
+ /* The ipmi_register_smi() code does some operations to
+ determine the channel information, so we must be ready to
+ handle operations before it is called. This means we have
+ to stop the timer if we get an error after this point. */
+ init_timer(&(new_smi->si_timer));
+ new_smi->si_timer.data = (long) new_smi;
+ new_smi->si_timer.function = smi_timeout;
+ new_smi->last_timeout_jiffies = jiffies;
+ new_smi->si_timer.expires = jiffies + SI_TIMEOUT_JIFFIES;
+ add_timer(&(new_smi->si_timer));
+
rv = ipmi_register_smi(&handlers,
new_smi,
new_smi->ipmi_version_major,
@@ -1857,7 +1872,7 @@
printk(KERN_ERR
"ipmi_si: Unable to register device: error %d\n",
rv);
- goto out_err;
+ goto out_err_stop_timer;
}

rv = ipmi_smi_add_proc_entry(new_smi->intf, "type",
@@ -1867,7 +1882,7 @@
printk(KERN_ERR
"ipmi_si: Unable to create proc entry: %d\n",
rv);
- goto out_err;
+ goto out_err_stop_timer;
}

rv = ipmi_smi_add_proc_entry(new_smi->intf, "si_stats",
@@ -1877,7 +1892,7 @@
printk(KERN_ERR
"ipmi_si: Unable to create proc entry: %d\n",
rv);
- goto out_err;
+ goto out_err_stop_timer;
}

start_clear_flags(new_smi);
@@ -1886,34 +1901,40 @@
if (new_smi->irq)
new_smi->si_state = SI_CLEARING_FLAGS_THEN_SET_IRQ;

- new_smi->interrupt_disabled = 0;
- new_smi->timer_stopped = 0;
- new_smi->stop_operation = 0;
-
- init_timer(&(new_smi->si_timer));
- new_smi->si_timer.data = (long) new_smi;
- new_smi->si_timer.function = smi_timeout;
- new_smi->last_timeout_jiffies = jiffies;
- new_smi->si_timer.expires = jiffies + SI_TIMEOUT_JIFFIES;
- add_timer(&(new_smi->si_timer));
-
*smi = new_smi;

printk(" IPMI %s interface initialized\n", si_type[intf_num]);

return 0;

+ out_err_stop_timer:
+ new_smi->stop_operation = 1;
+
+ /* Wait for the timer to stop. This avoids problems with race
+ conditions removing the timer here. */
+ while (!new_smi->timer_stopped) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ schedule_timeout(1);
+ }
+
out_err:
if (new_smi->intf)
ipmi_unregister_smi(new_smi->intf);

new_smi->irq_cleanup(new_smi);
+
+ /* Wait until we know that we are out of any interrupt
+ handlers might have been running before we freed the
+ interrupt. */
+ synchronize_kernel();
+
if (new_smi->si_sm) {
if (new_smi->handlers)
new_smi->handlers->cleanup(new_smi->si_sm);
kfree(new_smi->si_sm);
}
new_smi->io_cleanup(new_smi);
+
return rv;
}

--- linux.orig/include/linux/ipmi_msgdefs.h 2004-05-21 11:49:05.000000000 -0500
+++ linux/include/linux/ipmi_msgdefs.h 2004-06-16 09:04:26.000000000 -0500
@@ -71,6 +71,7 @@

#define IPMI_CC_NO_ERROR 0x00
#define IPMI_NODE_BUSY_ERR 0xc0
+#define IPMI_INVALID_COMMAND_ERR 0xc1
#define IPMI_ERR_MSG_TRUNCATED 0xc6
#define IPMI_LOST_ARBITRATION_ERR 0x81
#define IPMI_ERR_UNSPECIFIED 0xff


Attachments:
ipmi-fixhang.diff (4.46 kB)

2004-06-16 14:44:31

by Alex Williamson

[permalink] [raw]
Subject: Re: IPMI hangup in 2.6.6-rc3


That appears to do the trick on my rx8620:

ipmi message handler version v31
ipmi device interface version v31
IPMI System Interface driver version v31, KCS version v31, SMIC version
v31, BT version v31
ipmi_si: ACPI/SPMI specifies "bt" memory SI @ 0xffc30040000
ipmi_si: ipmi_si unable to claim interrupt 17, running polled
IPMI bt interface initialized

I still can't confirm whether or not the interface works, but this is
definitely better than before. Thanks,

Alex


On Wed, 2004-06-16 at 08:21, Corey Minyard wrote:
> I missed a part of the patch, here is a new one with the include file
> changes.
>
> -Corey
>
> Corey Minyard wrote:
>
> > Unfortuantely, that fix has some problems, but it was on the right
> > track. I have a new patch attached; can you try it out? Also, the
> > kernel interface has not changed. It should be exactly the same as
> > before.
> >
> > -Corey
>
>
> ______________________________________________________________________


2004-06-16 18:31:06

by Holger Kiehl

[permalink] [raw]
Subject: Re: IPMI hangup in 2.6.6-rc3

On Wed, 16 Jun 2004, Corey Minyard wrote:

> I missed a part of the patch, here is a new one with the include file changes.
>
> -Corey
>
> Corey Minyard wrote:
>
> > Unfortuantely, that fix has some problems, but it was on the right track. I
> > have a new patch attached; can you try it out? Also, the kernel interface
> > has not changed. It should be exactly the same as before.
> >
> > -Corey
>
Yes, with this patch it no longer hangs. But ipmitool still crashes

root@apollo:~# ipmitool -I open sdr list
Segmentation fault

I will try to contact the author of ipmitool.

Holger

2004-06-16 18:56:32

by Holger Kiehl

[permalink] [raw]
Subject: Re: IPMI hangup in 2.6.6-rc3

On Wed, 16 Jun 2004, Holger Kiehl wrote:

> On Wed, 16 Jun 2004, Corey Minyard wrote:
>
> > I missed a part of the patch, here is a new one with the include file changes.
> >
> > -Corey
> >
> > Corey Minyard wrote:
> >
> > > Unfortuantely, that fix has some problems, but it was on the right track. I
> > > have a new patch attached; can you try it out? Also, the kernel interface
> > > has not changed. It should be exactly the same as before.
> > >
> > > -Corey
> >
> Yes, with this patch it no longer hangs. But ipmitool still crashes
>
> root@apollo:~# ipmitool -I open sdr list
> Segmentation fault
>
Just when I send the mail I noticed that each time I call ipmitool I get
an oops:

Jun 16 18:43:40 apollo kernel: <1>Unable to handle kernel paging request at virtual address 00100104
Jun 16 18:43:40 apollo kernel: printing eip:
Jun 16 18:43:40 apollo kernel: c013ba4a
Jun 16 18:43:40 apollo kernel: *pde = 00000000
Jun 16 18:43:40 apollo kernel: Oops: 0000 [#54]
Jun 16 18:43:40 apollo kernel: SMP
Jun 16 18:43:40 apollo kernel: Modules linked in: bonding
Jun 16 18:43:40 apollo kernel: CPU: 1
Jun 16 18:43:40 apollo kernel: EIP: 0060:[<c013ba4a>] Not tainted
Jun 16 18:43:40 apollo kernel: EFLAGS: 00010086 (2.6.7)
Jun 16 18:43:40 apollo kernel: EIP is at kfree+0x37/0x66
Jun 16 18:43:40 apollo kernel: eax: 00000001 ebx: ffffffea ecx: 00100100 edx: c1000000
Jun 16 18:43:40 apollo kernel: esi: 000015ec edi: 00000202 ebp: ffffffff esp: f6dcce84
Jun 16 18:43:40 apollo kernel: ds: 007b es: 007b ss: 0068
Jun 16 18:43:40 apollo kernel: Process ipmitool (pid: 19786, threadinfo=f6dcc000 task=c2269360)
Jun 16 18:43:40 apollo kernel: Stack: ffffffea f6dccf6c f6dcceb0 c01dedca f6dccf78 00000000 00000000 ffffffff
Jun 16 18:43:40 apollo kernel: 00000000 f7b8f7c8 000015ec 00000500 c000000f f61ebd80 00000282 f7c8be2c
Jun 16 18:43:40 apollo kernel: c02e9600 f6e8b008 f6e8b008 00000000 00000004 f6dccf6c bffff2c0 f6dccf6c
Jun 16 18:43:40 apollo kernel: Call Trace:
Jun 16 18:43:40 apollo kernel: [<c01dedca>] handle_send_req+0xd3/0xe7
Jun 16 18:43:40 apollo kernel: [<c01df205>] ipmi_ioctl+0x427/0x474
Jun 16 18:43:40 apollo kernel: [<c015ece7>] sys_select+0x21c/0x482
Jun 16 18:43:40 apollo kernel: [<c015e036>] sys_ioctl+0xef/0x223
Jun 16 18:43:40 apollo kernel: [<c0104cf1>] sysenter_past_esp+0x52/0x71
Jun 16 18:43:40 apollo kernel:
Jun 16 18:43:40 apollo kernel: Code: 8b 1c 81 8b 03 3b 43 04 73 18 89 74 83 10 83 03 01 57 9d 8b


>>EIP; c013ba4a No symbols available <=====

Trace; c01dedca No symbols available
Trace; c01df205 No symbols available
Trace; c015ece7 No symbols available
Trace; c015e036 No symbols available
Trace; c0104cf1 No symbols available

Code; c013ba4a No symbols available
00000000 <_EIP>:
Code; c013ba4a No symbols available <=====
0: 8b 1c 81 mov (%ecx,%eax,4),%ebx <=====
Code; c013ba4d No symbols available
3: 8b 03 mov (%ebx),%eax
Code; c013ba4f No symbols available
5: 3b 43 04 cmp 0x4(%ebx),%eax
Code; c013ba52 No symbols available
8: 73 18 jae 22 <_EIP+0x22>
Code; c013ba54 No symbols available
a: 89 74 83 10 mov %esi,0x10(%ebx,%eax,4)
Code; c013ba58 No symbols available
e: 83 03 01 addl $0x1,(%ebx)
Code; c013ba5b No symbols available
11: 57 push %edi
Code; c013ba5c No symbols available
12: 9d popf
Code; c013ba5d No symbols available
13: 8b 00 mov (%eax),%eax

This must be the reason why ipmitool crashes.

Holger

2004-06-16 20:43:26

by Corey Minyard

[permalink] [raw]
Subject: Re: IPMI hangup in 2.6.6-rc3

I cannot figure out from the traceback what is wrong, and I haven't been
able to reproduce this, even with ipmitool.

What kernel version are you running? Can you verify that the attached
patch is in your code?

--- linux-2.6.7-rc3-full/drivers/char/ipmi/ipmi_devintf.c.orig Wed Jun 9 12:08:23 2004
+++ linux-2.6.7-rc3-full/drivers/char/ipmi/ipmi_devintf.c Wed Jun 9 12:07:09 2004
@@ -199,7 +199,7 @@ static int handle_send_req(ipmi_user_t
goto out;
}

- if (copy_from_user(&msgdata,
+ if (copy_from_user(msgdata,
req->msg.data,
req->msg.data_len))
{


If that doesn't help, can you upgrade to 2.6.7-rc3-mm2 and re-try this
patch?

If that doesn't help, Can you turn on frame pointers and try again?
This will give a cleaner backtrace.

-Corey

Holger Kiehl wrote:

>On Wed, 16 Jun 2004, Holger Kiehl wrote:
>
>
>
>>On Wed, 16 Jun 2004, Corey Minyard wrote:
>>
>>
>>
>>>I missed a part of the patch, here is a new one with the include file changes.
>>>
>>>-Corey
>>>
>>>Corey Minyard wrote:
>>>
>>>
>>>
>>>>Unfortuantely, that fix has some problems, but it was on the right track. I
>>>>have a new patch attached; can you try it out? Also, the kernel interface
>>>>has not changed. It should be exactly the same as before.
>>>>
>>>>-Corey
>>>>
>>>>
>>Yes, with this patch it no longer hangs. But ipmitool still crashes
>>
>> root@apollo:~# ipmitool -I open sdr list
>> Segmentation fault
>>
>>
>>
>Just when I send the mail I noticed that each time I call ipmitool I get
>an oops:
>
>Jun 16 18:43:40 apollo kernel: <1>Unable to handle kernel paging request at virtual address 00100104
>Jun 16 18:43:40 apollo kernel: printing eip:
>Jun 16 18:43:40 apollo kernel: c013ba4a
>Jun 16 18:43:40 apollo kernel: *pde = 00000000
>Jun 16 18:43:40 apollo kernel: Oops: 0000 [#54]
>Jun 16 18:43:40 apollo kernel: SMP
>Jun 16 18:43:40 apollo kernel: Modules linked in: bonding
>Jun 16 18:43:40 apollo kernel: CPU: 1
>Jun 16 18:43:40 apollo kernel: EIP: 0060:[<c013ba4a>] Not tainted
>Jun 16 18:43:40 apollo kernel: EFLAGS: 00010086 (2.6.7)
>Jun 16 18:43:40 apollo kernel: EIP is at kfree+0x37/0x66
>Jun 16 18:43:40 apollo kernel: eax: 00000001 ebx: ffffffea ecx: 00100100 edx: c1000000
>Jun 16 18:43:40 apollo kernel: esi: 000015ec edi: 00000202 ebp: ffffffff esp: f6dcce84
>Jun 16 18:43:40 apollo kernel: ds: 007b es: 007b ss: 0068
>Jun 16 18:43:40 apollo kernel: Process ipmitool (pid: 19786, threadinfo=f6dcc000 task=c2269360)
>Jun 16 18:43:40 apollo kernel: Stack: ffffffea f6dccf6c f6dcceb0 c01dedca f6dccf78 00000000 00000000 ffffffff
>Jun 16 18:43:40 apollo kernel: 00000000 f7b8f7c8 000015ec 00000500 c000000f f61ebd80 00000282 f7c8be2c
>Jun 16 18:43:40 apollo kernel: c02e9600 f6e8b008 f6e8b008 00000000 00000004 f6dccf6c bffff2c0 f6dccf6c
>Jun 16 18:43:40 apollo kernel: Call Trace:
>Jun 16 18:43:40 apollo kernel: [<c01dedca>] handle_send_req+0xd3/0xe7
>Jun 16 18:43:40 apollo kernel: [<c01df205>] ipmi_ioctl+0x427/0x474
>Jun 16 18:43:40 apollo kernel: [<c015ece7>] sys_select+0x21c/0x482
>Jun 16 18:43:40 apollo kernel: [<c015e036>] sys_ioctl+0xef/0x223
>Jun 16 18:43:40 apollo kernel: [<c0104cf1>] sysenter_past_esp+0x52/0x71
>Jun 16 18:43:40 apollo kernel:
>Jun 16 18:43:40 apollo kernel: Code: 8b 1c 81 8b 03 3b 43 04 73 18 89 74 83 10 83 03 01 57 9d 8b
>
>
>
>
>>>EIP; c013ba4a No symbols available <=====
>>>
>>>
>
>Trace; c01dedca No symbols available
>Trace; c01df205 No symbols available
>Trace; c015ece7 No symbols available
>Trace; c015e036 No symbols available
>Trace; c0104cf1 No symbols available
>
>Code; c013ba4a No symbols available
>00000000 <_EIP>:
>Code; c013ba4a No symbols available <=====
> 0: 8b 1c 81 mov (%ecx,%eax,4),%ebx <=====
>Code; c013ba4d No symbols available
> 3: 8b 03 mov (%ebx),%eax
>Code; c013ba4f No symbols available
> 5: 3b 43 04 cmp 0x4(%ebx),%eax
>Code; c013ba52 No symbols available
> 8: 73 18 jae 22 <_EIP+0x22>
>Code; c013ba54 No symbols available
> a: 89 74 83 10 mov %esi,0x10(%ebx,%eax,4)
>Code; c013ba58 No symbols available
> e: 83 03 01 addl $0x1,(%ebx)
>Code; c013ba5b No symbols available
> 11: 57 push %edi
>Code; c013ba5c No symbols available
> 12: 9d popf
>Code; c013ba5d No symbols available
> 13: 8b 00 mov (%eax),%eax
>
>This must be the reason why ipmitool crashes.
>
>Holger
>
>


2004-06-16 20:58:34

by Alex Williamson

[permalink] [raw]
Subject: Re: IPMI hangup in 2.6.6-rc3

On Wed, 2004-06-16 at 14:42, Corey Minyard wrote:
> I cannot figure out from the traceback what is wrong, and I haven't been
> able to reproduce this, even with ipmitool.
>
> What kernel version are you running? Can you verify that the attached
> patch is in your code?
>

This fixed it for me. I was running on stock 2.6.7, which does not
include the patch below. My test program now works. Thanks,

Alex

> --- linux-2.6.7-rc3-full/drivers/char/ipmi/ipmi_devintf.c.orig Wed Jun 9 12:08:23 2004
> +++ linux-2.6.7-rc3-full/drivers/char/ipmi/ipmi_devintf.c Wed Jun 9 12:07:09 2004
> @@ -199,7 +199,7 @@ static int handle_send_req(ipmi_user_t
> goto out;
> }
>
> - if (copy_from_user(&msgdata,
> + if (copy_from_user(msgdata,
> req->msg.data,
> req->msg.data_len))
> {
>
>
> If that doesn't help, can you upgrade to 2.6.7-rc3-mm2 and re-try this
> patch?
>
> If that doesn't help, Can you turn on frame pointers and try again?
> This will give a cleaner backtrace.
>
> -Corey


2004-06-17 07:04:33

by Holger Kiehl

[permalink] [raw]
Subject: Re: IPMI hangup in 2.6.6-rc3

On Wed, 16 Jun 2004, Corey Minyard wrote:

> I cannot figure out from the traceback what is wrong, and I haven't been able
> to reproduce this, even with ipmitool.
>
> What kernel version are you running? Can you verify that the attached patch
> is in your code?
>
> --- linux-2.6.7-rc3-full/drivers/char/ipmi/ipmi_devintf.c.orig Wed Jun 9
> 12:08:23 2004
> +++ linux-2.6.7-rc3-full/drivers/char/ipmi/ipmi_devintf.c Wed Jun 9
> 12:07:09 2004
> @@ -199,7 +199,7 @@ static int handle_send_req(ipmi_user_t goto out;
> }
>
> - if (copy_from_user(&msgdata,
> + if (copy_from_user(msgdata,
> req->msg.data,
> req->msg.data_len))
> {
>
Hurray! Now it works!!! Many thanks for the quick patch! As Alex Williamson
already mentioned this was missing in stock 2.6.7, that is what I am using.

Here the results:

ipmitool -I open sdr list
Baseboard 1.2V | 1.20 Volts | ok
Baseboard 1.25V | 1.25 Volts | ok
Baseboard 1.8V | 1.79 Volts | ok
Baseboard 1.8VSB | 1.80 Volts | ok
Baseboard 2.5V | 2.49 Volts | ok
Baseboard 3.3V | 3.37 Volts | ok
Baseboard 3.3AUX | 3.33 Volts | ok
Baseboard 5.0V | 5.07 Volts | ok
Baseboard 5VSB | 4.97 Volts | ok
Baseboard 12V | 12.03 Volts | ok
Baseboard 12VRM | 12.01 Volts | ok
Baseboard -12V | -11.91 Volts | ok
Baseboard VBAT | 3.17 Volts | ok
Baseboard Temp | 28 degrees C | ok
Front Panel Temp | 18 degrees C | ok
Basebrd FanBoost | 28 degrees C | ok
FP Amb FanBoost | 18 degrees C | ok
Sys Fan 1 | 3726 RPM | ok
Sys Fan 2 | 3933 RPM | ok
Sys Fan 3 | 3174 RPM | ok
Sys Fan 4 | 3795 RPM | ok
Sys Fan 5 | 3036 RPM | ok
SCSI A Term Pwr | 4.78 Volts | ok
SCSI B Term Pwr | 4.78 Volts | ok
Power Cage Fan 1 | 4260 RPM | ok
Power Cage Fan 2 | 4320 RPM | ok
Power Cage Temp | 25 degrees C | ok
Processor 1 Temp | 26 degrees C | ok
Processor 2 Temp | 24 degrees C | ok
Proc1 FanBoost | 26 degrees C | ok
Proc2 FanBoost | 24 degrees C | ok
Processor Vccp | 1.50 Volts | ok
HSBP A Temp | 0 degrees C | ok
HSBP B Temp | 0 degrees C | ok
Pwr Unit Status | 0x00 | ok
Pwr Unit Redund | 0x01 | ok
BMC Watchdog | 0x00 | ok
Scrty Violation | 0x00 | ok
Physical Scrty | 0x00 | ok
POST Error | 0x00 | ok
Critical Int | 0x00 | ok
Memory | 0x00 | ok
Logging Disabled | 0x00 | ok
Power Supply 1 | 0x01 | ok
Power Supply 2 | 0x01 | ok
Power Supply 3 | 0x01 | ok
Proc Missing | 0x00 | ok
ACPI State | 0x01 | ok
System Event | 0x00 | ok
Button | 0x00 | ok
SMI Timeout | 0x00 | ok
Sensor Failure | 0x00 | ok
NMI State | 0x00 | ok
SMI State | 0x00 | ok
FSB Mismatch | 0x00 | ok
Processor 1 Stat | Present | ok
Processor 2 Stat | Present | ok
CPU Therm Ctrl | 0x01 | ok
Fan Redundancy | 0x01 | ok
Fan1 Presence | Installed | ok
Fan2 Presence | Installed | ok
Fan3 Presence | Installed | ok
Fan4 Presence | Installed | ok
Fan5 Presence | Installed | ok
DIMM 1 | Installed | ok
DIMM 2 | Installed | ok
DIMM 3 | Not Installed | ok
DIMM 4 | Not Installed | ok
DIMM 5 | Not Installed | ok
DIMM 6 | Not Installed | ok
HSBP A Drv Stat | 0x01 | ok
HSBP A Drv Pres | 0x00 | ok
HSBP B Drv Stat | 0x01 | ok
HSBP B Drv Pres | 0x00 | ok
Power Cage FRU | Phy FRU @01h 15.1 | ok
Pwr Supply 1 FRU | Phy FRU @02h 0a.1 | ok
Pwr Supply 2 FRU | Phy FRU @03h 0a.2 | ok
Pwr Supply 3 FRU | Phy FRU @04h 0a.3 | ok
DIMM 1 SPD | Phy FRU @05h 20.1 | ok
DIMM 2 SPD | Phy FRU @06h 20.2 | ok
DIMM 3 SPD | Phy FRU @07h 20.3 | ok
DIMM 4 SPD | Phy FRU @08h 20.4 | ok
DIMM 5 SPD | Phy FRU @09h 20.4 | ok
DIMM 6 SPD | Phy FRU @0Ah 20.4 | ok
Basbrd Mgmt Ctlr | Dynamic MC @ 10h | ok
Chs Bridge Ctlr | Static MC @ 14h | ok
HSBP A | Dynamic MC @ 60h | ok
HSBP B | Dynamic MC @ 61h | ok

Again many thanks for the quick help!

Regards,
Holger