2009-03-19 16:28:04

by Martin Wilck

[permalink] [raw]
Subject: [PATCH] limit CPU time spent in kipmid

Hello Corey, hi everyone,

here is a patch that limits the CPU time spent in kipmid. I know that it
was previously stated that current kipmid "works as designed" (e.g.
http://lists.us.dell.com/pipermail/linux-poweredge/2008-October/037636.html),
yet users are irritated by the high amount of CPU time kipmid may use up
on current servers with many sensors, even though it is "nice" CPU time.
Moreover, kipmid busy-waiting for the KCS interface to become ready also
prevents CPUs from sleeping.

The attached patch was developed and tested on an enterprise
distribution kernel where it caused the CPU load of kipmid to drop to
essentially 0 while still delivering reliable IPMI communication.

I am looking forward for comments.
Martin

--
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel: ++49 5251 525 2796
Fax: ++49 5251 525 2820
Email: mailto:[email protected]
Internet: http://www.fujitsu-siemens.com
Company Details: http://www.fujitsu-siemens.com/imprint.html


Attachments:
ipmi_si_max_busy-2.6.29-rc8.diff (3.41 kB)

2009-03-19 21:31:43

by Corey Minyard

[permalink] [raw]
Subject: Re: [PATCH] limit CPU time spent in kipmid

Martin, thanks for the patch. I had actually implemented something like
this before, and it didn't really help very much with the hardware I
had, so I had abandoned this method. There's even a comment about it in
si_sm_result smi_event_handler(). Maybe making it tunable is better, I
don't know. But I'm afraid this will kill performance on a lot of systems.

Did you test throughput on this? The main problem people had without
kipmid was that things like firmware upgrades took a *long* time; adding
kipmid improved speeds by an order of magnitude or more.

It's my opinion that if you want this interface to work efficiently with
good performance, you should design the hardware to be used efficiently
by using interrupts (which are supported and disable kipmid). With the
way the hardware is defined, you cannot have both good performance and
low CPU usage without interrupts.

It may be possible to add an option to choose between performance and
efficiency, but it will have to default to performance.

-corey

Martin Wilck wrote:
> Hello Corey, hi everyone,
>
> here is a patch that limits the CPU time spent in kipmid. I know that
> it was previously stated that current kipmid "works as designed" (e.g.
> http://lists.us.dell.com/pipermail/linux-poweredge/2008-October/037636.html),
> yet users are irritated by the high amount of CPU time kipmid may use
> up on current servers with many sensors, even though it is "nice" CPU
> time. Moreover, kipmid busy-waiting for the KCS interface to become
> ready also prevents CPUs from sleeping.
>
> The attached patch was developed and tested on an enterprise
> distribution kernel where it caused the CPU load of kipmid to drop to
> essentially 0 while still delivering reliable IPMI communication.
>
> I am looking forward for comments.
> Martin
>

2009-03-19 22:41:37

by Bela Lubkin

[permalink] [raw]
Subject: RE: [Openipmi-developer] [PATCH] limit CPU time spent in kipmid

I'll give this a try, we've certainly had plenty of Fun with the
cost of running kipmid...

Why O Why are essentially all modern "managed" machines designed
with KCS? Even when you bless it with an identifiable interrupt
(as HP have done), it still cannot perform anywhere near as well
as BT, with its whole-packet DMA transfers.

I've been assuming that some silicon designer put out a KCS chip
early in the game and ramped production up to the point where it
was practically free, while other protocols were still tacking a
few cents onto the cost of every box. Thus, millions of hobbled
"server" computers. Grumble snort.

>Bela<

> -----Original Message-----
> From: Martin Wilck [mailto:[email protected]]
> Sent: Thursday, March 19, 2009 9:28 AM
> To: [email protected]; Corey Minyard;
> [email protected]
> Subject: [Openipmi-developer] [PATCH] limit CPU time spent in kipmid
>
> Hello Corey, hi everyone,
>
> here is a patch that limits the CPU time spent in kipmid. I
> know that it
> was previously stated that current kipmid "works as designed" (e.g.
> http://lists.us.dell.com/pipermail/linux-poweredge/2008-Octobe
r/037636.html),
> yet users are irritated by the high amount of CPU time kipmid
> may use up
> on current servers with many sensors, even though it is
> "nice" CPU time.
> Moreover, kipmid busy-waiting for the KCS interface to become
> ready also
> prevents CPUs from sleeping.
>
> The attached patch was developed and tested on an enterprise
> distribution kernel where it caused the CPU load of kipmid to drop to
> essentially 0 while still delivering reliable IPMI communication.
>
> I am looking forward for comments.
> Martin
>
> --
> Martin Wilck
> PRIMERGY System Software Engineer
> FSC IP ESP DEV 6
>
> Fujitsu Siemens Computers GmbH
> Heinz-Nixdorf-Ring 1
> 33106 Paderborn
> Germany
>
> Tel: ++49 5251 525 2796
> Fax: ++49 5251 525 2820
> Email: mailto:[email protected]
> Internet: http://www.fujitsu-siemens.com
> Company Details: http://www.fujitsu-siemens.com/imprint.html
> -

2009-03-20 00:16:20

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] limit CPU time spent in kipmid

On Thu, Mar 19, 2009 at 04:31:00PM -0500, Corey Minyard wrote:
> Martin, thanks for the patch. I had actually implemented something like
> this before, and it didn't really help very much with the hardware I had,
> so I had abandoned this method. There's even a comment about it in
> si_sm_result smi_event_handler(). Maybe making it tunable is better, I
> don't know. But I'm afraid this will kill performance on a lot of systems.
>
> Did you test throughput on this? The main problem people had without
> kipmid was that things like firmware upgrades took a *long* time; adding
> kipmid improved speeds by an order of magnitude or more.
>
> It's my opinion that if you want this interface to work efficiently with
> good performance, you should design the hardware to be used efficiently by
> using interrupts (which are supported and disable kipmid). With the way
> the hardware is defined, you cannot have both good performance and low CPU
> usage without interrupts.
>
> It may be possible to add an option to choose between performance and
> efficiency, but it will have to default to performance.

I would think that very infrequent things, like firmware upgrades, would
not take priority over a long-term "keep the cpu busy" type system, like
what we currently have.

Is there any way to switch between the different modes dynamically?

I like the idea of this change, as I have got a lot of complaints lately
about kipmi taking way too much cpu time up on idle systems, messing up
some user's process accounting rules in their management systems. But I
worry about making it a module parameter, why can't this be a
"self-tunable" thing?

thanks,

greg k-h

2009-03-20 15:31:17

by Corey Minyard

[permalink] [raw]
Subject: Re: [PATCH] limit CPU time spent in kipmid

Greg KH wrote:
> On Thu, Mar 19, 2009 at 04:31:00PM -0500, Corey Minyard wrote:
>
>> Martin, thanks for the patch. I had actually implemented something like
>> this before, and it didn't really help very much with the hardware I had,
>> so I had abandoned this method. There's even a comment about it in
>> si_sm_result smi_event_handler(). Maybe making it tunable is better, I
>> don't know. But I'm afraid this will kill performance on a lot of systems.
>>
>> Did you test throughput on this? The main problem people had without
>> kipmid was that things like firmware upgrades took a *long* time; adding
>> kipmid improved speeds by an order of magnitude or more.
>>
>> It's my opinion that if you want this interface to work efficiently with
>> good performance, you should design the hardware to be used efficiently by
>> using interrupts (which are supported and disable kipmid). With the way
>> the hardware is defined, you cannot have both good performance and low CPU
>> usage without interrupts.
>>
>> It may be possible to add an option to choose between performance and
>> efficiency, but it will have to default to performance.
>>
>
> I would think that very infrequent things, like firmware upgrades, would
> not take priority over a long-term "keep the cpu busy" type system, like
> what we currently have.
>
> Is there any way to switch between the different modes dynamically?
>
> I like the idea of this change, as I have got a lot of complaints lately
> about kipmi taking way too much cpu time up on idle systems, messing up
> some user's process accounting rules in their management systems. But I
> worry about making it a module parameter, why can't this be a
> "self-tunable" thing?
>
It's actually already sort of self-tuning. kipmid sleeps unless there
is IPMI activity. It only spins if it is expecting something from the
controller.

I've been thinking about this a little more. Assuming that the
self-tuning is working (and it appears to be working fine on my
systems), that means that something is causing the IPMI driver to
constantly talk to the management controller. I can think of three things:

1. The user is constantly sending messages to management controller.
2. There is something wrong with the hardware, like the ATTN bit is
stuck high, causing the driver to constantly poll the management
controller.
3. The driver either has a bug or needs some more work to account for
something the hardware needs it to do to clear the ATTN bit.

If it's #1 above, then I don't know if there is anything we can do about
it. The patch Martin sent will simply slow things down.

#2 and #3 will require someone to do some debugging. If the ATTN bit is
stuck, you should see the "attentions" field in /proc/ipmi/0/si_stats
constantly going up. Actually, the contents of that file would be
helpful, along with /proc/ipmi/0/stats.

-corey

2009-03-20 17:51:47

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] limit CPU time spent in kipmid

On Fri, Mar 20, 2009 at 10:30:45AM -0500, Corey Minyard wrote:
> Greg KH wrote:
>> On Thu, Mar 19, 2009 at 04:31:00PM -0500, Corey Minyard wrote:
>>
>>> Martin, thanks for the patch. I had actually implemented something like
>>> this before, and it didn't really help very much with the hardware I had,
>>> so I had abandoned this method. There's even a comment about it in
>>> si_sm_result smi_event_handler(). Maybe making it tunable is better, I
>>> don't know. But I'm afraid this will kill performance on a lot of
>>> systems.
>>>
>>> Did you test throughput on this? The main problem people had without
>>> kipmid was that things like firmware upgrades took a *long* time; adding
>>> kipmid improved speeds by an order of magnitude or more.
>>>
>>> It's my opinion that if you want this interface to work efficiently with
>>> good performance, you should design the hardware to be used efficiently
>>> by using interrupts (which are supported and disable kipmid). With the
>>> way the hardware is defined, you cannot have both good performance and
>>> low CPU usage without interrupts.
>>>
>>> It may be possible to add an option to choose between performance and
>>> efficiency, but it will have to default to performance.
>>>
>>
>> I would think that very infrequent things, like firmware upgrades, would
>> not take priority over a long-term "keep the cpu busy" type system, like
>> what we currently have.
>>
>> Is there any way to switch between the different modes dynamically?
>> I like the idea of this change, as I have got a lot of complaints lately
>> about kipmi taking way too much cpu time up on idle systems, messing up
>> some user's process accounting rules in their management systems. But I
>> worry about making it a module parameter, why can't this be a
>> "self-tunable" thing?
>>
> It's actually already sort of self-tuning. kipmid sleeps unless there is
> IPMI activity. It only spins if it is expecting something from the
> controller.
>
> I've been thinking about this a little more. Assuming that the self-tuning
> is working (and it appears to be working fine on my systems), that means
> that something is causing the IPMI driver to constantly talk to the
> management controller. I can think of three things:
>
> 1. The user is constantly sending messages to management controller.
> 2. There is something wrong with the hardware, like the ATTN bit is
> stuck high, causing the driver to constantly poll the management
> controller.
> 3. The driver either has a bug or needs some more work to account for
> something the hardware needs it to do to clear the ATTN bit.
>
> If it's #1 above, then I don't know if there is anything we can do about
> it. The patch Martin sent will simply slow things down.

Does the "normal" ipmi userspace tools do #1?

For #2, this might make sense, as I have had reports of some hardware
working just fine, while others have the load issue. Both were
different hardware manufacturers.

> #2 and #3 will require someone to do some debugging. If the ATTN bit is
> stuck, you should see the "attentions" field in /proc/ipmi/0/si_stats
> constantly going up. Actually, the contents of that file would be helpful,
> along with /proc/ipmi/0/stats.

Martin has one of these machines, right? If not, I can dig and try to
get some information as well.

thanks,

greg k-h

2009-03-20 19:28:48

by Corey Minyard

[permalink] [raw]
Subject: Re: [PATCH] limit CPU time spent in kipmid

Greg KH wrote:
> On Fri, Mar 20, 2009 at 10:30:45AM -0500, Corey Minyard wrote:
>
>> Greg KH wrote:
>>
>>> On Thu, Mar 19, 2009 at 04:31:00PM -0500, Corey Minyard wrote:
>>>
>>>
>>>> Martin, thanks for the patch. I had actually implemented something like
>>>> this before, and it didn't really help very much with the hardware I had,
>>>> so I had abandoned this method. There's even a comment about it in
>>>> si_sm_result smi_event_handler(). Maybe making it tunable is better, I
>>>> don't know. But I'm afraid this will kill performance on a lot of
>>>> systems.
>>>>
>>>> Did you test throughput on this? The main problem people had without
>>>> kipmid was that things like firmware upgrades took a *long* time; adding
>>>> kipmid improved speeds by an order of magnitude or more.
>>>>
>>>> It's my opinion that if you want this interface to work efficiently with
>>>> good performance, you should design the hardware to be used efficiently
>>>> by using interrupts (which are supported and disable kipmid). With the
>>>> way the hardware is defined, you cannot have both good performance and
>>>> low CPU usage without interrupts.
>>>>
>>>> It may be possible to add an option to choose between performance and
>>>> efficiency, but it will have to default to performance.
>>>>
>>>>
>>> I would think that very infrequent things, like firmware upgrades, would
>>> not take priority over a long-term "keep the cpu busy" type system, like
>>> what we currently have.
>>>
>>> Is there any way to switch between the different modes dynamically?
>>> I like the idea of this change, as I have got a lot of complaints lately
>>> about kipmi taking way too much cpu time up on idle systems, messing up
>>> some user's process accounting rules in their management systems. But I
>>> worry about making it a module parameter, why can't this be a
>>> "self-tunable" thing?
>>>
>>>
>> It's actually already sort of self-tuning. kipmid sleeps unless there is
>> IPMI activity. It only spins if it is expecting something from the
>> controller.
>>
>> I've been thinking about this a little more. Assuming that the self-tuning
>> is working (and it appears to be working fine on my systems), that means
>> that something is causing the IPMI driver to constantly talk to the
>> management controller. I can think of three things:
>>
>> 1. The user is constantly sending messages to management controller.
>> 2. There is something wrong with the hardware, like the ATTN bit is
>> stuck high, causing the driver to constantly poll the management
>> controller.
>> 3. The driver either has a bug or needs some more work to account for
>> something the hardware needs it to do to clear the ATTN bit.
>>
>> If it's #1 above, then I don't know if there is anything we can do about
>> it. The patch Martin sent will simply slow things down.
>>
>
> Does the "normal" ipmi userspace tools do #1?
>
That depends how they are used and configured. If you make them
constantly poll for events or grab sensor values, then they will just
use CPU. By default they shouldn't do anything.

> For #2, this might make sense, as I have had reports of some hardware
> working just fine, while others have the load issue. Both were
> different hardware manufacturers.
>
>
>> #2 and #3 will require someone to do some debugging. If the ATTN bit is
>> stuck, you should see the "attentions" field in /proc/ipmi/0/si_stats
>> constantly going up. Actually, the contents of that file would be helpful,
>> along with /proc/ipmi/0/stats.
>>
>
> Martin has one of these machines, right? If not, I can dig and try to
> get some information as well.
>
I'll wait for Martin, hopefully he can get the info.

Thanks,

-corey

2009-03-23 13:17:36

by Martin Wilck

[permalink] [raw]
Subject: Re: [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN)

Hi Corey, hi Greg, hi all,

first of all I need to apologize, because _the first patch I sent was
broken_. The attached patch should work better.

I did some benchmarking with this patch. In short:

1. The kipmid_max_busy parameter is a tunable that behaves reasonably.
2. Low values of this parameter use up almost as little CPU as the
"force_kipmid=0" case, but perform better.
3. It is important to distinguish cases with and without CPU load.
4. To offer this tunable to make a balance between max. CPU load of
kipmid and performance appears to be worthwhile for many users.

Now the details ... The following tables are in CSV format. The
benchmark used was a script using ipmitool to read all SDRs and all SEL
events from the BMC 10x in a loop. This takes 22s with the default
driver (using nearly 100% CPU), and almost 30x longer without kipmid
(force_kipmid=off). The "busy cycles" in the table were calculated from
oprofile CPU_CLK_UNHALTED counts; the "kipmid CPU%" are output from "ps
-eo pcpu". The tested kernel was an Enterprise Linux kernel with HZ=1000.

"Results without load"
"elapsed(s)" "elapsed (rel.)" "kipmid CPU% (ps)"
"CPU busy cycles (%)"
"default " 22 1 32 103.15
"force_kipmid=0" 621 28.23 0 12.7
"kipmid_max_busy=5000" 21 0.95 34 100.16
"kipmid_max_busy=2000" 22 1 34 94.04
"kipmid_max_busy=1000" 27 1.23 25 26.89
"kipmid_max_busy=500" 24 1.09 0 69.44
"kipmid_max_busy=200" 42 1.91 0 46.72
"kipmid_max_busy=100" 68 3.09 0 17.87
"kipmid_max_busy=50" 101 4.59 0 22.91
"kipmid_max_busy=20" 163 7.41 0 19.98
"kipmid_max_busy=10" 213 9.68 0 13.19

As expected, kipmid_max_busy > 1000 has almost no effect (with HZ=1000).
kipmid_max_busy=500 saves 30% busy time losing only 10% performance.
With kipmid_max_busy=10, the performance result is 3x better than just
switching kipmid totally off, with almost the same amount of CPU busy
cycles. Note that the %CPU displayed by "ps", "top" etc drops to 0 for
kipmid_max_busy < HZ. This effect is an artefact caused by the CPU time
being measured only at timer interrupts. But it will also make user
complains about kipmid drop to 0 - think about it ;-)

I took another run with a system under 100% CPU load by other processes.
Now there is hardly any performance difference any more. As expected,
the kipmid runs are all only slightly faster than the interrupt-driven
run which isn't affected by the CPU load. In this case, recording the
CPU load from kipmid makes no sense (it is ~0 anyway).

"elapsed(s)" "elapsed (rel.)" "kipmid CPU% (ps)"
"Results with 100% CPU load"
"default " 500 22.73
"force_kipmid=0" 620 28.18
"kipmid_max_busy=1000" 460 20.91
"kipmid_max_busy=500" 500 22.73
"kipmid_max_busy=200" 530 24.09
"kipmid_max_busy=100" 570 25.91


As I said initially, these are results taken on a single system. On this
system the KCS response times (from start to end of the
SI_SM_CALL_WITH_DELAY loop) are between 200 and 2000 us:

us %wait finished until
200 0%
400 21%
600 39%
800 44%
1000 55%
1200 89%
1400 94%
1600 97%

This may well be different on other systems, depending on the BMC,
number of sensors, etc. Therefore I think this should remain a tunable,
because finding an optimal value for arbitrary systems will be hard. Of
course, the impi driver could implement some sort of self-tuning logic,
but that would be overengineered to my taste. kipmid_max_busy would give
HW vendors a chance to determine an optimal value for a given system and
give a respective recommendation to users.

Best regards
Martin

--
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel: ++49 5251 525 2796
Fax: ++49 5251 525 2820
Email: mailto:[email protected]
Internet: http://www.fujitsu-siemens.com
Company Details: http://www.fujitsu-siemens.com/imprint.html


Attachments:
ipmi_si_max_busy-fixed-2.6.29-rc8.diff (3.22 kB)

2009-03-23 13:25:52

by Martin Wilck

[permalink] [raw]
Subject: Re: [PATCH] limit CPU time spent in kipmid

Corey Minyard wrote:

>>> It's actually already sort of self-tuning. kipmid sleeps unless there is
>>> IPMI activity. It only spins if it is expecting something from the
>>> controller.

The self-tuning is fine (as long as there is no CPU load, which may slow
down stuff a lot, see my other posting). But on systems with many
sensors it will lead to considerable CPU time shown in "top" and other
tools for kipmid. And this confuses users. Users think that this is the
hardware vendor's fault - that's why I am sending this patch (if you so
wish, it is indeed the vendor's fault to use the outdated KCS interface
- but that's a different discussion, please let's keep it separate).

>>> I've been thinking about this a little more. Assuming that the self-tuning
>>> is working (and it appears to be working fine on my systems), that means
>>> that something is causing the IPMI driver to constantly talk to the
>>> management controller. I can think of three things:
>>>
>>> 1. The user is constantly sending messages to management controller.

This is what I did in my benchmark, of course. But also in real systems,
there are now many sensors (think dozens of DIMMs with several sensors
on each DIMM) and many events, causing constant IPMI traffic.

>>> 2. There is something wrong with the hardware, like the ATTN bit is
>>> stuck high, causing the driver to constantly poll the management
>>> controller.
>>> 3. The driver either has a bug or needs some more work to account for
>>> something the hardware needs it to do to clear the ATTN bit.

I think both 2.) and 3.) is not the case here.

>>> If it's #1 above, then I don't know if there is anything we can do about
>>> it. The patch Martin sent will simply slow things down.

True, but only a little bit. Please look at the numbers in my other posting.

Best regards
Martin

--
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel: ++49 5251 525 2796
Fax: ++49 5251 525 2820
Email: mailto:[email protected]
Internet: http://www.fujitsu-siemens.com
Company Details: http://www.fujitsu-siemens.com/imprint.html

2009-03-23 15:59:28

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN)

On Mon, Mar 23, 2009 at 02:17:20PM +0100, Martin Wilck wrote:
> +module_param(kipmid_max_busy, uint, 0);

Can you make this modifiable at runtime by setting the mode to 0644?
That way people would not have to unload and then reload the module in
order to be able to fix their machines, as they usually don't know they
have a problem until they run their system for a while.

thanks,

greg k-h

2009-03-23 16:20:37

by Martin Wilck

[permalink] [raw]
Subject: Re: [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN)

Greg KH wrote:

> Can you make this modifiable at runtime by setting the mode to 0644?
> That way people would not have to unload and then reload the module in
> order to be able to fix their machines, as they usually don't know they
> have a problem until they run their system for a while.

Good point, but let's first sort out if there's agreement about the
general idea, and do the fine-tuning later. I may actually make sense to
make this an array parameter so that different IPMI interfaces can have
different settings (like force_kipmid).

Martin

--
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel: ++49 5251 525 2796
Fax: ++49 5251 525 2820
Email: mailto:[email protected]
Internet: http://www.fujitsu-siemens.com
Company Details: http://www.fujitsu-siemens.com/imprint.html

2009-03-23 20:39:47

by Corey Minyard

[permalink] [raw]
Subject: Re: [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN)

I've done some experimenting with your patch and some more thinking.
(BTW, setting the permissions of kipmid_max_busy to 0644 as Greg
suggested makes changing the value for testing a lot easier :).

Results are not so good with the system I was working with. I have a
tool that measures latency of individual messages, averaging over a
number of messages. It's part of the openipmi library, if you want to
grab it.

For a message that requires almost no CPU from the management controller
(a Get MC ID command), it takes around 5ms per message round-trip and
uses about 10% of a CPU. Setting the the max busy to 500 causes it to
take about 23ms per message round trip and the CPU usage is not measurable.

Fetching SDRs (sensor data repository items), which will require more
work on the management controller, is a bit different. Each message
takes 22ms with max_busy disabled using about 50% of the CPU. Setting
it to 500 changes the value to 44ms per message, no measurable CPU.
Still not great, but not 5 times worse, either. (The reason you are
seeing 100% CPU and I'm not is because ipmitool issues more than one
fetch to the driver at a time so the next command is ready to go as soon
as the driver finishes one, so the driver will not do a 1-tick sleep
between messages).

I'm guessing that the difference is that there is a long delay between
receiving the command and issuing the result in the SDR fetch command.
With your patch, this puts the driver to sleep for a tick when this
happens. The individual byte transfers are short, so the tick-long
sleep doesn't happen in that case.

I'm also pretty sure I know what is going on in general. You are using
ipmitool to fetch sensors with a short poll time and your management
controller does not implement a useful feature.

The reason that some systems doing this use a lot of CPU and other
systems do not has do with the management controller design. Some
management controllers implement a UUID and a timestamp on the SDR data.
ipmitool will locally cache the data and if the UUID and timestamp are
the same it will not fetch the SDRs. Just fetching the sensor values
will be very efficient, much like the Get MC ID command. If this is
not implemented in the management controller, ipmitool will fetch all
the SDRs every time you run it, which is terribly inefficient. I'm
guessing that's your situation.

I'm ok with the patch with the feature disabled by default. I'd prefer
for it to be disabled by default because I prefer to reward vendors that
make our lives better and punish vendors that make our lives worse :).
You should run it through checkpatch; there were one or two coding style
violations.

I also have a few suggestions for solving this problem outside of this
patch:

1. Get your vendor to implement UUIDs and timestamps. This will make
things run more than an order of magnitude faster and more
efficient. Even better than interrupts.
2. If that's not possible, don't use ipmitool. Instead, write a
program with the openipmi library that stays up all the time (so
the SDR fetch is only done once at startup) and dumps the sensors
periodically.
3. If that's not feasible, poll less often and use events to catch
critical changes. Of course, this being IPMI, some vendors don't
properly implement events on their sensors, so that may not work.

-corey


Martin Wilck wrote:
> Hi Corey, hi Greg, hi all,
>
> first of all I need to apologize, because _the first patch I sent was
> broken_. The attached patch should work better.
>
> I did some benchmarking with this patch. In short:
>
> 1. The kipmid_max_busy parameter is a tunable that behaves reasonably.
> 2. Low values of this parameter use up almost as little CPU as the
> "force_kipmid=0" case, but perform better.
> 3. It is important to distinguish cases with and without CPU load.
> 4. To offer this tunable to make a balance between max. CPU load of
> kipmid and performance appears to be worthwhile for many users.
>
> Now the details ... The following tables are in CSV format. The
> benchmark used was a script using ipmitool to read all SDRs and all
> SEL events from the BMC 10x in a loop. This takes 22s with the default
> driver (using nearly 100% CPU), and almost 30x longer without kipmid
> (force_kipmid=off). The "busy cycles" in the table were calculated
> from oprofile CPU_CLK_UNHALTED counts; the "kipmid CPU%" are output
> from "ps -eo pcpu". The tested kernel was an Enterprise Linux kernel
> with HZ=1000.
>
> "Results without load"
> "elapsed(s)" "elapsed (rel.)" "kipmid CPU% (ps)"
> "CPU busy cycles (%)"
> "default " 22 1 32 103.15
> "force_kipmid=0" 621 28.23 0 12.7
> "kipmid_max_busy=5000" 21 0.95 34 100.16
> "kipmid_max_busy=2000" 22 1 34 94.04
> "kipmid_max_busy=1000" 27 1.23 25 26.89
> "kipmid_max_busy=500" 24 1.09 0 69.44
> "kipmid_max_busy=200" 42 1.91 0 46.72
> "kipmid_max_busy=100" 68 3.09 0 17.87
> "kipmid_max_busy=50" 101 4.59 0 22.91
> "kipmid_max_busy=20" 163 7.41 0 19.98
> "kipmid_max_busy=10" 213 9.68 0 13.19
>
> As expected, kipmid_max_busy > 1000 has almost no effect (with
> HZ=1000). kipmid_max_busy=500 saves 30% busy time losing only 10%
> performance. With kipmid_max_busy=10, the performance result is 3x
> better than just switching kipmid totally off, with almost the same
> amount of CPU busy cycles. Note that the %CPU displayed by "ps", "top"
> etc drops to 0 for kipmid_max_busy < HZ. This effect is an artefact
> caused by the CPU time being measured only at timer interrupts. But it
> will also make user complains about kipmid drop to 0 - think about it ;-)
>
> I took another run with a system under 100% CPU load by other
> processes. Now there is hardly any performance difference any more. As
> expected,
> the kipmid runs are all only slightly faster than the interrupt-driven
> run which isn't affected by the CPU load. In this case, recording the
> CPU load from kipmid makes no sense (it is ~0 anyway).
>
> "elapsed(s)" "elapsed (rel.)" "kipmid CPU% (ps)"
> "Results with 100% CPU load"
> "default " 500 22.73
> "force_kipmid=0" 620 28.18
> "kipmid_max_busy=1000" 460 20.91
> "kipmid_max_busy=500" 500 22.73
> "kipmid_max_busy=200" 530 24.09
> "kipmid_max_busy=100" 570 25.91
>
>
> As I said initially, these are results taken on a single system. On
> this system the KCS response times (from start to end of the
> SI_SM_CALL_WITH_DELAY loop) are between 200 and 2000 us:
>
> us %wait finished until
> 200 0%
> 400 21%
> 600 39%
> 800 44%
> 1000 55%
> 1200 89%
> 1400 94%
> 1600 97%
>
> This may well be different on other systems, depending on the BMC,
> number of sensors, etc. Therefore I think this should remain a
> tunable, because finding an optimal value for arbitrary systems will
> be hard. Of course, the impi driver could implement some sort of
> self-tuning logic, but that would be overengineered to my taste.
> kipmid_max_busy would give HW vendors a chance to determine an optimal
> value for a given system and give a respective recommendation to users.
>
> Best regards
> Martin
>

2009-03-24 09:22:51

by Martin Wilck

[permalink] [raw]
Subject: Re: [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN)

Hi Corey,

thanks a lot for your comments.

> I've done some experimenting with your patch and some more thinking.
> (BTW, setting the permissions of kipmid_max_busy to 0644 as Greg
> suggested makes changing the value for testing a lot easier :).
>
> Results are not so good with the system I was working with.

I expected things to differ between various systems.

> I have a
> tool that measures latency of individual messages, averaging over a
> number of messages. It's part of the openipmi library, if you want to
> grab it.

I will check it out.

> I'm also pretty sure I know what is going on in general. You are using
> ipmitool to fetch sensors with a short poll time and your management
> controller does not implement a useful feature.

Some of the SDRs were useful temperature sensors. The SEL data were also
useful data. I wanted to have a simple benchmark that produces a similar
load to a real situation. I am grateful for a suggestion of a better tool.

> The reason that some systems doing this use a lot of CPU and other
> systems do not has do with the management controller design. Some
> management controllers implement a UUID and a timestamp on the SDR data.
> ipmitool will locally cache the data and if the UUID and timestamp are
> the same it will not fetch the SDRs. Just fetching the sensor values
> will be very efficient, much like the Get MC ID command. If this is
> not implemented in the management controller, ipmitool will fetch all
> the SDRs every time you run it, which is terribly inefficient. I'm
> guessing that's your situation.

In the benchmark case, this is what I intended to do (I wanted to
measure the KCS driver performance and load, after all). In the real
life situation, we aren't using ipmitool at all, we have our own server
management daemon.

> I'm ok with the patch with the feature disabled by default. I'd prefer
> for it to be disabled by default because I prefer to reward vendors that
> make our lives better and punish vendors that make our lives worse :).
on the user side.
I agree the current behavior is good as default.

> You should run it through checkpatch; there were one or two coding style
> violations.

Weird, I ran it. Perhaps not the latest version. I will resend soon.

> I also have a few suggestions for solving this problem outside of this
> patch:
>
> 1. Get your vendor to implement UUIDs and timestamps. This will make
> things run more than an order of magnitude faster and more
> efficient. Even better than interrupts.
> 2. If that's not possible, don't use ipmitool. Instead, write a
> program with the openipmi library that stays up all the time (so
> the SDR fetch is only done once at startup) and dumps the sensors
> periodically.

This is what we are doing in real life. I need to check how much caching
the user space tool does, though.

> 3. If that's not feasible, poll less often and use events to catch
> critical changes. Of course, this being IPMI, some vendors don't
> properly implement events on their sensors, so that may not work.

You forgot to suggest "implement BT" :-) still the best thing to do IMO.

Best regards,
Martin

--
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel: ++49 5251 525 2796
Fax: ++49 5251 525 2820
Email: mailto:[email protected]
Internet: http://www.fujitsu-siemens.com
Company Details: http://www.fujitsu-siemens.com/imprint.html

2009-03-24 09:30:27

by Martin Wilck

[permalink] [raw]
Subject: Improving IPMI performance under load

Hi Corey,

yesterday I posted some results about the IPMI performance under CPU
load, which can be up to 25 times slower than in an idle system. I think
it might be worthwhile to try to improve that behavior as well.

I made a variation of my patch which introduces a second parameter
(kipmid_min_busy) that causes kipmid not to call schedule() for a
certain amount of time. Thus if there's IPMI traffic pending, kipmid
will busy-loop for kipmid_min_busy seconds, then starting to schedule()
in each loop as it does now, and finally go to sleep when
kipmid_max_busy is reached. At the same time, I changed the nice value
of kipmid from 19 to 0.

With this patch and e.g. min_busy=100 and max_busy=200, there is no
noticeable difference any more between IPMI performance with and without
CPU load.

The patch + results still need cleanup, therefore I am not sending it
right now. Just wanted to hear what you think.

Martin

--
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel: ++49 5251 525 2796
Fax: ++49 5251 525 2820
Email: mailto:[email protected]
Internet: http://www.fujitsu-siemens.com
Company Details: http://www.fujitsu-siemens.com/imprint.html

2009-03-24 13:08:52

by Corey Minyard

[permalink] [raw]
Subject: Re: [Openipmi-developer] Improving IPMI performance under load

Martin Wilck wrote:
> Hi Corey,
>
> yesterday I posted some results about the IPMI performance under CPU
> load, which can be up to 25 times slower than in an idle system. I think
> it might be worthwhile to try to improve that behavior as well.
>
Yes, that would be expected, as kipmid would never be scheduled in a
busy system, and it would just be the timer driving things.

> I made a variation of my patch which introduces a second parameter
> (kipmid_min_busy) that causes kipmid not to call schedule() for a
> certain amount of time. Thus if there's IPMI traffic pending, kipmid
> will busy-loop for kipmid_min_busy seconds, then starting to schedule()
> in each loop as it does now, and finally go to sleep when
> kipmid_max_busy is reached. At the same time, I changed the nice value
> of kipmid from 19 to 0.
>
I would guess that changing the nice value is the main thing that caused
the difference. The other changes probably didn't make as big a difference.

> With this patch and e.g. min_busy=100 and max_busy=200, there is no
> noticeable difference any more between IPMI performance with and without
> CPU load.
>
> The patch + results still need cleanup, therefore I am not sending it
> right now. Just wanted to hear what you think.
>
I'm ok with tuning like this, but most users are probably not going to
want this type of behavior.

-corey

2009-03-24 13:21:51

by Martin Wilck

[permalink] [raw]
Subject: Re: [Openipmi-developer] Improving IPMI performance under load

Corey Minyard wrote:

> I would guess that changing the nice value is the main thing that caused
> the difference. The other changes probably didn't make as big a difference.

That's true, but setting the nice level to 0 isn't "nice" without
kipmid_max_busy. The two parameters help to make sure that kipmid
doesn't use excessive CPU time.

I am not sure about your reasons to call schedule() in every loop
iteration. If there is no other process that needs to run, it will just
waste cycles trying to figure that out. If there are other processes,
you say yourself that "kipmid would never be scheduled in a
busy system". Does it really make sense to call schedule() every
microsecond? That's what kipmid effectively does if it waits for the KCS
interface, because it'll do a port_inb() in every iteration which takes
ca. 1us.

> I'm ok with tuning like this, but most users are probably not going to
> want this type of behavior.

Let's wait and see :-)

Martin

--
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel: ++49 5251 525 2796
Fax: ++49 5251 525 2820
Email: mailto:[email protected]
Internet: http://www.fujitsu-siemens.com
Company Details: http://www.fujitsu-siemens.com/imprint.html

2009-03-24 15:51:52

by Matt Domsch

[permalink] [raw]
Subject: Re: [Openipmi-developer] Improving IPMI performance under load

On Tue, Mar 24, 2009 at 08:08:36AM -0500, Corey Minyard wrote:
> Martin Wilck wrote:
> >Hi Corey,
> >
> >yesterday I posted some results about the IPMI performance under CPU
> >load, which can be up to 25 times slower than in an idle system. I think
> >it might be worthwhile to try to improve that behavior as well.
> >
> Yes, that would be expected, as kipmid would never be scheduled in a
> busy system, and it would just be the timer driving things.
>
> >I made a variation of my patch which introduces a second parameter
> >(kipmid_min_busy) that causes kipmid not to call schedule() for a
> >certain amount of time. Thus if there's IPMI traffic pending, kipmid
> >will busy-loop for kipmid_min_busy seconds, then starting to schedule()
> >in each loop as it does now, and finally go to sleep when
> >kipmid_max_busy is reached. At the same time, I changed the nice value
> >of kipmid from 19 to 0.
> >
> I would guess that changing the nice value is the main thing that caused
> the difference. The other changes probably didn't make as big a difference.
>
> >With this patch and e.g. min_busy=100 and max_busy=200, there is no
> >noticeable difference any more between IPMI performance with and without
> >CPU load.
> >
> >The patch + results still need cleanup, therefore I am not sending it
> >right now. Just wanted to hear what you think.
> >
> I'm ok with tuning like this, but most users are probably not going to
> want this type of behavior.

I still get complaints from users who see their CPU utilization spike
attributed to kipmi0 when userspace throws a lot of requests down to
the controller. I've seen them want to limit kipmi0 even further, not
speed it up.

--
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & http://www.dell.com/linux

2009-03-24 17:16:04

by Martin Wilck

[permalink] [raw]
Subject: Re: [Openipmi-developer] Improving IPMI performance under load

Matt Domsch wrote:

> I still get complaints from users who see their CPU utilization spike
> attributed to kipmi0 when userspace throws a lot of requests down to
> the controller. I've seen them want to limit kipmi0 even further, not
> speed it up.

My first patch ("limit CPU time spent in kipmi") is targeted at exactly
those users that you are talking about.

These users make their observations on idle systems. You'll never see
kipmid use up CPU under "top" in systems with high CPU load. With the
new patch I have in mind, kipmid can be tuned to be faster under load
and at the same time use less cycles (at the cost of slightly decreased
speed) when idle.

Martin

--
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel: ++49 5251 525 2796
Fax: ++49 5251 525 2820
Email: mailto:[email protected]
Internet: http://www.fujitsu-siemens.com
Company Details: http://www.fujitsu-siemens.com/imprint.html

2009-04-06 13:58:49

by Martin Wilck

[permalink] [raw]
Subject: Re: [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN)

Hello Corey,

> I've done some experimenting with your patch and some more thinking.
> (BTW, setting the permissions of kipmid_max_busy to 0644 as Greg
> suggested makes changing the value for testing a lot easier :).

I apologize for the long delay - busy with other stuff.

Here is the patch with modifiable, per-interface module parameter.
Feedback is welcome.

Best regards,
Martin (*note changed email address!*)

--
Dr. Martin Wilck
PRIMERGY System Software Engineer
x86 Server Engineering

Fujitsu Technology Solutions GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn, Germany

Phone: ++49 5251 525 2796
Fax: ++49 5251 525 2820
Email: [email protected]
Internet: http://ts.fujitsu.com
Company Details: http://de.ts.fujitsu.com/imprint.html


Attachments:
kipmid_max_busy_arr_2.6.29.diff (3.23 kB)

2009-06-04 18:50:11

by Martin Wilck

[permalink] [raw]
Subject: [PATCH] limit CPU time spent in kipmid (version 4)

Hi all,

I am sorry for the long silence. I am sending here a new version of my
patch which takes into account Bela's suggestions (well, most of them).
I compiled and tested it with 2.6.29.4, the results are similar as
before. By setting kipmid_max_busy_us to a value between 100 and 500, it
is possible to bring down kipmid CPU load to practically 0 without
loosing too much ipmi throughput performance.

Please give me some feedback whether this patch will get merged, and if
not, what improvement is needed.

Regards
Martin

--
Dr. Martin Wilck
PRIMERGY System Software Engineer
x86 Server Engineering

Fujitsu Technology Solutions GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn, Germany

Phone: ++49 5251 525 2796
Fax: ++49 5251 525 2820
Email: [email protected]
Internet: http://ts.fujitsu.com
Company Details: http://de.ts.fujitsu.com/imprint.html


Attachments:
kipmid_max_busy_arr2_2.6.29.diff (2.69 kB)