2022-03-27 20:54:58

by chenchacha

[permalink] [raw]
Subject: [PATCH 0/3] ipmi: msghandler: check the users and msgs causing the system to block

At present, a scenario has been found that there are too many ipmi messages in a
short period of time, and a large number of users and messages are blocked in
the ipmi modules, resulting in a large amount of system memory being occupied by
ipmi, and ipmi communication always fails.

Frequent calls ipmi and failure of hardware communication will cause this
exception. And ipmi has no way to detect and perceive this problem, therefore
it is impossible to located and perceived online.

This patch provides a method to view the current number of users and messages in
ipmi, and introduce a simple interface to clear the message queue.

Chen Guanqiao (3):
ipmi: Get the number of user through sysfs
ipmi: Get the number of message through sysfs
ipmi: add a interface to clean message queue in sysfs

drivers/char/ipmi/ipmi_msghandler.c | 159 ++++++++++++++++++++++++++++
1 file changed, 159 insertions(+)

--
2.25.1


2022-03-28 12:00:46

by Corey Minyard

[permalink] [raw]
Subject: Re: [PATCH 0/3] ipmi: msghandler: check the users and msgs causing the system to block

On Mon, Mar 28, 2022 at 12:47:41AM +0800, Chen Guanqiao wrote:
> At present, a scenario has been found that there are too many ipmi messages in a
> short period of time, and a large number of users and messages are blocked in
> the ipmi modules, resulting in a large amount of system memory being occupied by
> ipmi, and ipmi communication always fails.
>
> Frequent calls ipmi and failure of hardware communication will cause this
> exception. And ipmi has no way to detect and perceive this problem, therefore
> it is impossible to located and perceived online.

Hmm. So you have an application that just keeps sending IPMI messages
and not waiting for responses? I think the first order of business
would be to fix your applications to not do that.

The ipmi driver will eventually clean things out, but the timeouts are
pretty long. In the 5 second range per message.

However, as you say, there are no limits on users or messages, and that
is perhaps a problem. I mean, only root can send IPMI message, and root
can do a lot more harm than that. But it's probably bad in principle.
Nobody has ever reported this problem before.

Anyway, a better solution for the kernel side of things, I think, would
be to add limits on the number of users and the number of messages per
user. That's more inline with what other kernel things do. I know of
nothing else in the kernel that does what you are proposing.

Does that make sense?

-corey

>
> This patch provides a method to view the current number of users and messages in
> ipmi, and introduce a simple interface to clear the message queue.
>
> Chen Guanqiao (3):
> ipmi: Get the number of user through sysfs
> ipmi: Get the number of message through sysfs
> ipmi: add a interface to clean message queue in sysfs
>
> drivers/char/ipmi/ipmi_msghandler.c | 159 ++++++++++++++++++++++++++++
> 1 file changed, 159 insertions(+)
>
> --
> 2.25.1
>

2022-03-28 22:21:24

by Corey Minyard

[permalink] [raw]
Subject: Re: [PATCH 0/3] ipmi: msghandler: check the users and msgs causing the system to block

On Mon, Mar 28, 2022 at 11:27:06PM +0800, chenchacha wrote:
>
> > Anyway, a better solution for the kernel side of things, I think, would
> > be to add limits on the number of users and the number of messages per
> > user. That's more inline with what other kernel things do. I know of
> > nothing else in the kernel that does what you are proposing.
>
> The precondition for add limits, is that people known that ipmi has too many
> users and messages cause problems, this patch is to let administrator known
> that.
>
> In addition, different machines have different limit, My server my block
> 700,000 messages and it's fine, and my NAS pc went to OOM when it probably
> blocked for 10,000 messages. So, to limit the number of users and messages,
> can wait until we have accumulated some online experience?

I don't mean a limit on the total number of messages, but a limit on the
total number of oustanding messages, and a limit on the total number of
users. No user should have more than a handful of oustanding message,
and limiting the number of users to 20 or 30 should be more than enough
for any system.

Having those limits in place would probably help you trace down your
problem, as you would hit the limits and it should report it at the
source of the problem.

-corey

>
> >
> > Does that make sense?
> >
> > -corey
> >
>
> thanks
> --
>
> Chen Guanqiao
> > >
> > > This patch provides a method to view the current number of users and messages in
> > > ipmi, and introduce a simple interface to clear the message queue.
> > >
> > > Chen Guanqiao (3):
> > > ipmi: Get the number of user through sysfs
> > > ipmi: Get the number of message through sysfs
> > > ipmi: add a interface to clean message queue in sysfs
> > >
> > > drivers/char/ipmi/ipmi_msghandler.c | 159 ++++++++++++++++++++++++++++
> > > 1 file changed, 159 insertions(+)
> > >
> > > --
> > > 2.25.1
> > >
>

2022-03-29 19:45:55

by chenchacha

[permalink] [raw]
Subject: Re: [PATCH 0/3] ipmi: msghandler: check the users and msgs causing the system to block



On 2022/3/28 23:45, Corey Minyard wrote:
> On Mon, Mar 28, 2022 at 11:27:06PM +0800, chenchacha wrote:
>>
>>> Anyway, a better solution for the kernel side of things, I think, would
>>> be to add limits on the number of users and the number of messages per
>>> user. That's more inline with what other kernel things do. I know of
>>> nothing else in the kernel that does what you are proposing.
>>
>> The precondition for add limits, is that people known that ipmi has too many
>> users and messages cause problems, this patch is to let administrator known
>> that.
>>
>> In addition, different machines have different limit, My server my block
>> 700,000 messages and it's fine, and my NAS pc went to OOM when it probably
>> blocked for 10,000 messages. So, to limit the number of users and messages,
>> can wait until we have accumulated some online experience?
>
> I don't mean a limit on the total number of messages, but a limit on the
> total number of oustanding messages, and a limit on the total number of
> users. No user should have more than a handful of oustanding message,
> and limiting the number of users to 20 or 30 should be more than enough
> for any system.
>
> Having those limits in place would probably help you trace down your
> problem, as you would hit the limits and it should report it at the
> source of the problem.
>
> -corey

Hi Corey:

According to your suggestion, I have don some tests. After adding
limits, event if the bmc hardware fails, the ipmi will not occupy a
large memory in system.

The modifications are in the next version of the patch.

Thanks
--
Chen Guanqiao