2022-10-07 09:38:34

by Zhang Yuchen

[permalink] [raw]
Subject: [PATCH 2/3] ipmi: fix long wait in unload when IPMI disconnect

When fixing the problem mentioned in PATCH1, we also found
the following problem:

If the IPMI is disconnected and in the sending process, the
uninstallation driver will be stuck for a long time.

The main problem is that uninstalling the driver waits for curr_msg to
be sent or HOSED. After stopping tasklet, the only place to trigger the
timeout mechanism is the circular poll in shutdown_smi.

The poll function delays 10us and calls smi_event_handler(smi_info,10).
Smi_event_handler deducts 10us from kcs->ibf_timeout.

But the poll func is followed by schedule_timeout_uninterruptible(1).
The time consumed here is not counted in kcs->ibf_timeout.

So when 10us is deducted from kcs->ibf_timeout, at least 1 jiffies has
actually passed. The waiting time has increased by more than a
hundredfold.

Now instead of calling poll(). call smi_event_handler() directly and
calculate the elapsed time.

For verification, you can directly use ebpf to check the kcs->
ibf_timeout for each call to kcs_event() when IPMI is disconnected.
Decrement at normal rate before unloading. The decrement rate becomes
very slow after unloading.

$ bpftrace -e 'kprobe:kcs_event {printf("kcs->ibftimeout : %d\n",
*(arg0+584));}'

Signed-off-by: Zhang Yuchen <[email protected]>
---
drivers/char/ipmi/ipmi_si_intf.c | 27 +++++++++++++++++++--------
1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
index 6e357ad76f2e..abddd7e43a9a 100644
--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -2153,6 +2153,20 @@ static int __init init_ipmi_si(void)
}
module_init(init_ipmi_si);

+static void wait_msg_processed(struct smi_info *smi_info)
+{
+ unsigned long jiffies_now;
+ long time_diff;
+
+ while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)) {
+ jiffies_now = jiffies;
+ time_diff = (((long)jiffies_now - (long)smi_info->last_timeout_jiffies)
+ * SI_USEC_PER_JIFFY);
+ smi_event_handler(smi_info, time_diff);
+ schedule_timeout_uninterruptible(1);
+ }
+}
+
static void shutdown_smi(void *send_info)
{
struct smi_info *smi_info = send_info;
@@ -2187,16 +2201,13 @@ static void shutdown_smi(void *send_info)
* in the BMC. Note that timers and CPU interrupts are off,
* so no need for locks.
*/
- while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)) {
- poll(smi_info);
- schedule_timeout_uninterruptible(1);
- }
+ wait_msg_processed(smi_info);
+
if (smi_info->handlers)
disable_si_irq(smi_info);
- while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)) {
- poll(smi_info);
- schedule_timeout_uninterruptible(1);
- }
+
+ wait_msg_processed(smi_info);
+
if (smi_info->handlers)
smi_info->handlers->cleanup(smi_info->si_sm);

--
2.30.2


2022-10-07 20:19:49

by Corey Minyard

[permalink] [raw]
Subject: Re: [PATCH 2/3] ipmi: fix long wait in unload when IPMI disconnect

On Fri, Oct 07, 2022 at 05:26:16PM +0800, Zhang Yuchen wrote:
> When fixing the problem mentioned in PATCH1, we also found
> the following problem:
>
> If the IPMI is disconnected and in the sending process, the
> uninstallation driver will be stuck for a long time.
>
> The main problem is that uninstalling the driver waits for curr_msg to
> be sent or HOSED. After stopping tasklet, the only place to trigger the
> timeout mechanism is the circular poll in shutdown_smi.
>
> The poll function delays 10us and calls smi_event_handler(smi_info,10).
> Smi_event_handler deducts 10us from kcs->ibf_timeout.
>
> But the poll func is followed by schedule_timeout_uninterruptible(1).
> The time consumed here is not counted in kcs->ibf_timeout.
>
> So when 10us is deducted from kcs->ibf_timeout, at least 1 jiffies has
> actually passed. The waiting time has increased by more than a
> hundredfold.
>
> Now instead of calling poll(). call smi_event_handler() directly and
> calculate the elapsed time.

Yes, you are correct.

I've included this for 6.2, and added:

Cc: [email protected]

I would like it to soak for a bit.

-corey

>
> For verification, you can directly use ebpf to check the kcs->
> ibf_timeout for each call to kcs_event() when IPMI is disconnected.
> Decrement at normal rate before unloading. The decrement rate becomes
> very slow after unloading.
>
> $ bpftrace -e 'kprobe:kcs_event {printf("kcs->ibftimeout : %d\n",
> *(arg0+584));}'
>
> Signed-off-by: Zhang Yuchen <[email protected]>
> ---
> drivers/char/ipmi/ipmi_si_intf.c | 27 +++++++++++++++++++--------
> 1 file changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
> index 6e357ad76f2e..abddd7e43a9a 100644
> --- a/drivers/char/ipmi/ipmi_si_intf.c
> +++ b/drivers/char/ipmi/ipmi_si_intf.c
> @@ -2153,6 +2153,20 @@ static int __init init_ipmi_si(void)
> }
> module_init(init_ipmi_si);
>
> +static void wait_msg_processed(struct smi_info *smi_info)
> +{
> + unsigned long jiffies_now;
> + long time_diff;
> +
> + while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)) {
> + jiffies_now = jiffies;
> + time_diff = (((long)jiffies_now - (long)smi_info->last_timeout_jiffies)
> + * SI_USEC_PER_JIFFY);
> + smi_event_handler(smi_info, time_diff);
> + schedule_timeout_uninterruptible(1);
> + }
> +}
> +
> static void shutdown_smi(void *send_info)
> {
> struct smi_info *smi_info = send_info;
> @@ -2187,16 +2201,13 @@ static void shutdown_smi(void *send_info)
> * in the BMC. Note that timers and CPU interrupts are off,
> * so no need for locks.
> */
> - while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)) {
> - poll(smi_info);
> - schedule_timeout_uninterruptible(1);
> - }
> + wait_msg_processed(smi_info);
> +
> if (smi_info->handlers)
> disable_si_irq(smi_info);
> - while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)) {
> - poll(smi_info);
> - schedule_timeout_uninterruptible(1);
> - }
> +
> + wait_msg_processed(smi_info);
> +
> if (smi_info->handlers)
> smi_info->handlers->cleanup(smi_info->si_sm);
>
> --
> 2.30.2
>