2024-05-07 13:35:01

by Rik van Riel

[permalink] [raw]
Subject: [PATCH] fs/proc: fix softlockup in __read_vmcore

While taking a kernel core dump with makedumpfile on a larger system,
softlockup messages often appear.

While softlockup warnings can be harmless, they can also interfere
with things like RCU freeing memory, which can be problematic when
the kdump kexec image is configured with as little memory as possible.

Avoid the softlockup, and give things like work items and RCU a
chance to do their thing during __read_vmcore by adding a cond_resched.

Signed-off-by: Rik van Riel <[email protected]>
---
fs/proc/vmcore.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 1fb213f379a5..d06607a1f137 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -383,6 +383,8 @@ static ssize_t __read_vmcore(struct iov_iter *iter, loff_t *fpos)
/* leave now if filled buffer already */
if (!iov_iter_count(iter))
return acc;
+
+ cond_resched();
}

list_for_each_entry(m, &vmcore_list, list) {
--
2.42.0




2024-05-09 03:53:17

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH] fs/proc: fix softlockup in __read_vmcore

Hi,

On 05/07/24 at 09:18am, Rik van Riel wrote:
> While taking a kernel core dump with makedumpfile on a larger system,
> softlockup messages often appear.
>
> While softlockup warnings can be harmless, they can also interfere
> with things like RCU freeing memory, which can be problematic when
> the kdump kexec image is configured with as little memory as possible.
>
> Avoid the softlockup, and give things like work items and RCU a
> chance to do their thing during __read_vmcore by adding a cond_resched.

Thanks for fixing this.

By the way, is it easy to reproduce? And should we add some trace of the
softlockup into log so that people can search for it and confirm when
encountering it?

Thanks
Baoquan

> ---
> fs/proc/vmcore.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 1fb213f379a5..d06607a1f137 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -383,6 +383,8 @@ static ssize_t __read_vmcore(struct iov_iter *iter, loff_t *fpos)
> /* leave now if filled buffer already */
> if (!iov_iter_count(iter))
> return acc;
> +
> + cond_resched();
> }
>
> list_for_each_entry(m, &vmcore_list, list) {
> --
> 2.42.0
>
>
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec
>


2024-05-09 13:42:48

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] fs/proc: fix softlockup in __read_vmcore

On Thu, 2024-05-09 at 11:52 +0800, Baoquan He wrote:
> Hi,
>
> On 05/07/24 at 09:18am, Rik van Riel wrote:
> > While taking a kernel core dump with makedumpfile on a larger
> > system,
> > softlockup messages often appear.
> >
> > While softlockup warnings can be harmless, they can also interfere
> > with things like RCU freeing memory, which can be problematic when
> > the kdump kexec image is configured with as little memory as
> > possible.
> >
> > Avoid the softlockup, and give things like work items and RCU a
> > chance to do their thing during __read_vmcore by adding a
> > cond_resched.
>
> Thanks for fixing this.
>
> By the way, is it easy to reproduce? And should we add some trace of
> the
> softlockup into log so that people can search for it and confirm when
> encountering it?

It is pretty easy to reproduce, but it does not happen all the time.
With millions of systems, even rare errors are common :)

However, we have been running with this fix for long enough (we
deployed it in order to test it) that I don't think we have theĀ 
warning stored any more. Those logs were rotated out long ago.

kind regards,

Rik
--
All Rights Reversed.

2024-05-09 15:30:44

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH] fs/proc: fix softlockup in __read_vmcore

On 05/09/24 at 09:41am, Rik van Riel wrote:
> On Thu, 2024-05-09 at 11:52 +0800, Baoquan He wrote:
> > Hi,
> >
> > On 05/07/24 at 09:18am, Rik van Riel wrote:
> > > While taking a kernel core dump with makedumpfile on a larger
> > > system,
> > > softlockup messages often appear.
> > >
> > > While softlockup warnings can be harmless, they can also interfere
> > > with things like RCU freeing memory, which can be problematic when
> > > the kdump kexec image is configured with as little memory as
> > > possible.
> > >
> > > Avoid the softlockup, and give things like work items and RCU a
> > > chance to do their thing during __read_vmcore by adding a
> > > cond_resched.
> >
> > Thanks for fixing this.
> >
> > By the way, is it easy to reproduce? And should we add some trace of
> > the
> > softlockup into log so that people can search for it and confirm when
> > encountering it?
>
> It is pretty easy to reproduce, but it does not happen all the time.
> With millions of systems, even rare errors are common :)
>
> However, we have been running with this fix for long enough (we
> deployed it in order to test it) that I don't think we have the?
> warning stored any more. Those logs were rotated out long ago.

OK, thanks for the explanation.

Acked-by: Baoquan He <[email protected]>