LinuxLists.cc - kdump always hangs in rcu_barrier() -> wait_for

2020-11-25 05:01:05

Subject: kdump always hangs in rcu_barrier() -> wait_for_completion()

Hi,
I happened to hit a kdump hang issue in a Linux VM running on some
Hyper-V host. Please see the attached log: the kdump kernel always hangs,
even if I configure only 1 virtual CPU to the VM.

I firstly hit the issue in RHEL 8.3's 4.18.x kernel, but later I found that
the latest upstream v5.10-rc5 also has the same issue (at least the
symptom is exactly the same), so I dug into v5.10-rc5 and found that the
kdump kernel always hangs in kernel_init() -> mark_readonly() ->
rcu_barrier() -> wait_for_completion(&rcu_state.barrier_completion).

Let's take the 1-vCPU case for example (refer to the attached log): in the
below code, somehow rcu_segcblist_n_cbs() returns true, so the call
smp_call_function_single(cpu, rcu_barrier_func, (void *)cpu, 1) increases
the counter by 1, and hence later the counter is still 1 after the
atomic_sub_and_test(), and the complete() is not called.

static void rcu_barrier_func(void *cpu_in)
{
...
if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) {
atomic_inc(&rcu_state.barrier_cpu_count);
} else {
...
}

void rcu_barrier(void)
{
...
atomic_set(&rcu_state.barrier_cpu_count, 2);
...
for_each_possible_cpu(cpu) {
rdp = per_cpu_ptr(&rcu_data, cpu);
...
if (rcu_segcblist_n_cbs(&rdp->cblist) && cpu_online(cpu)) {
...
smp_call_function_single(cpu, rcu_barrier_func, (void *)cpu, 1);
...
}
}
...
if (atomic_sub_and_test(2, &rcu_state.barrier_cpu_count))
complete(&rcu_state.barrier_completion);

...
wait_for_completion(&rcu_state.barrier_completion);

Sorry for my ignorance of RCU -- I'm not sure why the rcu_segcblist_n_cbs()
returns 1 here. In the normal kernel, it returns 0, so the normal kernel does not
hang.

Note: in the case of kdump kernel, if I remove the kernel parameter
console=ttyS0 OR if I build the kernel with CONFIG_HZ=250, the issue can
no longer reproduce. Currently my kernel uses CONFIG_HZ=1000 and I use
console=ttyS0, so I'm able to reproduce the isue every time.

Note: the same kernel binary can not reproduce the issue when the VM
runs on another Hyper-V host.

It looks there is some kind of race condition?

Looking forward to your insights!

I'm happy to test any patch or enable more tracing, if necessary. Thanks!

Thanks,
-- Dexuan

Attachments:

bad-hz-1000.log (76.99 kB)
bad-hz-1000.log

2020-11-26 15:50:01

by Paul E. McKenney

[permalink] [raw]

Subject: Re: kdump always hangs in rcu_barrier() -> wait_for_completion()

On Wed, Nov 25, 2020 at 04:56:33AM +0000, Dexuan Cui wrote:
> Hi,
> I happened to hit a kdump hang issue in a Linux VM running on some
> Hyper-V host. Please see the attached log: the kdump kernel always hangs,
> even if I configure only 1 virtual CPU to the VM.
>
> I firstly hit the issue in RHEL 8.3's 4.18.x kernel, but later I found that
> the latest upstream v5.10-rc5 also has the same issue (at least the
> symptom is exactly the same), so I dug into v5.10-rc5 and found that the
> kdump kernel always hangs in kernel_init() -> mark_readonly() ->
> rcu_barrier() -> wait_for_completion(&rcu_state.barrier_completion).
>
> Let's take the 1-vCPU case for example (refer to the attached log): in the
> below code, somehow rcu_segcblist_n_cbs() returns true, so the call
> smp_call_function_single(cpu, rcu_barrier_func, (void *)cpu, 1) increases
> the counter by 1, and hence later the counter is still 1 after the
> atomic_sub_and_test(), and the complete() is not called.
>
> static void rcu_barrier_func(void *cpu_in)
> {
> ...
> if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) {
> atomic_inc(&rcu_state.barrier_cpu_count);
> } else {
> ...
> }
>
> void rcu_barrier(void)
> {
> ...
> atomic_set(&rcu_state.barrier_cpu_count, 2);
> ...
> for_each_possible_cpu(cpu) {
> rdp = per_cpu_ptr(&rcu_data, cpu);
> ...
> if (rcu_segcblist_n_cbs(&rdp->cblist) && cpu_online(cpu)) {
> ...
> smp_call_function_single(cpu, rcu_barrier_func, (void *)cpu, 1);
> ...
> }
> }
> ...
> if (atomic_sub_and_test(2, &rcu_state.barrier_cpu_count))
> complete(&rcu_state.barrier_completion);
>
> ...
> wait_for_completion(&rcu_state.barrier_completion);
>
> Sorry for my ignorance of RCU -- I'm not sure why the rcu_segcblist_n_cbs()
> returns 1 here. In the normal kernel, it returns 0, so the normal kernel does not
> hang.

The rcu_segcblist_n_cbs() function returns non-zero because something
invoked call_rcu() some time previously. The ftrace facility (or just
a printk) should help you work out where that call_rcu() is located.

> Note: in the case of kdump kernel, if I remove the kernel parameter
> console=ttyS0 OR if I build the kernel with CONFIG_HZ=250, the issue can
> no longer reproduce. Currently my kernel uses CONFIG_HZ=1000 and I use
> console=ttyS0, so I'm able to reproduce the isue every time.
>
> Note: the same kernel binary can not reproduce the issue when the VM
> runs on another Hyper-V host.
>
> It looks there is some kind of race condition?

Quite possibly. Perhaps there is some code somewhere in the kernel that
invokes call_rcu() after some number of jiffies, in which case reducing
CONFIG_HZ would delay that call_rcu() invocation until after this call
to rcu_barrier(). Perhaps that call_rcu() is also associated with the
console=ttyS0, so that it is delayed sufficiently with reduced CONFIG_HZ
and missing entirely without console=ttyS0.

> Looking forward to your insights!
>
> I'm happy to test any patch or enable more tracing, if necessary. Thanks!

My best guess is that the underlying bug is that you are invoking
rcu_barrier() before the RCU grace-period kthread has been created.
This means that RCU grace periods cannot complete, which in turn means
that if there has been even one invocation of call_rcu() since boot,
rcu_barrier() cannot complete, which is what you are in fact seeing.
Please note that it is perfectly legal to invoke call_rcu() very early in
the boot process, as in even before the call to rcu_init(). Therefore,
if this is the case, the bug is the early call to rcu_barrier(), not
the early calls to call_rcu().

To check this, at the beginning of rcu_barrier(), check the value of
rcu_state.gp_kthread. If my guess is correct, it will be NULL.

Another possibility is that rcu_state.gp_kthread is non-NULL, but that
something else is preventing RCU grace periods from completing, but in
that case you should see RCU CPU stall warnings. Unless of course they
have been disabled.

Please let me know what you find!

Thanx, Paul

2020-11-27 08:45:07

by Paul E. McKenney

[permalink] [raw]

Subject: Re: kdump always hangs in rcu_barrier() -> wait_for_completion()

On Thu, Nov 26, 2020 at 09:25:28PM +0000, Dexuan Cui wrote:
> > From: Paul E. McKenney <[email protected]>
> > Sent: Thursday, November 26, 2020 7:47 AM
> > ...
> > The rcu_segcblist_n_cbs() function returns non-zero because something
> > invoked call_rcu() some time previously. The ftrace facility (or just
> > a printk) should help you work out where that call_rcu() is located.
>
> call_rcu() is indeed called multiple times, but as you said, this should
> be normal.

Good to know, thank you!

> > My best guess is that the underlying bug is that you are invoking
> > rcu_barrier() before the RCU grace-period kthread has been created.
> > This means that RCU grace periods cannot complete, which in turn means
> > that if there has been even one invocation of call_rcu() since boot,
> > rcu_barrier() cannot complete, which is what you are in fact seeing.
> > Please note that it is perfectly legal to invoke call_rcu() very early in
> > the boot process, as in even before the call to rcu_init(). Therefore,
> > if this is the case, the bug is the early call to rcu_barrier(), not
> > the early calls to call_rcu().
> >
> > To check this, at the beginning of rcu_barrier(), check the value of
> > rcu_state.gp_kthread. If my guess is correct, it will be NULL.
>
> Unluckily, it's not NULL here. :-)

You can't have everything! ;-)

> > Another possibility is that rcu_state.gp_kthread is non-NULL, but that
> > something else is preventing RCU grace periods from completing, but in
>
> It looks like somehow the scheduling is not working here: in rcu_barrier()
> , if I replace the wait_for_completion() with
> wait_for_completion_timeout(&rcu_state.barrier_completion, 30*HZ), the
> issue persists.

Have you tried using sysreq-t to see what the various tasks are doing?

One way that this can happen is if whatever task is currently running
has managed to enter long loop with interrupts disabled.

> > that case you should see RCU CPU stall warnings. Unless of course they
> > have been disabled.
> > Thanx, Paul
>
> I guess I didn't disable the wanrings (I don't even know how to do that :)

Having interrupts disabled on all CPUs would have the effect of disabling
the RCU CPU stall warnings.

The intended method is in Documentation/admin-guide/kernel-parameters.txt.
Search for rcu_cpu_stall_suppress. Not that it seems important at this
point.

Thanx, Paul

> grep RCU .config
> # RCU Subsystem
> CONFIG_TREE_RCU=y
> # CONFIG_RCU_EXPERT is not set
> CONFIG_SRCU=y
> CONFIG_TREE_SRCU=y
> CONFIG_TASKS_RCU_GENERIC=y
> CONFIG_TASKS_RUDE_RCU=y
> CONFIG_TASKS_TRACE_RCU=y
> CONFIG_RCU_STALL_COMMON=y
> CONFIG_RCU_NEED_SEGCBLIST=y
> CONFIG_RCU_NOCB_CPU=y
> # end of RCU Subsystem
> CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
> # RCU Debugging
> # CONFIG_RCU_SCALE_TEST is not set
> # CONFIG_RCU_TORTURE_TEST is not set
> # CONFIG_RCU_REF_SCALE_TEST is not set
> CONFIG_RCU_CPU_STALL_TIMEOUT=30
> CONFIG_RCU_TRACE=y
> CONFIG_RCU_EQS_DEBUG=y
> # end of RCU Debugging
>
> Thanks,
> -- Dexuan
>

2020-11-27 08:46:56

by Dexuan Cui

[permalink] [raw]

Subject: RE: kdump always hangs in rcu_barrier() -> wait_for_completion()

> From: Paul E. McKenney <[email protected]>
> Sent: Thursday, November 26, 2020 7:47 AM
> ...
> The rcu_segcblist_n_cbs() function returns non-zero because something
> invoked call_rcu() some time previously. The ftrace facility (or just
> a printk) should help you work out where that call_rcu() is located.

call_rcu() is indeed called multiple times, but as you said, this should
be normal.

> My best guess is that the underlying bug is that you are invoking
> rcu_barrier() before the RCU grace-period kthread has been created.
> This means that RCU grace periods cannot complete, which in turn means
> that if there has been even one invocation of call_rcu() since boot,
> rcu_barrier() cannot complete, which is what you are in fact seeing.
> Please note that it is perfectly legal to invoke call_rcu() very early in
> the boot process, as in even before the call to rcu_init(). Therefore,
> if this is the case, the bug is the early call to rcu_barrier(), not
> the early calls to call_rcu().
>
> To check this, at the beginning of rcu_barrier(), check the value of
> rcu_state.gp_kthread. If my guess is correct, it will be NULL.

Unluckily, it's not NULL here. :-)

>
> Another possibility is that rcu_state.gp_kthread is non-NULL, but that
> something else is preventing RCU grace periods from completing, but in

It looks like somehow the scheduling is not working here: in rcu_barrier()
, if I replace the wait_for_completion() with
wait_for_completion_timeout(&rcu_state.barrier_completion, 30*HZ), the
issue persists.

> that case you should see RCU CPU stall warnings. Unless of course they
> have been disabled.
> Thanx, Paul

I guess I didn't disable the wanrings (I don't even know how to do that :)

grep RCU .config
# RCU Subsystem
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_TASKS_TRACE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
CONFIG_RCU_NOCB_CPU=y
# end of RCU Subsystem
CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
# RCU Debugging
# CONFIG_RCU_SCALE_TEST is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_RCU_REF_SCALE_TEST is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=30
CONFIG_RCU_TRACE=y
CONFIG_RCU_EQS_DEBUG=y
# end of RCU Debugging

Thanks,
-- Dexuan

2020-11-27 08:47:17

by Dexuan Cui

[permalink] [raw]

Subject: RE: kdump always hangs in rcu_barrier() -> wait_for_completion()

> From: Paul E. McKenney <[email protected]>
> Sent: Thursday, November 26, 2020 1:42 PM
>
> > > Another possibility is that rcu_state.gp_kthread is non-NULL, but that
> > > something else is preventing RCU grace periods from completing, but in
> >
> > It looks like somehow the scheduling is not working here: in rcu_barrier()
> > , if I replace the wait_for_completion() with
> > wait_for_completion_timeout(&rcu_state.barrier_completion, 30*HZ), the
> > issue persists.
>
> Have you tried using sysreq-t to see what the various tasks are doing?

Will try it.

BTW, this is a "Generation 2" VM on Hyper-V, meaning sysrq only starts to
work after the Hyper-V para-virtualized keyboard driver loads... So, at this
early point, sysrq is not working. :-( I'll have to hack the code and use a
virtual NMI interrupt to force the sysrq handler to be called.

> Having interrupts disabled on all CPUs would have the effect of disabling
> the RCU CPU stall warnings.
> Thanx, Paul

I'm sure the interrupts are not disabled. Here the VM only has 1 virtual CPU,
and when the hang issue happens the virtual serial console is still responding
when I press Enter (it prints a new line) or Ctrl+C (it prints ^C).

Here the VM does not use the "legacy timers" (PIT, Local APIC timer, etc.) at all.
Instead, the VM uses the Hyper-V para-virtualized timers. It looks the Hyper-V
timer never fires in the kdump kernel when the hang issue happens. I'm
looking into this... I suspect this hang issue may only be specific to Hyper-V.

Thanks,
-- Dexuan

2020-11-27 08:47:34

by Paul E. McKenney

[permalink] [raw]

Subject: Re: kdump always hangs in rcu_barrier() -> wait_for_completion()

On Thu, Nov 26, 2020 at 10:59:19PM +0000, Dexuan Cui wrote:
> > From: Paul E. McKenney <[email protected]>
> > Sent: Thursday, November 26, 2020 1:42 PM
> >
> > > > Another possibility is that rcu_state.gp_kthread is non-NULL, but that
> > > > something else is preventing RCU grace periods from completing, but in
> > >
> > > It looks like somehow the scheduling is not working here: in rcu_barrier()
> > > , if I replace the wait_for_completion() with
> > > wait_for_completion_timeout(&rcu_state.barrier_completion, 30*HZ), the
> > > issue persists.
> >
> > Have you tried using sysreq-t to see what the various tasks are doing?
>
> Will try it.
>
> BTW, this is a "Generation 2" VM on Hyper-V, meaning sysrq only starts to
> work after the Hyper-V para-virtualized keyboard driver loads... So, at this
> early point, sysrq is not working. :-( I'll have to hack the code and use a
> virtual NMI interrupt to force the sysrq handler to be called.

Whatever works!

> > Having interrupts disabled on all CPUs would have the effect of disabling
> > the RCU CPU stall warnings.
> > Thanx, Paul
>
> I'm sure the interrupts are not disabled. Here the VM only has 1 virtual CPU,
> and when the hang issue happens the virtual serial console is still responding
> when I press Enter (it prints a new line) or Ctrl+C (it prints ^C).
>
> Here the VM does not use the "legacy timers" (PIT, Local APIC timer, etc.) at all.
> Instead, the VM uses the Hyper-V para-virtualized timers. It looks the Hyper-V
> timer never fires in the kdump kernel when the hang issue happens. I'm
> looking into this... I suspect this hang issue may only be specific to Hyper-V.

Fair enough, given that timers not working can also suppress RCU CPU
stall warnings. ;-)

Thanx, Paul

2021-01-28 07:35:04

by Dexuan Cui

[permalink] [raw]

Subject: RE: kdump always hangs in rcu_barrier() -> wait_for_completion()

> From: Paul E. McKenney <[email protected]>
> Sent: Thursday, November 26, 2020 3:55 PM
> To: Dexuan Cui <[email protected]>
> Cc: [email protected]; Ingo Molnar <[email protected]>;
> [email protected]; vkuznets <[email protected]>; Michael Kelley
> <[email protected]>; [email protected]
> Subject: Re: kdump always hangs in rcu_barrier() -> wait_for_completion()
>
> On Thu, Nov 26, 2020 at 10:59:19PM +0000, Dexuan Cui wrote:
> > > From: Paul E. McKenney <[email protected]>
> > > Sent: Thursday, November 26, 2020 1:42 PM
> > >
> > > > > Another possibility is that rcu_state.gp_kthread is non-NULL, but that
> > > > > something else is preventing RCU grace periods from completing, but in
> > > >
> > > > It looks like somehow the scheduling is not working here: in rcu_barrier()
> > > > , if I replace the wait_for_completion() with
> > > > wait_for_completion_timeout(&rcu_state.barrier_completion, 30*HZ),
> the
> > > > issue persists.
> > >
> > > Have you tried using sysreq-t to see what the various tasks are doing?
> >
> > Will try it.
> >
> > BTW, this is a "Generation 2" VM on Hyper-V, meaning sysrq only starts to
> > work after the Hyper-V para-virtualized keyboard driver loads... So, at this
> > early point, sysrq is not working. :-( I'll have to hack the code and use a
> > virtual NMI interrupt to force the sysrq handler to be called.
>
> Whatever works!
>
> > > Having interrupts disabled on all CPUs would have the effect of disabling
> > > the RCU CPU stall warnings.
> > > Thanx, Paul
> >
> > I'm sure the interrupts are not disabled. Here the VM only has 1 virtual CPU,
> > and when the hang issue happens the virtual serial console is still responding
> > when I press Enter (it prints a new line) or Ctrl+C (it prints ^C).
> >
> > Here the VM does not use the "legacy timers" (PIT, Local APIC timer, etc.) at
> all.
> > Instead, the VM uses the Hyper-V para-virtualized timers. It looks the
> Hyper-V
> > timer never fires in the kdump kernel when the hang issue happens. I'm
> > looking into this... I suspect this hang issue may only be specific to Hyper-V.
>
> Fair enough, given that timers not working can also suppress RCU CPU
> stall warnings. ;-)
>
> Thanx, Paul

FYI: the issue has been fixed by this fix:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fff7b5e6ee63c5d20406a131b260c619cdd24fd1

Thanks,
-- Dexuan

2021-01-28 15:00:55

by Paul E. McKenney

[permalink] [raw]

Subject: Re: kdump always hangs in rcu_barrier() -> wait_for_completion()

On Thu, Jan 28, 2021 at 07:28:20AM +0000, Dexuan Cui wrote:
> > From: Paul E. McKenney <[email protected]>
> > Sent: Thursday, November 26, 2020 3:55 PM
> > To: Dexuan Cui <[email protected]>
> > Cc: [email protected]; Ingo Molnar <[email protected]>;
> > [email protected]; vkuznets <[email protected]>; Michael Kelley
> > <[email protected]>; [email protected]
> > Subject: Re: kdump always hangs in rcu_barrier() -> wait_for_completion()
> >
> > On Thu, Nov 26, 2020 at 10:59:19PM +0000, Dexuan Cui wrote:
> > > > From: Paul E. McKenney <[email protected]>
> > > > Sent: Thursday, November 26, 2020 1:42 PM
> > > >
> > > > > > Another possibility is that rcu_state.gp_kthread is non-NULL, but that
> > > > > > something else is preventing RCU grace periods from completing, but in
> > > > >
> > > > > It looks like somehow the scheduling is not working here: in rcu_barrier()
> > > > > , if I replace the wait_for_completion() with
> > > > > wait_for_completion_timeout(&rcu_state.barrier_completion, 30*HZ),
> > the
> > > > > issue persists.
> > > >
> > > > Have you tried using sysreq-t to see what the various tasks are doing?
> > >
> > > Will try it.
> > >
> > > BTW, this is a "Generation 2" VM on Hyper-V, meaning sysrq only starts to
> > > work after the Hyper-V para-virtualized keyboard driver loads... So, at this
> > > early point, sysrq is not working. :-( I'll have to hack the code and use a
> > > virtual NMI interrupt to force the sysrq handler to be called.
> >
> > Whatever works!
> >
> > > > Having interrupts disabled on all CPUs would have the effect of disabling
> > > > the RCU CPU stall warnings.
> > > > Thanx, Paul
> > >
> > > I'm sure the interrupts are not disabled. Here the VM only has 1 virtual CPU,
> > > and when the hang issue happens the virtual serial console is still responding
> > > when I press Enter (it prints a new line) or Ctrl+C (it prints ^C).
> > >
> > > Here the VM does not use the "legacy timers" (PIT, Local APIC timer, etc.) at
> > all.
> > > Instead, the VM uses the Hyper-V para-virtualized timers. It looks the
> > Hyper-V
> > > timer never fires in the kdump kernel when the hang issue happens. I'm
> > > looking into this... I suspect this hang issue may only be specific to Hyper-V.
> >
> > Fair enough, given that timers not working can also suppress RCU CPU
> > stall warnings. ;-)
> >
> > Thanx, Paul
>
> FYI: the issue has been fixed by this fix:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fff7b5e6ee63c5d20406a131b260c619cdd24fd1

Thank you for the update!

Thanx, Paul