LinuxLists.cc - rcu_preempt caused oom

2018-11-29 08:50:31

Subject: rcu_preempt caused oom

Hi,
we test on kernel 4.19.0 on android, after run more than 24 Hours monkey stress test, we see OOM on 1/10 2G memory board, the issue is not seen on the 4.14 kernel.
we have done some debugs:
1. OOM is due to the filp consume too many memory: 300M vs 2G board.
2. with the 120s hung task detect, most of the tasks will block at __wait_rcu_gp: wait_for_completion(&rs_array[i].completion);
[47571.863839] Kernel panic - not syncing: hung_task: blocked tasks
[47571.875446] CPU: 1 PID: 13626 Comm: FinalizerDaemon Tainted: G U O 4.19.0-quilt-2e5dc0ac-gf3f313245eb6 #1
[47571.887603] Call Trace:
[47571.890547] dump_stack+0x70/0xa5
[47571.894456] panic+0xe3/0x241
[47571.897977] ? wait_for_completion_timeout+0x72/0x1b0
[47571.903830] __wait_rcu_gp+0x17b/0x180
[47571.908226] synchronize_rcu.part.76+0x38/0x50
[47571.913393] ? __call_rcu.constprop.79+0x3a0/0x3a0
[47571.918948] ? __bpf_trace_rcu_invoke_callback+0x10/0x10
[47571.925094] synchronize_rcu+0x43/0x50
[47571.929487] evdev_detach_client+0x59/0x60
[47571.934264] evdev_release+0x4e/0xd0
[47571.938464] __fput+0xfa/0x1f0
[47571.942072] ____fput+0xe/0x10
[47571.945683] task_work_run+0x90/0xc0
[47571.949884] exit_to_usermode_loop+0x9f/0xb0
[47571.954855] do_syscall_64+0xfa/0x110
[47571.959151] entry_SYSCALL_64_after_hwframe+0x49/0xbe
3. after enable the rcu trace, we don't see rcu_quiescent_state_report trace in a long time, we see rcu_callback: rcu_preempt will never response with the rcu_invoke_callback.
[47572.040668] ps-12388 1d..1 47566097572us : rcu_grace_period: rcu_preempt 23716088 AccWaitCB
[47572.040707] ps-12388 1d... 47566097621us : rcu_callback: rcu_preempt rhp=00000000783a728b func=file_free_rcu 4354/82824
[47572.040734] ps-12388 1d..1 47566097622us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf
[47572.040756] ps-12388 1d..1 47566097623us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Prestarted
[47572.040778] ps-12388 1d..1 47566097623us : rcu_grace_period: rcu_preempt 23716088 AccWaitCB
[47572.040802] ps-12388 1d... 47566097674us : rcu_callback: rcu_preempt rhp=0000000042c76521 func=file_free_rcu 4354/82825
[47572.040824] ps-12388 1d..1 47566097676us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf
[47572.040847] ps-12388 1d..1 47566097676us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Prestarted
[47572.040868] ps-12388 1d..1 47566097676us : rcu_grace_period: rcu_preempt 23716088 AccWaitCB
[47572.040895] ps-12388 1d..1 47566097716us : rcu_callback: rcu_preempt rhp=000000005e40fde2 func=avc_node_free 4354/82826
[47572.040919] ps-12388 1d..1 47566097735us : rcu_callback: rcu_preempt rhp=00000000f80fe353 func=avc_node_free 4354/82827
[47572.040943] ps-12388 1d..1 47566097758us : rcu_callback: rcu_preempt rhp=000000007486f400 func=avc_node_free 4354/82828
[47572.040967] ps-12388 1d..1 47566097760us : rcu_callback: rcu_preempt rhp=00000000b87872a8 func=avc_node_free 4354/82829
[47572.040990] ps-12388 1d... 47566097789us : rcu_callback: rcu_preempt rhp=000000008c656343 func=file_free_rcu 4354/82830
[47572.041013] ps-12388 1d..1 47566097790us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf
[47572.041036] ps-12388 1d..1 47566097790us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Prestarted
[47572.041057] ps-12388 1d..1 47566097791us : rcu_grace_period: rcu_preempt 23716088 AccWaitCB
[47572.041081] ps-12388 1d... 47566097871us : rcu_callback: rcu_preempt rhp=000000007e6c898c func=file_free_rcu 4354/82831
[47572.041103] ps-12388 1d..1 47566097872us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf
[47572.041126] ps-12388 1d..1 47566097872us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Prestarted
[47572.041147] ps-12388 1d..1 47566097873us : rcu_grace_period: rcu_preempt 23716088 AccWaitCB
[47572.041170] ps-12388 1d... 47566097945us : rcu_callback: rcu_preempt rhp=0000000032f4f174 func=file_free_rcu 4354/82832
[47572.041193] ps-12388 1d..1 47566097946us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf

Do you have any suggestions to debug the issue?

2018-11-29 13:09:13

by Paul E. McKenney

[permalink] [raw]

Subject: Re: rcu_preempt caused oom

On Thu, Nov 29, 2018 at 08:49:35AM +0000, He, Bo wrote:
> Hi,
> we test on kernel 4.19.0 on android, after run more than 24 Hours monkey stress test, we see OOM on 1/10 2G memory board, the issue is not seen on the 4.14 kernel.
> we have done some debugs:
> 1. OOM is due to the filp consume too many memory: 300M vs 2G board.
> 2. with the 120s hung task detect, most of the tasks will block at __wait_rcu_gp: wait_for_completion(&rs_array[i].completion);
> [47571.863839] Kernel panic - not syncing: hung_task: blocked tasks
> [47571.875446] CPU: 1 PID: 13626 Comm: FinalizerDaemon Tainted: G U O 4.19.0-quilt-2e5dc0ac-gf3f313245eb6 #1
> [47571.887603] Call Trace:
> [47571.890547] dump_stack+0x70/0xa5
> [47571.894456] panic+0xe3/0x241
> [47571.897977] ? wait_for_completion_timeout+0x72/0x1b0
> [47571.903830] __wait_rcu_gp+0x17b/0x180
> [47571.908226] synchronize_rcu.part.76+0x38/0x50
> [47571.913393] ? __call_rcu.constprop.79+0x3a0/0x3a0
> [47571.918948] ? __bpf_trace_rcu_invoke_callback+0x10/0x10
> [47571.925094] synchronize_rcu+0x43/0x50
> [47571.929487] evdev_detach_client+0x59/0x60
> [47571.934264] evdev_release+0x4e/0xd0
> [47571.938464] __fput+0xfa/0x1f0
> [47571.942072] ____fput+0xe/0x10
> [47571.945683] task_work_run+0x90/0xc0
> [47571.949884] exit_to_usermode_loop+0x9f/0xb0
> [47571.954855] do_syscall_64+0xfa/0x110
> [47571.959151] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 3. after enable the rcu trace, we don't see rcu_quiescent_state_report trace in a long time, we see rcu_callback: rcu_preempt will never response with the rcu_invoke_callback.
> [47572.040668] ps-12388 1d..1 47566097572us : rcu_grace_period: rcu_preempt 23716088 AccWaitCB
> [47572.040707] ps-12388 1d... 47566097621us : rcu_callback: rcu_preempt rhp=00000000783a728b func=file_free_rcu 4354/82824
> [47572.040734] ps-12388 1d..1 47566097622us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf
> [47572.040756] ps-12388 1d..1 47566097623us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Prestarted
> [47572.040778] ps-12388 1d..1 47566097623us : rcu_grace_period: rcu_preempt 23716088 AccWaitCB
> [47572.040802] ps-12388 1d... 47566097674us : rcu_callback: rcu_preempt rhp=0000000042c76521 func=file_free_rcu 4354/82825
> [47572.040824] ps-12388 1d..1 47566097676us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf
> [47572.040847] ps-12388 1d..1 47566097676us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Prestarted
> [47572.040868] ps-12388 1d..1 47566097676us : rcu_grace_period: rcu_preempt 23716088 AccWaitCB
> [47572.040895] ps-12388 1d..1 47566097716us : rcu_callback: rcu_preempt rhp=000000005e40fde2 func=avc_node_free 4354/82826
> [47572.040919] ps-12388 1d..1 47566097735us : rcu_callback: rcu_preempt rhp=00000000f80fe353 func=avc_node_free 4354/82827
> [47572.040943] ps-12388 1d..1 47566097758us : rcu_callback: rcu_preempt rhp=000000007486f400 func=avc_node_free 4354/82828
> [47572.040967] ps-12388 1d..1 47566097760us : rcu_callback: rcu_preempt rhp=00000000b87872a8 func=avc_node_free 4354/82829
> [47572.040990] ps-12388 1d... 47566097789us : rcu_callback: rcu_preempt rhp=000000008c656343 func=file_free_rcu 4354/82830
> [47572.041013] ps-12388 1d..1 47566097790us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf
> [47572.041036] ps-12388 1d..1 47566097790us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Prestarted
> [47572.041057] ps-12388 1d..1 47566097791us : rcu_grace_period: rcu_preempt 23716088 AccWaitCB
> [47572.041081] ps-12388 1d... 47566097871us : rcu_callback: rcu_preempt rhp=000000007e6c898c func=file_free_rcu 4354/82831
> [47572.041103] ps-12388 1d..1 47566097872us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf
> [47572.041126] ps-12388 1d..1 47566097872us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Prestarted
> [47572.041147] ps-12388 1d..1 47566097873us : rcu_grace_period: rcu_preempt 23716088 AccWaitCB
> [47572.041170] ps-12388 1d... 47566097945us : rcu_callback: rcu_preempt rhp=0000000032f4f174 func=file_free_rcu 4354/82832
> [47572.041193] ps-12388 1d..1 47566097946us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf
>
> Do you have any suggestions to debug the issue?

If you do not already have CONFIG_RCU_BOOST=y set, could you please
rebuild with that?

Could you also please send your .config file?

Thanx, Paul

2018-11-29 14:28:13

by Paul E. McKenney

[permalink] [raw]

Subject: Re: rcu_preempt caused oom

On Thu, Nov 29, 2018 at 05:06:47AM -0800, Paul E. McKenney wrote:
> On Thu, Nov 29, 2018 at 08:49:35AM +0000, He, Bo wrote:
> > Hi,
> > we test on kernel 4.19.0 on android, after run more than 24 Hours monkey stress test, we see OOM on 1/10 2G memory board, the issue is not seen on the 4.14 kernel.
> > we have done some debugs:
> > 1. OOM is due to the filp consume too many memory: 300M vs 2G board.
> > 2. with the 120s hung task detect, most of the tasks will block at __wait_rcu_gp: wait_for_completion(&rs_array[i].completion);

Did you did see any RCU CPU stall warnings? Or have those been disabled?
If they have been disabled, could you please rerun with them enabled?

> > [47571.863839] Kernel panic - not syncing: hung_task: blocked tasks
> > [47571.875446] CPU: 1 PID: 13626 Comm: FinalizerDaemon Tainted: G U O 4.19.0-quilt-2e5dc0ac-gf3f313245eb6 #1
> > [47571.887603] Call Trace:
> > [47571.890547] dump_stack+0x70/0xa5
> > [47571.894456] panic+0xe3/0x241
> > [47571.897977] ? wait_for_completion_timeout+0x72/0x1b0
> > [47571.903830] __wait_rcu_gp+0x17b/0x180
> > [47571.908226] synchronize_rcu.part.76+0x38/0x50
> > [47571.913393] ? __call_rcu.constprop.79+0x3a0/0x3a0
> > [47571.918948] ? __bpf_trace_rcu_invoke_callback+0x10/0x10
> > [47571.925094] synchronize_rcu+0x43/0x50
> > [47571.929487] evdev_detach_client+0x59/0x60
> > [47571.934264] evdev_release+0x4e/0xd0
> > [47571.938464] __fput+0xfa/0x1f0
> > [47571.942072] ____fput+0xe/0x10
> > [47571.945683] task_work_run+0x90/0xc0
> > [47571.949884] exit_to_usermode_loop+0x9f/0xb0
> > [47571.954855] do_syscall_64+0xfa/0x110
> > [47571.959151] entry_SYSCALL_64_after_hwframe+0x49/0xbe

This is indeed a task waiting on synchronize_rcu().

> > 3. after enable the rcu trace, we don't see rcu_quiescent_state_report trace in a long time, we see rcu_callback: rcu_preempt will never response with the rcu_invoke_callback.
> > [47572.040668] ps-12388 1d..1 47566097572us : rcu_grace_period: rcu_preempt 23716088 AccWaitCB
> > [47572.040707] ps-12388 1d... 47566097621us : rcu_callback: rcu_preempt rhp=00000000783a728b func=file_free_rcu 4354/82824
> > [47572.040734] ps-12388 1d..1 47566097622us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf
> > [47572.040756] ps-12388 1d..1 47566097623us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Prestarted
> > [47572.040778] ps-12388 1d..1 47566097623us : rcu_grace_period: rcu_preempt 23716088 AccWaitCB
> > [47572.040802] ps-12388 1d... 47566097674us : rcu_callback: rcu_preempt rhp=0000000042c76521 func=file_free_rcu 4354/82825
> > [47572.040824] ps-12388 1d..1 47566097676us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf
> > [47572.040847] ps-12388 1d..1 47566097676us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Prestarted
> > [47572.040868] ps-12388 1d..1 47566097676us : rcu_grace_period: rcu_preempt 23716088 AccWaitCB
> > [47572.040895] ps-12388 1d..1 47566097716us : rcu_callback: rcu_preempt rhp=000000005e40fde2 func=avc_node_free 4354/82826
> > [47572.040919] ps-12388 1d..1 47566097735us : rcu_callback: rcu_preempt rhp=00000000f80fe353 func=avc_node_free 4354/82827
> > [47572.040943] ps-12388 1d..1 47566097758us : rcu_callback: rcu_preempt rhp=000000007486f400 func=avc_node_free 4354/82828
> > [47572.040967] ps-12388 1d..1 47566097760us : rcu_callback: rcu_preempt rhp=00000000b87872a8 func=avc_node_free 4354/82829
> > [47572.040990] ps-12388 1d... 47566097789us : rcu_callback: rcu_preempt rhp=000000008c656343 func=file_free_rcu 4354/82830
> > [47572.041013] ps-12388 1d..1 47566097790us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf
> > [47572.041036] ps-12388 1d..1 47566097790us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Prestarted
> > [47572.041057] ps-12388 1d..1 47566097791us : rcu_grace_period: rcu_preempt 23716088 AccWaitCB
> > [47572.041081] ps-12388 1d... 47566097871us : rcu_callback: rcu_preempt rhp=000000007e6c898c func=file_free_rcu 4354/82831
> > [47572.041103] ps-12388 1d..1 47566097872us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf
> > [47572.041126] ps-12388 1d..1 47566097872us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Prestarted
> > [47572.041147] ps-12388 1d..1 47566097873us : rcu_grace_period: rcu_preempt 23716088 AccWaitCB
> > [47572.041170] ps-12388 1d... 47566097945us : rcu_callback: rcu_preempt rhp=0000000032f4f174 func=file_free_rcu 4354/82832
> > [47572.041193] ps-12388 1d..1 47566097946us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf

Callbacks are being queued and future grace periods to handle them are being
requested, but as you say, no progress on the current grace period.

Is it possible to start the trace earlier?

> > Do you have any suggestions to debug the issue?
>
> If you do not already have CONFIG_RCU_BOOST=y set, could you please
> rebuild with that?
>
> Could you also please send your .config file?

So, to summarize:

1. If you don't have RCU CPU stall warnings enabled,
please enable them. For example, please remove
rcupdate.rcu_cpu_stall_suppress from the kernel boot
parameters if it is there.

Getting an RCU CPU stall warning would be extremely
helpful. It contains many useful diagnostics.

2. If possible, please start the trace before the last
grace period starts.

3. If CONFIG_RCU_BOOST=y is not set, please try setting it.

4. Please send me your .config file.

Thanx, Paul

2018-11-30 08:06:25

On Fri, Dec 14, 2018 at 01:30:04AM +0000, He, Bo wrote:
> as you mentioned CONFIG_FAST_NO_HZ, do you mean CONFIG_RCU_FAST_NO_HZ? I double checked there is no FAST_NO_HZ in .config:

Yes, you are correct, CONFIG_RCU_FAST_NO_HZ. OK, you do not have it set,
which means several code paths can be ignored. Also CONFIG_HZ=1000, so
300 second delay.

Thanx, Paul

> Here is the grep from .config:
> egrep "HZ|RCU" .config
> CONFIG_NO_HZ_COMMON=y
> # CONFIG_HZ_PERIODIC is not set
> CONFIG_NO_HZ_IDLE=y
> # CONFIG_NO_HZ_FULL is not set
> CONFIG_NO_HZ=y
> # RCU Subsystem
> CONFIG_PREEMPT_RCU=y
> # CONFIG_RCU_EXPERT is not set
> CONFIG_SRCU=y
> CONFIG_TREE_SRCU=y
> CONFIG_TASKS_RCU=y
> CONFIG_RCU_STALL_COMMON=y
> CONFIG_RCU_NEED_SEGCBLIST=y
> # CONFIG_HZ_100 is not set
> # CONFIG_HZ_250 is not set
> # CONFIG_HZ_300 is not set
> CONFIG_HZ_1000=y
> CONFIG_HZ=1000
> # CONFIG_MACHZ_WDT is not set
> # RCU Debugging
> CONFIG_PROVE_RCU=y
> CONFIG_RCU_PERF_TEST=m
> CONFIG_RCU_TORTURE_TEST=m
> CONFIG_RCU_CPU_STALL_TIMEOUT=7
> CONFIG_RCU_TRACE=y
> CONFIG_RCU_EQS_DEBUG=y
>
> -----Original Message-----
> From: Paul E. McKenney <[email protected]>
> Sent: Friday, December 14, 2018 2:12 AM
> To: He, Bo <[email protected]>
> Cc: Zhang, Jun <[email protected]>; Steven Rostedt <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; Xiao, Jin <[email protected]>; Zhang, Yanmin <[email protected]>; Bai, Jie A <[email protected]>; Sun, Yi J <[email protected]>
> Subject: Re: rcu_preempt caused oom
>
> On Thu, Dec 13, 2018 at 03:26:08PM +0000, He, Bo wrote:
> > one of the board reproduce the issue with the show_rcu_gp_kthreads(), I also enclosed the logs as attachment.
> >
> > [17818.936032] rcu: rcu_preempt: wait state: RCU_GP_WAIT_GPS(1) ->state: 0x402 delta ->gp_activity 308257 ->gp_req_activity 308256 ->gp_wake_time 308258 ->gp_wake_seq 21808189 ->gp_seq 21808192 ->gp_seq_needed 21808196 ->gp_flags 0x1
>
> This is quite helpful, thank you!
>
> The "RCU lockdep checking is enabled" says that CONFIG_PROVE_RCU=y, which is good. The "RCU_GP_WAIT_GPS(1)" means that the rcu_preempt task is waiting for a new grace-period request. The "->state: 0x402" means that it is sleeping, neither running nor in the process of waking up.
> The "delta ->gp_activity 308257 ->gp_req_activity 308256 ->gp_wake_time 308258" means that it has been more than 300,000 jiffies since the rcu_preempt task did anything or was requested to do anything.
>
> The "->gp_wake_seq 21808189 ->gp_seq 21808192" says that the last attempt to awaken the rcu_preempt task happened during the last grace period.
> The "->gp_seq_needed 21808196 ->gp_flags 0x1" nevertheless says that someone requested a new grace period. So if the rcu_preempt task were to wake up, it would process the new grace period. Note again also the ->gp_req_activity 308256, which indicates that ->gp_flags was set more than 300,000 jiffies ago, just after the last recorded activity of the rcu_preempt task.
>
> But this is exactly the situation that rcu_check_gp_start_stall() is designed to warn about (and does warn about for me when I comment out the wakeup code). So why is rcu_check_gp_start_stall() not being called? Here are a couple of possibilities:
>
> 1. Because rcu_check_gp_start_stall() is only ever invoked from
> RCU_SOFTIRQ, it is possible that softirqs are stalled for
> whatever reason.
>
> 2. Because RCU_SOFTIRQ is invoked primarily from the scheduler-clock
> interrupt handler, it is possible that the scheduler tick has
> somehow been disabled. Traces from earlier runs showed a great
> deal of RCU callbacks queued, which would have caused RCU to
> refuse to allow the scheduler tick to be disabled, even if the
> corresponding CPU was idle.
>
> 3. You have CONFIG_FAST_NO_HZ=y (which you probably do, given
> that you are building for a battery-powered device) and all of the
> CPU's callbacks are lazy. Except that your earlier traces showed
> lots of non-lazy callbacks. Besides, even if all callbacks were
> lazy, there would still be a scheduling-clock interrupt every
> six seconds, and there are quite a few six-second intervals
> in a two-minute watchdog timeout.
>
> But if we cannot find the problem quickly, I will likely ask
> you to try reproducing with CONFIG_FAST_NO_HZ=n. This could
> be thought of as bisecting the RCU code looking for the bug.
>
> The first two of these seem unlikely given that the watchdog timer was still firing. Still, I don't see how 300,000 jiffies elapsed with a grace period requested and not started otherwise. Could you please check?
> One way to do so would be to enable ftrace on rcu_check_callbacks(), __rcu_process_callbacks(), and rcu_check_gp_start_stall(). It might be necessary to no-inline rcu_check_gp_start_stall(). You might have better ways to collect this information.
>
> Without this information, the only workaround patch I can give you will degrade battery lifetime, which might not be what you want.
>
> You do have a lockdep complaint early at boot. Although I don't immediately see how this self-deadlock would affect RCU, please do get it fixed. Sometimes the consequences of this sort of deadlock can propagate to unexepected places.
>
> Regardless of why rcu_check_gp_start_stall() failed to complain, it looks like this was set after the rcu_preempt task slept for the last time, and so there should have been a wakeup the last time that ->gp_flags was set. Perhaps there is some code path that drops the wakeup.
> I did check this in current -rcu, but you are instead running v4.19, so I should also check there.
>
> The ->gp_flags has its RCU_GP_FLAG_INIT bit set in rcu_start_this_gp() and in rcu_gp_cleanup(). We can eliminate rcu_gp_cleanup() from consideration because only the rcu_preempt task will execute that code, and we know that this task was asleep at the last time this bit was set.
> Now rcu_start_this_gp() returns a flag indicating whether or not a wakeup is needed, and the caller must do the wakeup once it is safe to do so, that is, after the various rcu_node locks have been released (doing a wakeup while holding any of those locks results in deadlock).
>
> The following functions invoke rcu_start_this_gp: rcu_accelerate_cbs() and rcu_nocb_wait_gp(). We can eliminate rcu_nocb_wait_gp() because you are building with CONFIG_RCU_NOCB_CPU=n. Then rcu_accelerate_cbs() is invoked from:
>
> o rcu_accelerate_cbs_unlocked(), which does the following, thus
> properly awakening the rcu_preempt task when needed:
>
> needwake = rcu_accelerate_cbs(rsp, rnp, rdp);
> raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
> if (needwake)
> rcu_gp_kthread_wake(rsp);
>
> o rcu_advance_cbs(), which returns the value returned by
> rcu_accelerate_cbs(), thus pushing the problem off to its
> callers, which are called out below.
>
> o __note_gp_changes(), which also returns the value returned by
> rcu_accelerate_cbs(), thus pushing the problem off to its callers,
> which are called out below.
>
> o rcu_gp_cleanup(), which is only ever invoked by RCU grace-period
> kthreads such as the rcu_preempt task. Therefore, this function
> never needs to awaken the rcu_preempt task, because the fact
> that this function is executing means that this task is already
> awake. (Also, as noted above, we can eliminate this code from
> consideration because this task is known to have been sleeping
> at the last time that the RCU_GP_FLAG_INIT bit was set.)
>
> o rcu_report_qs_rdp(), which does the following, thus properly
> awakening the rcu_preempt task when needed:
>
> needwake = rcu_accelerate_cbs(rsp, rnp, rdp);
>
> rcu_report_qs_rnp(mask, rsp, rnp, rnp->gp_seq, flags);
> /* ^^^ Released rnp->lock */
> if (needwake)
> rcu_gp_kthread_wake(rsp);
>
> o rcu_prepare_for_idle(), which does the following, thus properly
> awakening the rcu_preempt task when needed:
>
> needwake = rcu_accelerate_cbs(rsp, rnp, rdp);
> raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
> if (needwake)
> rcu_gp_kthread_wake(rsp);
>
> Now for rcu_advance_cbs():
>
> o __note_gp_changes(), which which also returns the value returned
> by rcu_advance_cbs(), thus pushing the problem off to its callers,
> which are called out below.
>
> o rcu_migrate_callbacks(), which does the following, thus properly
> awakening the rcu_preempt task when needed:
>
> needwake = rcu_advance_cbs(rsp, rnp_root, rdp) ||
> rcu_advance_cbs(rsp, rnp_root, my_rdp);
> rcu_segcblist_merge(&my_rdp->cblist, &rdp->cblist);
> WARN_ON_ONCE(rcu_segcblist_empty(&my_rdp->cblist) !=
> !rcu_segcblist_n_cbs(&my_rdp->cblist));
> raw_spin_unlock_irqrestore_rcu_node(rnp_root, flags);
> if (needwake)
> rcu_gp_kthread_wake(rsp);
>
> Now for __note_gp_changes():
>
> o note_gp_changes(), which does the following, thus properly
> awakening the rcu_preempt task when needed:
>
> needwake = __note_gp_changes(rsp, rnp, rdp);
> raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> if (needwake)
> rcu_gp_kthread_wake(rsp);
>
> o rcu_gp_init() which is only ever invoked by RCU grace-period
> kthreads such as the rcu_preempt task, which makes wakeups
> unnecessary, just as for rcu_gp_cleanup() above.
>
> o rcu_gp_cleanup(), ditto.
>
> So I am not seeing how I am losing a wakeup, but please do feel free to double-check my analysis. One way to do that is using event tracing.
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
> lockdep complaint:
> ------------------------------------------------------------------------
>
> [ 2.895507] ======================================================
> [ 2.895511] WARNING: possible circular locking dependency detected
> [ 2.895517] 4.19.5-quilt-2e5dc0ac-g4d59bbd0fd1a #1 Tainted: G U
> [ 2.895521] ------------------------------------------------------
> [ 2.895525] earlyEvs/1839 is trying to acquire lock:
> [ 2.895530] 00000000ff344115 (&asd->mutex){+.+.}, at: ipu_isys_subdev_get_ffmt+0x32/0x90
> [ 2.895546]
> [ 2.895546] but task is already holding lock:
> [ 2.895550] 0000000069562e72 (&mdev->graph_mutex){+.+.}, at: media_pipeline_start+0x28/0x50
> [ 2.895561]
> [ 2.895561] which lock already depends on the new lock.
> [ 2.895561]
> [ 2.895566]
> [ 2.895566] the existing dependency chain (in reverse order) is:
> [ 2.895570]
> [ 2.895570] -> #1 (&mdev->graph_mutex){+.+.}:
> [ 2.895583] __mutex_lock+0x80/0x9a0
> [ 2.895588] mutex_lock_nested+0x1b/0x20
> [ 2.895593] media_device_register_entity+0x92/0x1e0
> [ 2.895598] v4l2_device_register_subdev+0xc2/0x1b0
> [ 2.895604] ipu_isys_csi2_init+0x22c/0x520
> [ 2.895608] isys_probe+0x6cb/0xed0
> [ 2.895613] ipu_bus_probe+0xfd/0x2e0
> [ 2.895620] really_probe+0x268/0x3d0
> [ 2.895625] driver_probe_device+0x11a/0x130
> [ 2.895630] __device_attach_driver+0x86/0x100
> [ 2.895635] bus_for_each_drv+0x6e/0xb0
> [ 2.895640] __device_attach+0xdf/0x160
> [ 2.895645] device_initial_probe+0x13/0x20
> [ 2.895650] bus_probe_device+0xa6/0xc0
> [ 2.895655] deferred_probe_work_func+0x88/0xe0
> [ 2.895661] process_one_work+0x220/0x5c0
> [ 2.895665] worker_thread+0x1da/0x3b0
> [ 2.895670] kthread+0x12c/0x150
> [ 2.895675] ret_from_fork+0x3a/0x50
> [ 2.895678]
> [ 2.895678] -> #0 (&asd->mutex){+.+.}:
> [ 2.895688] lock_acquire+0x95/0x1a0
> [ 2.895693] __mutex_lock+0x80/0x9a0
> [ 2.895698] mutex_lock_nested+0x1b/0x20
> [ 2.895703] ipu_isys_subdev_get_ffmt+0x32/0x90
> [ 2.895708] ipu_isys_csi2_get_fmt+0x14/0x30
> [ 2.895713] v4l2_subdev_link_validate_get_format.isra.6+0x52/0x80
> [ 2.895718] v4l2_subdev_link_validate_one+0x67/0x120
> [ 2.895723] v4l2_subdev_link_validate+0x246/0x490
> [ 2.895728] csi2_link_validate+0xc6/0x220
> [ 2.895733] __media_pipeline_start+0x15b/0x2f0
> [ 2.895738] media_pipeline_start+0x33/0x50
> [ 2.895743] ipu_isys_video_prepare_streaming+0x1e0/0x610
> [ 2.895748] start_streaming+0x186/0x3a0
> [ 2.895753] vb2_start_streaming+0x6d/0x130
> [ 2.895758] vb2_core_streamon+0x108/0x140
> [ 2.895762] vb2_streamon+0x29/0x50
> [ 2.895767] vb2_ioctl_streamon+0x42/0x50
> [ 2.895772] v4l_streamon+0x20/0x30
> [ 2.895776] __video_do_ioctl+0x1af/0x3c0
> [ 2.895781] video_usercopy+0x27e/0x7e0
> [ 2.895785] video_ioctl2+0x15/0x20
> [ 2.895789] v4l2_ioctl+0x49/0x50
> [ 2.895794] do_video_ioctl+0x93c/0x2360
> [ 2.895799] v4l2_compat_ioctl32+0x93/0xe0
> [ 2.895806] __ia32_compat_sys_ioctl+0x73a/0x1c90
> [ 2.895813] do_fast_syscall_32+0x9a/0x2d6
> [ 2.895818] entry_SYSENTER_compat+0x6d/0x7c
> [ 2.895821]
> [ 2.895821] other info that might help us debug this:
> [ 2.895821]
> [ 2.895826] Possible unsafe locking scenario:
> [ 2.895826]
> [ 2.895830] CPU0 CPU1
> [ 2.895833] ---- ----
> [ 2.895836] lock(&mdev->graph_mutex);
> [ 2.895842] lock(&asd->mutex);
> [ 2.895847] lock(&mdev->graph_mutex);
> [ 2.895852] lock(&asd->mutex);
> [ 2.895857]
> [ 2.895857] *** DEADLOCK ***
> [ 2.895857]
> [ 2.895863] 3 locks held by earlyEvs/1839:
> [ 2.895866] #0: 00000000ed860090 (&av->mutex){+.+.}, at: __video_do_ioctl+0xbf/0x3c0
> [ 2.895876] #1: 000000000cb253e7 (&isys->stream_mutex){+.+.}, at: start_streaming+0x5c/0x3a0
> [ 2.895886] #2: 0000000069562e72 (&mdev->graph_mutex){+.+.}, at: media_pipeline_start+0x28/0x50
> [ 2.895896]
> [ 2.895896] stack backtrace:
> [ 2.895903] CPU: 0 PID: 1839 Comm: earlyEvs Tainted: G U 4.19.5-quilt-2e5dc0ac-g4d59bbd0fd1a #1
> [ 2.895907] Call Trace:
> [ 2.895915] dump_stack+0x70/0xa5
> [ 2.895921] print_circular_bug.isra.35+0x1d8/0x1e6
> [ 2.895927] __lock_acquire+0x1284/0x1340
> [ 2.895931] ? __lock_acquire+0x2b5/0x1340
> [ 2.895940] lock_acquire+0x95/0x1a0
> [ 2.895945] ? lock_acquire+0x95/0x1a0
> [ 2.895950] ? ipu_isys_subdev_get_ffmt+0x32/0x90
> [ 2.895956] ? ipu_isys_subdev_get_ffmt+0x32/0x90
> [ 2.895961] __mutex_lock+0x80/0x9a0
> [ 2.895966] ? ipu_isys_subdev_get_ffmt+0x32/0x90
> [ 2.895971] ? crlmodule_get_format+0x43/0x50
> [ 2.895979] mutex_lock_nested+0x1b/0x20
> [ 2.895984] ? mutex_lock_nested+0x1b/0x20
> [ 2.895989] ipu_isys_subdev_get_ffmt+0x32/0x90
> [ 2.895995] ipu_isys_csi2_get_fmt+0x14/0x30
> [ 2.896001] v4l2_subdev_link_validate_get_format.isra.6+0x52/0x80
> [ 2.896006] v4l2_subdev_link_validate_one+0x67/0x120
> [ 2.896011] ? crlmodule_get_format+0x2a/0x50
> [ 2.896018] ? find_held_lock+0x35/0xa0
> [ 2.896023] ? crlmodule_get_format+0x43/0x50
> [ 2.896030] v4l2_subdev_link_validate+0x246/0x490
> [ 2.896035] ? __mutex_unlock_slowpath+0x58/0x2f0
> [ 2.896042] ? mutex_unlock+0x12/0x20
> [ 2.896046] ? crlmodule_get_format+0x43/0x50
> [ 2.896052] ? v4l2_subdev_link_validate_get_format.isra.6+0x52/0x80
> [ 2.896057] ? v4l2_subdev_link_validate_one+0x67/0x120
> [ 2.896065] ? __is_insn_slot_addr+0xad/0x120
> [ 2.896070] ? kernel_text_address+0xc4/0x100
> [ 2.896078] ? v4l2_subdev_link_validate+0x246/0x490
> [ 2.896085] ? kernel_text_address+0xc4/0x100
> [ 2.896092] ? __lock_acquire+0x1106/0x1340
> [ 2.896096] ? __lock_acquire+0x1169/0x1340
> [ 2.896103] csi2_link_validate+0xc6/0x220
> [ 2.896110] ? __lock_is_held+0x5a/0xa0
> [ 2.896115] ? mark_held_locks+0x58/0x80
> [ 2.896122] ? __kmalloc+0x207/0x2e0
> [ 2.896127] ? __lock_is_held+0x5a/0xa0
> [ 2.896134] ? rcu_read_lock_sched_held+0x81/0x90
> [ 2.896139] ? __kmalloc+0x2a3/0x2e0
> [ 2.896144] ? media_pipeline_start+0x28/0x50
> [ 2.896150] ? __media_entity_enum_init+0x33/0x70
> [ 2.896155] ? csi2_has_route+0x18/0x20
> [ 2.896160] ? media_graph_walk_next.part.9+0xac/0x290
> [ 2.896166] __media_pipeline_start+0x15b/0x2f0
> [ 2.896173] ? rcu_read_lock_sched_held+0x81/0x90
> [ 2.896179] media_pipeline_start+0x33/0x50
> [ 2.896186] ipu_isys_video_prepare_streaming+0x1e0/0x610
> [ 2.896191] ? __lock_acquire+0x132e/0x1340
> [ 2.896198] ? __lock_acquire+0x2b5/0x1340
> [ 2.896204] ? lock_acquire+0x95/0x1a0
> [ 2.896209] ? start_streaming+0x5c/0x3a0
> [ 2.896215] ? start_streaming+0x5c/0x3a0
> [ 2.896221] ? __mutex_lock+0x391/0x9a0
> [ 2.896226] ? v4l_enable_media_source+0x2d/0x70
> [ 2.896233] ? find_held_lock+0x35/0xa0
> [ 2.896238] ? v4l_enable_media_source+0x57/0x70
> [ 2.896245] start_streaming+0x186/0x3a0
> [ 2.896250] ? __mutex_unlock_slowpath+0x58/0x2f0
> [ 2.896257] vb2_start_streaming+0x6d/0x130
> [ 2.896262] ? vb2_start_streaming+0x6d/0x130
> [ 2.896267] vb2_core_streamon+0x108/0x140
> [ 2.896273] vb2_streamon+0x29/0x50
> [ 2.896278] vb2_ioctl_streamon+0x42/0x50
> [ 2.896284] v4l_streamon+0x20/0x30
> [ 2.896288] __video_do_ioctl+0x1af/0x3c0
> [ 2.896296] ? __might_fault+0x85/0x90
> [ 2.896302] video_usercopy+0x27e/0x7e0
> [ 2.896307] ? copy_overflow+0x20/0x20
> [ 2.896313] ? find_held_lock+0x35/0xa0
> [ 2.896319] ? __might_fault+0x3e/0x90
> [ 2.896325] video_ioctl2+0x15/0x20
> [ 2.896330] v4l2_ioctl+0x49/0x50
> [ 2.896335] do_video_ioctl+0x93c/0x2360
> [ 2.896343] v4l2_compat_ioctl32+0x93/0xe0
> [ 2.896349] __ia32_compat_sys_ioctl+0x73a/0x1c90
> [ 2.896354] ? lockdep_hardirqs_on+0xef/0x180
> [ 2.896359] ? do_fast_syscall_32+0x3b/0x2d6
> [ 2.896364] do_fast_syscall_32+0x9a/0x2d6
> [ 2.896370] entry_SYSENTER_compat+0x6d/0x7c
> [ 2.896377] RIP: 0023:0xf7e79b79
> [ 2.896382] Code: 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 0c 24 c3 8b 1c 24 c3 90 90 90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
> [ 2.896387] RSP: 002b:00000000f76816bc EFLAGS: 00000292 ORIG_RAX: 0000000000000036
> [ 2.896393] RAX: ffffffffffffffda RBX: 000000000000000e RCX: 0000000040045612
> [ 2.896396] RDX: 00000000f768172c RSI: 00000000f7d42d9c RDI: 00000000f768172c
> [ 2.896400] RBP: 00000000f7681708 R08: 0000000000000000 R09: 0000000000000000
> [ 2.896404] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [ 2.896408] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>
> ------------------------------------------------------------------------
>
> > [17818.936039] rcu: rcu_node 0:3 ->gp_seq 21808192 ->gp_seq_needed 21808196
> > [17818.936048] rcu: rcu_sched: wait state: RCU_GP_WAIT_GPS(1) ->state: 0x402 delta ->gp_activity 101730 ->gp_req_activity 101732 ->gp_wake_time 101730 ->gp_wake_seq 1357 - >gp_seq 1360 ->gp_seq_needed 1360 ->gp_flags 0x0
> > [17818.936056] rcu: rcu_bh: wait state: RCU_GP_WAIT_GPS(1) ->state: 0x402 delta ->gp_activity 4312486108 ->gp_req_activity 4312486108 ->gp_wake_time 4312486108 - >gp_wake_seq 0 ->gp_seq -1200 ->gp_seq_needed -1200 ->gp_flags 0x0
> >
> > -----Original Message-----
> > From: Paul E. McKenney <[email protected]>
> > Sent: Thursday, December 13, 2018 12:40 PM
> > To: Zhang, Jun <[email protected]>
> > Cc: He, Bo <[email protected]>; Steven Rostedt <[email protected]>;
> > [email protected]; [email protected];
> > [email protected]; [email protected]; Xiao, Jin
> > <[email protected]>; Zhang, Yanmin <[email protected]>; Bai, Jie
> > A <[email protected]>; Sun, Yi J <[email protected]>
> > Subject: Re: rcu_preempt caused oom
> >
> > On Thu, Dec 13, 2018 at 03:28:46AM +0000, Zhang, Jun wrote:
> > > Ok, we will test it, thanks!
> >
> > But please also try the sysrq-y with the earlier patch after a hang!
> >
> > Thanx, Paul
> >
> > > -----Original Message-----
> > > From: Paul E. McKenney [mailto:[email protected]]
> > > Sent: Thursday, December 13, 2018 10:43
> > > To: Zhang, Jun <[email protected]>
> > > Cc: He, Bo <[email protected]>; Steven Rostedt <[email protected]>;
> > > [email protected]; [email protected];
> > > [email protected]; [email protected]; Xiao, Jin
> > > <[email protected]>; Zhang, Yanmin <[email protected]>; Bai,
> > > Jie A <[email protected]>; Sun, Yi J <[email protected]>
> > > Subject: Re: rcu_preempt caused oom
> > >
> > > On Thu, Dec 13, 2018 at 02:11:35AM +0000, Zhang, Jun wrote:
> > > > Hello, Paul
> > > >
> > > > I think the next patch is better.
> > > > Because ULONG_CMP_GE could cause double write, which has risk that write back old value.
> > > > Please help review.
> > > > I don't test it. If you agree, we will test it.
> > >
> > > Just to make sure that I understand, you are worried about something like the following, correct?
> > >
> > > o __note_gp_changes() compares rnp->gp_seq_needed and rdp->gp_seq_needed
> > > and finds them equal.
> > >
> > > o At just this time something like rcu_start_this_gp() assigns a new
> > > (larger) value to rdp->gp_seq_needed.
> > >
> > > o Then __note_gp_changes() overwrites rdp->gp_seq_needed with the
> > > old value.
> > >
> > > This cannot happen because __note_gp_changes() runs with interrupts disabled on the CPU corresponding to the rcu_data structure referenced by the rdp pointer. So there is no way for rcu_start_this_gp() to be invoked on the same CPU during this "if" statement.
> > >
> > > Of course, there could be bugs. For example:
> > >
> > > o __note_gp_changes() might be called on a different CPU than that
> > > corresponding to rdp. You can check this with something like:
> > >
> > > WARN_ON_ONCE(rdp->cpu != smp_processor_id());
> > >
> > > o The same things could happen with rcu_start_this_gp(), and the
> > > above WARN_ON_ONCE() would work there as well.
> > >
> > > o rcutree_prepare_cpu() is a special case, but is irrelevant unless
> > > you are doing CPU-hotplug operations. (It can run on a CPU other
> > > than rdp->cpu, but only at times when rdp->cpu is offline.)
> > >
> > > o Interrupts might not really be disabled.
> > >
> > > That said, your patch could reduce overhead slightly, given that the two values will be equal much of the time. So it might be worth testing just for that reason.
> > >
> > > So why not just test it anyway? If it makes the bug go away, I will
> > > be surprised, but it would not be the first surprise for me. ;-)
> > >
> > > Thanx, Paul
> > >
> > > > Thanks!
> > > >
> > > >
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index
> > > > 0b760c1..c00f34e 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -1849,7 +1849,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp,
> > > > zero_cpu_stall_ticks(rdp);
> > > > }
> > > > rdp->gp_seq = rnp->gp_seq; /* Remember new grace-period state. */
> > > > - if (ULONG_CMP_GE(rnp->gp_seq_needed, rdp->gp_seq_needed) || rdp->gpwrap)
> > > > + if (ULONG_CMP_LT(rdp->gp_seq_needed, rnp->gp_seq_needed)
> > > > + ||
> > > > + rdp->gpwrap)
> > > > rdp->gp_seq_needed = rnp->gp_seq_needed;
> > > > WRITE_ONCE(rdp->gpwrap, false);
> > > > rcu_gpnum_ovf(rnp, rdp);
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Paul E. McKenney [mailto:[email protected]]
> > > > Sent: Thursday, December 13, 2018 08:12
> > > > To: He, Bo <[email protected]>
> > > > Cc: Steven Rostedt <[email protected]>;
> > > > [email protected]; [email protected];
> > > > [email protected]; [email protected]; Zhang, Jun
> > > > <[email protected]>; Xiao, Jin <[email protected]>; Zhang,
> > > > Yanmin <[email protected]>; Bai, Jie A <[email protected]>;
> > > > Sun, Yi J <[email protected]>
> > > > Subject: Re: rcu_preempt caused oom
> > > >
> > > > On Wed, Dec 12, 2018 at 11:13:22PM +0000, He, Bo wrote:
> > > > > I don't see the rcutree.sysrq_rcu parameter in v4.19 kernel, I also checked the latest kernel and the latest tag v4.20-rc6, not see the sysrq_rcu.
> > > > > Please correct me if I have something wrong.
> > > >
> > > > That would be because I sent you the wrong patch, apologies! :-/
> > > >
> > > > Please instead see the one below, which does add sysrq_rcu.
> > > >
> > > > Thanx, Paul
> > > >
> > > > > -----Original Message-----
> > > > > From: Paul E. McKenney <[email protected]>
> > > > > Sent: Thursday, December 13, 2018 5:03 AM
> > > > > To: He, Bo <[email protected]>
> > > > > Cc: Steven Rostedt <[email protected]>;
> > > > > [email protected]; [email protected];
> > > > > [email protected]; [email protected]; Zhang,
> > > > > Jun <[email protected]>; Xiao, Jin <[email protected]>;
> > > > > Zhang, Yanmin <[email protected]>; Bai, Jie A
> > > > > <[email protected]>
> > > > > Subject: Re: rcu_preempt caused oom
> > > > >
> > > > > On Wed, Dec 12, 2018 at 07:42:24AM -0800, Paul E. McKenney wrote:
> > > > > > On Wed, Dec 12, 2018 at 01:21:33PM +0000, He, Bo wrote:
> > > > > > > we reproduce on two boards, but I still not see the show_rcu_gp_kthreads() dump logs, it seems the patch can't catch the scenario.
> > > > > > > I double confirmed the CONFIG_PROVE_RCU=y is enabled in the config as it's extracted from the /proc/config.gz.
> > > > > >
> > > > > > Strange.
> > > > > >
> > > > > > Are the systems responsive to sysrq keys once failure occurs?
> > > > > > If so, I will provide you a sysrq-R or some such to dump out the RCU state.
> > > > >
> > > > > Or, as it turns out, sysrq-y if booting with rcutree.sysrq_rcu=1 using the patch below. Only lightly tested.
> > > >
> > > > ------------------------------------------------------------------
> > > > --
> > > > --
> > > > --
> > > >
> > > > commit 04b6245c8458e8725f4169e62912c1fadfdf8141
> > > > Author: Paul E. McKenney <[email protected]>
> > > > Date: Wed Dec 12 16:10:09 2018 -0800
> > > >
> > > > rcu: Add sysrq rcu_node-dump capability
> > > >
> > > > Backported from v4.21/v5.0
> > > >
> > > > Life is hard if RCU manages to get stuck without triggering RCU CPU
> > > > stall warnings or triggering the rcu_check_gp_start_stall() checks
> > > > for failing to start a grace period. This commit therefore adds a
> > > > boot-time-selectable sysrq key (commandeering "y") that allows manually
> > > > dumping Tree RCU state. The new rcutree.sysrq_rcu kernel boot parameter
> > > > must be set for this sysrq to be available.
> > > >
> > > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > >
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index
> > > > 0b760c1369f7..e9392a9d6291 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -61,6 +61,7 @@
> > > > #include <linux/trace_events.h>
> > > > #include <linux/suspend.h>
> > > > #include <linux/ftrace.h>
> > > > +#include <linux/sysrq.h>
> > > >
> > > > #include "tree.h"
> > > > #include "rcu.h"
> > > > @@ -128,6 +129,9 @@ int num_rcu_lvl[] = NUM_RCU_LVL_INIT; int
> > > > rcu_num_nodes __read_mostly = NUM_RCU_NODES; /* Total # rcu_nodes
> > > > in use. */
> > > > /* panic() on RCU Stall sysctl. */ int sysctl_panic_on_rcu_stall
> > > > __read_mostly;
> > > > +/* Commandeer a sysrq key to dump RCU's tree. */ static bool
> > > > +sysrq_rcu; module_param(sysrq_rcu, bool, 0444);
> > > >
> > > > /*
> > > > * The rcu_scheduler_active variable is initialized to the value
> > > > @@
> > > > -662,6 +666,27 @@ void show_rcu_gp_kthreads(void) }
> > > > EXPORT_SYMBOL_GPL(show_rcu_gp_kthreads);
> > > >
> > > > +/* Dump grace-period-request information due to commandeered sysrq.
> > > > +*/ static void sysrq_show_rcu(int key) {
> > > > + show_rcu_gp_kthreads();
> > > > +}
> > > > +
> > > > +static struct sysrq_key_op sysrq_rcudump_op = {
> > > > + .handler = sysrq_show_rcu,
> > > > + .help_msg = "show-rcu(y)",
> > > > + .action_msg = "Show RCU tree",
> > > > + .enable_mask = SYSRQ_ENABLE_DUMP, };
> > > > +
> > > > +static int __init rcu_sysrq_init(void) {
> > > > + if (sysrq_rcu)
> > > > + return register_sysrq_key('y', &sysrq_rcudump_op);
> > > > + return 0;
> > > > +}
> > > > +early_initcall(rcu_sysrq_init);
> > > > +
> > > > /*
> > > > * Send along grace-period-related data for rcutorture diagnostics.
> > > > */
> > > >
> > >
> >
>
>

2018-12-14 02:42:02

[permalink] [raw]

Subject: [tip:core/rcu] rcu: Prevent needless ->gp_seq_needed update in __note_gp_changes()

Commit-ID: 13dc7d0c7a2ed438f0ec8e9fb365a1256d87cf87
Gitweb: https://git.kernel.org/tip/13dc7d0c7a2ed438f0ec8e9fb365a1256d87cf87
Author: Zhang, Jun <[email protected]>
AuthorDate: Wed, 19 Dec 2018 10:37:34 -0800
Committer: Paul E. McKenney <[email protected]>
CommitDate: Fri, 25 Jan 2019 15:30:00 -0800

rcu: Prevent needless ->gp_seq_needed update in __note_gp_changes()

Currently, __note_gp_changes() checks to see if the rcu_node structure's
->gp_seq_needed is greater than or equal to that of the rcu_data
structure, and if so, updates the rcu_data structure's ->gp_seq_needed
field. This results in a useless store in the case where the two fields
are equal.

This commit therefore carries out this store only in the case where the
rcu_node structure's ->gp_seq_needed is strictly greater than that of
the rcu_data structure.

Signed-off-by: "Zhang, Jun" <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Link: https://lkml.kernel.org/r/88DC34334CA3444C85D647DBFA962C2735AD5F77@SHSMSX104.ccr.corp.intel.com
---
kernel/rcu/tree.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 21775eebb8f0..9d0e2ac9356e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1758,7 +1758,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
zero_cpu_stall_ticks(rdp);
}
rdp->gp_seq = rnp->gp_seq; /* Remember new grace-period state. */
- if (ULONG_CMP_GE(rnp->gp_seq_needed, rdp->gp_seq_needed) || rdp->gpwrap)
+ if (ULONG_CMP_LT(rdp->gp_seq_needed, rnp->gp_seq_needed) || rdp->gpwrap)
rdp->gp_seq_needed = rnp->gp_seq_needed;
WRITE_ONCE(rdp->gpwrap, false);
rcu_gpnum_ovf(rnp, rdp);