Received: by 2002:a25:683:0:0:0:0:0 with SMTP id 125csp1792637ybg; Thu, 4 Jun 2020 20:23:05 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwwNYsHu2gW8GeZn71x3AgAFeg8WO9+Q3Vr8JCbJlsxuvDGuS+Q11aV1UDVJXA+zi/fE2C7 X-Received: by 2002:aa7:ce13:: with SMTP id d19mr7470684edv.355.1591327385435; Thu, 04 Jun 2020 20:23:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1591327385; cv=none; d=google.com; s=arc-20160816; b=alVuNZ+LWP5z6DiYhg20WnRGa3+ccdTbn4/IBQllqMKbELcb6pSXESCjkQvjDfKev0 DZ+XskpwjtXNYHQEiS9g6PzxhGIaz7Oa3J96Y3PSDXszRxilmexVbI0mLv1zcN/ra8H9 gkXBu+zOB1zwse/WwNROp0bKXaK7O0SaSPAYD8l1nF/27XHVuxc2YVeEpRAZc2HDWZSX ABKMsIel8WKi6i+zJYgc9tM1Av3EwUXP6eAPXMAOAFcMts1Cyv9mf1/9xodLh+qLCyUC o0q8SvkHcJaPregO+baiV/KZFAZ0ShNFLyl+DC3ShmELXd1KqqRqITzPWWEmrzEc1KIs wBtw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=K1PJ1QsA0NRteyVhlBgFwaOn34Y2uXHRVuHh5u9pVsA=; b=k/R3chO3iYEQm7jupzSSTDz+y+lxwopismIPaMiaBNWyPTx2Eo78klZwcLvDFni2c2 r8QStt2rEimvvMnpIUhYQhwPyE51c9nzvZN/H6htB0t8dW+rM5apceqo4IoXus1yTfSD LCeq/R1HUDM07ALspjpLjFlC3Kmnp8A5ktldzqDLDbDn/FS7CzlHewky9QCSy3P5Qp+t m29M4fdxmYDXEsS7yWm1jEHywQvNn36vGu1IV0xPy2AdD2RrIbjRN5ccHDErm1ZgFtUa FrWk+ZPhJbaOMbqX0xL6QQByYubvLJFe3sY+hWK/LSR+xLX1aO+t6XV2hPdyhrkS/2NK PKSw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=nRXNpn+X; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v21si2946968edy.131.2020.06.04.20.22.42; Thu, 04 Jun 2020 20:23:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=nRXNpn+X; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726142AbgFEDUQ (ORCPT + 99 others); Thu, 4 Jun 2020 23:20:16 -0400 Received: from mail.kernel.org ([198.145.29.99]:53528 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725883AbgFEDUQ (ORCPT ); Thu, 4 Jun 2020 23:20:16 -0400 Received: from paulmck-ThinkPad-P72.home (50-39-105-78.bvtn.or.frontiernet.net [50.39.105.78]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 2C2F720772; Fri, 5 Jun 2020 03:20:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1591327215; bh=1B3gkEdlaQ9AujReI5kYJn3rc33qe7iW/DeEj3sUOMY=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=nRXNpn+XpZ/QZcMx2uZdPPrv+TDuDYBWTKYnAMQDph/gSOcJ7sU/WULabc15t7aHP sWpWpMjLuoA7H2iFwgxHJNS7rCVx6u/OQzhR1CpM2E8kPrIXmN5odz4n+/FZVNik/y vS76ncIBmOrO0PM7WK4POK6EWzf7KHB/W7dcEJVw= Received: by paulmck-ThinkPad-P72.home (Postfix, from userid 1000) id 04D403520B71; Thu, 4 Jun 2020 20:20:14 -0700 (PDT) Date: Thu, 4 Jun 2020 20:20:14 -0700 From: "Paul E. McKenney" To: Vladimir Oltean Cc: linux-rt-users@vger.kernel.org, lkml , rcu@vger.kernel.org, Colin King Subject: Re: stress-ng --hrtimers hangs system Message-ID: <20200605032014.GA4455@paulmck-ThinkPad-P72> Reply-To: paulmck@kernel.org References: <4781d250-9a29-cef3-268d-7d83c98bf16a@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4781d250-9a29-cef3-268d-7d83c98bf16a@gmail.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 05, 2020 at 04:47:51AM +0300, Vladimir Oltean wrote: > Hi, > > I was testing stress-ng on an ARM64 box and I found that it can be killed instantaneously with a --hrtimers 1 test: > https://github.com/ColinIanKing/stress-ng/blob/master/stress-hrtimers.c > The console shell locks up immediately after starting the process, and I get this rcu_preempt splat after 21 seconds, > letting me know that the grace-periods kernel thread could not run: > > # stress-ng --hrtimers 1 > stress-ng: info: [320] defaulting to a 86400 second (1 day, 0.00 secs) run per stressor > stress-ng: info: [320] dispatching hogs: 1 hrtimers > stress-ng: info: [320] cache allocate: using defaults, can't determine cache details from sysfs > [ 85.827528] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 85.833656] (detected by 1, t=6502 jiffies, g=1789, q=12) > [ 85.839163] rcu: All QSes seen, last rcu_preempt kthread activity 6502 (4294913720-4294907218), jiffies_till_next_fqs=1, root ->qsmask 0x0 > [ 85.851647] rcu: rcu_preempt kthread starved for 6502 jiffies! g1789 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 > [ 85.861859] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. > [ 85.871025] rcu: RCU grace-period kthread stack dump: > [ 85.876094] rcu_preempt R running task 0 10 2 0x00000028 > [ 85.883173] Call trace: > [ 85.885636] __switch_to+0xf8/0x148 > [ 85.889137] __schedule+0x2d8/0x808 > [ 85.892636] schedule+0x48/0x100 > [ 85.895875] schedule_timeout+0x1c8/0x420 > [ 85.899900] rcu_gp_kthread+0x738/0x1b78 > [ 85.903836] kthread+0x158/0x168 > [ 85.907075] ret_from_fork+0x10/0x18 > [ 93.283548] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 33s! > [ 93.291569] BUG: workqueue lockup - pool cpus=0-1 flags=0x4 nice=0 stuck for 33s! > [ 93.299105] Showing busy workqueues and worker pools: > [ 93.304189] workqueue events: flags=0x0 > [ 93.308116] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 > [ 93.315031] pending: vmstat_shepherd > [ 93.318990] workqueue events_unbound: flags=0x2 > [ 93.323577] pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/512 refcnt=3 > [ 93.330309] pending: flush_to_ldisc > [ 93.334161] > [ 93.334164] ====================================================== > [ 93.334165] WARNING: possible circular locking dependency detected > [ 93.334167] 5.7.0-08604-g7dcedf8c04c0 #118 Not tainted > [ 93.334168] ------------------------------------------------------ > [ 93.334169] stress-ng-hrtim/326 is trying to acquire lock: > [ 93.334171] ffffb236f6aee1a0 (console_owner){-.-.}-{0:0}, at: console_unlock+0x21c/0x678 > [ 93.334176] > [ 93.334177] but task is already holding lock: > [ 93.334178] ffff00207ac0d018 (&pool->lock/1){-.-.}-{2:2}, at: show_workqueue_state+0x288/0x3a8 > [ 93.334184] > [ 93.334186] which lock already depends on the new lock. > [ 93.334187] > [ 93.334188] > [ 93.334189] the existing dependency chain (in reverse order) is: > [ 93.334190] > [ 93.334191] -> #2 (&pool->lock/1){-.-.}-{2:2}: > [ 93.334197] _raw_spin_lock+0x5c/0x78 > [ 93.334198] __queue_work+0x124/0x7c8 > [ 93.334199] queue_work_on+0xd0/0xf0 > [ 93.334200] tty_flip_buffer_push+0x3c/0x48 > [ 93.334202] serial8250_rx_chars+0x74/0x88 > [ 93.334203] fsl8250_handle_irq+0x15c/0x1a0 > [ 93.334204] serial8250_interrupt+0x70/0xb8 > [ 93.334206] __handle_irq_event_percpu+0xe0/0x478 > [ 93.334207] handle_irq_event_percpu+0x40/0x98 > [ 93.334208] handle_irq_event+0x4c/0xd0 > [ 93.334209] handle_fasteoi_irq+0xb4/0x158 > [ 93.334211] generic_handle_irq+0x3c/0x58 > [ 93.334212] __handle_domain_irq+0x68/0xc0 > [ 93.334213] gic_handle_irq+0x6c/0x160 > [ 93.334214] el1_irq+0xbc/0x180 > [ 93.334216] cpuidle_enter_state+0xb4/0x4f8 > [ 93.334217] cpuidle_enter+0x3c/0x50 > [ 93.334218] call_cpuidle+0x44/0x78 > [ 93.334219] do_idle+0x228/0x2c8 > [ 93.334220] cpu_startup_entry+0x2c/0x48 > [ 93.334222] rest_init+0x1ac/0x280 > [ 93.334223] arch_call_rest_init+0x14/0x1c > [ 93.334224] start_kernel+0x4ec/0x524 > [ 93.334225] > [ 93.334226] -> #1 (&port->lock#2){-.-.}-{2:2}: > [ 93.334232] _raw_spin_lock_irqsave+0x78/0xa0 > [ 93.334233] serial8250_console_write+0x1f4/0x348 > [ 93.334234] univ8250_console_write+0x44/0x58 > [ 93.334235] console_unlock+0x480/0x678 > [ 93.334237] vprintk_emit+0x188/0x370 > [ 93.334238] vprintk_default+0x48/0x58 > [ 93.334239] vprintk_func+0xf0/0x238 > [ 93.334240] printk+0x74/0x94 > [ 93.334241] register_console+0x1a0/0x300 > [ 93.334243] uart_add_one_port+0x4a0/0x4e0 > [ 93.334244] serial8250_register_8250_port+0x2d0/0x4b0 > [ 93.334245] of_platform_serial_probe+0x318/0x650 > [ 93.334246] platform_drv_probe+0x58/0xa8 > [ 93.334248] really_probe+0x10c/0x358 > [ 93.334249] driver_probe_device+0x5c/0x100 > [ 93.334250] device_driver_attach+0x74/0x98 > [ 93.334252] __driver_attach+0x98/0xe0 > [ 93.334253] bus_for_each_dev+0x84/0xd8 > [ 93.334254] driver_attach+0x30/0x40 [ 93.334255] bus_add_driver+0x160/0x208 > [ 93.334256] driver_register+0x64/0x110 > [ 93.334258] __platform_driver_register+0x58/0x68 > [ 93.334259] of_platform_serial_driver_init+0x20/0x28 > [ 93.334260] do_one_initcall+0x94/0x438 > [ 93.334261] kernel_init_freeable+0x2cc/0x338 > [ 93.334263] kernel_init+0x18/0x110 [ 93.334264] ret_from_fork+0x10/0x18 [ 93.334265] > [ 93.334266] -> #0 (console_owner){-.-.}-{0:0}: > [ 93.334271] __lock_acquire+0x1170/0x1510 > [ 93.334272] lock_acquire+0xf8/0x458 [ 93.334273] console_unlock+0x274/0x678 > [ 93.334274] vprintk_emit+0x188/0x370 > [ 93.334275] vprintk_default+0x48/0x58 > [ 93.334277] vprintk_func+0xf0/0x238 [ 93.334278] printk+0x74/0x94 > [ 93.334279] show_pwq+0x80/0x2c0 > [ 93.334280] show_workqueue_state+0x2ac/0x3a8 > [ 93.334281] wq_watchdog_timer_fn+0x194/0x2d8 > [ 93.334283] call_timer_fn+0xcc/0x400 > [ 93.334284] run_timer_softirq+0x250/0x6e0 > [ 93.334285] __do_softirq+0x12c/0x5b4 > [ 93.334286] irq_exit+0x13c/0x148 > [ 93.334287] __handle_domain_irq+0x6c/0xc0 > [ 93.334289] gic_handle_irq+0x6c/0x160 > [ 93.334290] el1_irq+0xbc/0x180 > [ 93.334291] lock_acquire+0x110/0x458 > [ 93.334292] __might_fault+0x84/0xa8 [ 93.334293] __arm64_sys_rt_sigreturn+0x154/0x980 > [ 93.334295] el0_svc_common.constprop.2+0x7c/0x178 > [ 93.334296] do_el0_svc+0x34/0xa0 > [ 93.334297] el0_sync_handler+0x160/0x1f0 > [ 93.334298] el0_sync+0x140/0x180 > [ 93.334299] > [ 93.334301] other info that might help us debug this: > [ 93.334301] > [ 93.334303] Chain exists of: > [ 93.334303] console_owner --> &port->lock#2 --> &pool->lock/1 > [ 93.334311] > [ 93.334313] Possible unsafe locking scenario: > [ 93.334314] > [ 93.334315] CPU0 CPU1 > [ 93.334316] ---- ---- > [ 93.334317] lock(&pool->lock/1); > [ 93.334321] lock(&port->lock#2); > [ 93.334325] lock(&pool->lock/1); > [ 93.334328] lock(console_owner); > [ 93.334331] > [ 93.334332] *** DEADLOCK *** > [ 93.334333] > [ 93.334334] 5 locks held by stress-ng-hrtim/326: > [ 93.334335] #0: ffff002077cf9a18 (&mm->mmap_sem){++++}-{3:3}, at: __might_fault+0x60/0xa8 > [ 93.334341] #1: ffff800010003de0 ((&wq_watchdog_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400 > [ 93.334347] #2: ffffb236f6aefd60 (rcu_read_lock){....}-{1:2}, at: show_workqueue_state+0x0/0x3a8 > [ 93.334353] #3: ffff00207ac0d018 (&pool->lock/1){-.-.}-{2:2}, at: show_workqueue_state+0x288/0x3a8 > [ 93.334360] #4: ffffb236f6aee080 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x180/0x370 > [ 93.334366] > [ 93.334367] stack backtrace: > [ 93.334369] CPU: 0 PID: 326 Comm: stress-ng-hrtim Not tainted 5.7.0-08604-g7dcedf8c04c0 #118 > [ 93.334370] Hardware name: LS1028A RDB Board (DT) > [ 93.334371] Call trace: > [ 93.334372] dump_backtrace+0x0/0x1a8 > [ 93.334373] show_stack+0x24/0x30 > [ 93.334374] dump_stack+0xe8/0x150 > [ 93.334376] print_circular_bug.isra.41+0x228/0x280 > [ 93.334377] check_noncircular+0x14c/0x1b0 > [ 93.334378] __lock_acquire+0x1170/0x1510 [ 93.334379] lock_acquire+0xf8/0x458 > [ 93.334381] console_unlock+0x274/0x678 > [ 93.334382] vprintk_emit+0x188/0x370 > [ 93.334383] vprintk_default+0x48/0x58 > [ 93.334384] vprintk_func+0xf0/0x238 > [ 93.334385] printk+0x74/0x94 > [ 93.334386] show_pwq+0x80/0x2c0 > [ 93.334387] show_workqueue_state+0x2ac/0x3a8 > [ 93.334389] wq_watchdog_timer_fn+0x194/0x2d8 > [ 93.334390] call_timer_fn+0xcc/0x400 > [ 93.334391] run_timer_softirq+0x250/0x6e0 > [ 93.334392] __do_softirq+0x12c/0x5b4 > [ 93.334393] irq_exit+0x13c/0x148 > [ 93.334394] __handle_domain_irq+0x6c/0xc0 > [ 93.334396] gic_handle_irq+0x6c/0x160 > [ 93.334397] el1_irq+0xbc/0x180 > [ 93.334398] lock_acquire+0x110/0x458 > [ 93.334399] __might_fault+0x84/0xa8 > [ 93.334400] __arm64_sys_rt_sigreturn+0x154/0x980 > [ 93.334402] el0_svc_common.constprop.2+0x7c/0x178 > [ 93.334403] do_el0_svc+0x34/0xa0 > [ 93.334404] el0_sync_handler+0x160/0x1f0 [ 93.334405] el0_sync+0x140/0x180 > [ 93.984979] workqueue events_freezable: flags=0x4 > [ 93.989705] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 > [ 93.996522] pending: mmc_rescan > [ 94.000032] workqueue events_power_efficient: flags=0x82 > [ 94.005369] pwq 4: cpus=0-1 flags=0x4 nice=0 active=4/256 refcnt=6 > [ 94.011748] pending: phy_state_machine, phy_state_machine, neigh_periodic_work, do_cache_clean > [ 94.020761] workqueue pm: flags=0x4 > [ 94.024266] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 > [ 94.031081] pending: pm_runtime_work > [ 94.035025] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 > [ 94.041840] pending: pm_runtime_work > [ 94.045782] workqueue writeback: flags=0x4a > [ 94.049984] pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/256 refcnt=3 > [ 94.056362] pending: wb_workfn > [ 94.059784] workqueue kblockd: flags=0x18 [ 94.063812] pwq 3: cpus=1 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2 > [ 94.070802] pending: blk_mq_timeout_work > > Looks like there's another issue in fsl8250_handle_irq, but I think that's unrelated so I'll ignore it. > So it looks like CPU 1 detects that the rcu_preempt kthread isn't scheduled, despite the fact that all > quiescent states were seen. So it looks like there is no RCU usage bug in here, but a legitimate CPU > starvation issue as far as I can see. That agrees with my interpretation of the stall warning message. On the rest, I must defer to others. Thanx, Paul > I have some noob questions: > > - Is my system (hardware, drivers, kernel config) misconfigured? This can be reproduced at least on the > NXP LS1028A running today's master branch with just the arm64 defconfig. On other hardware however > (NXP T1040 PowerPC), RT throttling kicks in faster and the system doesn't hang (more below). > > - What is the expected behavior of a system to this hrtimer test? Intuitively I would say the kenel > shouldn't hang, but I don't exactly know what are the mechanisms to avoid that. Below are 2 things I tried. > > - I tried to enable CONFIG_RCU_BOOST and had limited success. I could get rid of the RCU stall messages > but the system would still hang after a longer while. When it hanged, it didn't even respond to ping > from the outside, no shell, only magic sysrq could restart the kernel. > > - I read a little bit of the "Limiting the CPU usage of real-time and deadline processes" chapter from > man 7 sched. Is this what I should be doing? My /proc/sys/kernel/sched_rt_period_us and > /proc/sys/kernel/sched_rt_runtime_us are set to their default values (RT period 1 second, RT runtime > .95 seconds), why isn't that enough? If I reduce the RT runtime to .8 seconds while keeping the period > at 1 second, RT throttling (finally!) kicks in and the system does not die. > The RCU stall warning does keep, however, getting printed continuously. > > [ 385.732074] sched: RT throttling activated > [ 385.883960] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 385.890077] (detected by 1, t=6503 jiffies, g=1801, q=68) > [ 385.895582] rcu: All QSes seen, last rcu_preempt kthread activity 6504 (4294988738-4294982234), jiffies_till_next_fqs=1, root ->qsmask 0x0 > [ 385.908062] rcu: rcu_preempt kthread starved for 6508 jiffies! g1801 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1 > [ 385.918273] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. > [ 385.927437] rcu: RCU grace-period kthread stack dump: > [ 385.932503] rcu_preempt R running task 0 11 2 0x00000028 > [ 385.939579] Call trace: > [ 385.942040] __switch_to+0xf8/0x148 > [ 385.945539] __schedule+0x2d8/0x808 > [ 385.949036] schedule+0x48/0x100 > [ 385.952273] schedule_timeout+0x1c8/0x420 > [ 385.956295] rcu_gp_kthread+0x704/0x1870 > [ 385.960228] kthread+0x158/0x168 > [ 385.963464] ret_from_fork+0x10/0x18 > (...) > [ 463.903959] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 463.910077] (detected by 1, t=26008 jiffies, g=1801, q=148) > [ 463.915756] rcu: All QSes seen, last rcu_preempt kthread activity 26009 (4295008243-4294982234), jiffies_till_next_fqs=1, root ->qsmask 0x0 > [ 463.928324] rcu: rcu_preempt kthread starved for 26013 jiffies! g1801 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1 > [ 463.938622] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. > [ 463.947786] rcu: RCU grace-period kthread stack dump: > [ 463.952852] rcu_preempt R running task 0 11 2 0x00000028 > [ 463.959928] Call trace: > [ 463.962389] __switch_to+0xf8/0x148 > [ 463.965888] __schedule+0x2d8/0x808 > [ 463.969385] schedule+0x48/0x100 > [ 463.972622] schedule_timeout+0x1c8/0x420 > [ 463.976644] rcu_gp_kthread+0x704/0x1870 > [ 463.980578] kthread+0x158/0x168 > [ 463.983814] ret_from_fork+0x10/0x18 > (...) > [ 541.923959] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 541.930076] (detected by 1, t=45513 jiffies, g=1801, q=230) > [ 541.935754] rcu: All QSes seen, last rcu_preempt kthread activity 45514 (4295027748-4294982234), jiffies_till_next_fqs=1, root ->qsmask 0x0 > [ 541.948322] rcu: rcu_preempt kthread starved for 45518 jiffies! g1801 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1 > [ 541.958620] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. > [ 541.967784] rcu: RCU grace-period kthread stack dump: > [ 541.972850] rcu_preempt R running task 0 11 2 0x00000028 > [ 541.979926] Call trace: > [ 541.982387] __switch_to+0xf8/0x148 > [ 541.985887] __schedule+0x2d8/0x808 > [ 541.989384] schedule+0x48/0x100 > [ 541.992621] schedule_timeout+0x1c8/0x420 > [ 541.996643] rcu_gp_kthread+0x704/0x1870 > [ 542.000576] kthread+0x158/0x168 > [ 542.003813] ret_from_fork+0x10/0x18 > > I would really appreciate some pointers. > > Thanks, > -Vladimir