Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752654AbbF2RiT (ORCPT ); Mon, 29 Jun 2015 13:38:19 -0400 Received: from foss.arm.com ([217.140.101.70]:39081 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752627AbbF2RiL (ORCPT ); Mon, 29 Jun 2015 13:38:11 -0400 Date: Mon, 29 Jun 2015 18:37:52 +0100 From: Mark Rutland To: "peterz@infradead.org" , linux-kernel@vger.kernel.org Cc: "umgwanakikbuti@gmail.com" , torvalds@linux-foundation.org, "mingo@kernel.org" , paulmck@linux.vnet.ibm.com, Will Deacon Subject: PRREMPT RCU stalls with 9edfbfed3f544a78 ("sched/core: Rework rq->clock update skips") Message-ID: <20150629173751.GB19863@leverpostej> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Thread-Topic: RCU stalls with 9edfbfed3f544a78 ("sched/core: Rework rq->clock update skips") Accept-Language: en-GB, en-US Content-Language: en-US User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6025 Lines: 115 Hi, Recently (since ~v4.0) I've been seeing new RCU stall warnings when running hackbench (and cpu-affine cyclictest instances) on CONFIG_PREEMPT=y kernels, both arm64 and x86_64: arm64 ----- INFO: rcu_preempt detected stalls on CPUs/tasks: Tasks blocked on level-0 rcu_node (CPUs 0-5): (detected by 1, t=2102 jiffies, g=957, c=956, q=23) All QSes seen, last rcu_preempt kthread activity 2 (4294981808-4294981806), jiffies_till_next_fqs=1, root ->qsmask 0x0 hackbench R running task 0 10388 9559 0x00000000 Call trace: [] dump_backtrace+0x0/0x124 [] show_stack+0x10/0x1c [] sched_show_task+0xa4/0x104 [] rcu_check_callbacks+0x998/0x9a0 [] update_process_times+0x38/0x6c [] tick_sched_handle.isra.16+0x1c/0x68 [] tick_sched_timer+0x40/0x88 [] __run_hrtimer.isra.34+0x48/0x108 [] hrtimer_interrupt+0xc4/0x248 [] arch_timer_handler_phys+0x28/0x38 [] handle_percpu_devid_irq+0x74/0x9c [] generic_handle_irq+0x30/0x4c [] __handle_domain_irq+0x5c/0xac [] gic_handle_irq+0x30/0x80 Exception stack(0xffffffc073bc3a80 to 0xffffffc073bc3ba0) 3a80: 00000064 00000000 00000064 00000000 73bc3bc0 ffffffc0 00331ffc ffffffc0 3aa0: 00000064 00000000 00000064 00000000 ffffffff ffffffff 160c3d20 00000000 3ac0: e9d018ac 0000007f 00000064 00000000 5a046200 ffffffc9 000000c0 00000000 3ae0: 00000000 00000000 00000000 00000000 ffffffff 00000000 00000018 00000000 3b00: 00000002 00000000 00000000 00000000 948d9a5c 0000007f 94a1e590 0000007f 3b20: 0019b7e0 ffffffc0 94a316d0 0000007f e9d01530 0000007f 00000064 00000000 3b40: 00000064 00000000 00000064 00000000 00000000 00000000 00000064 00000000 3b60: 73bc3d78 ffffffc0 757be080 ffffffc9 6241b980 ffffffc9 00000001 00000000 3b80: 55da5500 ffffffc9 73bc3bc0 ffffffc0 00331f9c ffffffc0 73bc3bc0 ffffffc0 [] el1_irq+0x64/0xd8 [] skb_copy_datagram_from_iter+0x5c/0x1f0 [] unix_stream_sendmsg+0xf0/0x334 [] sock_sendmsg+0x14/0x58 [] sock_write_iter+0x74/0xd4 [] __vfs_write+0xac/0x10c [] vfs_write+0x8c/0x194 [] SyS_write+0x40/0xa0 x86_64 ------ [ 2616.307011] INFO: rcu_preempt detected stalls on CPUs/tasks: [ 2616.311868] Tasks blocked on level-0 rcu_node (CPUs 0-3): [ 2616.311868] (detected by 1, t=21014 jiffies, g=10009, c=10008, q=274) [ 2616.322320] All QSes seen, last rcu_preempt kthread activity 3 (4297283620-4297283617), jiffies_till_next_fqs=3, root ->qsmask 0x0 [ 2616.322320] hackbench R running task 13984 30174 29054 0x00000008 [ 2616.322320] 00000000000036a0 ffff8801bec83c98 ffffffff8107b751 0000000000000003 [ 2616.322320] ffffffff81e3b8c0 ffffffff81f01280 ffffffff81e3b8c0 ffff8801bec83d28 [ 2616.352044] ffffffff810a90f7 0000000000000000 ffffffff8107faac 000000000000f6a0 [ 2616.352044] Call Trace: [ 2616.352044] [] sched_show_task+0xe1/0x150 [ 2616.352044] [] rcu_check_callbacks+0x8f7/0x900 [ 2616.352044] [] ? account_system_time+0x9c/0x190 [ 2616.352044] [] update_process_times+0x34/0x60 [ 2616.352044] [] tick_sched_handle.isra.15+0x31/0x40 [ 2616.352044] [] tick_sched_timer+0x3f/0x70 [ 2616.352044] [] __run_hrtimer+0x7f/0x230 [ 2616.352044] [] ? tick_sched_handle.isra.15+0x40/0x40 [ 2616.416043] [] hrtimer_interrupt+0xf3/0x220 [ 2616.416043] [] hpet_interrupt_handler+0x11/0x30 [ 2616.416043] [] handle_irq_event_percpu+0x47/0x1d0 [ 2616.428774] [] handle_irq_event+0x3c/0x60 [ 2616.428774] [] handle_edge_irq+0x8f/0x130 [ 2616.428774] [] handle_irq+0xf7/0x180 [ 2616.450371] [] ? get_parent_ip+0x11/0x50 [ 2616.450371] [] do_IRQ+0x58/0x100 [ 2616.450371] [] common_interrupt+0x6b/0x6b [ 2616.467277] [] ? sock_def_readable+0x67/0x70 [ 2616.467277] [] ? avc_has_perm+0x105/0x1b0 [ 2616.467277] [] ? avc_has_perm+0xd7/0x1b0 [ 2616.482406] [] inode_has_perm.isra.29+0x22/0x30 [ 2616.482406] [] file_has_perm+0x87/0xa0 [ 2616.493232] [] ? fsnotify+0x2d6/0x400 [ 2616.493232] [] selinux_file_permission+0xa5/0x120 [ 2616.493232] [] security_file_permission+0x1e/0xa0 [ 2616.493232] [] rw_verify_area+0x51/0xd0 [ 2616.493232] [] vfs_write+0x6f/0x1b0 [ 2616.493232] [] SyS_write+0x41/0xb0 [ 2616.493232] [] ? SyS_poll+0x60/0xf0 [ 2616.493232] [] system_call_fastpath+0x12/0x6a I've bisected to 9edfbfed3f544a78 ("sched/core: Rework rq->clock update skips"), and with that reverted I no longer see stalls. For v4.1 I also had to revert 44fb085bfa17628c "sched/deadline: Add rq->clock update skip for dl task yield" as it calls function introduced in 9edfbfed3f544a78. Any ideas as to what could be going on? To reproduce the issue I'm running the following in parallel on dual-core machines: $ while true; do ./hackbench 100 process 1000; sleep 30; done # ./cyclictest -a0 -n -M -m # ./cyclictest -a1 -n -M -m Usually it takes a couple of minutes to trigger, but I've seen it take up to half an hour. With the reverts I haven't seen stalls after testing for an hour or so. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/