Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755530AbaLHOdq (ORCPT ); Mon, 8 Dec 2014 09:33:46 -0500 Received: from userp1040.oracle.com ([156.151.31.81]:33766 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751709AbaLHOdp (ORCPT ); Mon, 8 Dec 2014 09:33:45 -0500 Message-ID: <5485B6B9.7010800@oracle.com> Date: Mon, 08 Dec 2014 09:33:29 -0500 From: Sasha Levin User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: paulmck@linux.vnet.ibm.com CC: Linus Torvalds , Dave Jones , Chris Mason , =?windows-1252?Q?D=E2niel_Fraga?= , Linux Kernel Mailing List Subject: Re: frequent lockups in 3.18rc4 References: <20141202193252.GB17595@redhat.com> <547E4C14.6040509@oracle.com> <54813C03.8040009@oracle.com> <5481C92E.6020805@oracle.com> <54846B06.8050906@oracle.com> <20141207182420.GG25340@linux.vnet.ibm.com> <20141207194304.GA17810@linux.vnet.ibm.com> <5484E2AB.1070503@oracle.com> <20141208052048.GJ25340@linux.vnet.ibm.com> In-Reply-To: <20141208052048.GJ25340@linux.vnet.ibm.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Source-IP: ucsinet22.oracle.com [156.151.31.94] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/08/2014 12:20 AM, Paul E. McKenney wrote: > I have seen this caused by lost IPIs, but you have to lose two of them, > which seems less than fully likely. It does seem that it can cause full blown stalls as well, just pretty rarely (notice the lack of any prints before): [11373.032327] audit: type=1326 audit(1397703594.974:502): auid=4294967295 uid=7884781 gid=0 ses=4294967295 pid=9853 comm="trinity-c768" exe="/trinity/trinity" sig=9 arch=c0000 03e syscall=96 compat=0 ip=0x7fff2c3fee47 code=0x0 [11374.565881] audit: type=1326 audit(1397703596.504:503): auid=4294967295 uid=32 gid=0 ses=4294967295 pid=9801 comm="trinity-c710" exe="/trinity/trinity" sig=9 arch=c000003e syscall=96 compat=0 ip=0x7fff2c3fee47 code=0x0 [11839.353539] Hangcheck: hangcheck value past margin! [12040.010128] INFO: rcu_sched detected stalls on CPUs/tasks: [12040.012072] (detected by 4, t=213513 jiffies, g=-222, c=-223, q=0) [12040.014200] INFO: Stall ended before state dump start [12159.730069] INFO: rcu_preempt detected stalls on CPUs/tasks: [12159.730069] (detected by 3, t=396537 jiffies, g=24095, c=24094, q=1346) [12159.730069] INFO: Stall ended before state dump start [12602.162439] Hangcheck: hangcheck value past margin! [12655.560806] INFO: rcu_sched detected stalls on CPUs/tasks: [12655.560806] 0: (3 ticks this GP) idle=bc3/140000000000002/0 softirq=26674/26674 last_accelerate: b2a8/da68, nonlazy_posted: 20893, .. [12655.602171] (detected by 13, t=30506 jiffies, g=-219, c=-220, q=0) [12655.602171] Task dump for CPU 0: [12655.602171] trinity-c39 R running task 11904 6558 26120 0x0008000c [12655.602171] ffffffff81593bf7 ffff8808d5d58d40 0000000000000282 ffffffff9ef40538 [12655.602171] ffff880481400000 ffff8808d5dcb638 ffffffff83f2ed2b ffffffff9eaa1718 [12655.602171] ffff8808d5d58d08 00000b820c44b0ae 0000000000000000 0000000000000001 [12655.602171] Call Trace: [12655.602171] [] ? trace_hardirqs_on_caller+0x677/0x900 [12655.602171] [] ? trace_hardirqs_on_thunk+0x3a/0x3f [12655.602171] [] ? retint_restore_args+0x13/0x13 [12655.602171] [] ? _raw_spin_unlock_irqrestore+0xa2/0xf0 [12655.602171] [] ? __debug_check_no_obj_freed+0x2f5/0xd90 [12655.602171] [] ? trace_hardirqs_on_caller+0x677/0x900 [12655.602171] [] ? debug_check_no_obj_freed+0x19/0x20 [12655.602171] [] ? free_pages_prepare+0x5bf/0x1000 [12655.602171] [] ? __this_cpu_preempt_check+0x13/0x20 [12655.602171] [] ? __free_pages_ok+0x3d/0x360 [12655.602171] [] ? free_compound_page+0x8d/0xd0 [12655.602171] [] ? __put_compound_page+0x46/0x70 [12655.602171] [] ? put_compound_page+0xf5/0x10e0 [12655.602171] [] ? preempt_count_sub+0x11b/0x1d0 [12655.602171] [] ? release_pages+0x41d/0x6f0 [12655.602171] [] ? free_pages_and_swap_cache+0x11b/0x1a0 [12655.602171] [] ? tlb_flush_mmu_free+0x72/0x180 [12655.602171] [] ? unmap_single_vma+0x1326/0x2170 [12655.602171] [] ? __this_cpu_preempt_check+0x13/0x20 [12655.602171] [] ? unmap_vmas+0xd4/0x250 [12655.602171] [] ? exit_mmap+0x169/0x610 [12655.602171] [] ? kmem_cache_free+0x7cd/0xbb0 [12655.602171] [] ? mmput+0xd2/0x2c0 [12655.602171] [] ? do_exit+0x7e1/0x39c0 [12655.602171] [] ? get_signal+0x7a2/0x2130 [12655.602171] [] ? do_group_exit+0x101/0x490 [12655.602171] [] ? preempt_count_sub+0x11b/0x1d0 [12655.602171] [] ? get_signal+0x73e/0x2130 [12655.602171] [] ? sched_clock+0x31/0x50 [12655.602171] [] ? get_lock_stats+0x1d/0x100 [12655.602171] [] ? do_signal+0x28/0x3750 [12655.602171] [] ? vtime_account_user+0x173/0x220 [12655.602171] [] ? get_parent_ip+0x11/0x50 [12655.602171] [] ? __this_cpu_preempt_check+0x13/0x20 [12655.602171] [] ? trace_hardirqs_on_caller+0x677/0x900 [12655.602171] [] ? trace_hardirqs_on+0xd/0x10 [12655.602171] [] ? do_notify_resume+0x69/0x100 [12655.602171] [] ? int_signal+0x12/0x17 Thanks, Sasha -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/