Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754954AbaKRVzv (ORCPT ); Tue, 18 Nov 2014 16:55:51 -0500 Received: from mx1.redhat.com ([209.132.183.28]:36773 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753954AbaKRVzu (ORCPT ); Tue, 18 Nov 2014 16:55:50 -0500 Date: Tue, 18 Nov 2014 16:55:40 -0500 From: Don Zickus To: Thomas Gleixner Cc: Dave Jones , Linus Torvalds , Linux Kernel , the arch/x86 maintainers Subject: Re: frequent lockups in 3.18rc4 Message-ID: <20141118215540.GD35311@redhat.com> References: <20141116014006.GA5016@redhat.com> <20141117170359.GA1382@redhat.com> <20141118020959.GA2091@redhat.com> <20141118023930.GA2871@redhat.com> <20141118145234.GA7487@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 18, 2014 at 07:54:17PM +0100, Thomas Gleixner wrote: > On Tue, 18 Nov 2014, Dave Jones wrote: > > Here's the first hit. Curiously, one cpu is missing. > > I don't think so > > > NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [trinity-c180:17837] > > > irq event stamp: 2258092 > > hardirqs last enabled at (2258091): [] get_page_from_freelist+0x555/0xaa0 > > hardirqs last disabled at (2258092): [] apic_timer_interrupt+0x6a/0x80 > > So that means we are in the timer interrupt and handling > watchdog_timer_fn. > > > CPU: 1 PID: 17837 Comm: trinity-c180 Not tainted 3.18.0-rc5+ #90 [loadavg: 199.00 178.81 173.92 35/402 20526] > > task: ffff8801575e4680 ti: ffff880202434000 task.ti: ffff880202434000 > > RIP: 0010:[] [] bad_range+0x0/0x90 > > So the softlockup tells us, that the high priority watchdog thread was > not able to touch the watchdog timestamp. That means this task was > hogging the CPU for 20+ seconds. I have no idea how that happens in > that call chain. > > Call Trace: > [] ? get_page_from_freelist+0x564/0xaa0 > [] ? get_parent_ip+0x11/0x50 > [] __alloc_pages_nodemask+0x230/0xd20 > [] ? get_parent_ip+0x11/0x50 > [] ? mark_held_locks+0x75/0xa0 > [] alloc_pages_vma+0xee/0x1b0 > [] ? shmem_alloc_page+0x6e/0xc0 > [] shmem_alloc_page+0x6e/0xc0 > [] ? get_parent_ip+0x11/0x50 > [] ? preempt_count_sub+0x7b/0x100 > [] ? __percpu_counter_add+0x86/0xb0 > [] ? __vm_enough_memory+0x66/0x1c0 > [] ? find_get_entry+0x5/0x230 > [] ? cap_vm_enough_memory+0x4c/0x60 > [] shmem_getpage_gfp+0x630/0xa40 > [] ? match_held_lock+0x111/0x160 > [] shmem_write_begin+0x42/0x70 > [] generic_perform_write+0xd4/0x1f0 > [] __generic_file_write_iter+0x162/0x350 > [] ? new_sync_read+0xd0/0xd0 > [] generic_file_write_iter+0x3f/0xb0 > [] ? __generic_file_write_iter+0x350/0x350 > [] do_iter_readv_writev+0x78/0xc0 > [] do_readv_writev+0xd8/0x2a0 > [] ? __generic_file_write_iter+0x350/0x350 > [] ? lock_release_holdtime.part.28+0xe6/0x160 > [] ? __generic_file_write_iter+0x350/0x350 > [] ? get_parent_ip+0x11/0x50 > [] ? preempt_count_sub+0x7b/0x100 > [] ? rcu_read_lock_held+0x6e/0x80 > [] vfs_writev+0x3c/0x50 > [] SyS_writev+0x5c/0x100 > [] tracesys_phase2+0xd4/0xd9 > > But this gets pages for a write into shmem and the other one below > does a madvise on a shmem map. Coincidence? > > > sending NMI to other CPUs: > > So here we kick the other cpus > > > NMI backtrace for cpu 2 > > CPU: 2 PID: 15913 Comm: trinity-c141 Not tainted 3.18.0-rc5+ #90 [loadavg: 199.00 178.81 173.92 35/402 20526] > > task: ffff880223229780 ti: ffff8801afca0000 task.ti: ffff8801afca0000 > > RIP: 0010:[] [] generic_exec_single+0xee/0x1a0 > > [] ? do_flush_tlb_all+0x60/0x60 > > [] ? do_flush_tlb_all+0x60/0x60 > > [] smp_call_function_single+0x6a/0xe0 > > [] ? cpumask_next_and+0x4f/0xb0 > > [] ? do_flush_tlb_all+0x60/0x60 > > [] smp_call_function_many+0x2b9/0x320 > > [] flush_tlb_mm_range+0xe0/0x370 > > [] tlb_flush_mmu_tlbonly+0x42/0x50 > > [] unmap_single_vma+0x6b8/0x900 > > [] zap_page_range_single+0xfc/0x160 > > [] unmap_mapping_range+0x134/0x190 > > [] shmem_fallocate+0x4fd/0x520 > > [] ? prepare_to_wait+0x27/0x90 > > [] do_fallocate+0x132/0x1d0 > > [] SyS_madvise+0x398/0x870 > > [] ? rcu_read_lock_sched_held+0x4e/0x6a > > [] ? syscall_trace_enter_phase2+0xa7/0x2b0 > > [] tracesys_phase2+0xd4/0xd9 > > We've seen that before > > > NMI backtrace for cpu 0 > > INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 37.091 msecs > > So it complains that the backtrace handler took 37 msec, which is > indeed long for just dumping a stack trace. > > > CPU: 0 PID: 15851 Comm: trinity-c80 Not tainted 3.18.0-rc5+ #90 [loadavg: 199.00 178.81 173.92 36/402 20526] > > task: ffff8801874e8000 ti: ffff88022baec000 task.ti: ffff88022baec000 > > RIP: 0010:[] [] preempt_count_add+0x0/0xc0 > > RSP: 0000:ffff880244003c30 EFLAGS: 00000092 > > RAX: 0000000000000001 RBX: ffffffffa9edb560 RCX: 0000000000000001 > > RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000001 > > RBP: ffff880244003c48 R08: 0000000000000000 R09: 0000000000000001 > > R10: 0000000000000000 R11: ffff8801874e88c8 [23543.271956] NMI backtrace for cpu 3 > > So here we mangle CPU3 in and lose the backtrace for cpu0, which might > be the real interesting one .... Dave, Can you provide another dump? The hope is we get something not mangled? The other option we have done in RHEL is panic the system and let kdump capture the memory. Then we can analyze the vmcore for the stack trace cpu0 stored in memory to get a rough idea where it might be if the cpu isn't responding very well. Cheers, Don > > > INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 100.612 msecs > > This one takes 100ms. > > > CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.18.0-rc5+ #90 [loadavg: 199.00 178.81 173.92 37/402 20526] > > task: ffff880242b5c680 ti: ffff880242b78000 task.ti: ffff880242b78000 > > RIP: 0010:[] [] intel_idle+0xd5/0x180 > > So that one is simply idle. > > > INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 125.739 msecs > > > > And we get another backtrace handler taking too long. Of course we > cannot tell which of the 3 complaints comes from which cpu, because > the printk lacks a cpuid. > > Thanks, > > tglx > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/