Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760558Ab2J2UDh (ORCPT ); Mon, 29 Oct 2012 16:03:37 -0400 Received: from mail-ee0-f46.google.com ([74.125.83.46]:36297 "EHLO mail-ee0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759048Ab2J2UDQ (ORCPT ); Mon, 29 Oct 2012 16:03:16 -0400 MIME-Version: 1.0 Date: Mon, 29 Oct 2012 21:03:15 +0100 Message-ID: Subject: Hang in XFS reclaim on 3.7.0-rc3 From: Torsten Kaiser To: xfs@oss.sgi.com Cc: Linux Kernel Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8096 Lines: 169 After experiencing a hang of all IO yesterday ( http://marc.info/?l=linux-kernel&m=135142236520624&w=2 ), I turned on LOCKDEP after upgrading to -rc3. I then tried to replicate the load that hung yesterday and got the following lockdep report, implicating XFS instead of by stacking swap onto dm-crypt and md. Oct 29 20:27:11 thoregon kernel: [ 2675.571958] usb 7-2: USB disconnect, device number 2 Oct 29 20:30:01 thoregon kernel: [ 2844.971913] Oct 29 20:30:01 thoregon kernel: [ 2844.971920] ================================= Oct 29 20:30:01 thoregon kernel: [ 2844.971921] [ INFO: inconsistent lock state ] Oct 29 20:30:01 thoregon kernel: [ 2844.971924] 3.7.0-rc3 #1 Not tainted Oct 29 20:30:01 thoregon kernel: [ 2844.971925] --------------------------------- Oct 29 20:30:01 thoregon kernel: [ 2844.971927] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. Oct 29 20:30:01 thoregon kernel: [ 2844.971929] kswapd0/725 [HC0[0]:SC0[0]:HE1:SE1] takes: Oct 29 20:30:01 thoregon kernel: [ 2844.971931] (&(&ip->i_lock)->mr_lock){++++?.}, at: [] xfs_ilock+0x84/0xb0 Oct 29 20:30:01 thoregon kernel: [ 2844.971941] {RECLAIM_FS-ON-W} state was registered at: Oct 29 20:30:01 thoregon kernel: [ 2844.971942] [] mark_held_locks+0x7e/0x130 Oct 29 20:30:01 thoregon kernel: [ 2844.971947] [] lockdep_trace_alloc+0x63/0xc0 Oct 29 20:30:01 thoregon kernel: [ 2844.971949] [] kmem_cache_alloc+0x35/0xe0 Oct 29 20:30:01 thoregon kernel: [ 2844.971952] [] vm_map_ram+0x271/0x770 Oct 29 20:30:01 thoregon kernel: [ 2844.971955] [] _xfs_buf_map_pages+0x46/0xe0 Oct 29 20:30:01 thoregon kernel: [ 2844.971959] [] xfs_buf_get_map+0x8a/0x130 Oct 29 20:30:01 thoregon kernel: [ 2844.971961] [] xfs_trans_get_buf_map+0xa9/0xd0 Oct 29 20:30:01 thoregon kernel: [ 2844.971964] [] xfs_ifree_cluster+0x129/0x670 Oct 29 20:30:01 thoregon kernel: [ 2844.971967] [] xfs_ifree+0xe9/0xf0 Oct 29 20:30:01 thoregon kernel: [ 2844.971969] [] xfs_inactive+0x2af/0x480 Oct 29 20:30:01 thoregon kernel: [ 2844.971972] [] xfs_fs_evict_inode+0x70/0x80 Oct 29 20:30:01 thoregon kernel: [ 2844.971974] [] evict+0xaf/0x1b0 Oct 29 20:30:01 thoregon kernel: [ 2844.971977] [] iput+0x105/0x210 Oct 29 20:30:01 thoregon kernel: [ 2844.971979] [] dentry_iput+0xa0/0xe0 Oct 29 20:30:01 thoregon kernel: [ 2844.971981] [] dput+0x150/0x280 Oct 29 20:30:01 thoregon kernel: [ 2844.971983] [] sys_renameat+0x21b/0x290 Oct 29 20:30:01 thoregon kernel: [ 2844.971986] [] sys_rename+0x16/0x20 Oct 29 20:30:01 thoregon kernel: [ 2844.971988] [] system_call_fastpath+0x16/0x1b Oct 29 20:30:01 thoregon kernel: [ 2844.971992] irq event stamp: 155377 Oct 29 20:30:01 thoregon kernel: [ 2844.971993] hardirqs last enabled at (155377): [] mutex_trylock+0xfd/0x170 Oct 29 20:30:01 thoregon kernel: [ 2844.971997] hardirqs last disabled at (155376): [] mutex_trylock+0x3e/0x170 Oct 29 20:30:01 thoregon kernel: [ 2844.971999] softirqs last enabled at (155368): [] __do_softirq+0x111/0x170 Oct 29 20:30:01 thoregon kernel: [ 2844.972002] softirqs last disabled at (155353): [] call_softirq+0x1c/0x30 Oct 29 20:30:01 thoregon kernel: [ 2844.972004] Oct 29 20:30:01 thoregon kernel: [ 2844.972004] other info that might help us debug this: Oct 29 20:30:01 thoregon kernel: [ 2844.972006] Possible unsafe locking scenario: Oct 29 20:30:01 thoregon kernel: [ 2844.972006] Oct 29 20:30:01 thoregon kernel: [ 2844.972007] CPU0 Oct 29 20:30:01 thoregon kernel: [ 2844.972007] ---- Oct 29 20:30:01 thoregon kernel: [ 2844.972008] lock(&(&ip->i_lock)->mr_lock); Oct 29 20:30:01 thoregon kernel: [ 2844.972009] Oct 29 20:30:01 thoregon kernel: [ 2844.972010] lock(&(&ip->i_lock)->mr_lock); Oct 29 20:30:01 thoregon kernel: [ 2844.972012] Oct 29 20:30:01 thoregon kernel: [ 2844.972012] *** DEADLOCK *** Oct 29 20:30:01 thoregon kernel: [ 2844.972012] Oct 29 20:30:01 thoregon kernel: [ 2844.972013] 3 locks held by kswapd0/725: Oct 29 20:30:01 thoregon kernel: [ 2844.972014] #0: (shrinker_rwsem){++++..}, at: [] shrink_slab+0x32/0x1f0 Oct 29 20:30:01 thoregon kernel: [ 2844.972020] #1: (&type->s_umount_key#20){++++.+}, at: [] grab_super_passive+0x3e/0x90 Oct 29 20:30:01 thoregon kernel: [ 2844.972024] #2: (&pag->pag_ici_reclaim_lock){+.+...}, at: [] xfs_reclaim_inodes_ag+0xbc/0x4f0 Oct 29 20:30:01 thoregon kernel: [ 2844.972027] Oct 29 20:30:01 thoregon kernel: [ 2844.972027] stack backtrace: Oct 29 20:30:01 thoregon kernel: [ 2844.972029] Pid: 725, comm: kswapd0 Not tainted 3.7.0-rc3 #1 Oct 29 20:30:01 thoregon kernel: [ 2844.972031] Call Trace: Oct 29 20:30:01 thoregon kernel: [ 2844.972035] [] print_usage_bug+0x1f5/0x206 Oct 29 20:30:01 thoregon kernel: [ 2844.972039] [] ? save_stack_trace+0x2a/0x50 Oct 29 20:30:01 thoregon kernel: [ 2844.972042] [] mark_lock+0x28d/0x2f0 Oct 29 20:30:01 thoregon kernel: [ 2844.972044] [] ? print_irq_inversion_bug.part.37+0x1f0/0x1f0 Oct 29 20:30:01 thoregon kernel: [ 2844.972047] [] __lock_acquire+0x57f/0x1c00 Oct 29 20:30:01 thoregon kernel: [ 2844.972049] [] ? __lock_is_held+0x59/0x70 Oct 29 20:30:01 thoregon kernel: [ 2844.972051] [] lock_acquire+0x55/0x70 Oct 29 20:30:01 thoregon kernel: [ 2844.972053] [] ? xfs_ilock+0x84/0xb0 Oct 29 20:30:01 thoregon kernel: [ 2844.972056] [] down_write_nested+0x4a/0x70 Oct 29 20:30:01 thoregon kernel: [ 2844.972058] [] ? xfs_ilock+0x84/0xb0 Oct 29 20:30:01 thoregon kernel: [ 2844.972060] [] xfs_ilock+0x84/0xb0 Oct 29 20:30:01 thoregon kernel: [ 2844.972063] [] xfs_reclaim_inode+0x136/0x340 Oct 29 20:30:01 thoregon kernel: [ 2844.972065] [] xfs_reclaim_inodes_ag+0x2bf/0x4f0 Oct 29 20:30:01 thoregon kernel: [ 2844.972066] [] ? xfs_reclaim_inodes_ag+0xe0/0x4f0 Oct 29 20:30:01 thoregon kernel: [ 2844.972069] [] xfs_reclaim_inodes_nr+0x2e/0x40 Oct 29 20:30:01 thoregon kernel: [ 2844.972071] [] xfs_fs_free_cached_objects+0x10/0x20 Oct 29 20:30:01 thoregon kernel: [ 2844.972073] [] prune_super+0x113/0x1a0 Oct 29 20:30:01 thoregon kernel: [ 2844.972075] [] shrink_slab+0x11e/0x1f0 Oct 29 20:30:01 thoregon kernel: [ 2844.972077] [] kswapd+0x690/0xa10 Oct 29 20:30:01 thoregon kernel: [ 2844.972080] [] ? __init_waitqueue_head+0x60/0x60 Oct 29 20:30:01 thoregon kernel: [ 2844.972082] [] ? shrink_lruvec+0x540/0x540 Oct 29 20:30:01 thoregon kernel: [ 2844.972084] [] kthread+0xd6/0xe0 Oct 29 20:30:01 thoregon kernel: [ 2844.972087] [] ? _raw_spin_unlock_irq+0x2b/0x50 Oct 29 20:30:01 thoregon kernel: [ 2844.972089] [] ? flush_kthread_worker+0xe0/0xe0 Oct 29 20:30:01 thoregon kernel: [ 2844.972091] [] ret_from_fork+0x7c/0xb0 Oct 29 20:30:01 thoregon kernel: [ 2844.972093] [] ? flush_kthread_worker+0xe0/0xe0 Oct 29 20:30:01 thoregon cron[24374]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons) As kswapd got stuck yesterday, I think LOCKDEP found a real problem. If you need more information, please ask. I will try to provide it. Thanks, Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/