Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760024AbZFNOPN (ORCPT ); Sun, 14 Jun 2009 10:15:13 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755168AbZFNOPB (ORCPT ); Sun, 14 Jun 2009 10:15:01 -0400 Received: from tau.jukie.net ([216.239.93.128]:58920 "EHLO tau.jukie.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754799AbZFNOO7 (ORCPT ); Sun, 14 Jun 2009 10:14:59 -0400 Date: Sun, 14 Jun 2009 10:14:59 -0400 From: Bart Trojanowski To: David Howells , linux-kernel@vger.kernel.org Cc: linux-cachefs@redhat.com, linux-nfs@vger.kernel.org, linux-mm@kvack.org Subject: Re: [v2.6.30 nfs+fscache] swapper: possible circular locking dependency detected Message-ID: <20090614141459.GA5543@jukie.net> References: <20090613182721.GA24072@jukie.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090613182721.GA24072@jukie.net> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9711 Lines: 217 Hi, yesterday [1] I reported a kswapd hanging in D state when using fscache for nfs. [1] http://article.gmane.org/gmane.linux.kernel.mm/35601 Since then I rebuilt the kernel with lockdep and booted to find out that by default... BUG: MAX_LOCK_DEPTH too low! turning off the locking correctness validator. ... lockdep wasn't going to help. I increased the value by 128 (hence the 'dirty' in the version) and now I am seeing the following report shortly after boot. If you scroll down past the locking report, you'll see 'CacheFiles: Lookup failed'. This shows up after heavy use (with cat) of my nfs filesystem with fscache enabled. Around that time I saw my user space program report 'Permission denied'. Let me know if you need more info. -Bart ---- 8< ---- ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.30-kvm3-dirty #4 ------------------------------------------------------- swapper/0 is trying to acquire lock: (&cwq->lock){..-...}, at: [] __queue_work+0x1d/0x43 but task is already holding lock: (&q->lock){-.-.-.}, at: [] __wake_up+0x27/0x55 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&q->lock){-.-.-.}: [] __lock_acquire+0x1350/0x16b4 [] lock_acquire+0xc7/0xf3 [] _spin_lock_irqsave+0x4f/0x86 [] __wake_up+0x27/0x55 [] insert_work+0x9a/0xa6 [] __queue_work+0x2f/0x43 [] queue_work_on+0x4a/0x53 [] queue_work+0x1f/0x21 [] call_usermodehelper_exec+0x8e/0xbc [] kobject_uevent_env+0x3b0/0x3ee [] kobject_uevent+0xb/0xd [] kset_register+0x37/0x3f [] bus_register+0xf3/0x280 [] platform_bus_init+0x2c/0x44 [] driver_init+0x1d/0x29 [] kernel_init+0x123/0x18c [] child_rip+0xa/0x20 [] 0xffffffffffffffff -> #0 (&cwq->lock){..-...}: [] 0xffffffffffffffff other info that might help us debug this: 1 lock held by swapper/0: #0: (&q->lock){-.-.-.}, at: [] __wake_up+0x27/0x55 stack backtrace: Pid: 0, comm: swapper Not tainted 2.6.30-kvm3-dirty #4 Call Trace: [] print_circular_bug_tail+0xc1/0xcc [] __lock_acquire+0x1085/0x16b4 [] ? save_trace+0x3f/0xa6 [] ? __lock_acquire+0x15d2/0x16b4 [] lock_acquire+0xc7/0xf3 [] ? __queue_work+0x1d/0x43 [] _spin_lock_irqsave+0x4f/0x86 [] ? __queue_work+0x1d/0x43 [] __queue_work+0x1d/0x43 [] queue_work_on+0x4a/0x53 [] queue_work+0x1f/0x21 [] schedule_work+0x1b/0x1d [] fscache_enqueue_operation+0xec/0x11e [fscache] [] cachefiles_read_waiter+0xee/0x102 [cachefiles] [] __wake_up_common+0x4b/0x7a [] __wake_up+0x3d/0x55 [] __wake_up_bit+0x31/0x33 [] unlock_page+0x27/0x2b [] mpage_end_io_read+0x60/0x77 [] bio_endio+0x2f/0x31 [] raid_end_bio_io+0x3c/0x8c [] raid1_end_read_request+0xb4/0x13b [] ? native_sched_clock+0x32/0x60 [] bio_endio+0x2f/0x31 [] req_bio_endio+0xa7/0xc6 [] __end_that_request_first+0x1b0/0x2ca [] ? __end_that_request_first+0xd2/0x2ca [] end_that_request_data+0x2a/0x5f [] blk_end_io+0x22/0x7c [] blk_end_request+0x13/0x15 [] scsi_io_completion+0x1d8/0x458 [] scsi_finish_command+0xf1/0xfa [] scsi_softirq_done+0x125/0x12e [] blk_done_softirq+0x81/0x91 [] __do_softirq+0xbc/0x198 [] call_softirq+0x1c/0x28 [] do_softirq+0x50/0xb1 [] irq_exit+0x53/0x8d [] do_IRQ+0xb1/0xc8 [] ret_from_intr+0x0/0x16 [] finish_task_switch+0x40/0x111 [] ? __atomic_notifier_call_chain+0x0/0x87 [] ? native_safe_halt+0xb/0xd [] ? trace_hardirqs_on+0xd/0xf [] default_idle+0x71/0xc2 [] ? __atomic_notifier_call_chain+0x78/0x87 [] ? __atomic_notifier_call_chain+0x0/0x87 [] c1e_idle+0x11e/0x125 [] ? atomic_notifier_call_chain+0xf/0x11 [] cpu_idle+0x62/0xa3 [] start_secondary+0x1c1/0x1c5 ---- 8< ---- CacheFiles: Lookup failed error -105 CacheFiles: Lookup failed error -105 INFO: task cat:8884 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. cat D ffff880028012d00 0 8884 8832 ffff880436243998 0000000000000046 0000000000000000 ffff880028012d18 ffff880028012d00 00000000001d3000 000000000000d4e0 ffff880436378000 ffff88043f0d8000 ffff880436378598 0000000700000046 ffffc20000ead018 Call Trace: [] ? trace_hardirqs_on+0xd/0xf [] schedule+0xe/0x22 [] fscache_wait_bit+0xe/0x12 [fscache] [] __wait_on_bit+0x4c/0x7e [] ? fscache_wait_bit+0x0/0x12 [fscache] [] ? fscache_wait_bit+0x0/0x12 [fscache] [] out_of_line_wait_on_bit+0x6f/0x7c [] ? wake_bit_function+0x0/0x2f [] __fscache_read_or_alloc_pages+0x1a4/0x252 [fscache] [] ? nfs_readpage_from_fscache_complete+0x0/0x6b [nfs] [] __nfs_readpages_from_fscache+0x87/0x174 [nfs] [] nfs_readpages+0x11f/0x1d4 [nfs] [] ? cpuset_update_task_memory_state+0x6a/0x11e [] ? cpuset_update_task_memory_state+0x0/0x11e [] ? alloc_pages_current+0xbe/0xc7 [] __do_page_cache_readahead+0x164/0x1f1 [] ? __do_page_cache_readahead+0x80/0x1f1 [] ? print_lock_contention_bug+0x1e/0x110 [] ondemand_readahead+0x1d4/0x1e9 [] page_cache_sync_readahead+0x1c/0x1e [] generic_file_aio_read+0x23a/0x5d4 [] ? nfs_have_delegation+0x0/0x8a [nfs] [] nfs_file_read+0xeb/0xfe [nfs] [] do_sync_read+0xec/0x132 [] ? cp_new_stat+0xe7/0xf4 [] ? autoremove_wake_function+0x0/0x3d [] ? security_file_permission+0x16/0x18 [] vfs_read+0xb0/0x159 [] sys_read+0x4c/0x74 [] system_call_fastpath+0x16/0x1b INFO: lockdep is turned off. INFO: task top:10437 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. top D ffff88024d238680 0 10437 5425 ffff880085d53998 0000000000000046 0000000000000000 ffff88024d238698 ffff88024d238680 00000000001d3000 000000000000d4e0 ffff8803fa5b0000 ffff88043f0d8000 ffff8803fa5b0598 0000000700000046 ffffc20000ead018 Call Trace: [] ? trace_hardirqs_on+0xd/0xf [] schedule+0xe/0x22 [] fscache_wait_bit+0xe/0x12 [fscache] [] __wait_on_bit+0x4c/0x7e [] ? fscache_wait_bit+0x0/0x12 [fscache] [] ? fscache_wait_bit+0x0/0x12 [fscache] [] out_of_line_wait_on_bit+0x6f/0x7c [] ? wake_bit_function+0x0/0x2f [] __fscache_read_or_alloc_pages+0x1a4/0x252 [fscache] [] ? nfs_readpage_from_fscache_complete+0x0/0x6b [nfs] [] __nfs_readpages_from_fscache+0x87/0x174 [nfs] [] nfs_readpages+0x11f/0x1d4 [nfs] [] ? cpuset_update_task_memory_state+0x6a/0x11e [] ? cpuset_update_task_memory_state+0x0/0x11e [] ? alloc_pages_current+0xbe/0xc7 [] __do_page_cache_readahead+0x164/0x1f1 [] ? __do_page_cache_readahead+0x80/0x1f1 [] ondemand_readahead+0x1d4/0x1e9 [] page_cache_sync_readahead+0x1c/0x1e [] generic_file_aio_read+0x23a/0x5d4 [] ? nfs_have_delegation+0x0/0x8a [nfs] [] nfs_file_read+0xeb/0xfe [nfs] [] do_sync_read+0xec/0x132 [] ? nameidata_to_filp+0x46/0x57 [] ? autoremove_wake_function+0x0/0x3d [] ? fd_install+0x35/0x64 [] ? security_file_permission+0x16/0x18 [] vfs_read+0xb0/0x159 [] sys_read+0x4c/0x74 [] system_call_fastpath+0x16/0x1b INFO: lockdep is turned off. -- WebSig: http://www.jukie.net/~bart/sig/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/