From: Dmitry Monakhov Subject: Re: ext4_orphan_del() sleeps in non-journal mode Date: Sat, 15 Sep 2012 14:06:21 +0400 Message-ID: <87zk4rwqg2.fsf@openvz.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii To: Anatol Pomozov , linux-ext4@vger.kernel.org, Theodore Ts'o Return-path: Received: from mail-lb0-f174.google.com ([209.85.217.174]:51592 "EHLO mail-lb0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751976Ab2IOKGZ (ORCPT ); Sat, 15 Sep 2012 06:06:25 -0400 Received: by lbbgj3 with SMTP id gj3so3285265lbb.19 for ; Sat, 15 Sep 2012 03:06:23 -0700 (PDT) In-Reply-To: Sender: linux-ext4-owner@vger.kernel.org List-ID: On Fri, 14 Sep 2012 14:06:10 -0700, Anatol Pomozov wrote: > Hi, > > I am debugging one issue that happens on our servers. We use ext4 with > non-journaling mode (2.6.34 kernel) and when we try to use > asynchronous IO we see following oops in dmesg: Strange.. I can't find the exact place there ext4_end_io_dio invokes ext4_orphan_del(). Can you please post the place you are talking about. > > <3>[ 3983.762966] bad: scheduling from the idle thread! > <4>[ 3983.762968] Pid: 0, comm: swapper > <4>[ 3983.762970] Call Trace: > <4>[ 3983.762972] [] dequeue_task_idle+0x24/0x30 > <4>[ 3983.762980] [] schedule+0x2a98/0x3310 > <4>[ 3983.762985] [] ? sched_clock_cpu+0x2a/0xe0 > <4>[ 3983.762988] [] ? mempool_alloc+0xa7/0x1a0 > <4>[ 3983.762992] [] __mutex_lock_common.isra.3+0x14b/0x1d0 > <4>[ 3983.762996] [] __mutex_lock_slowpath+0x13/0x20 > <4>[ 3983.762999] [] mutex_lock+0x22/0x40 > <4>[ 3983.763004] [] ext4_orphan_del+0x4f/0x2e0 > <4>[ 3983.763008] [] ? insert_work+0x6c/0xb0 > <4>[ 3983.763011] [] ? diskmon_bio_complete+0x798/0xda0 > <4>[ 3983.763016] [] ext4_end_io_dio+0xb7/0x1d7 > <4>[ 3983.763021] [] dio_fast_end_async+0x1bc/0x1d0 > <4>[ 3983.763025] [] ? blk_complete_request+0x1a/0x20 > <4>[ 3983.763028] [] bio_endio+0x6d/0x80 > <4>[ 3983.763033] [] req_bio_endio+0x62/0xb0 > <4>[ 3983.763036] [] blk_update_request+0x142/0x3f0 > <4>[ 3983.763041] [] ? ata_qc_complete+0xae/0x1f0 > <4>[ 3983.763044] [] blk_end_bidi_request+0x2c/0xa0 > <4>[ 3983.763047] [] blk_end_request+0x10/0x20 > <4>[ 3983.763050] [] scsi_io_completion+0xac/0x520 > <4>[ 3983.763053] [] scsi_finish_command+0xb7/0x110 > <4>[ 3983.763056] [] scsi_softirq_done+0x6f/0x140 > <4>[ 3983.763059] [] blk_done_softirq+0x77/0x80 > <4>[ 3983.763062] [] __do_softirq+0x37f/0x3e0 > <4>[ 3983.763066] [] ? ack_apic_level+0x7c/0x1f0 > <4>[ 3983.763070] [] call_softirq+0x1c/0x30 > <4>[ 3983.763072] [] do_softirq+0x41/0x80 > <4>[ 3983.763074] [] irq_exit+0x49/0xa0 > <4>[ 3983.763077] [] do_IRQ+0x72/0xe0 > <4>[ 3983.763083] [] ret_from_intr+0x0/0xa > <4>[ 3983.763084] [] ? c1e_idle+0x70/0x170 > <4>[ 3983.763089] [] cpu_idle+0x90/0x130 > <4>[ 3983.763091] [] rest_init+0x7e/0x80 > <4>[ 3983.763094] [] start_kernel+0x3b7/0x3c3 > <4>[ 3983.763097] [] x86_64_start_reservations+0x141/0x145 > <4>[ 3983.763101] [] x86_64_start_kernel+0x117/0x11e > > > > So the problem is that ext4_orphan_del() wants to sleep in softirq > context. I started debugging and here are some questions. > > The first question is why ext4_orphan_del() sleeps in no-journal mode > at all. It gets mutex to manipulate with i_orphan list but this list > is used only in journaling mode. In non-journal mode (in my case) both > ext4_orphan_del() and ext4_orphan_add() should be no-op. > > ext4_orphan_del() gets mutex in no-journal mode when it is called with > NULL as a first parameter. There are 10 places in fs/ext4 where it > happens: > > $ git grep "ext4_orphan_del(NULL" > fs/ext4/indirect.c:845: ext4_orphan_del(NULL, inode); > fs/ext4/inode.c:249: ext4_orphan_del(NULL, inode); > fs/ext4/inode.c:281: ext4_orphan_del(NULL, inode); > fs/ext4/inode.c:956: ext4_orphan_del(NULL, inode); > fs/ext4/inode.c:1069: ext4_orphan_del(NULL, inode); > fs/ext4/inode.c:1111: ext4_orphan_del(NULL, inode); > fs/ext4/inode.c:1177: ext4_orphan_del(NULL, inode); > fs/ext4/inode.c:4338: > ext4_orphan_del(NULL, inode); > fs/ext4/inode.c:4365: ext4_orphan_del(NULL, inode); > fs/ext4/migrate.c:516: ext4_orphan_del(NULL, tmp_inode); > > > There was a change that fixes ext4_orphan_del(NULL) issue in > ext4_setattr for no-journal mode 3d287de3b828 . And I think we should > fix all other places as well. > > There are several possible solutions for this issue: > 1) Pass handle received by ext4_journal_current_handle() or similar. > Why do we pass NULL at all when we can use the handle? I see that in > some functions we already have "handle" variable that we can re-use. > 2) Follow the way used by Dmitry and call ext4_orphan_del only if > ext4_orphan_add was successful *and* handle is valid. This is not > always possible as not all _del() are paired with _add() in the same > function. > 3) Inside ext4_orphan_del() and ext4_orphan_add() check if journal is > enabled. Do nothing if this is no-journal mode. What is the best way > to check no-journal mode? Is it just "if (EXT4_SB(sb)->s_journal) ..." > > It seems that #1 is the best way. > > PS once this no-journal issue will be clarified I'll take a look at > sleeping issue in journaling mode. > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html