Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760563Ab3JPVmh (ORCPT ); Wed, 16 Oct 2013 17:42:37 -0400 Received: from ipmail07.adl2.internode.on.net ([150.101.137.131]:63841 "EHLO ipmail07.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760423Ab3JPVme (ORCPT ); Wed, 16 Oct 2013 17:42:34 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjQHAPwHX1J5LFuj/2dsb2JhbABagwe5QIU+gR4XdIIlAQEEAScTHCMFCwgDDgoJJQ8FJQMhE4gABb9ZFo4IgTMHhCUDmAOSA4M4KIErAR8D Date: Thu, 17 Oct 2013 08:42:00 +1100 From: Dave Chinner To: Akira Hayakawa Cc: mpatocka@redhat.com, dm-devel@redhat.com, devel@driverdev.osuosl.org, thornber@redhat.com, snitzer@redhat.com, gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org, dan.carpenter@oracle.com, joe@perches.com, akpm@linux-foundation.org, m.chehab@samsung.com, ejt@redhat.com, agk@redhat.com, cesarb@cesarb.net, tj@kernel.org, xfs@oss.sgi.com Subject: Re: A review of dm-writeboost Message-ID: <20131016214200.GI4446@dastard> References: <52550841.5030001@gmail.com> <525BAB32.5050901@gmail.com> <20131016060750.GE4446@dastard> <525E6BBE.40004@gmail.com> <20131016110146.GH4446@dastard> <525E83E4.9090803@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <525E83E4.9090803@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10453 Lines: 206 On Wed, Oct 16, 2013 at 09:17:40PM +0900, Akira Hayakawa wrote: > Dave > > > XFS shuts down because you've returned EIO to a log IO. That's a > > fatal error. If you do the same to an ext4 journal write, it will do > > the equivalent of shut down (e.g. complain and turn read-only). > You mean block device should not return -EIO anyway if > it doesn't want XFS to suddenly shut down? Yes. EIO means an IO error has occurred. That causes failure paths to be triggered in the upper layers. I really don't understand what you are trying to achieve with this "blockup" thing. If something goes wrong with the device, then you *cannot recover* by sending EIO to any new IOs and then continuing on at a later time as though nothing has happened. The moment a filesystem gets EIO from a metadata write, it is likely to be corrupted and if you continue onwards after that you simply propagate the corruption. > As Mikulas said, connection failure often be the cause of > I/O error from the underlying devices. Connection failure is *rarely* the cause of IO errors, except in environments where SANs are in use. Even then multipathing makes fatal connection failure a rare occurrence. Broken hardware is a much more common cause of problems at the storage layers. > That ext4 and XFS are both determine shutting down in > erroneous journal writes is also due to the limitation of journal write? > or just a compromise in implementation? This is just for my curiosity. A failed, unrecoverable journal write violates the filesystem consistency model of any journalling filesystem. Operations must be stopped and the hardware and filesystem must be repaired, otherwise loss of data will occur. i.e. You're tellingthe filesystem that it's had a fatal IO error by returning EIO, and the filesystems are treating it as though they've seen a fatal IO error. Simple rule: Don't complete IOs with EIO if you haven't had a fatal IO error. > struct wb_cache *cache = data; > struct wb_device *wb = cache->wb; > unsigned long intvl; > > while (!kthread_should_stop()) { > > wait_on_blockup(); Ugh. You should be using workqueue with timed work for this. BTW, you're missing the handling needed by these kernel threads for suspend-to-disk/ram.... > [ 68.825016] XFS (dm-3): Mounting Filesystem > [ 68.847027] XFS (dm-3): Ending clean mount > [ 72.100112] device-mapper: writeboost: err@dm_safe_io_internal() system is blocked up on I/O error. set blockup to 0 after checkup. > [ 72.109702] device-mapper: writeboost: err@migrate_proc() system is blocked up on I/O error. set blockup to 0 after checkup. > [ 72.812097] device-mapper: writeboost: err@modulator_proc() system is blocked up on I/O error. set blockup to 0 after checkup. > [ 73.894429] Buffer I/O error on device dm-3, logical block 98354 > [ 73.895824] lost page write due to I/O error on dm-3 Data IO has been lost due to EIOs. You've corrupted user files when this error is emitted. ... > [ 73.930022] XFS (dm-3): metadata I/O error: block 0x40 ("xfs_buf_iodone_callbacks") error 5 numblks 16 > [ 74.036759] XFS (dm-3): metadata I/O error: block 0x300c7f ("xlog_iodone") error 5 numblks 64 > [ 74.043456] XFS (dm-3): xfs_do_force_shutdown(0x2) called from line 1161 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa03a6417 > [ 74.047556] XFS (dm-3): Log I/O Error Detected. Shutting down filesystem > [ 74.049893] XFS (dm-3): Please umount the filesystem and rectify the problem(s) > [ 74.051467] XFS (dm-3): metadata I/O error: block 0x300cbf ("xlog_iodone") error 5 numblks 64 > [ 74.053190] XFS (dm-3): xfs_do_force_shutdown(0x2) called from line 1161 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa03a6417 And these are all the log buffers containing uncommitted changes being aborted due to EIO. The filesystem state in memory now doesn't match the state on disk, and so it's effectively corrupt and shuts down. .... > [ 100.052005] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1H:215] ... > [ 100.052005] RIP: 0010:[] [] do_raw_spin_lock+0x16/0x23 > [ 100.052005] Workqueue: xfslogd xfs_buf_iodone_work [xfs] > [ 100.052005] Call Trace: > [ 100.052005] [] ? xfs_buf_iodone+0x1b/0x49 [xfs] > [ 100.052005] [] ? xfs_buf_do_callbacks+0x22/0x30 [xfs] > [ 100.052005] [] ? xfs_buf_iodone_callbacks+0x16b/0x1c4 [xfs] > [ 100.052005] [] ? process_one_work+0x191/0x294 > [ 100.052005] [] ? worker_thread+0x121/0x1e7 > [ 100.052005] [] ? rescuer_thread+0x269/0x269 > [ 100.052005] [] ? kthread+0x81/0x89 > [ 100.052005] [] ? __kthread_parkme+0x5d/0x5d > [ 100.052005] [] ? ret_from_fork+0x7c/0xb0 > [ 100.052005] [] ? __kthread_parkme+0x5d/0x5d You need to compile your kernel with framepointers enabled so we get reliable stack traces. I think it's stuck on a spinlock in xfs_buf_iodone, which would imply the AIL lock. ..... > [ 100.244006] CPU: 2 PID: 3167 Comm: xfsaild/dm-3 Tainted: G O 3.12.0-rc1 #8 FWIW, you should probably be testing the lastest Linus kernel (3.12-rc5, IIRC) rather than -rc1.... .... > [ 100.244006] RIP: 0010:[] [] do_raw_spin_lock+0x16/0x23 > [ 100.244006] Call Trace: > [ 100.244006] [] ? xfs_trans_committed_bulk+0x2f/0x19d [xfs] > [ 100.244006] [] ? _xfs_buf_ioapply+0x271/0x29c [xfs] > [ 100.244006] [] ? remove_wait_queue+0xe/0x48 > [ 100.244006] [] ? xlog_wait+0x62/0x6b [xfs] > [ 100.244006] [] ? try_to_wake_up+0x190/0x190 > [ 100.244006] [] ? xlog_state_get_iclog_space+0x5a/0x1fb [xfs] > [ 100.244006] [] ? __cache_free.isra.46+0x178/0x187 > [ 100.244006] [] ? xlog_cil_committed+0x2f/0xe6 [xfs] > [ 100.244006] [] ? xlog_cil_push+0x2f6/0x311 [xfs] > [ 100.244006] [] ? mmdrop+0xd/0x1c > [ 100.244006] [] ? xlog_cil_force_lsn+0x71/0xdd [xfs] > [ 100.244006] [] ? _xfs_log_force+0x55/0x1a0 [xfs] > [ 100.244006] [] ? xfs_log_force+0x1f/0x4e [xfs] > [ 100.244006] [] ? xfsaild+0x144/0x4cd [xfs] > [ 100.244006] [] ? finish_task_switch+0x7f/0xaa > [ 100.244006] [] ? xfs_trans_ail_cursor_first+0x76/0x76 [xfs] > [ 100.244006] [] ? xfs_trans_ail_cursor_first+0x76/0x76 [xfs] > [ 100.244006] [] ? kthread+0x81/0x89 > [ 100.244006] [] ? __kthread_parkme+0x5d/0x5d > [ 100.244006] [] ? ret_from_fork+0x7c/0xb0 > [ 100.244006] [] ? __kthread_parkme+0x5d/0x5d It's stuck on a spin lock, but I don't know what function it's in because the stack trace is indeterminate (i.e. need frame pointers enabled). It might be the AIL lock (as it's the xfsaild), but I can't tell. > [ 100.436010] BUG: soft lockup - CPU#4 stuck for 22s! [kworker/4:2:537] ... > [ 100.436010] Workqueue: xfs-reclaim/dm-3 xfs_reclaim_worker [xfs] > [ 100.436010] RIP: 0010:[] [] do_raw_spin_lock+0x13/0x23 > [ 100.436010] Call Trace: > [ 100.436010] [] ? xfs_iflush_abort+0x35/0x9a [xfs] > [ 100.436010] [] ? xfs_reclaim_inode+0x85/0x246 [xfs] > [ 100.436010] [] ? xfs_reclaim_inodes_ag+0x147/0x1fc [xfs] > [ 100.436010] [] ? try_to_wake_up+0x190/0x190 > [ 100.436010] [] ? __wake_up_common+0x42/0x78 > [ 100.436010] [] ? fold_diff+0x22/0x2e > [ 100.436010] [] ? lock_timer_base.isra.35+0x23/0x48 > [ 100.436010] [] ? internal_add_timer+0xd/0x28 > [ 100.436010] [] ? __mod_timer+0xfa/0x10c > [ 100.436010] [] ? xfs_reclaim_inodes+0x16/0x1b [xfs] > [ 100.436010] [] ? xfs_reclaim_worker+0x15/0x1e [xfs] > [ 100.436010] [] ? process_one_work+0x191/0x294 > [ 100.436010] [] ? worker_thread+0x121/0x1e7 > [ 100.436010] [] ? rescuer_thread+0x269/0x269 > [ 100.436010] [] ? kthread+0x81/0x89 > [ 100.436010] [] ? __kthread_parkme+0x5d/0x5d > [ 100.436010] [] ? ret_from_fork+0x7c/0xb0 > [ 100.436010] [] ? __kthread_parkme+0x5d/0x5d Also stuck on a spin lock, but again it is not obvious what function it is in and hence what spinlock is affected. xfs_iflush_abort() does take the AIL lock, so that might be it. > [ 100.628005] BUG: soft lockup - CPU#6 stuck for 22s! [script:3151] > [ 100.628005] RIP: 0010:[] [] do_raw_spin_lock+0x16/0x23 > [ 100.628005] Call Trace: > [ 100.628005] [] ? xfs_ail_push_all+0x13/0x4f [xfs] > [ 100.628005] [] ? xfs_reclaim_inodes_nr+0x1a/0x34 [xfs] > [ 100.628005] [] ? super_cache_scan+0x121/0x13e > [ 100.628005] [] ? shrink_slab+0x1e3/0x2f9 > [ 100.628005] [] ? iput+0x34/0x13d > [ 100.628005] [] ? do_coredump+0xbc3/0xbc3 > [ 100.628005] [] ? drop_caches_sysctl_handler+0x65/0x76 > [ 100.628005] [] ? proc_sys_call_handler+0x98/0xbf > [ 100.628005] [] ? vfs_write+0x9e/0x104 > [ 100.628005] [] ? SyS_write+0x51/0x85 > [ 100.628005] [] ? system_call_fastpath+0x16/0x1b That can only be stuck on the AIL spin lock. So, I've just audited all the uses of the AIL lock, and I cannot find an unbalanced user of the AIL lock. If we've leaked the spin lock, it's not an obvious or easy to trigger bug. Can you turn on lockdep as well as CONFIG_XFS_DEBUG and see what warnings that throws? Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/