Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934306AbcJRWmY (ORCPT ); Tue, 18 Oct 2016 18:42:24 -0400 Received: from arcturus.aphlor.org ([188.246.204.175]:38712 "EHLO arcturus.aphlor.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755988AbcJRWmN (ORCPT ); Tue, 18 Oct 2016 18:42:13 -0400 Date: Tue, 18 Oct 2016 18:42:05 -0400 From: Dave Jones To: Al Viro , Chris Mason , Josef Bacik , David Sterba , linux-btrfs@vger.kernel.org, Linux Kernel , axboe@fb.com, Linus Torvalds Subject: Re: bio linked list corruption. Message-ID: <20161018224205.bjgloslaxcej2td2@codemonkey.org.uk> Mail-Followup-To: Dave Jones , Al Viro , Chris Mason , Josef Bacik , David Sterba , linux-btrfs@vger.kernel.org, Linux Kernel , axboe@fb.com, Linus Torvalds References: <20161011144507.okg6baqvodn2m2lh@codemonkey.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161011144507.okg6baqvodn2m2lh@codemonkey.org.uk> User-Agent: NeoMutt/20161014 (1.7.1) X-Spam-Flag: skipped (authorised relay user) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5123 Lines: 98 On Tue, Oct 11, 2016 at 10:45:07AM -0400, Dave Jones wrote: > WARNING: CPU: 1 PID: 3673 at lib/list_debug.c:33 __list_add+0x89/0xb0 > list_add corruption. prev->next should be next (ffffe8ffff806648), but was ffffc9000067fcd8. (prev=ffff880503878b80). > CPU: 1 PID: 3673 Comm: trinity-c0 Not tainted 4.8.0-think+ #13 > ffffc90000d87458 ffffffff8d32007c ffffc90000d874a8 0000000000000000 > ffffc90000d87498 ffffffff8d07a6c1 0000002100000246 ffff88050388e880 > ffff880503878b80 ffffe8ffff806648 ffffe8ffffc06600 ffff880502808008 > Call Trace: > [] dump_stack+0x4f/0x73 > [] __warn+0xc1/0xe0 > [] warn_slowpath_fmt+0x5a/0x80 > [] __list_add+0x89/0xb0 > [] blk_sq_make_request+0x2f8/0x350 > [] ? generic_make_request+0xec/0x240 > [] generic_make_request+0xf9/0x240 > [] submit_bio+0x78/0x150 > [] ? __percpu_counter_add+0x85/0xb0 > [] btrfs_map_bio+0x19e/0x330 [btrfs] > [] btree_submit_bio_hook+0xfa/0x110 [btrfs] > [] submit_one_bio+0x65/0xa0 [btrfs] > [] read_extent_buffer_pages+0x2f0/0x3d0 [btrfs] > [] ? free_root_pointers+0x60/0x60 [btrfs] > [] btree_read_extent_buffer_pages.constprop.55+0xa8/0x110 [btrfs] > [] read_tree_block+0x2d/0x50 [btrfs] > [] read_block_for_search.isra.33+0x134/0x330 [btrfs] > [] ? _raw_write_unlock+0x2c/0x50 > [] ? unlock_up+0x16c/0x1a0 [btrfs] > [] btrfs_search_slot+0x450/0xa40 [btrfs] > [] btrfs_del_csums+0xe3/0x2e0 [btrfs] > [] __btrfs_free_extent.isra.82+0x32d/0xc90 [btrfs] > [] __btrfs_run_delayed_refs+0x4d3/0x1010 [btrfs] > [] ? debug_smp_processor_id+0x17/0x20 > [] ? get_lock_stats+0x19/0x50 > [] btrfs_run_delayed_refs+0x9c/0x2d0 [btrfs] > [] btrfs_truncate_inode_items+0x888/0xda0 [btrfs] > [] btrfs_truncate+0xe5/0x2b0 [btrfs] > [] btrfs_setattr+0x249/0x360 [btrfs] > [] notify_change+0x252/0x440 > [] do_truncate+0x6e/0xc0 > [] do_sys_ftruncate.constprop.19+0x10c/0x170 > [] ? __this_cpu_preempt_check+0x13/0x20 > [] SyS_ftruncate+0x9/0x10 > [] do_syscall_64+0x5c/0x170 > [] entry_SYSCALL64_slow_path+0x25/0x25 So Chris had me do a run on ext4 just for giggles. It took a while, but eventually this fell out... WARNING: CPU: 3 PID: 21324 at lib/list_debug.c:33 __list_add+0x89/0xb0 list_add corruption. prev->next should be next (ffffe8ffffc05648), but was ffffc9000028bcd8. (prev=ffff880503a145c0). CPU: 3 PID: 21324 Comm: modprobe Not tainted 4.9.0-rc1-think+ #1 ffffc90000a6b7b8 ffffffff81320e3c ffffc90000a6b808 0000000000000000 ffffc90000a6b7f8 ffffffff8107a711 0000002100000246 ffff8805039f1740 ffff880503a145c0 ffffe8ffffc05648 ffffe8ffffa05600 ffff880502c39548 Call Trace: [] dump_stack+0x4f/0x73 [] __warn+0xc1/0xe0 [] warn_slowpath_fmt+0x5a/0x80 [] __list_add+0x89/0xb0 [] blk_sq_make_request+0x2f8/0x350 [] ? generic_make_request+0xec/0x240 [] generic_make_request+0xf9/0x240 [] submit_bio+0x78/0x150 [] ? __find_get_block+0x126/0x130 [] submit_bh_wbc+0x16f/0x1e0 [] ? __end_buffer_read_notouch+0x20/0x20 [] ll_rw_block+0xa8/0xb0 [] __breadahead+0x3f/0x70 [] __ext4_get_inode_loc+0x37c/0x3d0 [] ext4_iget+0x8d/0xb90 [] ? d_alloc_parallel+0x329/0x700 [] ext4_iget_normal+0x2a/0x30 [] ext4_lookup+0x136/0x250 [] lookup_slow+0x12d/0x220 [] walk_component+0x1e7/0x310 [] ? path_init+0x4d8/0x520 [] path_lookupat+0x62/0x120 [] ? getname_flags+0x32/0x180 [] filename_lookup+0xa8/0x130 [] ? strncpy_from_user+0x46/0x170 [] ? getname_flags+0x4e/0x180 [] user_path_at_empty+0x31/0x40 [] vfs_fstatat+0x61/0xc0 [] ? __lock_acquire.isra.32+0x1cf/0x8c0 [] SYSC_newstat+0x2e/0x60 [] ? __this_cpu_preempt_check+0x13/0x20 [] SyS_newstat+0x9/0x10 [] do_syscall_64+0x5c/0x170 [] entry_SYSCALL64_slow_path+0x25/0x25 So this one isn't a btrfs specific problem as I first thought. This sometimes reproduces within minutes, sometimes hours, which makes it a pain to bisect. It only started showing up this merge window though. Dave