Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964992AbcJVPUq (ORCPT ); Sat, 22 Oct 2016 11:20:46 -0400 Received: from arcturus.aphlor.org ([188.246.204.175]:45502 "EHLO arcturus.aphlor.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S937229AbcJVPUn (ORCPT ); Sat, 22 Oct 2016 11:20:43 -0400 Date: Sat, 22 Oct 2016 11:20:33 -0400 From: Dave Jones To: Andy Lutomirski , Andy Lutomirski , Linus Torvalds , Chris Mason , Jens Axboe , Al Viro , Josef Bacik , David Sterba , linux-btrfs , Linux Kernel Subject: Re: bio linked list corruption. Message-ID: <20161022152033.gkmm3l75kqjzsije@codemonkey.org.uk> Mail-Followup-To: Dave Jones , Andy Lutomirski , Andy Lutomirski , Linus Torvalds , Chris Mason , Jens Axboe , Al Viro , Josef Bacik , David Sterba , linux-btrfs , Linux Kernel References: <20161018233148.GA93792@clm-mbp.masoncoding.com> <20161018234248.GB93792@clm-mbp.masoncoding.com> <332c8e94-a969-093f-1fb4-30d89be8993e@kernel.org> <20161020225028.czodw54tjbiwwv3o@codemonkey.org.uk> <20161020230341.jsxpia2sy53xn5l5@codemonkey.org.uk> <20161021200245.kahjzgqzdfyoe3uz@codemonkey.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161021200245.kahjzgqzdfyoe3uz@codemonkey.org.uk> User-Agent: NeoMutt/20161014 (1.7.1) X-Spam-Flag: skipped (authorised relay user) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2725 Lines: 65 On Fri, Oct 21, 2016 at 04:02:45PM -0400, Dave Jones wrote: > > It could be worth trying this, too: > > > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack&id=174531fef4e8 > > > > It occurred to me that the current code is a little bit fragile. > > It's been nearly 24hrs with the above changes, and it's been pretty much > silent the whole time. > > The only thing of note over that time period has been a btrfs lockdep > warning that's been around for a while, and occasional btrfs checksum > failures, which I've been seeing for a while, but seem to have gotten > worse since 4.8. > > I'm pretty confident in the disk being ok in this machine, so I think > the checksum warnings are bogus. Chris suggested they may be the result > of memory corruption, but there's little else going on. The only interesting thing last nights run was this.. BUG: Bad page state in process kworker/u8:1 pfn:4e2b70 page:ffffea00138adc00 count:0 mapcount:0 mapping:ffff88046e9fc2e0 index:0xdf0 flags: 0x400000000000000c(referenced|uptodate) page dumped because: non-NULL mapping CPU: 3 PID: 24234 Comm: kworker/u8:1 Not tainted 4.9.0-rc1-think+ #11 Workqueue: writeback wb_workfn (flush-btrfs-2) ffffc90001f97828 ffffffff8130d07c ffffea00138adc00 ffffffff819ff524 ffffc90001f97850 ffffffff8115117f 0000000000000000 ffffea00138adc00 400000000000000c ffffc90001f97860 ffffffff8115123a ffffc90001f978a8 Call Trace: [] dump_stack+0x4f/0x73 [] bad_page+0xbf/0x120 [] free_pages_check_bad+0x5a/0x70 [] free_hot_cold_page+0x248/0x290 [] free_hot_cold_page_list+0x2b/0x50 [] release_pages+0x2bd/0x350 [] __pagevec_release+0x22/0x30 [] extent_write_cache_pages.isra.48.constprop.63+0x32e/0x400 [btrfs] [] extent_writepages+0x49/0x60 [btrfs] [] ? btrfs_releasepage+0x40/0x40 [btrfs] [] btrfs_writepages+0x23/0x30 [btrfs] [] do_writepages+0x1c/0x30 [] __writeback_single_inode+0x33/0x180 [] writeback_sb_inodes+0x2a8/0x5b0 [] __writeback_inodes_wb+0x8d/0xc0 [] wb_writeback+0x1e3/0x1f0 [] wb_workfn+0xd2/0x280 [] process_one_work+0x1d5/0x490 [] ? process_one_work+0x175/0x490 [] worker_thread+0x49/0x490 [] ? process_one_work+0x490/0x490 [] kthread+0xee/0x110 [] ? kthread_park+0x60/0x60 [] ret_from_fork+0x22/0x30