Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753528AbbLNUC3 (ORCPT ); Mon, 14 Dec 2015 15:02:29 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:37120 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751814AbbLNUC1 (ORCPT ); Mon, 14 Dec 2015 15:02:27 -0500 Date: Mon, 14 Dec 2015 15:01:51 -0500 From: Chris Mason To: Dave Jones , Linus Torvalds , Peter Zijlstra , LKML , Jon Christopherson , NeilBrown , Ingo Molnar , David Howells , Steven Whitehouse Subject: Re: [PATCH] lock_page() doesn't lock if __wait_on_bit_lock returns -EINTR Message-ID: <20151214200151.GA12014@clm-mbp.thefacebook.com> Mail-Followup-To: Chris Mason , Dave Jones , Linus Torvalds , Peter Zijlstra , LKML , Jon Christopherson , NeilBrown , Ingo Molnar , David Howells , Steven Whitehouse References: <20151212162342.GF11257@ret.masoncoding.com> <20151213000746.GA26204@clm-mbp.thefacebook.com> <20151214183356.GA5251@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20151214183356.GA5251@fb.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Originating-IP: [192.168.52.123] X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2015-12-14_12:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2951 Lines: 63 On Mon, Dec 14, 2015 at 01:33:56PM -0500, Dave Jones wrote: > On Sat, Dec 12, 2015 at 07:07:46PM -0500, Chris Mason wrote: > > On Sat, Dec 12, 2015 at 11:41:26AM -0800, Linus Torvalds wrote: > > > On Sat, Dec 12, 2015 at 10:33 AM, Linus Torvalds > > > wrote: > > > > > > > > Peter, did that patch also handle just plain "lock_page()" case? > > > > > > Looking more at it, I think this all goes back to commit 743162013d40 > > > ("sched: Remove proliferation of wait_on_bit() action functions"). > > > > > > It looks like PeterZ's pending patch should fix this, by passing in > > > the proper TASK_UNINTERRUPTIBLE to the bit_wait_io function, and going > > > back to signal_pending_state(). PeterZ, did I follow the history of > > > this correctly? > > > > Looks right to me, I found Peter's patch and have it running now. After > > about 6 hours my patch did eventually crash again under trinity. Btrfs has a > > very old (from 2011) bug in the error handling path that trinity is > > banging on. > > Is the other bug this one ? I've hit this quite a lot over the last 12 months, > and now that the lock_page bug is fixed this is showing up again. > > page:ffffea00110d2700 count:4 mapcount:0 mapping:ffff88045b5160a0 index:0x0 > flags: 0x8000000000000806(error|referenced|private) > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) [ snip ] > [] prepare_uptodate_page+0x39/0x80 [btrfs] > [] prepare_pages+0x19e/0x210 [btrfs] This should be the second call to prepare_uptodate_page() in prepare_pages(). If we get an error on the first call, and the write only spans a single page, we'll call prepare_uptodate_page a second time on an unlocked page. I'll send out the patch a little later this afternoon. > [] __btrfs_buffered_write+0x351/0x8a0 [btrfs] > [] ? btrfs_dirty_pages+0xf0/0xf0 [btrfs] > [] ? generic_file_direct_write+0x1aa/0x2c0 > [] ? generic_file_read_iter+0xa00/0xa00 > [] btrfs_file_write_iter+0x6dd/0x800 [btrfs] > [] __vfs_write+0x21d/0x260 > [] ? __vfs_read+0x260/0x260 > [] ? __lock_is_held+0x92/0xd0 > [] ? preempt_count_sub+0xc1/0x120 > [] ? percpu_down_read+0x57/0xa0 > [] ? __sb_start_write+0xb4/0xf0 > [] vfs_write+0xf6/0x260 > [] SyS_write+0xbf/0x160 > [] ? SyS_read+0x160/0x160 > [] ? trace_hardirqs_on_thunk+0x17/0x19 > [] entry_SYSCALL_64_fastpath+0x12/0x6b -chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/