Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751560Ab3JPGH6 (ORCPT ); Wed, 16 Oct 2013 02:07:58 -0400 Received: from ipmail06.adl6.internode.on.net ([150.101.137.145]:57140 "EHLO ipmail06.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750862Ab3JPGH5 (ORCPT ); Wed, 16 Oct 2013 02:07:57 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApQVANcsXlJ5LFuj/2dsb2JhbABagwc4riUDjxuFRIEeF3SCJQEBBAE6HCMFCwgDDgoJJQ8FJQMhE4gABQ2+FxaMZoJVB4MfgQYDmAOSA4M4KA Date: Wed, 16 Oct 2013 17:07:50 +1100 From: Dave Chinner To: Mikulas Patocka Cc: Akira Hayakawa , dm-devel@redhat.com, devel@driverdev.osuosl.org, thornber@redhat.com, snitzer@redhat.com, gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org, dan.carpenter@oracle.com, joe@perches.com, akpm@linux-foundation.org, m.chehab@samsung.com, ejt@redhat.com, agk@redhat.com, cesarb@cesarb.net, tj@kernel.org, xfs@oss.sgi.com Subject: Re: A review of dm-writeboost Message-ID: <20131016060750.GE4446@dastard> References: <52550841.5030001@gmail.com> <525BAB32.5050901@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2629 Lines: 75 [cc xfs@oss.sgi.com] On Tue, Oct 15, 2013 at 08:01:45PM -0400, Mikulas Patocka wrote: > On Mon, 14 Oct 2013, Akira Hayakawa wrote: > > But, XFS stalls ... > > ------------------- > > For testing, > > I manually turns `blockup` to 1 > > when compiling Ruby is in progress > > on XFS on a writeboost device. > > As soon as I do it, > > XFS starts to dump error message > > like "metadata I/O error: ... ("xlog_iodone") error ..." > > and after few seconds it then starts to dump > > like "BUG: soft lockup -CPU#3 stuck for 22s!". > > The system stalls and doesn't accept the keyboard. > > > > I think this behavior is caused by > > the device always returning -EIO after turning > > the variable to 1. > > But why XFS goes stalling on I/O error? > > Because it is bloated and buggy. How did I know you'd take that cheap shot, Mikulas? You are so predictable... > We have bug 924301 for XFS crash on I/O > error... Which is a problem with memory corruption after filling a dm snapshot volume to 100% and shortly after XFS has shut down the kernel panics from memory corruption. Can't be reproduced without filling the dm-snapshot volume to 100%, can't be reproduced with any other filesystem. Crashes are also occurring randomly in printk and the worker thread infrastructure. Memory and list poisoning clearly indicates worker thread lists have freed objects on them. There are lockdep messages from the DM snapshot code, etc. There's actually very little to point at XFS problems other than the first hang that was reported where XFS was stuck in a tight loop due to memory corruption. It reminds me of a very similar bug report and triage we went through last week: http://oss.sgi.com/pipermail/xfs/2013-October/030681.html Further analysis and bisects pointed to the zram driver being buggy, not XFS: http://oss.sgi.com/pipermail/xfs/2013-October/030707.html XFS has historically exposing bugs in block device drivers that no other filesystem exposes, and so when a new block device driver gets tested with XFS and we start seeing memory corruption symptoms, it's a fair bet that it's not XFS that is causing it.... Just sayin'. --- Akira, can you please post the entire set of messages you are getting when XFS showing problems? That way I can try to confirm whether it's a regression in XFS or something else. Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/