Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755182Ab0FNB3i (ORCPT ); Sun, 13 Jun 2010 21:29:38 -0400 Received: from bld-mail13.adl6.internode.on.net ([150.101.137.98]:42441 "EHLO mail.internode.on.net" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753351Ab0FNB3h (ORCPT ); Sun, 13 Jun 2010 21:29:37 -0400 Date: Mon, 14 Jun 2010 11:29:33 +1000 From: Dave Chinner To: Ilia Mirkin Cc: Roman Kononov , xfs@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: WARNING in xfs_lwr.c, xfs_write() Message-ID: <20100614012933.GB6590@dastard> References: <20100523002023.41f5a5c8@aaa.pulp.binarylife.net> <20100523101856.GL2150@dastard> <20100523092344.0fcaab42@aaa.pulp.binarylife.net> <20100524011907.GC12087@dastard> <20100613224752.GA2069@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4357 Lines: 94 On Sun, Jun 13, 2010 at 07:10:30PM -0400, Ilia Mirkin wrote: > On Sun, Jun 13, 2010 at 6:47 PM, Dave Chinner wrote: > > On Sat, Jun 12, 2010 at 01:00:52AM -0400, Ilia Mirkin wrote: > >> Sorry to pick up an old-ish thread, but I have a similar situation: > >> > >> On Sun, May 23, 2010 at 9:19 PM, Dave Chinner wrote: > >> > On Sun, May 23, 2010 at 09:23:44AM -0500, Roman Kononov wrote: > >> >> On 2010-05-23, 20:18:56 +1000, Dave Chinner wrote: > >> >> > Can you find out what the application is triggering this? > >> > >> I noticed this happening with mysql and xtrabackup -- the latter opens > >> up mysql's files while mysql is still running (and modifying its own > >> files) and backs them up in a (hopefully) safe way. > > > > That's not safe at all - there's no guarantee you'll end up with a > > consistent database image doing backups like this. Have you ever > > tried to restore and use one of these backups? > > Yep, works great. [Used it to initialize a slave, did the full > checksums, so it's unlikely to have randomly corrupt data.] You were lucky, I'd say. xtrabackup is supposed to be tightly integrated with mysql, so perhaps it should be using the same IO methods that the admin has selected for their database. Maybe you need to talk to the xtrabackup folks to get them to add a "backup via direct IO" method if the mysql database is using direct IO so that other uses don't have the same issues. > >> Would it be safe to remove the warning at > >> fs/xfs/linux-2.6/xfs_lrw.c:651 (which looks like it has moved to > >> xfs_file.c in 2.6.34)? It seems undesirable to get a long stream of > >> these (51 in this particular instance) every time we run a backup... > > > > You can if you want, but then you won't know when your backup or > > database might have been corrupted, right? > > No, but I wouldn't know that without the warnings either -- for all I > know xtrabackup could be buggy in all kinds of ways. The only real way > to check is to use the backup data in some way. Yup, but you still can't rely on the backup for disaster recovery without first doing a full application level consistency check it if one of these warnings was generated while it was being taken. > >> IOW, is the warning purely something along the lines of "Userspace is > >> doing something wonky, but the underlying FS will still be fine no > >> matter what" kind of deal, or could there be an actual problem with > >> the XFS metadata itself? > > > > Nothing wrong with the filesystem metadata will occur - as I said > > eariler in the thread that this is a warning to tell us that data > > corruption is possible due to userspace doing something stupid, not > > a filesystem bug. > > OK, thanks for the clarification. Ideally these wouldn't taint the > kernel either Why not? Something has potentially compromised the integrity of the system and that's exactly what the taint flag is there for. > -- perhaps these can be downgraded to a message that > explicitly suggests that nothing is wrong with kernel-space things, > only user-space? The backtrace doesn't really get you much, so really > all you want to show is the offending process... They are there to be meaningful to the XFS developer, not the user, and it conveys all the information we need to start a deeper investigation. IOWs, it's a defensive mechanism that we have in place because direct IO is effectively handing responsibility for data integrity to userspace. Hence when userspace is doing something obviously dangerous to data integrity we want loud, noticable warnings so that the filesystem is not blamed for the data corruption that will inevitably occur. And from a "I read it on the interwebs so it must be true" perspective, without a loud obnoxious warning we'll never hear about problems until someone flames us about silent data corruption on a random blog that gets slashdotted and then referenced for the next 10 years as the next canonical "XFS eats my data!" reference for the clueless.... Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/