From: "Huang Weller (CM/ESW12-CN)" Subject: RE: ext4 filesystem bad extent error review Date: Mon, 6 Jan 2014 10:23:17 +0800 Message-ID: References: <20140102184211.GC10870@thunk.org> <20140103154846.GB31411@thunk.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: "linux-ext4@vger.kernel.org" To: "Juergens Dirk (CM-AI/ECO2)" , Theodore Ts'o Return-path: Received: from smtp6-v.fe.bosch.de ([139.15.237.11]:57368 "EHLO smtp6-v.fe.bosch.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751956AbaAFCXa convert rfc822-to-8bit (ORCPT ); Sun, 5 Jan 2014 21:23:30 -0500 In-Reply-To: Content-Language: en-US Sender: linux-ext4-owner@vger.kernel.org List-ID: >On Thu, Jan 03, 2014 at 17:30, Theodore Ts'o [mailto:tytso@mit.edu] >wrote: >>=20 >> On Fri, Jan 03, 2014 at 11:16:02AM +0800, Huang Weller (CM/ESW12-CN) >> wrote: >> > >> > It sounds like the barrier test. We wrote such kind test tool >> > before, the test program used ioctl(fd, BLKFLSBUF, 0) to set a >> > barrier before next write operation. Do you think this ioctl is >> > enough ? Because I saw the ext4 use it. I will do the test with th= at >> > tool and then let you know the result. >>=20 >> The BLKFLSBUF ioctl does __not__ send a CACHE FLUSH command to the >> hardware device. It forces all of the dirty buffers in memory to th= e >> storage device, and then it invalidates all the buffer cache, but it >> does not send a CACHE FLUSH command to the hardware. Hence, the >> hardware is free to write it to its on-disk cache, and not necessari= ly >> guarantee that the data is written to stable store. (For an example >> use case of BLKFLSBUF, we use it in e2fsck to drop the buffer cache >> for benchmarking purposes.) >>=20 >> If you want to force a CACHE FLUSH (or barrier, depending on the >> underlying transport different names may be given to this operation)= , >> you need to call fsync() on the file descriptor open to the block >> device. >>=20 >> > More information about journal block which caused the bad extents >> > error: We enabled the mount option journal_checksum in our test. = We >> > reproduced the same problem and the journal checksum is correct >> > because the journal block will not be replayed if checksum is erro= r. >>=20 >> How did you enable the journal_checksum option? Note that this is n= ot >> safe in general, which is why we don't enable it or the async_commit >> mount option by default. The problem is that currently the journal >> replay stops when it hits a bad checksum, and this can leave the fil= e >> system in a worse case than it currently is in. There is a way we >> could fix it, by adding per-block checksums to the journal, so we ca= n >> skip just the bad block, and then force an efsck afterwards, but tha= t >> isn't something we've implemented yet. >>=20 >> That being said, if the journal checksum was valid, and so the >> corrupted block was replayed, it does seem to argue against >> hardware-induced corruption. >Yes, this was also our feeling. Please see my other mail just sent >some minutes ago. We know about the possible problems with=20 >journal_checksum, but we thought that it is a good option in our case >to identify if this is a HW- or SW-induced issue. >>=20 >> Hmm.... I'm stumped, for the moment. The journal layer is quite >> stable, and we haven't had any problems like this reported in many, >> many years. >>=20 >> Let's take this back to first principles. How reliably can you >> reproduce the problem? How often does it fail? =20 >With kernel 3.5.7.23 about once per overnight long term test. >> Is it something where >> you can characterize the workload leading to this failure? Secondly= , >> is a power drop involved in the reproduction at all, or is this >> something that can be reproduced by running some kind of workload, a= nd >> then doing a soft reset (i.e., force a kernel reboot, but _not_ do i= t >> via a power drop)? >As I stated in my other mail, it is also reproduced with soft resets. >Weller can give more details about the test setup. =20 My test case is like this: 1. left about 700M empty space for the test 2. most of test with stress(some test without stress but we also reprod= uced the issue) 3. power loss and CPU WDT reset both happened during file write operati= ons. >=20 > The other thing to ask is when did this problem first start appearing= ? > With a kernel upgrade? A compiler/toolchain upgrade? Or has it > always been there? >=20 > Regards, >=20 > - Ted Mit freundlichen Gr=FC=DFen / Best regards Dr. rer. nat. Dirk Juergens Robert Bosch Car Multimedia GmbH -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html