From: "Huang Weller (CM/ESW12-CN)" Subject: RE: AW: AW: ext4 filesystem bad extent error review Date: Mon, 6 Jan 2014 13:45:49 +0800 Message-ID: References: <20140102184211.GC10870@thunk.org> <52C6F28A.6060706@redhat.com> <52C70610.4010907@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT Cc: "linux-ext4@vger.kernel.org" To: "Juergens Dirk (CM-AI/ECO2)" , Eric Sandeen , Theodore Ts'o Return-path: Received: from smtp6-v.fe.bosch.de ([139.15.237.11]:59677 "EHLO smtp6-v.fe.bosch.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750872AbaAFFqA convert rfc822-to-8bit (ORCPT ); Mon, 6 Jan 2014 00:46:00 -0500 In-Reply-To: Content-Language: en-US Sender: linux-ext4-owner@vger.kernel.org List-ID: > On Thu, Jan 03, 2014 at 19:49, Eric Sandeen wrote > > > > On 1/3/14, 12:45 PM, Juergens Dirk (CM-AI/ECO2) wrote: > > > > > > On Thu, Jan 03, 2014 at 19:24, Eric Sandeen wrote > > >> > > >> On 1/3/14, 10:29 AM, Juergens Dirk (CM-AI/ECO2) wrote: > > >>> So, I think there _might_ be a kernel bug, but it could be also a > > >> problem > > >>> related to the particular type of eMMC. We did not observe the same > > >> issue > > >>> in previous tests with another type of eMMC from another supplier, > > >> but this > > >>> was with an older kernel patch level and with another HW design. > > >>> > > >>> Regarding a possible kernel bug: Is there any chance that the > > invalid > > >>> ee_len or ee_start are returned by, e.g., the block allocator ? > > >>> If so, can we try to instrument the code to get suitable traces ? > > >>> Just to see or to exclude that the corrupted inode is really > > written > > >>> to the eMMC ? > > >> > > >> From your description it does sound possible that it's a kernel bug. > > >> Adding testcases to the code to catch it before it hits the journal > > >> might be helpful - but then maybe this is something getting > > overwritten > > >> after the fact - hard to say. > > >> > > >> Can you share more details of the test you are running? Or maybe > > even > > >> the test itself? > > > > > > Yes, for sure, we can. Weller, please provide additional details > > > or corrections. > > > > > > In short: > > > Basically we use an automated cyclic test writing many small > > > (some kBytes) files with CRC checksums for easy consistency check > > > into a separate test partition. Files also contain meta information > > > like filename, sequence number and a random number to allow to > > identify > > > from block device image dumps, if we just see a fragment of an old > > > deleted file or a still valid one. > > > > > > Each test loop looks like this: > > > > 0) mkfs the filesystem - with what options? How big? > > Here we do need the details from Weller, cause > he has done all this. We use the default options with option nodiscard: mkfs.ext4 -E nodiscard /dev/$PAR the size is about 6G. > > > > > 1) Boot the device after power on or reset > > > 2) Do fsck -n BEFORE mounting > > > 2 a) (optional) binary dump of the journal > > > 3) Mount test partition > > > > Again with what options, if any? > > Details again have to be given by Weller, sorry. Mount options: -ext4 default options: rw,relatime,data=ordered,barrier=1 -rw,relatime,data=ordered,barrier=1,journal_checksum And the test partition size is about 6G. But I filled the test partition and make there is only 700M empty space left.