Received: by 2002:a05:6a10:3150:0:0:0:0 with SMTP id m16csp548019pxc; Tue, 29 Mar 2022 08:08:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx0y2B3b5/kALC4Uo1nojZtv7djAb+HXUlypA2CdNuvitCU8msGS2fM+/+D6JQZ7Gie7UsX X-Received: by 2002:a05:6122:683:b0:343:169d:f7cd with SMTP id n3-20020a056122068300b00343169df7cdmr10459870vkq.7.1648566511866; Tue, 29 Mar 2022 08:08:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648566511; cv=none; d=google.com; s=arc-20160816; b=1GOQ8+1/pM9ZzkHBu2Hzr7Vl84nZDiEB4J9Op2RujL3LG1DM3AXteu0Vh1BylTMoZ7 qT8A/idLnezKEhSHPOYk54Xf7+O2rYnikt2PfO53+5r3RNQSMOXI5LcS31czpDhxdQHx T7B/YSCqosLyhVeJ51hv46FMoXgh4QgabrjQqiyLhTcw5uqprRMKt5e1TR9uRLDMr80L mNBcK3PKKeK8A7YZyAOe2Lw0JXBvRQDw3/FFPFiTdc2JzwrrLudECNLe5r5G423URZKs 4lJc5N3TpH8RMx/cl9dMRkMEBKXrB2qqZjWxibzPCswkAdZghAJ04k3LtBKjK0LHhdcl t9Aw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=iL0A23j8VQTssjFW7EBsohjREB7/uSOjkp2Qt1Fwd3Y=; b=U04vGAlEt7ymgoskBd82cpgZdZ9YcSCEwBHnqOD+si+95eeuj8mNC95vZnk+Evaju2 o1A2aKLmwREmiKQzvB2QusbuxpQwD81slKKhMj+YZcJdbDHBzCDB1gXDSiHDpeWG+qvF 1uYIl3zwkNNuYoNv0DD/vY098y0mjTho1z1rPet8t+EALMJseoWtKbKx/icLhWjTkTeS pCvavtEEn4iXR3XQTM/DqKUhC8zaS9iI8D57Q4BdcUA1eEc+jEb38SFWK92ygq2pUbLe eNg/kPCHls5ticNV6dCqQskj1jb45hTlssOPuH5LKO5SCQHhHORAEuneDlPyqQ1xVTL/ 7rsg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=mit.edu Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w27-20020ab05a9b000000b0034c07a93ab4si4617543uae.79.2022.03.29.08.07.33; Tue, 29 Mar 2022 08:08:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=mit.edu Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236831AbiC2NJ6 (ORCPT + 99 others); Tue, 29 Mar 2022 09:09:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56054 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235514AbiC2NJz (ORCPT ); Tue, 29 Mar 2022 09:09:55 -0400 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A1D8EFABD3 for ; Tue, 29 Mar 2022 06:08:08 -0700 (PDT) Received: from cwcc.thunk.org (pool-108-7-220-252.bstnma.fios.verizon.net [108.7.220.252]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 22TD84Jg011551 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 29 Mar 2022 09:08:05 -0400 Received: by cwcc.thunk.org (Postfix, from userid 15806) id 658BB15C3ECA; Tue, 29 Mar 2022 09:08:04 -0400 (EDT) Date: Tue, 29 Mar 2022 09:08:04 -0400 From: "Theodore Ts'o" To: Fariya F Cc: linux-ext4@vger.kernel.org Subject: Re: df returns incorrect size of partition due to huge overhead block count in ext4 partition Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org (Removing linux-fsdevel from the cc list since this is an ext4 specific issue.) On Mon, Mar 28, 2022 at 09:38:18PM +0530, Fariya F wrote: > Hi Ted, > > Thanks for the response. Really appreciate it. Some questions: > > a) This issue is observed on one of the customer board and hence a fix > is a must for us or at least I will need to do a work-around so other > customer boards do not face this issue. As I mentioned my script > relies on df -h output of used percentage. In the case of the board > reporting 16Z of used space and size, the available space is somehow > reported correctly. Should my script rely on available space and not > on the used space% output of df. Will that be a reliable work-around? > Do you see any issue in using the partition from then or some where > down the line the overhead blocks number would create a problem and my > partition would end up misbehaving or any sort of data loss could > occur? Data loss would be a concern for us. Please guide. I'm guessing that the problem was caused by a bit-flip in the superblock, so it was just a matter of hardware error. What version of e2fsprogs are using, and did you have metadata checksum (meta_csum) feature enabled? Depending on where the bit-flip happened --- e.g., whether it was in memory and then superblock was written out, or on the eMMC or other storage device --- if the metadata checksum feature caught the superblock error, it would have detected the issue, and while it would have required a manual fsck to fix it, at that point it would have fallen back to use the backup superblock version. > b) Any other suggestions of a work-around so even if the overhead > blocks reports more blocks than actual blocks on the partition, i am > able to use the partition reliably or do you think it would be a > better suggestion to wait for the fix in e2fsprogs? > > I think apart from the fix in e2fsprogs tool, a kernel fix is also > required, wherein it performs check that the overhead blocks should > not be greater than the actual blocks on the partition. Yes, we can certainly have the kernel check to see if the overhead value is completely insane, and if so, recalculate it (even though it would slow down the mount). Another thing we could do is to always recaluclate the overhead amount if the file system is smaller than some arbitrary size, on the theory that (a) for small file systems, the increased time to mount the file system will not be noticeable, and (b) embedded and mobile devices are often where "cost optimized" (my polite way of saying crappy quality to save a pentty or two in Bill of Materials costs) are most likely, and so those are where bit flips are more likely. Cheers, - Ted