Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1881898pxb; Fri, 5 Mar 2021 01:49:57 -0800 (PST) X-Google-Smtp-Source: ABdhPJzDARJSveZPAGEh/+LzVQYITjQs9I/f9JChPjKq41RTnVLcbP6rUXZcH/Gj8zQQpq7DSIlZ X-Received: by 2002:a17:907:20e4:: with SMTP id rh4mr1573565ejb.369.1614937797365; Fri, 05 Mar 2021 01:49:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614937797; cv=none; d=google.com; s=arc-20160816; b=pq/VGK9FumqtIYGjZDGSt/OVWhwpjfspfDhrp3HMqH9EY5FilOJo7NduO2gWlnWvEx YL76aFdpxf5BxbveV/LyJDcPNdaUAJAtlbKMLkd4KtfDHDuirqvui/tnD4hpmhEBWpne pBd4ORfbPw4i6GE1btqX9RRhUKHsHwpnfBBXCs+l4+pqNuNo71PkBUxr14obllQsaVKP vKDYjqEs2soMRh7+Q2RHyN9DWsWxDQnvf1jgIVqkFwiKk3MJFWlX9aMNXnJrGg3j+R8P 0j7gdkWA8dI7BNXhPynDW6WnBVuIQIF6VJHh3hDzK2lbQyQaSZGgdJI4WTMhZ3ViDAyJ UFgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=tqxdkHtZ0TsA09GQpx0/6gNOuShxwu5h8TFriMY9Cgw=; b=OnNMG3CYFEPELBFKfRjm0eGDttjJx6KCu+o0e6r+/UdjyPkGyw3sCVLlGk7JmTwpoa k+4YCJTgcDpTTQvwv1MmdTL4ZY5J4XUzHvOiwj88fHvf59aDdHNBUSJ6g5bclmPxLnJV d8+Tb+thdrWdudH+uIxs62yLc/fm4MJsg0QYGxeFfSJT3CTxoJsQxwZ9kds2Sai1nr1B cnvAfd8xHl0zDoN2QovfkOvPqVGgcdWu5xIn8apNHidKgsqpxS+xXWgQQX18XxBeN0BF ragdH2CIfJepVIpaNvEignUGNznIkNuI1sJXhREd4ySpRuEmiGxY4v+laIVZ96ztKVld MNFA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u21si1075780ejt.396.2021.03.05.01.49.32; Fri, 05 Mar 2021 01:49:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229573AbhCEJss (ORCPT + 99 others); Fri, 5 Mar 2021 04:48:48 -0500 Received: from szxga05-in.huawei.com ([45.249.212.191]:13063 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229687AbhCEJsh (ORCPT ); Fri, 5 Mar 2021 04:48:37 -0500 Received: from DGGEMS401-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4DsND02JNdzMhs7; Fri, 5 Mar 2021 17:46:24 +0800 (CST) Received: from [10.174.179.246] (10.174.179.246) by DGGEMS401-HUB.china.huawei.com (10.3.19.201) with Microsoft SMTP Server id 14.3.498.0; Fri, 5 Mar 2021 17:48:28 +0800 Subject: Re: [PATCH] e2fsck: Avoid changes on recovery flags when jbd2_journal_recover() failed To: harshad shirwadkar CC: "Theodore Y. Ts'o" , Ext4 Developers List , linfeilong , , Zhiqiang Liu References: <1bb3c556-4635-061b-c2dc-df10c15e6398@huawei.com> <3e3c18f6-9f45-da04-9e81-ebf1ae16747e@huawei.com> <20201214202701.GI575698@mit.edu> <1384512f-9c8b-d8d7-cb38-824a76b742fc@huawei.com> <52e7ad7b-0411-8b6d-35f9-696f9dd75c31@huawei.com> From: Haotian Li Message-ID: Date: Fri, 5 Mar 2021 17:48:28 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.179.246] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org I'm very sorry for the delay. Thanks for your suggestion. Just as you said, we use an e2fsck.conf option "recovery_error_behavior" to help user adopt different behavior on this situation. The new v2 patch will be resent. 在 2021/1/6 7:06, harshad shirwadkar 写道: > Sorry for the delay. Thanks for providing more information, Haotian. > So this is happening due to IO errors experienced due to a flaky > network connection. I can imagine that this is perhaps a situation > which is recoverable but I guess when running on physical hardware, > it's less likely for such IO errors to be recoverable. I wonder if > this means we need an e2fsck.conf option - something like > "recovery_error_behavior" with default value of "continue". For > usecases such as this, we can set it to "exit" or perhaps "retry"? > > On Thu, Dec 24, 2020 at 5:49 PM Zhiqiang Liu wrote: >> >> friendly ping... >> >> On 2020/12/15 15:43, Haotian Li wrote: >>> Thanks for your review. I agree with you that it's more important >>> to understand the errors found by e2fsck. we'll decribe the case >>> below about this problem. >>> >>> The probelm we find actually in a remote storage case. It means >>> e2fsck's read or write may fail because of the network packet loss. >>> At first time, some packet loss errors happen during e2fsck's journal >>> recovery (using fsck -a), then recover failed. At second time, we >>> fix the network problem and run e2fsck again, but it still has errors >>> when we try to mount. Then we set jsb->s_start journal flags and retry >>> e2fsck, the problem is fixed. So we suspect something wrong on e2fsck's >>> journal recovery, probably the bug we've described on the patch. >>> >>> Certainly, directly exit is not a good way to fix this problem. >>> just like what Harshad said, we need tell user what happen and listen >>> user's decision, continue e2fsck or not. If we want to safely use >>> e2fsck without human intervention (using fsck -a), I wonder if we need >>> provide a safe mechanism to complate the fast check but avoid changes >>> on journal or something else which may be fixed in feature (such >>> as jsb->s_start flag)? >>> >>> Thanks >>> Haotian >>> >>> 在 2020/12/15 4:27, Theodore Y. Ts'o 写道: >>>> On Mon, Dec 14, 2020 at 10:44:29AM -0800, harshad shirwadkar wrote: >>>>> Hi Haotian, >>>>> >>>>> Yeah perhaps these are the only recoverable errors. I also think that >>>>> we can't surely say that these errors are recoverable always. That's >>>>> because in some setups, these errors may still be unrecoverable (for >>>>> example, if the machine is running under low memory). I still feel >>>>> that we should ask the user about whether they want to continue or >>>>> not. The reason is that firstly if we don't allow running e2fsck in >>>>> these cases, I wonder what would the user do with their file system - >>>>> they can't mount / can't run fsck, right? Secondly, not doing that >>>>> would be a regression. I wonder if some setups would have chosen to >>>>> ignore journal recovery if there are errors during journal recovery >>>>> and with this fix they may start seeing that their file systems aren't >>>>> getting repaired. >>>> >>>> It may very well be that there are corrupted file system structures >>>> that could lead to ENOMEM. If so, I'd consider that someone we should >>>> be explicitly checking for in e2fsck, and it's actually relatively >>>> unlikely in the jbd2 recovery code, since that's fairly straight >>>> forward --- except I'd be concerned about potential cases in your Fast >>>> Commit code, since there's quite a bit more complexity when parsing >>>> the fast commit journal. >>>> >>>> This isn't a new concern; we've already talked a about the fact the >>>> fast commit needs to have a lot more sanity checks to look for >>>> maliciously --- or syzbot generated, which may be the same thing :-) >>>> --- inconsistent fields causing the e2fsck reply code to behave in >>>> unexpected way, which might include trying to allocate insane amounts >>>> of memory, array buffer overruns, etc. >>>> >>>> But assuming that ENOMEM is always due to operational concerns, as >>>> opposed to file system corruption, may not always be a safe >>>> assumption. >>>> >>>> Something else to consider is from the perspective of a naive system >>>> administrator, if there is an bad media sector in the journal, simply >>>> always aborting the e2fsck run may not allow them an easy way to >>>> recover. Simply ignoring the journal and allowing the next write to >>>> occur, at which point the HDD or SSD will redirect the write to a bad >>>> sector spare spool, will allow for an automatic recovery. Simply >>>> always causing e2fsck to fail, would actually result in a worse >>>> outcome in this particular case. >>>> >>>> (This is especially true for a mobile device, where the owner is not >>>> likely to have access to the serial console to manually run e2fsck, >>>> and where if they can't automatically recover, they will have to take >>>> their phone to the local cell phone carrier store for repairs --- >>>> which is *not* something that a cellular provider will enjoy, and they >>>> will tend to choose other cell phone models to feature as >>>> supported/featured devices. So an increased number of failures which >>>> cann't be automatically recovered cause the carrier to choose to >>>> feature, say, a Xiaomi phone over a ZTE phone.) >>>> >>>>> I'm wondering if you saw any a situation in your setup where exiting >>>>> e2fsck helped? If possible, could you share what kind of errors were >>>>> seen in journal recovery and what was the expected behavior? Maybe >>>>> that would help us decide on the right behavior. >>>> >>>> Seconded; I think we should try to understand why it is that e2fsck is >>>> failing with these sorts of errors. It may be that there are better >>>> ways of solving the high-level problem. >>>> >>>> For example, the new libext2fs bitmap backends were something that I >>>> added because when running a large number of e2fsck processes in >>>> parallel on a server machine with dozens of HDD spindles was causing >>>> e2fsck processes to run slowly due to memory contention. We fixed it >>>> by making e2fsck more memory efficient, by improving the bitmap >>>> implementations --- but if that hadn't been sufficient, I had also >>>> considered adding support to make /sbin/fsck "smarter" by limiting the >>>> number of fsck.XXX processes that would get started simultaneously, >>>> since that could actually cause the file system check to run faster by >>>> reducing memory thrashing. (The trick would have been how to make >>>> fsck smart enough to automatically tune the number of parallel fsck >>>> processes to allow, since asking the system administrator to manually >>>> tune the max number of processes would be annoying to the sysadmin, >>>> and would mean that the feature would never get used outside of $WORK >>>> in practice.) >>>> >>>> So is the actual underlying problem that e2fsck is running out of >>>> memory? If so, is it because there simply isn't enough physical >>>> memory available? Is it being run in a cgroup container which is too >>>> small? Or is it because too many file systems are being checked in >>>> parallel at the same time? >>>> >>>> Or is it I/O errors that you are concerned with? And how do you know >>>> that they are not permanent errors; is thie caused by something like >>>> fibre channel connections being flaky? >>>> >>>> Or is this a hypotethical worry, as opposed to something which is >>>> causing operational problems right now? >>>> >>>> Cheers, >>>> >>>> - Ted >>>> >>>> . >>>> >>> >>> . >>> >> > . >