Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp2854915pxu; Mon, 14 Dec 2020 12:30:34 -0800 (PST) X-Google-Smtp-Source: ABdhPJw81VBuSl8+emRGw2CNOb70UO60wbnITeHP5+exjlBRlaDmIQL3/BiqFicdisvB61TBrR1B X-Received: by 2002:a17:907:9627:: with SMTP id gb39mr24315250ejc.267.1607977834402; Mon, 14 Dec 2020 12:30:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607977834; cv=none; d=google.com; s=arc-20160816; b=y3yVpXMhABOSPKSTpgL37iCXXKXnAJZ4vAmvehgGoM4GyZeo9MWLVQK+GrVZNtyLdE YnO/DHFL6IYiwzAslapOe/cQGzRqLR58VSX3VXFLZj7gua3Q3+7tuxQZ/K+Fxe6cMOqf 7Q0vTOJV7b/BX+YpcvFDWC1fHtyrTrcC0ERgQa0IMMtL0wElOwH6Ky6+8Aos/Zg7UYJd 1NzljXy/b0jasRsJXk2bCm5+xNXKgUTlBvcfl0DQDxuyU/LwE1aMMngxtfFXnrPpHfyd miFZ6BAXhSffAeK7oY5css48YDCM8/4JeQudtVIyDdRVrU9we2B66Chuo9zaWnt238WQ Xe2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=LKYPOoKPgN6stwHsVhOv26V8/FwiMYBykrYMKxaPQeg=; b=P5dt02Nl7m6FqctJeyMiRZnD4S2nnu8WQYZi+gptvaWSjIW0m/BAv6yI3XfeXKiaL7 l+RS4R18t9O1kAMuzIHmpbeoin/n9AbnJe5qxTpOQD8LTXpX/VW8J4u5WCwgAl505j3j QY7Pwd16vaxHPgTBmHrb/NJzd4ft241XR7nNkF/eyjUhQ2Lz6LRdGe7nh6vQM60v426H 548UQVLun8SogIneyw7FOWI2BWa249nEIh5VN7GXqctYHPSDNQkzRGWfY37SwrarDCqf z6EaOH8Ig98xq8gCyJfMsV53r1WeD7CEP1uI2Z/7sz3z0uIpkzHi5AJik87bkTd1DS93 3JNw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bo24si10808209edb.495.2020.12.14.12.30.02; Mon, 14 Dec 2020 12:30:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2502841AbgLNU2J (ORCPT + 99 others); Mon, 14 Dec 2020 15:28:09 -0500 Received: from outgoing-auth-1.mit.edu ([18.9.28.11]:34055 "EHLO outgoing.mit.edu" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2503226AbgLNU2C (ORCPT ); Mon, 14 Dec 2020 15:28:02 -0500 Received: from callcc.thunk.org (pool-72-74-133-215.bstnma.fios.verizon.net [72.74.133.215]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 0BEKR1Ss004945 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 14 Dec 2020 15:27:02 -0500 Received: by callcc.thunk.org (Postfix, from userid 15806) id 7DCE8420136; Mon, 14 Dec 2020 15:27:01 -0500 (EST) Date: Mon, 14 Dec 2020 15:27:01 -0500 From: "Theodore Y. Ts'o" To: harshad shirwadkar Cc: Haotian Li , Ext4 Developers List , "liuzhiqiang (I)" , linfeilong , liangyun2@huawei.com Subject: Re: [PATCH] e2fsck: Avoid changes on recovery flags when jbd2_journal_recover() failed Message-ID: <20201214202701.GI575698@mit.edu> References: <1bb3c556-4635-061b-c2dc-df10c15e6398@huawei.com> <3e3c18f6-9f45-da04-9e81-ebf1ae16747e@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Mon, Dec 14, 2020 at 10:44:29AM -0800, harshad shirwadkar wrote: > Hi Haotian, > > Yeah perhaps these are the only recoverable errors. I also think that > we can't surely say that these errors are recoverable always. That's > because in some setups, these errors may still be unrecoverable (for > example, if the machine is running under low memory). I still feel > that we should ask the user about whether they want to continue or > not. The reason is that firstly if we don't allow running e2fsck in > these cases, I wonder what would the user do with their file system - > they can't mount / can't run fsck, right? Secondly, not doing that > would be a regression. I wonder if some setups would have chosen to > ignore journal recovery if there are errors during journal recovery > and with this fix they may start seeing that their file systems aren't > getting repaired. It may very well be that there are corrupted file system structures that could lead to ENOMEM. If so, I'd consider that someone we should be explicitly checking for in e2fsck, and it's actually relatively unlikely in the jbd2 recovery code, since that's fairly straight forward --- except I'd be concerned about potential cases in your Fast Commit code, since there's quite a bit more complexity when parsing the fast commit journal. This isn't a new concern; we've already talked a about the fact the fast commit needs to have a lot more sanity checks to look for maliciously --- or syzbot generated, which may be the same thing :-) --- inconsistent fields causing the e2fsck reply code to behave in unexpected way, which might include trying to allocate insane amounts of memory, array buffer overruns, etc. But assuming that ENOMEM is always due to operational concerns, as opposed to file system corruption, may not always be a safe assumption. Something else to consider is from the perspective of a naive system administrator, if there is an bad media sector in the journal, simply always aborting the e2fsck run may not allow them an easy way to recover. Simply ignoring the journal and allowing the next write to occur, at which point the HDD or SSD will redirect the write to a bad sector spare spool, will allow for an automatic recovery. Simply always causing e2fsck to fail, would actually result in a worse outcome in this particular case. (This is especially true for a mobile device, where the owner is not likely to have access to the serial console to manually run e2fsck, and where if they can't automatically recover, they will have to take their phone to the local cell phone carrier store for repairs --- which is *not* something that a cellular provider will enjoy, and they will tend to choose other cell phone models to feature as supported/featured devices. So an increased number of failures which cann't be automatically recovered cause the carrier to choose to feature, say, a Xiaomi phone over a ZTE phone.) > I'm wondering if you saw any a situation in your setup where exiting > e2fsck helped? If possible, could you share what kind of errors were > seen in journal recovery and what was the expected behavior? Maybe > that would help us decide on the right behavior. Seconded; I think we should try to understand why it is that e2fsck is failing with these sorts of errors. It may be that there are better ways of solving the high-level problem. For example, the new libext2fs bitmap backends were something that I added because when running a large number of e2fsck processes in parallel on a server machine with dozens of HDD spindles was causing e2fsck processes to run slowly due to memory contention. We fixed it by making e2fsck more memory efficient, by improving the bitmap implementations --- but if that hadn't been sufficient, I had also considered adding support to make /sbin/fsck "smarter" by limiting the number of fsck.XXX processes that would get started simultaneously, since that could actually cause the file system check to run faster by reducing memory thrashing. (The trick would have been how to make fsck smart enough to automatically tune the number of parallel fsck processes to allow, since asking the system administrator to manually tune the max number of processes would be annoying to the sysadmin, and would mean that the feature would never get used outside of $WORK in practice.) So is the actual underlying problem that e2fsck is running out of memory? If so, is it because there simply isn't enough physical memory available? Is it being run in a cgroup container which is too small? Or is it because too many file systems are being checked in parallel at the same time? Or is it I/O errors that you are concerned with? And how do you know that they are not permanent errors; is thie caused by something like fibre channel connections being flaky? Or is this a hypotethical worry, as opposed to something which is causing operational problems right now? Cheers, - Ted