Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp3910735ybv; Tue, 25 Feb 2020 09:29:16 -0800 (PST) X-Google-Smtp-Source: APXvYqxxLI7BYbn9NKz4WR5LdHAh7KQZM9EJoOE9TZYVCH0jCRLSSntXY6tuEONZ2O3wZnGR414o X-Received: by 2002:a9d:6f85:: with SMTP id h5mr44977259otq.19.1582651756654; Tue, 25 Feb 2020 09:29:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582651756; cv=none; d=google.com; s=arc-20160816; b=iairuLVt4/Em4ag3yaCIsCW5pJa4v+sz0Rpp4ONrfncgU/6HgbwjzywL9EZIjUbhv4 d9M9Zx53wzRl1uXPDCTxDiYrsJhjr8JAiNApHV9t70h37jM5agyMztf03tr/TJlF/0i2 fwzZCT3p22YSCV2SYGYMTfSHms70wNwbbpAwDbxFn/PklkmlH3dAEyKULevDpfOuD+7X mJ8wRXD2EEE/623r8NK4G81g4tvsi4yVqzUu8n9DWrfux7kiGJCG3TTvKcxjPnGTdCsC QjooTXQVHV/BTmQvDoRiPycI9mY5jTFetZdFKlJ0YCLlM4PCbx9g12K9uN/JRtMbfV/G bI/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=9Jv4eHSTE71x/AQpmgD0M9uRAJGtYzsBudC9VjaHVso=; b=zKfputVEFmZ2HKo88rBC8574U1/ymR/P6If0KddDM/iVQIBPuXHQ/Qpj2gZ9LqevvZ 8SOrODVS67FPLqtJWNUyD3ibBUrtfavpWILorXOnYTbq9b0/rghZ+Sz8P3J0YRUhm/RY tzzvBlffSIBPLrKGnlLRLtZf8MU6yT98xazCn4zrVnGoqdhlgB1+7YJldIbpwTBQZbzW 0Oo5toam7W3UZk6xRtJw6bRi6vjmJFJ4yGDVUZ+7WH2xZbjXX8NyO4hVvziQlpp4qkco ey6KTJYOelpy6O2OWF1NLMemdUbc52z0qdgmOffHfzm8ChRXDo1WyDKYPJxaVlxtkkOo 6JAw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h6si8628900otk.276.2020.02.25.09.29.05; Tue, 25 Feb 2020 09:29:16 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731207AbgBYRYN (ORCPT + 99 others); Tue, 25 Feb 2020 12:24:13 -0500 Received: from outgoing-auth-1.mit.edu ([18.9.28.11]:39879 "EHLO outgoing.mit.edu" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1729817AbgBYRYN (ORCPT ); Tue, 25 Feb 2020 12:24:13 -0500 Received: from callcc.thunk.org ([4.28.11.157]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 01PHNurB008220 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 25 Feb 2020 12:23:58 -0500 Received: by callcc.thunk.org (Postfix, from userid 15806) id 1CA2A4211EF; Tue, 25 Feb 2020 12:23:55 -0500 (EST) Date: Tue, 25 Feb 2020 12:23:55 -0500 From: "Theodore Y. Ts'o" To: Jean-Louis Dupond Cc: linux-ext4@vger.kernel.org Subject: Re: Filesystem corruption after unreachable storage Message-ID: <20200225172355.GA14617@mit.edu> References: <20200124203725.GH147870@mit.edu> <3a7bc899-31d9-51f2-1ea9-b3bef2a98913@dupond.be> <20200220155022.GA532518@mit.edu> <7376c09c-63e3-488f-fcf8-89c81832ef2d@dupond.be> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Tue, Feb 25, 2020 at 02:19:09PM +0100, Jean-Louis Dupond wrote: > FYI, > > Just did same test with e2fsprogs 1.45.5 (from buster backports) and kernel > 5.4.13-1~bpo10+1. > And having exactly the same issue. > The VM needs a manual fsck after storage outage. > > Don't know if its useful to test with 5.5 or 5.6? > But it seems like the issue still exists. This is going to be a long shot, but if you could try testing with 5.6-rc3, or with this commit cherry-picked into a 5.4 or later kernel: commit 8eedabfd66b68a4623beec0789eac54b8c9d0fb6 Author: wangyan Date: Thu Feb 20 21:46:14 2020 +0800 jbd2: fix ocfs2 corrupt when clearing block group bits I found a NULL pointer dereference in ocfs2_block_group_clear_bits(). The running environment: kernel version: 4.19 A cluster with two nodes, 5 luns mounted on two nodes, and do some file operations like dd/fallocate/truncate/rm on every lun with storage network disconnection. The fallocate operation on dm-23-45 caused an null pointer dereference. ... ... it would be interesting to see if fixes things for you. I can't guarantee that it will, but the trigger of the failure which wangyan found is very similar indeed. Thanks, - Ted