Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp474948pxb; Wed, 24 Feb 2021 07:07:45 -0800 (PST) X-Google-Smtp-Source: ABdhPJzyzqkIMdLEQWua+ruCC+vozKXBn6NHvoxko9xWAeCsz3AzeT3ASuiaQtnXy+rstTsmx1Ql X-Received: by 2002:a05:6402:50ce:: with SMTP id h14mr33606439edb.283.1614179265331; Wed, 24 Feb 2021 07:07:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614179265; cv=none; d=google.com; s=arc-20160816; b=REHFSZOonZ+Af4HDhh9awPINyHx2sNZFbqmzKMC72EQXj/o+kZNn9dFzJ1eL7B63wc JZ4cRAncjBT3ntq/BdBKoreza2zLPgUPoriuvlhx8dgVgbB0Udo2o++WHAKnu6dpCqtO V6wbsHtsCZwblpmjngjnp6r+1gHHtYDTAFBqX0bYSqsGTVpSJfr+p/dWBN4KKZt0Nqo1 VbE8o8Gs+gT2DmZmFd4EAg9IdZdK3S0c5oVVIrqmOJ4tgOh1UDBh3lAHLQfieA+G9XGk 4aa/GIvS98+ZcfgUxa1m2SFMDJk5tZp0og19C9oG33AY4wmJFgjJ4x1I0UHc6el26xV6 sVDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=hTEOr6JZWy4DZeuMXiQp+JDMi84l3aKOXA4jEytDNyo=; b=ZvAFcGCox7gRliwJedmV7d4VsM71WXiTQfCTJbgNWtF2cwjkb8lOXTneJsy09yLScd QVAcrj8OG5vrJPOUtMZZbpY2GiKhtLg0F9IGa+by8K61bbQqnggsAFpQ6AFDDwygmq/m 3TWOYkww2iqwCiTV/lhTxEtvyHPNTvIERxoEm6es8aR4k+rmipSPPxeOKb4QTALCDk9r hgEzc6YJMfpenu68bXsnCkBd5Zm7cHudGnbjSkAbuuW48DCH0mMUs0qC1rUNDSsZE7pn Q3R8rOlpZWSmwjas4Lwl3ofb1h7IX7UzIjQv0CNdPw7l1d+0NsljErinQQa5SvgoPF5k 2zvQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g4si1281725edr.324.2021.02.24.07.07.21; Wed, 24 Feb 2021 07:07:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235532AbhBXPC4 (ORCPT + 99 others); Wed, 24 Feb 2021 10:02:56 -0500 Received: from outgoing-auth-1.mit.edu ([18.9.28.11]:52463 "EHLO outgoing.mit.edu" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S235395AbhBXOyV (ORCPT ); Wed, 24 Feb 2021 09:54:21 -0500 Received: from cwcc.thunk.org (pool-72-74-133-215.bstnma.fios.verizon.net [72.74.133.215]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 11OErTgL012902 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 24 Feb 2021 09:53:30 -0500 Received: by cwcc.thunk.org (Postfix, from userid 15806) id 81CB215C342C; Wed, 24 Feb 2021 09:53:29 -0500 (EST) Date: Wed, 24 Feb 2021 09:53:29 -0500 From: "Theodore Ts'o" To: Seamus Connor Cc: linux-ext4@vger.kernel.org Subject: Re: reproducible corruption in journal Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Tue, Feb 23, 2021 at 04:41:20PM -0800, Seamus Connor wrote: > Hello All, > > I am investigating an issue on our system where a filesystem is becoming corrupt. So I'm not 100% sure, but it sounds to me like what is going on is caused by the following: *) The jbd/jbd2 layer relies on finding an invalid block (a block which is missing the jbd/jbd2 "magic number", or where the sequence number is unexpected) to indicate the end of the journal. *) We reset to the (4 byte) sequence number to zero on a freshly mounted file system. *) It appears that your test is generating a large number of very small transactions, and you are then "crashing" the file system by disconnecting the file system from further updates, and running e2fsck to replay the journal, throwing away the block writes after the "disconnection", and then remounting the file system. I'm going to further guess that size of the small transactions are very similar, and the amount of time between when the file system is mounted, and when the file system is forcibly disconnected, is highly predictable (e.g., always N seconds, plus or minus a small delta). Is that last point correct? If so, that's a perfect storm where it's possible for the journal replay to get confused, and mistake previous blocks in the journal as ones part of the last valid file system mount. It's something which probably never happens in practice in production, since users are generally not running a super-fixed workload, and then causing the system to repeatedly crash after a fixed interval, such that the mistake described above could happen. That being said, it's arguably still a bug. Does this hypothesis consistent with what you are seeing? If so, I can see two possible solutions to avoid this: 1) When we initialize the journal, after replaying the journal and writing a new journal superblock, we issue a discard for the rest of the journal. This won't help for block devices that don't support discard, but it should slightly reduce work for the FTL, and perhaps slightly improve the write endurance for flash. 2) We should stop resetting the sequence number to zero, but instead, keep the sequence number at the last used number. For testing purposes, we should have an option where the sequence number is forced to (0U - 300) so that we test what happens when the 4 byte unsigned integer wraps. Cheers, - Ted