Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp1032904pxb; Tue, 9 Nov 2021 04:08:45 -0800 (PST) X-Google-Smtp-Source: ABdhPJyash18VQ9Gf42tcJ5r2w8dxskvw03SC1g7tTJe5ApSgXIHG8oaYimG91309IMuIjEzyWf5 X-Received: by 2002:a92:cb4e:: with SMTP id f14mr4883239ilq.109.1636459725214; Tue, 09 Nov 2021 04:08:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1636459725; cv=none; d=google.com; s=arc-20160816; b=WbTMlKsGoQ9Y3kMTk5hxexNHWBhZfMOc1EXMCp6W2LYYkQLR4ni5Ig5YtoJQv9hYT1 PGWy+32zDN+3VBg8jwCAPMq5IwaTFeotkA7o1YFAhMpjbXvPDT4QOmTaNBadxjv6NZo7 1KhXXQc4KsUMzgkoMzSlau6T6KLGlrCFCh7uCpv3NQXE7z+Ky0wdDCp1qy5EC84aiIko tfuz0nBLya8Y4EFXRdl2+7OdpSqhUblROvUv5HgQydxct4mH1cdm96Ip1d9OoV634RnS xdcdFUuYEQvi/TqnlQW6f0V1ikbCdWq1fak8l23D4L0OFGgx9q45b1MOoZ44X8aUB5fB 5LkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=viSmTSoAi794E4fMmSEcUf9xqXbAfV88OdcUi6+Yhc0=; b=FGxCLBwFvEtrJ0fmau+UkE7hno4Fn510J0NfZOD+baFY6iK4lF0tALT3H0hB23KzRZ 1D1mfkMOfqcPhFlqJruiJp2tgwZbfLIRAYQMQfl86YbOv92omS+oRIAdf7peH3Wh+NiP aSwuD755buif6Fz8Q76OwqHYjsMNjuFL+Wemb7tM6KFbSWfQmH7DoBMc5+aD4xZ+hvAH +snaD11k7B9shn6D2BHiJeoik6NDcW0lPqA45/CIWcTIFq54US7qssaSpeh5PjoF9dzE HefHcX/kOHm3yo8LbIn9Apg+wlazlAzNxELzmajbTFpKukOmp7aRhQxktE/lwi+IVlQf iu8Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 8si37036751ilx.163.2021.11.09.04.08.24; Tue, 09 Nov 2021 04:08:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236467AbhKIDRZ (ORCPT + 99 others); Mon, 8 Nov 2021 22:17:25 -0500 Received: from outgoing-auth-1.mit.edu ([18.9.28.11]:44610 "EHLO outgoing.mit.edu" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S231910AbhKIDRY (ORCPT ); Mon, 8 Nov 2021 22:17:24 -0500 Received: from cwcc.thunk.org (pool-72-74-133-215.bstnma.fios.verizon.net [72.74.133.215]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 1A93EX4F009672 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 8 Nov 2021 22:14:34 -0500 Received: by cwcc.thunk.org (Postfix, from userid 15806) id 89E9315C00C2; Mon, 8 Nov 2021 22:14:33 -0500 (EST) Date: Mon, 8 Nov 2021 22:14:33 -0500 From: "Theodore Ts'o" To: Samuel Mendoza-Jonas Cc: linux-ext4@vger.kernel.org, adilger.kernel@dilger.ca, benh@amazon.com Subject: Re: Debugging ext4 corruption with nojournal & extents Message-ID: References: <20211108173520.xp6xphodfhcen2sy@u87e72aa3c6c25c.ant.amazon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211108173520.xp6xphodfhcen2sy@u87e72aa3c6c25c.ant.amazon.com> Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Mon, Nov 08, 2021 at 09:35:20AM -0800, Samuel Mendoza-Jonas wrote: > Based on that what I think is happening is > - A file with separate (i.e. non-inline) extents is synced / written to disk > (in this case, one of the large "compound" files) > - ext4_end_io_end() kicks off writeback of extent metadata > - AIUI this marks the related buffers dirty but does not wait on them in the > no-journal case > - The file is deleted, causing the extents to be "removed" and the blocks where > they were stored are marked unused > - A new file is created (any file, separate extents not required) > - The new file is allocated the block that was just freed (the physical block > where the old extents were located) > > Some time between this point and when the file is next read, the dirty extent > buffer hits the disk instead of the intended data for the new file. > A big-hammer hack in __ext4_handle_dirty_metadata() to always sync metadata > blocks appears to avoid the issue but isn't ideal - most likely a better > solution would be to ensure any dirty metadata buffers are synced before the > inode is dropped. > > Overall does this summary sound valid, or have I wandered into the > weeds somewhere? Hmm... well, I can tell you what's *supposed* to happen. When the extent block is freed, ext4_free_blocks() gets called with the EXT4_FREE_BLOCKS_FORGET flag set. ext4_free_blocks() calls ext4_forget() in two places; one when bh passed to ext4_free_blocks() is NULL, and one where it is non-NULL. And then ext4_free_blocks() calls bforget(), which should cause the dirty extent block to get thrown away. This *should* have prevented your failure scenario from taking place, since after the call to bforget() the dirty extent buffer *shouldn't* have hit the disk. If your theory is correct, the somehow either (a) the bforget() wasn't called, or (b) the bforget() didn't work, and then the page writeback for the new page happened first, and then buffer cache writeback happened second, overwriting the intended data for the new file. Have you tried enabling the blktrace tracer in combination with some of the ext4 tracepoints, to see if you can catch the double write happening? Another thing to try would be enabling some tracepoints, such as ext4_forget and ext4_free_blocks. Unfortunately we don't have any tracepoints in fs/ext4/page-io.c to get a tracepoint which includes the physical block ranges coming from the writeback path. And the tracepoints in fs/fs-writeback.c won't have the physical block number (just the inode and logical block numbers). - Ted