Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp380039pxb; Mon, 8 Nov 2021 15:22:47 -0800 (PST) X-Google-Smtp-Source: ABdhPJxupM1hDRCi1vQ35/tXjXoYfDpXfvB1pGEtLeXLtWWgckrqWJzFUYgnBa33Rut94O6IK61e X-Received: by 2002:aa7:c041:: with SMTP id k1mr3791588edo.330.1636413767446; Mon, 08 Nov 2021 15:22:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1636413767; cv=none; d=google.com; s=arc-20160816; b=V0otxH1htg22zb9Zn8yLIT7RZEwiKgizV7AUuXRbv8zPb0ZU2SdOQD25RXASCj1FlU xmsi+gm/FElE85g0hrLhgNUYX9gJeDpHzYcNmU2hH+j2ENP5OdtwChHkXSeXPBsz8Ugg n2/8e7DlzC6VH4KQaJh9bNgBHBEJ7rfETTxB/FMth5ZfNlBjHwC4x0BEup8SP4Byz4RF wkvXnbDd5Eyk80RevgJp8h7hyUrp4kuJ88jpYJMcg23tcEJJtD6VfGNpHYRnBS/Tu8xG e5PjV85BsQYQPdJoTBdlS3R6YD693BOk1X+s90MJS8grY4AyAJTlZnO3lU4LX0/+wPvi M3pw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:content-disposition:mime-version :message-id:subject:cc:to:from:date:dkim-signature; bh=jDzzbzEVmLRFZ7pTarUrV0elr5JQR8X2uLSrHErjG34=; b=CRO+Usc33NwNxay/QU0vXo2dfFN6X9bE4yi3no3JffCY0OU4wd1OHRtpMiBohVQYzT mxdTrh8cXxfz7nDE4/nHQc0LJCtoP3F2VU7kIx0241suIGQGTw/SjsD0l05gO2T+ptjW XcqBtP0OaFBUuf9R0oZ/iJQHWjvyCDsRWgHyjYQprIg8jWlEdscsMB9evFrD4x8LHSVv Fb3KBAiNair4aIN9tId2hcRiLpQREDxiVWm9cm3YC7NdMLMuyP/gSlRTkW3IhMfzO1ZM xJB4XVwonZ46ZVJBpuX+fk3CgQujw3qi2ZEHeY3n/iNUovJrMm6VSKfXGjj4olMwTn2x oEBg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=v8LfpYp3; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id sg15si37756424ejc.353.2021.11.08.15.22.09; Mon, 08 Nov 2021 15:22:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=v8LfpYp3; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240629AbhKHRiJ (ORCPT + 99 others); Mon, 8 Nov 2021 12:38:09 -0500 Received: from smtp-fw-9103.amazon.com ([207.171.188.200]:63613 "EHLO smtp-fw-9103.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240628AbhKHRiI (ORCPT ); Mon, 8 Nov 2021 12:38:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1636392924; x=1667928924; h=date:from:to:cc:subject:message-id:mime-version; bh=jDzzbzEVmLRFZ7pTarUrV0elr5JQR8X2uLSrHErjG34=; b=v8LfpYp38AA7y/vSM/Awzl7p4G/FD7PuZznv4L9UweuQ69p6kq9+iGDb abKo/SM52xq+gCZmqbVOdDzIv2dbcS7XxnL67hZxMmnGIUqKLisuPi7gs mZB+J5/QRzaMbufhqUL+Son/51rQt0TUJfN6D/jNwFIjcAdbHJPXZgkwZ g=; X-IronPort-AV: E=Sophos;i="5.87,218,1631577600"; d="scan'208";a="970181184" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO email-inbound-relay-iad-1e-204be258.us-east-1.amazon.com) ([10.25.36.210]) by smtp-border-fw-9103.sea19.amazon.com with ESMTP; 08 Nov 2021 17:35:22 +0000 Received: from EX13MTAUWA001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan2.iad.amazon.com [10.40.163.34]) by email-inbound-relay-iad-1e-204be258.us-east-1.amazon.com (Postfix) with ESMTPS id A675B41608; Mon, 8 Nov 2021 17:35:21 +0000 (UTC) Received: from EX13D01UWA002.ant.amazon.com (10.43.160.74) by EX13MTAUWA001.ant.amazon.com (10.43.160.58) with Microsoft SMTP Server (TLS) id 15.0.1497.24; Mon, 8 Nov 2021 17:35:21 +0000 Received: from localhost (10.43.160.225) by EX13d01UWA002.ant.amazon.com (10.43.160.74) with Microsoft SMTP Server (TLS) id 15.0.1497.24; Mon, 8 Nov 2021 17:35:20 +0000 Date: Mon, 8 Nov 2021 09:35:20 -0800 From: Samuel Mendoza-Jonas To: , , CC: Subject: Debugging ext4 corruption with nojournal & extents Message-ID: <20211108173520.xp6xphodfhcen2sy@u87e72aa3c6c25c.ant.amazon.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline User-Agent: NeoMutt/20171215 X-Originating-IP: [10.43.160.225] X-ClientProxiedBy: EX13D39UWB004.ant.amazon.com (10.43.161.148) To EX13d01UWA002.ant.amazon.com (10.43.160.74) Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Hi all, Recently I've been digging into a corruption issue which I think is just about pinned, but I'd appreciate some more expert EXT4 eyes to confirm we're on the right path. What we have boils down to a system with - An ext4 filesystem with the journal disabled - A workload[0] which in a loop - Creates a lot of small files - Occasionally deletes these files and collects them into a single larger "compound" file - Checks the header of all of these files periodically to ensure they're correct After a while this check fails, and when inspecting the "bad" file, the contents of that file are actually an EXT4 extent structure, for example: [ec2-user@ip-172-31-0-206 ~]$ hexdump -C _2w.si 00000000 0a f3 05 00 54 01 00 00 00 00 00 00 00 00 00 00 |....T...........| 00000010 01 00 00 00 63 84 08 05 01 00 00 00 ff 01 00 00 |....c...........| 00000020 75 8a 1c 02 00 02 00 00 00 02 00 00 00 9c 1c 02 |u...............| 00000030 00 04 00 00 dc 00 00 00 00 ac 1c 02 dc 04 00 00 |................| 00000040 08 81 00 00 dc ac 1c 02 00 00 00 00 00 00 00 00 |................| 00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000170 00 00 00 |...| 00000173 This has EXT4_EXT_MAGIC (cpu_to_le16(0xf30a)), and when parsed as extent header plus array has 5 extent entries at 0 depth. By the time the file is checked, the file that these extents presumably pointed to appears to have been deleted, but reading the physical blocks looks like the data of one of the larger files this test creates. Based on that what I think is happening is - A file with separate (i.e. non-inline) extents is synced / written to disk (in this case, one of the large "compound" files) - ext4_end_io_end() kicks off writeback of extent metadata - AIUI this marks the related buffers dirty but does not wait on them in the no-journal case - The file is deleted, causing the extents to be "removed" and the blocks where they were stored are marked unused - A new file is created (any file, separate extents not required) - The new file is allocated the block that was just freed (the physical block where the old extents were located) Some time between this point and when the file is next read, the dirty extent buffer hits the disk instead of the intended data for the new file. A big-hammer hack in __ext4_handle_dirty_metadata() to always sync metadata blocks appears to avoid the issue but isn't ideal - most likely a better solution would be to ensure any dirty metadata buffers are synced before the inode is dropped. Overall does this summary sound valid, or have I wandered into the weeds somewhere? Cheers, Sam Mendoza-Jonas [0] This is an Elastisearch/Lucene workload, running the esrally tests to hit the issue.