Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp1212298pxa; Thu, 20 Aug 2020 05:48:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzgkcqmTWg01ZznCRFXboIiUKhtsYWKh6an85GrdUx50yQzAFGNhRkvR4mL9WNwahwuLAH8 X-Received: by 2002:a05:6402:1d17:: with SMTP id dg23mr2745726edb.198.1597927738399; Thu, 20 Aug 2020 05:48:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597927738; cv=none; d=google.com; s=arc-20160816; b=gnrJ5+EkgFONmCAdYORcmGeJNKgbehNzO1TceFHlw7Yhg7EeNutbNunIt3yag0Ug3l b7bUwBc0tFAQEBKlYlgxm5v02tHZJIdaqi3Pjc8LP38DNziGbmgnDLwfEpWPGJuJyuhn 5WjSFHDuFjdL5InxhQTabzuLYSs9S8HWY8QZlCs0mArIrR817Fs1904f7PAy1nOW9Qos XnNlzvIonBWyn3RY7lLGamLD4Xh2llOZBtlX3KGQBNf+D1kzHRVxaeqT9o1LNIvTgO1k t7snOm0M4oxapJJWWVXoiAIMOw/xZNyfH8JRmZvjT27OfNotCpC1LDmU1e2IBvZsauXi QEOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=LGFJNqjY0YlIdZtNX2GU80jRBB3wiC3Jvx9frWlSxhQ=; b=dvGZtncuyqBuemmkg2Xj4Wghgr72GHcdxoskKXu3/31bFnEU3H4vH362T6pcy8J9VN 6h0GqyUwEgv62X0qfRjodj7ayPPwnl3asvRbhWdIPiwPQGZhp8ZV1RpUF5bMmIn3+ezW R57c+7ApZhT+EqJgAc0EwTO/Ip6cQB8mvbhpI88YxsfdlbXp3viuFdia3anb1hBatP3r pjGp2IQJc9QGYeiF2Oy3mE/ouoKraUp3MG+mFhNZWwi54NjFUx6SRvPrGkDXjvZkkoLc pQfVlBoDXJolx4MQ8zNzGjn32fEP5GVCV1LqSuRZ/RRh/53m25dEH9UCEcy5tpR+fhE7 DmSA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=vN5ykeMP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p3si1293577edq.281.2020.08.20.05.48.34; Thu, 20 Aug 2020 05:48:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=vN5ykeMP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730075AbgHTMpk (ORCPT + 99 others); Thu, 20 Aug 2020 08:45:40 -0400 Received: from mail.kernel.org ([198.145.29.99]:46886 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729304AbgHTJpl (ORCPT ); Thu, 20 Aug 2020 05:45:41 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id B027F2224D; Thu, 20 Aug 2020 09:45:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1597916720; bh=LhrVJjLW4k5wufY+4/30vf1lucgNllhSgPLioVFX6kg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=vN5ykeMPKOlLXFt8CtrQ1BzDjrIClISCqbWqvoFpIyMSvubithD6dSuT//IjKFxSF 4dhNGTLhyFV3UY/nLJGhFB5mzUdjz9zCCU2I5VJLsdSV8fHd2ppsJ2ZqRy38p62Jfd 6DfrneyTtD3H1qDc7JR4mH4Q2O0NyCLFbKf9Rvf4= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Filipe Manana , David Sterba Subject: [PATCH 5.4 023/152] btrfs: fix race between page release and a fast fsync Date: Thu, 20 Aug 2020 11:19:50 +0200 Message-Id: <20200820091554.828551254@linuxfoundation.org> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200820091553.615456912@linuxfoundation.org> References: <20200820091553.615456912@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Filipe Manana commit 3d6448e631591756da36efb3ea6355ff6f383c3a upstream. When releasing an extent map, done through the page release callback, we can race with an ongoing fast fsync and cause the fsync to miss a new extent and not log it. The steps for this to happen are the following: 1) A page is dirtied for some inode I; 2) Writeback for that page is triggered by a path other than fsync, for example by the system due to memory pressure; 3) When the ordered extent for the extent (a single 4K page) finishes, we unpin the corresponding extent map and set its generation to N, the current transaction's generation; 4) The btrfs_releasepage() callback is invoked by the system due to memory pressure for that no longer dirty page of inode I; 5) At the same time, some task calls fsync on inode I, joins transaction N, and at btrfs_log_inode() it sees that the inode does not have the full sync flag set, so we proceed with a fast fsync. But before we get into btrfs_log_changed_extents() and lock the inode's extent map tree: 6) Through btrfs_releasepage() we end up at try_release_extent_mapping() and we remove the extent map for the new 4Kb extent, because it is neither pinned anymore nor locked. By calling remove_extent_mapping(), we remove the extent map from the list of modified extents, since the extent map does not have the logging flag set. We unlock the inode's extent map tree; 7) The task doing the fast fsync now enters btrfs_log_changed_extents(), locks the inode's extent map tree and iterates its list of modified extents, which no longer has the 4Kb extent in it, so it does not log the extent; 8) The fsync finishes; 9) Before transaction N is committed, a power failure happens. After replaying the log, the 4K extent of inode I will be missing, since it was not logged due to the race with try_release_extent_mapping(). So fix this by teaching try_release_extent_mapping() to not remove an extent map if it's still in the list of modified extents. Fixes: ff44c6e36dc9dc ("Btrfs: do not hold the write_lock on the extent tree while logging") CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/extent_io.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4467,15 +4467,25 @@ int try_release_extent_mapping(struct pa free_extent_map(em); break; } - if (!test_range_bit(tree, em->start, - extent_map_end(em) - 1, - EXTENT_LOCKED, 0, NULL)) { + if (test_range_bit(tree, em->start, + extent_map_end(em) - 1, + EXTENT_LOCKED, 0, NULL)) + goto next; + /* + * If it's not in the list of modified extents, used + * by a fast fsync, we can remove it. If it's being + * logged we can safely remove it since fsync took an + * extra reference on the em. + */ + if (list_empty(&em->list) || + test_bit(EXTENT_FLAG_LOGGING, &em->flags)) { set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &btrfs_inode->runtime_flags); remove_extent_mapping(map, em); /* once for the rb tree */ free_extent_map(em); } +next: start = extent_map_end(em); write_unlock(&map->lock);