Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2957986imu; Mon, 19 Nov 2018 08:37:15 -0800 (PST) X-Google-Smtp-Source: AFSGD/UGRoqztc+Ttl5JaxHtVsguo8RFouO9k1vq9M8RpnToVM6Fh76WUyTutKdt5Uq0puLnSbgb X-Received: by 2002:a17:902:a710:: with SMTP id w16mr9479867plq.95.1542645435726; Mon, 19 Nov 2018 08:37:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542645435; cv=none; d=google.com; s=arc-20160816; b=v0p/GFmpP/0goPkOpdZbiZo9yPLG191WPWKawIFgPsIRZRGFUDR6Rmbve8i0tZMDAJ 1YDpacKkle9U4moIJwMqiZYDqcEYaZZ2IqeXJrfrEe7udjtX2YQ3ujE+qZ0jClFPCQRP s/b1ZCrHE6otas5vQlZ3RURSrcvpoR2LKlm8OdD82rVIA/jw7sEPwBjUiRvAZjq6EReq TSgQscafltisXuj7L0+78oby8auEqW44WdxfMPhuEyyHF+7ILb546tEeXaZVsz6mKdfL 5UHPZ03ATmhigHA7eirDa90yFv2O4TceMX/Tvkl8YoYkcQS4NgaZXpzvp3ELxQQXeoJL sf9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=Ozw8pIC1+itX1M7MbQDXAoJpcVkfuGFgcmeJ9tl/3Eo=; b=ZwSMAD+VFVL0drXSWOBpc0piSr8YCVhjjB8+bfDHajMA05V20XyMniz4bnX28JNi6e 1n/ParKaEZSK8zR+XNu8UhOb0anoZocuE1qZETazlC0N1QkD6QUpifP1sXf2vyA4O3UI Y8sTjd46h39wfR5btyv1XOlJUPlPZgs2SScUa+WUoqQI4dj/sYeMGJmPKpJMw8ly4odA g8KQImDRSga+kk8lIp6EZNonv6v0U/JBwO2SwCSXdhOgbwU9D60rq+YDAocnsuK5CBbt nAJ08YJJ3NqWEKfR97+QRrQxLXdei/nBE34IH3lXyMG3283Sj5wHT4RuVba3dhrwbITg Kw1Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="0fLy/LkB"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s2si40020418pgj.60.2018.11.19.08.37.00; Mon, 19 Nov 2018 08:37:15 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="0fLy/LkB"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731699AbeKTC7n (ORCPT + 99 others); Mon, 19 Nov 2018 21:59:43 -0500 Received: from mail.kernel.org ([198.145.29.99]:59254 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730350AbeKTC7n (ORCPT ); Mon, 19 Nov 2018 21:59:43 -0500 Received: from localhost (5356596B.cm-6-7b.dynamic.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id C190B20989; Mon, 19 Nov 2018 16:35:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1542645334; bh=5RmWss2svd/hVniwv37FTp0eTGSCqZmtgdBgXskEbyM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=0fLy/LkB1Jk0Zq8mVTwcS4Qv5b7Z0Or+Qp/Mk8aLV0Tvqmw46fJMpM0cBU/IEeJJ5 iN/hlVPGVDjWPg1dqflaW3zIKTmdgTUQg/bRXGjDZqXnwyg3ggbwzIw7AK6dBu+CEt jx8kG7jPgH1jhkUr/RDd0Bfthk6Xm1xAx5PxV6Ng= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Josef Bacik , Filipe Manana , David Sterba Subject: [PATCH 4.19 124/205] Btrfs: fix missing data checksums after a ranged fsync (msync) Date: Mon, 19 Nov 2018 17:27:11 +0100 Message-Id: <20181119162636.426695453@linuxfoundation.org> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181119162616.586062722@linuxfoundation.org> References: <20181119162616.586062722@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.19-stable review patch. If anyone has any objections, please let me know. ------------------ From: Filipe Manana commit 008c6753f7e070c77c70d708a6bf0255b4381763 upstream. Recently we got a massive simplification for fsync, where for the fast path we no longer log new extents while their respective ordered extents are still running. However that simplification introduced a subtle regression for the case where we use a ranged fsync (msync). Consider the following example: CPU 0 CPU 1 mmap write to range [2Mb, 4Mb[ mmap write to range [512Kb, 1Mb[ msync range [512K, 1Mb[ --> triggers fast fsync (BTRFS_INODE_NEEDS_FULL_SYNC not set) --> creates extent map A for this range and adds it to list of modified extents --> starts ordered extent A for this range --> waits for it to complete writeback triggered for range [2Mb, 4Mb[ --> create extent map B and adds it to the list of modified extents --> creates ordered extent B --> start looking for and logging modified extents --> logs extent maps A and B --> finds checksums for extent A in the csum tree, but not for extent B fsync (msync) finishes --> ordered extent B finishes and its checksums are added to the csum tree After replaying the log, we have the extent covering the range [2Mb, 4Mb[ but do not have the data checksum items covering that file range. This happens because at the very beginning of an fsync (btrfs_sync_file()) we start and wait for IO in the given range [512Kb, 1Mb[ and therefore wait for any ordered extents in that range to complete before we start logging the extents. However if right before we start logging the extent in our range [512Kb, 1Mb[, writeback is started for any other dirty range, such as the range [2Mb, 4Mb[ due to memory pressure or a concurrent fsync or msync (btrfs_sync_file() starts writeback before acquiring the inode's lock), an ordered extent is created for that other range and a new extent map is created to represent that range and added to the inode's list of modified extents. That means that we will see that other extent in that list when collecting extents for logging (done at btrfs_log_changed_extents()) and log the extent before the respective ordered extent finishes - namely before the checksum items are added to the checksums tree, which is where log_extent_csums() looks for the checksums, therefore making us log an extent without logging its checksums. Before that massive simplification of fsync, this wasn't a problem because besides looking for checkums in the checksums tree, we also looked for them in any ordered extent still running. The consequence of data checksums missing for a file range is that users attempting to read the affected file range will get -EIO errors and dmesg reports the following: [10188.358136] BTRFS info (device sdc): no csum found for inode 297 start 57344 [10188.359278] BTRFS warning (device sdc): csum failed root 5 ino 297 off 57344 csum 0x98f94189 expected csum 0x00000000 mirror 1 So fix this by skipping extents outside of our logging range at btrfs_log_changed_extents() and leaving them on the list of modified extents so that any subsequent ranged fsync may collect them if needed. Also, if we find a hole extent outside of the range still log it, just to prevent having gaps between extent items after replaying the log, otherwise fsck will complain when we are not using the NO_HOLES feature (fstest btrfs/056 triggers such case). Fixes: e7175a692765 ("btrfs: remove the wait ordered logic in the log_one_extent path") CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Josef Bacik Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/tree-log.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -4399,6 +4399,23 @@ static int btrfs_log_changed_extents(str logged_end = end; list_for_each_entry_safe(em, n, &tree->modified_extents, list) { + /* + * Skip extents outside our logging range. It's important to do + * it for correctness because if we don't ignore them, we may + * log them before their ordered extent completes, and therefore + * we could log them without logging their respective checksums + * (the checksum items are added to the csum tree at the very + * end of btrfs_finish_ordered_io()). Also leave such extents + * outside of our range in the list, since we may have another + * ranged fsync in the near future that needs them. If an extent + * outside our range corresponds to a hole, log it to avoid + * leaving gaps between extents (fsck will complain when we are + * not using the NO_HOLES feature). + */ + if ((em->start > end || em->start + em->len <= start) && + em->block_start != EXTENT_MAP_HOLE) + continue; + list_del_init(&em->list); /* * Just an arbitrary number, this can be really CPU intensive