Received: by 10.192.165.148 with SMTP id m20csp563661imm; Wed, 25 Apr 2018 04:20:09 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+J45P2Ur9usIlJPL88SLnZp7d97vKp6qRBN0kuZ1Iy+HWBZ4HVQmXG1Hb4xbVFoSUbYK9O X-Received: by 10.98.73.22 with SMTP id w22mr27442832pfa.96.1524655209035; Wed, 25 Apr 2018 04:20:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524655209; cv=none; d=google.com; s=arc-20160816; b=U62fprhM+MtOsVl+mJpOLazOyH6IgCfqAJn52+BcfL3k2QQEIQTePwrvCe3+hZFUpz PBDgtUtC+k/DVM06vn0vZLQqManBZbUcpG7OklqerFssySOq1n0IR4iLTuLmf8jjZtFS Ip0CNFHUyIQOap8S/jV/CUe1FxrfKxlseoQHskkdlDOyhLDhqVQNknern3kFvadak49t SAcshUbg94SyEEAJiAnUiF26CZINwtZYTBvFQOn75COOAh+1qScx2SyED5Y2TgwHhhCz Vh6i+Joeby0u9Hnd51H2J18EERMaH8z5blZ6ucFwsNUwdvA35GUEIxpi8oZdkyUdKMJq Al7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=KDmQ+cLF/aOMl6b962Mfb1np45o/kWMCvpmOh2My6tk=; b=xU2FCGn6NLT50SAONwNfrd2iJ1sEqn2S5wHzZwxSW2tfPA6Rr/Vfv9uVrbt01YPA/p MkqEy1sghnUA66FreGvn3d+IIYFFT4UdzoYUnB27z+OjhHQiCStOObflGYQv41scMMgM sMZJzwrNAgKvnzGmRjlEX+OxaAbHhYmguI+/vScbTnCj06nljV+EBPMEF4WnN2YrUF/e 7z4fN2NfwRnyiPhXQBhvYusWvvHgljiaJMrPBYx7sqf7Sh1iPouHWK5jiCoN3JQM5bqo KOPLrOhILcabATS9wxKzHaQ2j2rYCOB6Hxw4MffKP4wyjn5xPcbK3Baxs2lnMtQimNQR +XqA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s83si15100670pfg.175.2018.04.25.04.19.54; Wed, 25 Apr 2018 04:20:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753786AbeDYKkI (ORCPT + 99 others); Wed, 25 Apr 2018 06:40:08 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:52010 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752441AbeDYKkC (ORCPT ); Wed, 25 Apr 2018 06:40:02 -0400 Received: from localhost (LFbn-1-12247-202.w90-92.abo.wanadoo.fr [90.92.61.202]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 0EE33266; Wed, 25 Apr 2018 10:40:00 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, David Vallender , Liu Bo , Josef Bacik , David Sterba , Sasha Levin Subject: [PATCH 4.14 067/183] Btrfs: fix unexpected EEXIST from btrfs_get_extent Date: Wed, 25 Apr 2018 12:34:47 +0200 Message-Id: <20180425103245.212722093@linuxfoundation.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180425103242.532713678@linuxfoundation.org> References: <20180425103242.532713678@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.14-stable review patch. If anyone has any objections, please let me know. ------------------ From: Liu Bo [ Upstream commit 18e83ac75bfe67009c4ddcdd581bba8eb16f4030 ] This fixes a corner case that is caused by a race of dio write vs dio read/write. Here is how the race could happen. Suppose that no extent map has been loaded into memory yet. There is a file extent [0, 32K), two jobs are running concurrently against it, t1 is doing dio write to [8K, 32K) and t2 is doing dio read from [0, 4K) or [4K, 8K). t1 goes ahead of t2 and splits em [0, 32K) to em [0K, 8K) and [8K 32K). ------------------------------------------------------ t1 t2 btrfs_get_blocks_direct() btrfs_get_blocks_direct() -> btrfs_get_extent() -> btrfs_get_extent() -> lookup_extent_mapping() -> add_extent_mapping() -> lookup_extent_mapping() # load [0, 32K) -> btrfs_new_extent_direct() -> btrfs_drop_extent_cache() # split [0, 32K) and # drop [8K, 32K) -> add_extent_mapping() # add [8K, 32K) -> add_extent_mapping() # handle -EEXIST when adding # [0, 32K) ------------------------------------------------------ About how t2(dio read/write) runs into -EEXIST: a) add_extent_mapping() gets -EEXIST for adding em [0, 32k), b) search_extent_mapping() then returns [0, 8k) as the existing em, even though start == existing->start, em is [0, 32k) so that extent_map_end(em) > extent_map_end(existing), i.e. 32k > 8k, c) then it goes thru merge_extent_mapping() which tries to add a [8k, 8k) (with a length 0) and returns -EEXIST as [8k, 32k) is already in tree, d) so btrfs_get_extent() ends up returning -EEXIST to dio read/write, which is confusing applications. Here I conclude all the possible situations, 1) start < existing->start +-----------+em+-----------+ +--prev---+ | +-------------+ | | | | | | | +---------+ + +---+existing++ ++ + | + start 2) start == existing->start +------------em------------+ | +-------------+ | | | | | + +----existing-+ + | | + start 3) start > existing->start && start < (existing->start + existing->len) +------------em------------+ | +-------------+ | | | | | + +----existing-+ + | | + start 4) start >= (existing->start + existing->len) +-----------+em+-----------+ | +-------------+ | +--next---+ | | | | | | + +---+existing++ + +---------+ + | + start As we can see, it turns out that if start is within existing em (front inclusive), then the existing em should be returned as is, otherwise, we try our best to merge candidate em with sibling ems to form a larger em (in order to reduce the total number of em). Reported-by: David Vallender Signed-off-by: Liu Bo Reviewed-by: Josef Bacik Signed-off-by: David Sterba Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/inode.c | 17 +++-------------- 1 file changed, 3 insertions(+), 14 deletions(-) --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7265,19 +7265,12 @@ insert: * existing will always be non-NULL, since there must be * extent causing the -EEXIST. */ - if (existing->start == em->start && - extent_map_end(existing) >= extent_map_end(em) && - em->block_start == existing->block_start) { - /* - * The existing extent map already encompasses the - * entire extent map we tried to add. - */ + if (start >= existing->start && + start < extent_map_end(existing)) { free_extent_map(em); em = existing; err = 0; - - } else if (start >= extent_map_end(existing) || - start <= existing->start) { + } else { /* * The existing extent map is the one nearest to * the [start, start + len) range which overlaps @@ -7289,10 +7282,6 @@ insert: free_extent_map(em); em = NULL; } - } else { - free_extent_map(em); - em = existing; - err = 0; } } write_unlock(&em_tree->lock);