Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp249806ybi; Wed, 29 May 2019 20:38:18 -0700 (PDT) X-Google-Smtp-Source: APXvYqzMXIgoSO2s5xdJH9gFspUX6Hk3FJcpQmlpHL579dOFz1PZRY1ihuIRSERsVi1R/ch8KR3X X-Received: by 2002:a63:1f55:: with SMTP id q21mr1733361pgm.51.1559187497875; Wed, 29 May 2019 20:38:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559187497; cv=none; d=google.com; s=arc-20160816; b=g0Dl+6Fz2kKzeSzB49NBcdgrZ7k730RHpAWnyDmyFTOJdbt7CPU1zrRgC5Yf8u1eo1 ywB4HSmWXQZ5tT0HFxJaK+fjDMUNlAcphj2X9+fv48hnMImLV5P8BFETZA2e3cVO+L1Y lCZYQGeKkqgTasf6F2M5iO11a1UD411mfXMNrfZK88forQ222/w3WtNT5s1s0w6wosP+ /dvJREdob9cvArtPXOBZbQ277FDqGaNQOquGy3jUJFTSlPNqkTb1NgMR0Hu7vGH1OnCj vDTREWlGKYfvJoeZvSJaZFZ/wazDKah5g0rHPlygUzpIOuy77KgZvnaksnmv0FKiProz b6qw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=PvtZpib1xwuvc7qdYx46/htgNyYd838Dwps7cNRVL1k=; b=eIwVzJVcDM4LqoFOKdgRnpT0RYHbkF9UGNWgKdnTH0ujAYHS1cZBxh/cHwIAXRUwPH EwfyVBAwW9WxCwK2l2mOMbZ2LoZtrtL6eu4qPgKbGOcP2EU4JoWA8n0fCJ9bhYp5LSG6 EGTDBqhCt3WfEgxptNS0XdnawLveKlVJbkGxYuTByYVxBRU7ClJ91Me/MpYsxC6wqnHS 21RNm+MCEqmN6jjiKFWL8XePYG/TKi3NSs3/w9h45W3oQekxyvXTGs+z3vSnzFMPRiwt ctZhLJ0yg72W/fZJvIbVpbUpuX4U/MFt8SbrHtDY6hzEsOJlcsm7NFOYKdwlXpHOJ8ht zm3w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=q7zk8o1w; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 77si2041470pgb.237.2019.05.29.20.38.01; Wed, 29 May 2019 20:38:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=q7zk8o1w; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732518AbfE3D0f (ORCPT + 99 others); Wed, 29 May 2019 23:26:35 -0400 Received: from mail.kernel.org ([198.145.29.99]:53140 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731755AbfE3DS6 (ORCPT ); Wed, 29 May 2019 23:18:58 -0400 Received: from localhost (ip67-88-213-2.z213-88-67.customer.algx.net [67.88.213.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 3AAE424818; Thu, 30 May 2019 03:18:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1559186337; bh=QP1POT/RpiZp1yNC2eCRKp00tyzQRi3oVWsk3RE0y2k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=q7zk8o1wJbbq6NW3PtIu2EKAw8pzCLVzJ+FtiFKeC16iA+tWZtd/BmaN+2+qEgnrd BLbzZaYFrks8Trq6MAEiASubAYx086WgMQf8Qro95n7bCRDt+LXUQDGJkKIRr/IYaV /T+HjALI7vw7iKVv2t8rXOduzY+bLUhMaAOV+1qI= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Josef Bacik , Filipe Manana , David Sterba , Sasha Levin Subject: [PATCH 4.14 060/193] btrfs: fix panic during relocation after ENOSPC before writeback happens Date: Wed, 29 May 2019 20:05:14 -0700 Message-Id: <20190530030457.835382127@linuxfoundation.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190530030446.953835040@linuxfoundation.org> References: <20190530030446.953835040@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [ Upstream commit ff612ba7849964b1898fd3ccd1f56941129c6aab ] We've been seeing the following sporadically throughout our fleet panic: kernel BUG at fs/btrfs/relocation.c:4584! netversion: 5.0-0 Backtrace: #0 [ffffc90003adb880] machine_kexec at ffffffff81041da8 #1 [ffffc90003adb8c8] __crash_kexec at ffffffff8110396c #2 [ffffc90003adb988] crash_kexec at ffffffff811048ad #3 [ffffc90003adb9a0] oops_end at ffffffff8101c19a #4 [ffffc90003adb9c0] do_trap at ffffffff81019114 #5 [ffffc90003adba00] do_error_trap at ffffffff810195d0 #6 [ffffc90003adbab0] invalid_op at ffffffff81a00a9b [exception RIP: btrfs_reloc_cow_block+692] RIP: ffffffff8143b614 RSP: ffffc90003adbb68 RFLAGS: 00010246 RAX: fffffffffffffff7 RBX: ffff8806b9c32000 RCX: ffff8806aad00690 RDX: ffff880850b295e0 RSI: ffff8806b9c32000 RDI: ffff88084f205bd0 RBP: ffff880849415000 R8: ffffc90003adbbe0 R9: ffff88085ac90000 R10: ffff8805f7369140 R11: 0000000000000000 R12: ffff880850b295e0 R13: ffff88084f205bd0 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffc90003adbbb0] __btrfs_cow_block at ffffffff813bf1cd #8 [ffffc90003adbc28] btrfs_cow_block at ffffffff813bf4b3 #9 [ffffc90003adbc78] btrfs_search_slot at ffffffff813c2e6c The way relocation moves data extents is by creating a reloc inode and preallocating extents in this inode and then copying the data into these preallocated extents. Once we've done this for all of our extents, we'll write out these dirty pages, which marks the extent written, and goes into btrfs_reloc_cow_block(). From here we get our current reloc_control, which _should_ match the reloc_control for the current block group we're relocating. However if we get an ENOSPC in this path at some point we'll bail out, never initiating writeback on this inode. Not a huge deal, unless we happen to be doing relocation on a different block group, and this block group is now rc->stage == UPDATE_DATA_PTRS. This trips the BUG_ON() in btrfs_reloc_cow_block(), because we expect to be done modifying the data inode. We are in fact done modifying the metadata for the data inode we're currently using, but not the one from the failed block group, and thus we BUG_ON(). (This happens when writeback finishes for extents from the previous group, when we are at btrfs_finish_ordered_io() which updates the data reloc tree (inode item, drops/adds extent items, etc).) Fix this by writing out the reloc data inode always, and then breaking out of the loop after that point to keep from tripping this BUG_ON() later. Signed-off-by: Josef Bacik Reviewed-by: Filipe Manana [ add note from Filipe ] Signed-off-by: David Sterba Signed-off-by: Sasha Levin --- fs/btrfs/relocation.c | 31 ++++++++++++++++++++----------- 1 file changed, 20 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 5feb8b03ffe86..9fa6db6a6f7d5 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -4403,27 +4403,36 @@ int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start) mutex_lock(&fs_info->cleaner_mutex); ret = relocate_block_group(rc); mutex_unlock(&fs_info->cleaner_mutex); - if (ret < 0) { + if (ret < 0) err = ret; - goto out; - } - - if (rc->extents_found == 0) - break; - - btrfs_info(fs_info, "found %llu extents", rc->extents_found); + /* + * We may have gotten ENOSPC after we already dirtied some + * extents. If writeout happens while we're relocating a + * different block group we could end up hitting the + * BUG_ON(rc->stage == UPDATE_DATA_PTRS) in + * btrfs_reloc_cow_block. Make sure we write everything out + * properly so we don't trip over this problem, and then break + * out of the loop if we hit an error. + */ if (rc->stage == MOVE_DATA_EXTENTS && rc->found_file_extent) { ret = btrfs_wait_ordered_range(rc->data_inode, 0, (u64)-1); - if (ret) { + if (ret) err = ret; - goto out; - } invalidate_mapping_pages(rc->data_inode->i_mapping, 0, -1); rc->stage = UPDATE_DATA_PTRS; } + + if (err < 0) + goto out; + + if (rc->extents_found == 0) + break; + + btrfs_info(fs_info, "found %llu extents", rc->extents_found); + } WARN_ON(rc->block_group->pinned > 0); -- 2.20.1