Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp3070260imm; Sun, 1 Jul 2018 11:33:03 -0700 (PDT) X-Google-Smtp-Source: ADUXVKLJqkKj/PX3uu5oq/nJegOaJ9Mu+mMmVWMaCCWEOUxTko7SZw5ZtjtklJBiQRmLjQ+bjdDH X-Received: by 2002:a65:44c3:: with SMTP id g3-v6mr19221405pgs.231.1530469983548; Sun, 01 Jul 2018 11:33:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530469983; cv=none; d=google.com; s=arc-20160816; b=pxx2dLQj644BtyP/yYUj0r9D1FqycUJOu9VY7+wOOAzgrlWb2Rg7TSEzdrrVQu83MZ xcMuxXA+7Ztas996IxieHGBgfFQxE6XywgfVddyQCAw0vuKs8pcCKzv6tQqzVmvHVcnc T7E80ZBnj4ltd2mr+Q7EAenWwqZBWmQQB+uEpVL01n6Bt5v5VNBpqHfqRdBVUZdPzUMj QsA2wllIkQwBun+0L5MEra8u+WYauI4XvFNdC/2XGwYLocWWeUuevVDXt4AAjIzaCjSt vVhJtvvfBaBZnnpJRIfVZdWu5fzQpxjE1mij7gMGwq7vAGUcm35hyhM2LC7hGKQfGLdQ goIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=htPCrV0u0i4E9+6QRfsN5SS1qCzCimY/lPuuYNFtmsM=; b=pJkGOIvUu49K6+wdNmdLsQB+w/TYtzv1Ba2ygZmgCQAmiEkg8UfLBN55Xd14Kt+BeG Kmfcx7kZljkUPym0i2acRtGKSUcdbWD4lkLVMSRKOAe05UFmilJsa5fajmc5bU9g6C3w 8aIIyWxI19s5AUOMeLWEnXyfH55jUPzx+d9yaikSCAM26e5pLhZzgIhTfZMuri0clQEX LqRaDhtoQ1SPOW8Z5rT7vQxr1FgH7GyR+KKclEq/RXv45p1Ooa+++pVRhRzBjlxTufrY vjTgPhLMWyOgDWqgH7g5qpY6mrnGgS+qlWZuPay3DAW/qgQfw4fS0wwuI26Ljts8a1mH lfNg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 1-v6si14884089plu.282.2018.07.01.11.32.49; Sun, 01 Jul 2018 11:33:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933621AbeGAQQ7 (ORCPT + 99 others); Sun, 1 Jul 2018 12:16:59 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:60838 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933574AbeGAQQn (ORCPT ); Sun, 1 Jul 2018 12:16:43 -0400 Received: from localhost (LFbn-1-12247-202.w90-92.abo.wanadoo.fr [90.92.61.202]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 8FEDD4A3; Sun, 1 Jul 2018 16:16:42 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Liu Bo , David Sterba , Sasha Levin Subject: [PATCH 4.4 009/105] Btrfs: make raid6 rebuild retry more Date: Sun, 1 Jul 2018 18:01:19 +0200 Message-Id: <20180701153150.175386034@linuxfoundation.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180701153149.382300170@linuxfoundation.org> References: <20180701153149.382300170@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.4-stable review patch. If anyone has any objections, please let me know. ------------------ From: Liu Bo [ Upstream commit 8810f7517a3bc4ca2d41d022446d3f5fd6b77c09 ] There is a scenario that can end up with rebuild process failing to return good content, i.e. suppose that all disks can be read without problems and if the content that was read out doesn't match its checksum, currently for raid6 btrfs at most retries twice, - the 1st retry is to rebuild with all other stripes, it'll eventually be a raid5 xor rebuild, - if the 1st fails, the 2nd retry will deliberately fail parity p so that it will do raid6 style rebuild, however, the chances are that another non-parity stripe content also has something corrupted, so that the above retries are not able to return correct content, and users will think of this as data loss. More seriouly, if the loss happens on some important internal btree roots, it could refuse to mount. This extends btrfs to do more retries and each retry fails only one stripe. Since raid6 can tolerate 2 disk failures, if there is one more failure besides the failure on which we're recovering, this can always work. The worst case is to retry as many times as the number of raid6 disks, but given the fact that such a scenario is really rare in practice, it's still acceptable. Signed-off-by: Liu Bo Signed-off-by: David Sterba Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/raid56.c | 18 ++++++++++++++---- fs/btrfs/volumes.c | 9 ++++++++- 2 files changed, 22 insertions(+), 5 deletions(-) --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -2160,11 +2160,21 @@ int raid56_parity_recover(struct btrfs_r } /* - * reconstruct from the q stripe if they are - * asking for mirror 3 + * Loop retry: + * for 'mirror == 2', reconstruct from all other stripes. + * for 'mirror_num > 2', select a stripe to fail on every retry. */ - if (mirror_num == 3) - rbio->failb = rbio->real_stripes - 2; + if (mirror_num > 2) { + /* + * 'mirror == 3' is to fail the p stripe and + * reconstruct from the q stripe. 'mirror > 3' is to + * fail a data stripe and reconstruct from p+q stripe. + */ + rbio->failb = rbio->real_stripes - (mirror_num - 1); + ASSERT(rbio->failb > 0); + if (rbio->failb <= rbio->faila) + rbio->failb--; + } ret = lock_stripe_add(rbio); --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5056,7 +5056,14 @@ int btrfs_num_copies(struct btrfs_fs_inf else if (map->type & BTRFS_BLOCK_GROUP_RAID5) ret = 2; else if (map->type & BTRFS_BLOCK_GROUP_RAID6) - ret = 3; + /* + * There could be two corrupted data stripes, we need + * to loop retry in order to rebuild the correct data. + * + * Fail a stripe at a time on every retry except the + * stripe under reconstruction. + */ + ret = map->num_stripes; else ret = 1; free_extent_map(em);