Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp1200687imm; Fri, 8 Jun 2018 11:43:40 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKJamK3+ohiehmuwWQIn7pwq+P2vtf3JifdeYPjZOhG/yCT6nxVZKVD/5XT0sk+SitI9H31 X-Received: by 2002:a65:4783:: with SMTP id e3-v6mr6281833pgs.235.1528483420733; Fri, 08 Jun 2018 11:43:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528483420; cv=none; d=google.com; s=arc-20160816; b=nTFVPnNXeorczpiR65WBL8hCdmd0CarHKQHpMwT4WfgKk21BVkCXl7fmfXilmN0t/q om6m/xkIILSQglEWRv7qXSGmsknH0szzO3rfhlyzW8ctD+GFF0qJwq/KUTcCk7TBb5Xh o0GMd2FgZvg+AMm5MyRLatAnNRpWjP+MRGG9XwXVfQtT6GSyKOX+KOi2AOLcRmhOyZHq 53Ucrk7qxUtUuiV6Dw7wVmyWlOUzegNNUNMm7f8lKIT3CoJFoe0YAI9mpcyNaogqBhPn KFssL8cc0qb+6uBqJ2Tirk/ZGiY3DfEfjy5nphbZEgGFZacM9MOHv89AyFmPupkOrwW0 ZL9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :organization:references:in-reply-to:date:cc:to:from:subject :message-id:arc-authentication-results; bh=sRm6/P5t63vd8BY6eldArSs0DNXC1VUldCq7kzha5IE=; b=uBBFbvaf00RJAdeQC+NQ/0+caQDRpXlLtVKvp8CHhwOpyZgf+/1MgIUFPrOHXdFyBH csVpkIGdXVfO0eHVg43Im82Qv6fJo7VsJsoxMb+21Jlg9apePWWSapYCxjXPbnR2UEsQ Jemnv7kBhDQ99/yiRxNJhrwvIVWSLxoCxcZ5m7xSkudjcVrFCkQ3v8JAWFRECiRkcJJ+ 7lxj/K89JFxQuZWSgntTmsMelHy5pk3wT7c1ndAEOuEUytPfjnxccKbZ8UCsAI418saR cCcRkq5/GTdgg6WGVIujzJG/N8VCX71XBIrq/+O0qXSoTa/3dRwHuVIw44wfhSg0WV3B zAtw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=codethink.co.uk Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v62-v6si13057697pgb.3.2018.06.08.11.43.25; Fri, 08 Jun 2018 11:43:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=codethink.co.uk Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752943AbeFHSm7 (ORCPT + 99 others); Fri, 8 Jun 2018 14:42:59 -0400 Received: from imap1.codethink.co.uk ([176.9.8.82]:59995 "EHLO imap1.codethink.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752623AbeFHSm6 (ORCPT ); Fri, 8 Jun 2018 14:42:58 -0400 Received: from [148.252.241.226] (helo=xylophone) by imap1.codethink.co.uk with esmtpsa (Exim 4.84_2 #1 (Debian)) id 1fRMLY-0006mU-8R; Fri, 08 Jun 2018 19:42:48 +0100 Message-ID: <1528483367.2289.105.camel@codethink.co.uk> Subject: Re: [PATCH 4.4 038/268] Btrfs: fix scrub to repair raid6 corruption From: Ben Hutchings To: Greg Kroah-Hartman , Liu Bo , Sasha Levin Cc: stable@vger.kernel.org, David Sterba , LKML Date: Fri, 08 Jun 2018 19:42:47 +0100 In-Reply-To: <20180528100206.374694208@linuxfoundation.org> References: <20180528100202.045206534@linuxfoundation.org> <20180528100206.374694208@linuxfoundation.org> Organization: Codethink Ltd. Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.22.6-1+deb9u1 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2018-05-28 at 12:00 +0200, Greg Kroah-Hartman wrote: > 4.4-stable review patch.  If anyone has any objections, please let me know. > > ------------------ > > From: Liu Bo > > [ Upstream commit 762221f095e3932669093466aaf4b85ed9ad2ac1 ] The diff here is actually from commit 8810f7517a3b ("Btrfs: make raid6 rebuild retry more", mentioned in this commit message). (Sasha, please try to work out why commit messages and descriptions are getting mixed up in your auto-selections.) Maybe stable branches should get the real commit 762221f095e3 as well? Ben. > The raid6 corruption is that, > suppose that all disks can be read without problems and if the content > that was read out doesn't match its checksum, currently for raid6 > btrfs at most retries twice, > > - the 1st retry is to rebuild with all other stripes, it'll eventually >   be a raid5 xor rebuild, > - if the 1st fails, the 2nd retry will deliberately fail parity p so >   that it will do raid6 style rebuild, > > however, the chances are that another non-parity stripe content also > has something corrupted, so that the above retries are not able to > return correct content. > > We've fixed normal reads to rebuild raid6 correctly with more retries > in Patch "Btrfs: make raid6 rebuild retry more"[1], this is to fix > scrub to do the exactly same rebuild process. > > [1]: https://patchwork.kernel.org/patch/10091755/ > > Signed-off-by: Liu Bo > Signed-off-by: David Sterba > Signed-off-by: Sasha Levin > Signed-off-by: Greg Kroah-Hartman > --- >  fs/btrfs/raid56.c  |   18 ++++++++++++++---- >  fs/btrfs/volumes.c |    9 ++++++++- >  2 files changed, 22 insertions(+), 5 deletions(-) > > --- a/fs/btrfs/raid56.c > +++ b/fs/btrfs/raid56.c > @@ -2160,11 +2160,21 @@ int raid56_parity_recover(struct btrfs_r >   } >   >   /* > -  * reconstruct from the q stripe if they are > -  * asking for mirror 3 > +  * Loop retry: > +  * for 'mirror == 2', reconstruct from all other stripes. > +  * for 'mirror_num > 2', select a stripe to fail on every retry. >    */ > - if (mirror_num == 3) > - rbio->failb = rbio->real_stripes - 2; > + if (mirror_num > 2) { > + /* > +  * 'mirror == 3' is to fail the p stripe and > +  * reconstruct from the q stripe.  'mirror > 3' is to > +  * fail a data stripe and reconstruct from p+q stripe. > +  */ > + rbio->failb = rbio->real_stripes - (mirror_num - 1); > + ASSERT(rbio->failb > 0); > + if (rbio->failb <= rbio->faila) > + rbio->failb--; > + } >   >   ret = lock_stripe_add(rbio); >   > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -5056,7 +5056,14 @@ int btrfs_num_copies(struct btrfs_fs_inf >   else if (map->type & BTRFS_BLOCK_GROUP_RAID5) >   ret = 2; >   else if (map->type & BTRFS_BLOCK_GROUP_RAID6) > - ret = 3; > + /* > +  * There could be two corrupted data stripes, we need > +  * to loop retry in order to rebuild the correct data. > +  * > +  * Fail a stripe at a time on every retry except the > +  * stripe under reconstruction. > +  */ > + ret = map->num_stripes; >   else >   ret = 1; >   free_extent_map(em); -- Ben Hutchings, Software Developer   Codethink Ltd https://www.codethink.co.uk/ Dale House, 35 Dale Street Manchester, M1 2HF, United Kingdom