Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp4534051imm; Mon, 17 Sep 2018 16:01:59 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZyKXMCaFg+iBxM03w23TJwBHmlGXr8Nry+AwrWJ/ucJ/ZRQrZyY8YUiJ4zile0uqpyETk7 X-Received: by 2002:a63:610:: with SMTP id 16-v6mr24761552pgg.96.1537225319521; Mon, 17 Sep 2018 16:01:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537225319; cv=none; d=google.com; s=arc-20160816; b=goJ6IsyN6q/b98YS/JhyLA/7K00k6Wh7T1Bvfdklap3r6huk6y03D0qvkaB1AHX+8u h/KxIWXcpMp0c3YZp5smGvm5iZTk043emQh99UAlOkTO8z2vWlN+ydp6VWzcQ/KIcyoI QG9Wz9W3qM2ayLL3dptKozzIRArWhDCh1o9MVK/BVIWYgLcXka2dMZf+uQbTAmfoQNpF Am/RCT8Xy1CgBMwPy+q7LI0GyCwj2kdrccVxLKeE4xGOCixLcDxEMCJp44RJC+MIVhJp PmQtJ7Fuzk6H6aZX5hNDzJ9HDbfnGTUjhO2G28XIGizArCxTila3JFkFVn5qXTiL6uFs sjFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from; bh=n+OerpWlm9IHnSeM7gSHQ+LLhpei+adPAhSJYGge3BU=; b=jub4Jdbmqpuh2bzIo2jZo8ViICjNJ3YA8g/UghT40k8PUoMAqFISaS7U86Ctz2W4uh 04xBs4tHrKlF0LlBqq7XCmhnL4WjdBTKMLotiWv3VUVb8SAs3iKJRYAAd80Hg+l2fj1k zT97qV40T5N6EVgDzY75mutkdsV8umnWYQLAceBv2/Jqusrz1BxYigl/0ybyR82Fc0jf 44HMsgOYZinSPEe4d30qqFgcl811/8VA56RIKetYuOm+vghCQhMOGc0RL2m7/omKQfce 42/yO0iwTWzVWhg4ysBvmS2/HpgxjSmp0sgRg5pJbrB2iFh/rFgh77CkRo+JCEGfDy3P eLXA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e186-v6si17429354pfa.107.2018.09.17.16.01.44; Mon, 17 Sep 2018 16:01:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730184AbeIREat (ORCPT + 99 others); Tue, 18 Sep 2018 00:30:49 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:48260 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728884AbeIREas (ORCPT ); Tue, 18 Sep 2018 00:30:48 -0400 Received: from localhost (li1825-44.members.linode.com [172.104.248.44]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 054C2C77; Mon, 17 Sep 2018 23:01:19 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Alex Chen , Alex Wu , Chung-Chiang Cheng , BingJing Chang , Shaohua Li , Sasha Levin Subject: [PATCH 4.14 034/126] md/raid5: fix data corruption of replacements after originals dropped Date: Tue, 18 Sep 2018 00:41:22 +0200 Message-Id: <20180917211707.026764193@linuxfoundation.org> X-Mailer: git-send-email 2.19.0 In-Reply-To: <20180917211703.481236999@linuxfoundation.org> References: <20180917211703.481236999@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.14-stable review patch. If anyone has any objections, please let me know. ------------------ From: BingJing Chang [ Upstream commit d63e2fc804c46e50eee825c5d3a7228e07048b47 ] During raid5 replacement, the stripes can be marked with R5_NeedReplace flag. Data can be read from being-replaced devices and written to replacing spares without reading all other devices. (It's 'replace' mode. s.replacing = 1) If a being-replaced device is dropped, the replacement progress will be interrupted and resumed with pure recovery mode. However, existing stripes before being interrupted cannot read from the dropped device anymore. It prints lots of WARN_ON messages. And it results in data corruption because existing stripes write problematic data into its replacement device and update the progress. \# Erase disks (1MB + 2GB) dd if=/dev/zero of=/dev/sda bs=1MB count=2049 dd if=/dev/zero of=/dev/sdb bs=1MB count=2049 dd if=/dev/zero of=/dev/sdc bs=1MB count=2049 dd if=/dev/zero of=/dev/sdd bs=1MB count=2049 mdadm -C /dev/md0 -amd -R -l5 -n3 -x0 /dev/sd[abc] -z 2097152 \# Ensure array stores non-zero data dd if=/root/data_4GB.iso of=/dev/md0 bs=1MB \# Start replacement mdadm /dev/md0 -a /dev/sdd mdadm /dev/md0 --replace /dev/sda Then, Hot-plug out /dev/sda during recovery, and wait for recovery done. echo check > /sys/block/md0/md/sync_action cat /sys/block/md0/md/mismatch_cnt # it will be greater than 0. Soon after you hot-plug out /dev/sda, you will see many WARN_ON messages. The replacement recovery will be interrupted shortly. After the recovery finishes, it will result in data corruption. Actually, it's just an unhandled case of replacement. In commit (md/raid5: fix interaction of 'replace' and 'recovery'.), if a NeedReplace device is not UPTODATE then that is an error, the commit just simply print WARN_ON but also mark these corrupted stripes with R5_WantReplace. (it means it's ready for writes.) To fix this case, we can leverage 'sync and replace' mode mentioned in commit <9a3e1101b827> (md/raid5: detect and handle replacements during recovery.). We can add logics to detect and use 'sync and replace' mode for these stripes. Reported-by: Alex Chen Reviewed-by: Alex Wu Reviewed-by: Chung-Chiang Cheng Signed-off-by: BingJing Chang Signed-off-by: Shaohua Li Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/md/raid5.c | 6 ++++++ 1 file changed, 6 insertions(+) --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -4516,6 +4516,12 @@ static void analyse_stripe(struct stripe s->failed++; if (rdev && !test_bit(Faulty, &rdev->flags)) do_recovery = 1; + else if (!rdev) { + rdev = rcu_dereference( + conf->disks[i].replacement); + if (rdev && !test_bit(Faulty, &rdev->flags)) + do_recovery = 1; + } } if (test_bit(R5_InJournal, &dev->flags))