Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp630032pxb; Tue, 5 Apr 2022 16:33:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyHE0AP3TGKANYF4YH8dYQ2FrFomvXf6vu61VSLerBFd4i70CuAaGC85bw57bePdHn0Hun7 X-Received: by 2002:a63:1509:0:b0:399:14fa:3a16 with SMTP id v9-20020a631509000000b0039914fa3a16mr4745288pgl.411.1649201617765; Tue, 05 Apr 2022 16:33:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649201617; cv=none; d=google.com; s=arc-20160816; b=uWlTAJ47DYgyMswbC0PKMi2Hi6S9qoALeVaffnNbH8+AfjiR7g+b3cx5MNK1nukAcA OrZp+YQWvQexyVxwmuuSMdsPm9HnRpXnvl0ipX1+qk9uETeJLfduRZaAtE4yQVXQ+Ndg i/pFsTYnLpIg/ltUXhlbPXPLDg5QBvw/BRXbJgZL8k2mGvsQvcRSB20GOhoIkiFSAxnz HIL7eYOWgpMnO6MFaawwTml1dfVC/sswFpV6hOn/E7di6NAZwU1TXvL2tuq1YnERa6Su 5gQMQLoX8mfvxK31aEorl1Ms3oqqykCtWois2SlElu8NcKCEzJCQid3MPeUXbL3JYTjC Uhyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=prTihVfy/MDdeKuqA/YnHNsBcmufbMLeFlzpTY9Nbts=; b=jjaqI8cNwjJ5wsYFdf79KhkMd6ci19dl0PoPr/dz7DNovoRKed0CB4frjlyT4V376+ pIEz90TcYm7eOZPHLJbC6ambUn9gmuT6v+yvXBdl155oMaVTO5uqoe6BpakGg3mcICPG eoCmoAlXWXpjIpvcruMvFDG3fC7JJ/E0M57vkBHza1YIz4BkZr7E/is8lhr/lp62ghz/ yHCiylt16JtbUpCrFast95EOS85SsZkYDiSdAzoMCqFXw4sW0I0iIKzxAjN+tY2G2lbC /PjdwFJLuaLjJXSZuj4EzOOh5+9FNkalk92VSI2oXMEew6+wbMpAOHDAJ2bHpVyP1RtW yJ4A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=eHCORRyZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id u11-20020a63f64b000000b003829d4ce902si13842278pgj.394.2022.04.05.16.33.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Apr 2022 16:33:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=eHCORRyZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 3A078FE413; Tue, 5 Apr 2022 16:18:00 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1387016AbiDEO3J (ORCPT + 99 others); Tue, 5 Apr 2022 10:29:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240229AbiDEJeB (ORCPT ); Tue, 5 Apr 2022 05:34:01 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD1063AA43; Tue, 5 Apr 2022 02:23:39 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id E0415615E4; Tue, 5 Apr 2022 09:23:38 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EBBB0C385A2; Tue, 5 Apr 2022 09:23:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1649150618; bh=VD6RhK8TDJ39OE2besGcNGyIkkLTOm36pAjK63/kddY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=eHCORRyZaCCh1pAfcDdXtkQ87nl5uMuR9ol5po49xVEj8dTbzo+MtnDCcyiHU/8PE M7aaN0K5QxNmmRD2cFU6EzSdxXY0YcFU6kXDl/5nvAszLZVNpHQjA52NX324O7/F9c W9PCqV1qfVizT/g3+kMgi9nKby7rlr+LPcGPheig= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Lars Ellenberg , =?UTF-8?q?Christoph=20B=C3=B6hmwalder?= , Jens Axboe Subject: [PATCH 5.15 118/913] drbd: fix potential silent data corruption Date: Tue, 5 Apr 2022 09:19:40 +0200 Message-Id: <20220405070343.363900549@linuxfoundation.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220405070339.801210740@linuxfoundation.org> References: <20220405070339.801210740@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Lars Ellenberg commit f4329d1f848ac35757d9cc5487669d19dfc5979c upstream. Scenario: --------- bio chain generated by blk_queue_split(). Some split bio fails and propagates its error status to the "parent" bio. But then the (last part of the) parent bio itself completes without error. We would clobber the already recorded error status with BLK_STS_OK, causing silent data corruption. Reproducer: ----------- How to trigger this in the real world within seconds: DRBD on top of degraded parity raid, small stripe_cache_size, large read_ahead setting. Drop page cache (sysctl vm.drop_caches=1, fadvise "DONTNEED", umount and mount again, "reboot"). Cause significant read ahead. Large read ahead request is split by blk_queue_split(). Parts of the read ahead that are already in the stripe cache, or find an available stripe cache to use, can be serviced. Parts of the read ahead that would need "too much work", would need to wait for a "stripe_head" to become available, are rejected immediately. For larger read ahead requests that are split in many pieces, it is very likely that some "splits" will be serviced, but then the stripe cache is exhausted/busy, and the remaining ones will be rejected. Signed-off-by: Lars Ellenberg Signed-off-by: Christoph Böhmwalder Cc: # 4.13.x Link: https://lore.kernel.org/r/20220330185551.3553196-1-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- drivers/block/drbd/drbd_req.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/drivers/block/drbd/drbd_req.c +++ b/drivers/block/drbd/drbd_req.c @@ -180,7 +180,8 @@ void start_new_tl_epoch(struct drbd_conn void complete_master_bio(struct drbd_device *device, struct bio_and_error *m) { - m->bio->bi_status = errno_to_blk_status(m->error); + if (unlikely(m->error)) + m->bio->bi_status = errno_to_blk_status(m->error); bio_endio(m->bio); dec_ap_bio(device); }