Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp3058202rwb; Mon, 15 Aug 2022 17:06:12 -0700 (PDT) X-Google-Smtp-Source: AA6agR7khjFYITCSolh3BdXHcFxxui8n8vSkZIrlnFeASUmmbfO4VkMavXTgRzEqWbWcGW3BRS5Y X-Received: by 2002:aa7:d0d8:0:b0:441:4671:49d6 with SMTP id u24-20020aa7d0d8000000b00441467149d6mr17011923edo.252.1660608372261; Mon, 15 Aug 2022 17:06:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660608372; cv=none; d=google.com; s=arc-20160816; b=zarRA2vBoyxpeaYLgZkv2yPMc1V4tUm9r9m2LcGivy6y+75ESakqQJfwxm4nM2nnhQ I1mK1zxsJnD8I/eLssRXqME6TQV7LFdLkU1t6PFW7/ZfydVpoHH1sTk52PxR8fg1oxsR SR9Hathz3gF2DRxXsuJZoTvvVyHb0hq+HBtOqI8pvGCte6DfTszRLw1zmssPcDCIMLW/ /u2r73JIuMWafBaGaaDwjAsZTrcQxtgrUpZ1oaf91sA2OQtYXU2hfDbgzeZs1kO253XS 0vc1WFaYKru2bbN/KiNtHojscRDyoVBVJJ1wVTRrSY2Q7ML4TFm6GW44sjhxX1TztvdT KWyQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=VsFm8igK9dKw+osLV1DMDrDGNMBmHB49nRVMrjWesXM=; b=IBq7IrBtOa3tZtIm5k/sQ1b0rKAwySoLLz41R5tWs0OXryYS0i6WT+8zIVef847x53 Q9+5DnzRg41ejWlvXRhmDAYsA+3IuNn2sWALuFGS0BQSxEkZg96FCttG6HewRe0n0c5I yxJreeZXPn7NbuUY3uTKbdlzqWQvUPe76GNxiQkVVEMmyrTeC09TpoOR2/is4dYPxvle aFWg2H0IWexrmSx3lrKkwnduEfFn4Iw9SzX4j2QAqiCOWSvNnqi5j0Xbgquy2f3tpIOs eCtIW/UvQGdYq52ZISwnysQGGkO0EfZqVG6O5kwcwF9eY1yNVZGi0aBb7btX6ms3+OeP glpw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=SZd64X0p; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i36-20020a0564020f2400b0044398d621f1si4182067eda.424.2022.08.15.17.05.16; Mon, 15 Aug 2022 17:06:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=SZd64X0p; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355889AbiHPABp (ORCPT + 99 others); Mon, 15 Aug 2022 20:01:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356218AbiHOXx6 (ORCPT ); Mon, 15 Aug 2022 19:53:58 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2468A15E6A5; Mon, 15 Aug 2022 13:18:41 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1871260F17; Mon, 15 Aug 2022 20:18:35 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 091AAC433C1; Mon, 15 Aug 2022 20:18:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1660594714; bh=uGV0oAYI1DaH27rMwlkN03ACoMpW3nZ3BAo4v9G+Vaw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SZd64X0pgkVcA+LCfqnaQ+N7e1HbwXm1wmb2QijwdtZ88nVzcvV6Hhx+TbEQRD4Ji OMCTbeKhrNIsFkPKTMCtlmH4EL0UM6A2fRd0MXq6U+XZhOUAnchsVoynLuknad5qso XwsTSLUU1lK3nfBE01otG5Gw6WB84uZvzg5bFFAg= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Qu Wenruo , David Sterba , Sasha Levin Subject: [PATCH 5.19 0535/1157] btrfs: update stripe_sectors::uptodate in steal_rbio Date: Mon, 15 Aug 2022 19:58:11 +0200 Message-Id: <20220815180501.056583969@linuxfoundation.org> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220815180439.416659447@linuxfoundation.org> References: <20220815180439.416659447@linuxfoundation.org> User-Agent: quilt/0.67 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Qu Wenruo [ Upstream commit 4d10046613333508d31fe926c545c8c0b620508a ] [BUG] With added debugging, it turns out the following write sequence would cause extra read which is unnecessary: # xfs_io -f -s -c "pwrite -b 32k 0 32k" -c "pwrite -b 32k 32k 32k" \ -c "pwrite -b 32k 64k 32k" -c "pwrite -b 32k 96k 32k" \ $mnt/file The debug message looks like this (btrfs header skipped): partial rmw, full stripe=389152768 opf=0x0 devid=3 type=1 offset=32768 physical=323059712 len=32768 partial rmw, full stripe=389152768 opf=0x0 devid=1 type=2 offset=0 physical=67174400 len=65536 full stripe rmw, full stripe=389152768 opf=0x1 devid=3 type=1 offset=0 physical=323026944 len=32768 full stripe rmw, full stripe=389152768 opf=0x1 devid=2 type=-1 offset=0 physical=323026944 len=32768 partial rmw, full stripe=298844160 opf=0x0 devid=1 type=1 offset=32768 physical=22052864 len=32768 partial rmw, full stripe=298844160 opf=0x0 devid=2 type=2 offset=0 physical=277872640 len=65536 full stripe rmw, full stripe=298844160 opf=0x1 devid=1 type=1 offset=0 physical=22020096 len=32768 full stripe rmw, full stripe=298844160 opf=0x1 devid=3 type=-1 offset=0 physical=277872640 len=32768 partial rmw, full stripe=389152768 opf=0x0 devid=3 type=1 offset=0 physical=323026944 len=32768 partial rmw, full stripe=389152768 opf=0x0 devid=1 type=2 offset=0 physical=67174400 len=65536 ^^^^ Still partial read, even 389152768 is already cached by the first. write. full stripe rmw, full stripe=389152768 opf=0x1 devid=3 type=1 offset=32768 physical=323059712 len=32768 full stripe rmw, full stripe=389152768 opf=0x1 devid=2 type=-1 offset=32768 physical=323059712 len=32768 partial rmw, full stripe=298844160 opf=0x0 devid=1 type=1 offset=0 physical=22020096 len=32768 partial rmw, full stripe=298844160 opf=0x0 devid=2 type=2 offset=0 physical=277872640 len=65536 ^^^^ Still partial read for 298844160. full stripe rmw, full stripe=298844160 opf=0x1 devid=1 type=1 offset=32768 physical=22052864 len=32768 full stripe rmw, full stripe=298844160 opf=0x1 devid=3 type=-1 offset=32768 physical=277905408 len=32768 This means every 32K writes, even they are in the same full stripe, still trigger read for previously cached data. This would cause extra RAID56 IO, making the btrfs raid56 cache useless. [CAUSE] Commit d4e28d9b5f04 ("btrfs: raid56: make steal_rbio() subpage compatible") tries to make steal_rbio() subpage compatible, but during that conversion, there is one thing missing. We no longer rely on PageUptodate(rbio->stripe_pages[i]), but rbio->stripe_nsectors[i].uptodate to determine if a sector is uptodate. This means, previously if we switch the pointer, everything is done, as the PageUptodate flag is still bound to that page. But now we have to manually mark the involved sectors uptodate, or later raid56_rmw_stripe() will find the stolen sector is not uptodate, and assemble the read bio for it, wasting IO. [FIX] We can easily fix the bug, by also update the rbio->stripe_sectors[].uptodate in steal_rbio(). With this fixed, now the same write pattern no longer leads to the same unnecessary read: partial rmw, full stripe=389152768 opf=0x0 devid=3 type=1 offset=32768 physical=323059712 len=32768 partial rmw, full stripe=389152768 opf=0x0 devid=1 type=2 offset=0 physical=67174400 len=65536 full stripe rmw, full stripe=389152768 opf=0x1 devid=3 type=1 offset=0 physical=323026944 len=32768 full stripe rmw, full stripe=389152768 opf=0x1 devid=2 type=-1 offset=0 physical=323026944 len=32768 partial rmw, full stripe=298844160 opf=0x0 devid=1 type=1 offset=32768 physical=22052864 len=32768 partial rmw, full stripe=298844160 opf=0x0 devid=2 type=2 offset=0 physical=277872640 len=65536 full stripe rmw, full stripe=298844160 opf=0x1 devid=1 type=1 offset=0 physical=22020096 len=32768 full stripe rmw, full stripe=298844160 opf=0x1 devid=3 type=-1 offset=0 physical=277872640 len=32768 ^^^ No more partial read, directly into the write path. full stripe rmw, full stripe=389152768 opf=0x1 devid=3 type=1 offset=32768 physical=323059712 len=32768 full stripe rmw, full stripe=389152768 opf=0x1 devid=2 type=-1 offset=32768 physical=323059712 len=32768 full stripe rmw, full stripe=298844160 opf=0x1 devid=1 type=1 offset=32768 physical=22052864 len=32768 full stripe rmw, full stripe=298844160 opf=0x1 devid=3 type=-1 offset=32768 physical=277905408 len=32768 Fixes: d4e28d9b5f04 ("btrfs: raid56: make steal_rbio() subpage compatible") Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Sasha Levin --- fs/btrfs/raid56.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index a5b623ee6fac..13e0bb0479e6 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -347,6 +347,24 @@ static void index_stripe_sectors(struct btrfs_raid_bio *rbio) } } +static void steal_rbio_page(struct btrfs_raid_bio *src, + struct btrfs_raid_bio *dest, int page_nr) +{ + const u32 sectorsize = src->bioc->fs_info->sectorsize; + const u32 sectors_per_page = PAGE_SIZE / sectorsize; + int i; + + if (dest->stripe_pages[page_nr]) + __free_page(dest->stripe_pages[page_nr]); + dest->stripe_pages[page_nr] = src->stripe_pages[page_nr]; + src->stripe_pages[page_nr] = NULL; + + /* Also update the sector->uptodate bits. */ + for (i = sectors_per_page * page_nr; + i < sectors_per_page * page_nr + sectors_per_page; i++) + dest->stripe_sectors[i].uptodate = true; +} + /* * Stealing an rbio means taking all the uptodate pages from the stripe array * in the source rbio and putting them into the destination rbio. @@ -358,7 +376,6 @@ static void steal_rbio(struct btrfs_raid_bio *src, struct btrfs_raid_bio *dest) { int i; struct page *s; - struct page *d; if (!test_bit(RBIO_CACHE_READY_BIT, &src->flags)) return; @@ -368,12 +385,7 @@ static void steal_rbio(struct btrfs_raid_bio *src, struct btrfs_raid_bio *dest) if (!s || !full_page_sectors_uptodate(src, i)) continue; - d = dest->stripe_pages[i]; - if (d) - __free_page(d); - - dest->stripe_pages[i] = s; - src->stripe_pages[i] = NULL; + steal_rbio_page(src, dest, i); } index_stripe_sectors(dest); index_stripe_sectors(src); -- 2.35.1