Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp215870ybi; Fri, 7 Jun 2019 06:59:13 -0700 (PDT) X-Google-Smtp-Source: APXvYqyoMcIXw1q4+eZsvYO3eQh4Dtyy69AKS7ssLryBKzh0OhZSC5IZOwCcHWJhqC9uNO5KM9Ok X-Received: by 2002:a17:90a:2e89:: with SMTP id r9mr5448099pjd.117.1559915953508; Fri, 07 Jun 2019 06:59:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559915953; cv=none; d=google.com; s=arc-20160816; b=GQF2JkI0YI+njQwAlH77dfecGy1oFNg6IYsk+pJgQmZ4UfUuq9P/VTHr9azLdgvZsD gqlbF+5OgL29CeQTlFeEd4vaSRBP8eb+Ayw11M04zTalCyyfL3Rs+x8oxPfFWfG1Ekj6 do82dZLSrVdpntrQrcWFyTx0D7ZwDfqcZJIhhdnLWDhlinQoGyyVhbpbkF51RmlUWU9s O/XtecC+SpLwm0ziTd2ZA4UdY62/toh+XJUGd2IdaouBeZvsJhjLjIz6TIK+G7E08hwq le2UIRaK+VKqBCvxKwhl8fvUsHN8gDwXeD7tZwJZOkH+iYO6rcJuMSJDKEJBmfXJHYhH YQtg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :ironport-sdr:ironport-sdr:dkim-signature; bh=ap1XX28f2JwvWW/gVE1nfzwL0NTLce96ZxoY24IK21U=; b=ENGTncep5b9vBmrs1qFS1JrjC65Z7Py9tDXpuNO+tZ1uPfsYhWZawvVL8NTlM6Pdcd Vj494V9EW6WBXoyOpgJeB+It5AhVDeRXPtN7COSgrDc5SeoqBJjfLNa6r1XCsEjFhHGn BxP5oJhEJc04iuXlz5bAc85S5KD+/c0S1Bc4cU2JQqnCAC+ua0yDFCP9fPH9R6203q58 Xank0ZdIv+M2GIkDMrDukWmKVqJYlB/sKhswGIYPxpChiO1h6Ue5CKxT9o2BGaWi1hBr F2btdXy+qf6UtkMeMBeC6CGyFmALVnf3yih6VW1amtN8hnZZDqPOCpaW0N3WEqIdNRhF Pg+A== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@wdc.com header.s=dkim.wdc.com header.b=B2UKT4d3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=wdc.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h68si1771514pgc.270.2019.06.07.06.58.51; Fri, 07 Jun 2019 06:59:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@wdc.com header.s=dkim.wdc.com header.b=B2UKT4d3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=wdc.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729406AbfFGNLy (ORCPT + 99 others); Fri, 7 Jun 2019 09:11:54 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:53172 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729380AbfFGNLw (ORCPT ); Fri, 7 Jun 2019 09:11:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1559913112; x=1591449112; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GKTLJDMPSC7oI+dEroHu1zACnrTugLVZNcumBahEBmo=; b=B2UKT4d3JsOHykMpUlqg2pHIlVri7GvGwt1Atpk48/GKVGVGckbwm5cy O853VRBtoGP0bjLNuWyzOmJxLWsOSeKTu6DgVTghNyOu1IzdFcA+gapKV StLb1uhveO0ffEYjz5jQ55/TfcCP3ZAWfAL0abxr//gvVwiQ+M0FwS+xa BSaqnOSM2yFiUFuz+JsDKXkxFe8fL7I5869PSzPSSGlvgqwQr3z8IiMWh G8foA4vAWf3sXFmqc+pIsjIpr27R3CFrnk/Q2CJHnpq4fZp5mTi7MH8Tb x08RsDgBHiq5hBReGlognKx/X0qPgg0SXkWgDI06XpDto2ZIH1VY8hrJK A==; X-IronPort-AV: E=Sophos;i="5.63,563,1557158400"; d="scan'208";a="110027833" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Jun 2019 21:11:51 +0800 IronPort-SDR: jqRFCCLG0nmeDNsMiHbogjZ+eUNuZb7uV/Do2HAEKihQPJb+mMCylE+cDj+nRsu0J2dVSEz0cv 051RjdOW8mjfQrTwpRGdHOgYJEtZewwubAMr14BMmbCGn0ucDAsvvzmKN+4Yc2GuHx04L5IIPN em25PoWUpHuaEYUAeD8buIXav5ATeOIBVt6WttedufqbVB4VX7/c2N8ySXI0pO7fnv+D81TWh/ UrD/ETD0a5Hdoc03uTjLBHfjaXWlaU+RJdDRsNH4ClagCk4WPRjwZ6q2WDtS8VXB8liPjIDcuB 5BM+3Kzbe5fIkoOBHqztiQE7 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP; 07 Jun 2019 05:49:08 -0700 IronPort-SDR: N4hXjbYkwpT0Rm7GGc+XOG4jjckvPLd/sRk5mC7ndisiGr/S9C24fE5PFMbLCE8EILow8OVYHb cXXOn7E69piuGZ6+76vgHNhaXuLC5rgBvFjisyHJYwy0v4IzdIrxmxd9uN2aTqHMUBiXugb2ky hFutFfqr5Hbm4Zkq9QWNrI2yVNIDuXO3+wNhP5e/Gluy0tSE/iIIhB0fSyP4ibphvpTPyhEAof SzYH9JqoIUshxY2xeX/Mo4nNk1jpX6YKOcNqlePn4uFUrKr9mHJyDHpj/Hst9lHt2/+icPeprv xOs= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Jun 2019 06:11:50 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Qu Wenruo , Nikolay Borisov , linux-kernel@vger.kernel.org, Hannes Reinecke , linux-fsdevel@vger.kernel.org, Damien Le Moal , =?UTF-8?q?Matias=20Bj=C3=B8rling?= , Johannes Thumshirn , Bart Van Assche , Naohiro Aota Subject: [PATCH 17/19] btrfs: shrink delayed allocation size in HMZONED mode Date: Fri, 7 Jun 2019 22:10:23 +0900 Message-Id: <20190607131025.31996-18-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190607131025.31996-1-naohiro.aota@wdc.com> References: <20190607131025.31996-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In a write heavy workload, the following scenario can occur: 1. mark page #0 to page #2 (and their corresponding extent region) as dirty and candidate for delayed allocation pages 0 1 2 3 4 dirty o o o - - towrite - - - - - delayed o o o - - alloc 2. extent_write_cache_pages() mark dirty pages as TOWRITE pages 0 1 2 3 4 dirty o o o - - towrite o o o - - delayed o o o - - alloc 3. Meanwhile, another write dirties page #3 and page #4 pages 0 1 2 3 4 dirty o o o o o towrite o o o - - delayed o o o o o alloc 4. find_lock_delalloc_range() decide to allocate a region to write page #0 to page #4 5. but, extent_write_cache_pages() only initiate write to TOWRITE tagged pages (#0 to #2) So the above process leaves page #3 and page #4 behind. Usually, the periodic dirty flush kicks write IOs for page #3 and #4. However, if we try to mount a subvolume at this timing, mount process takes s_umount write lock to block the periodic flush to come in. To deal with the problem, shrink the delayed allocation region to have only expected to be written pages. Signed-off-by: Naohiro Aota --- fs/btrfs/extent_io.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index c73c69e2bef4..ea582ff85c73 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3310,6 +3310,33 @@ static noinline_for_stack int writepage_delalloc(struct inode *inode, delalloc_start = delalloc_end + 1; continue; } + + if (btrfs_fs_incompat(btrfs_sb(inode->i_sb), HMZONED) && + (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) && + ((delalloc_start >> PAGE_SHIFT) < + (delalloc_end >> PAGE_SHIFT))) { + unsigned long i; + unsigned long end_index = delalloc_end >> PAGE_SHIFT; + + for (i = delalloc_start >> PAGE_SHIFT; + i <= end_index; i++) + if (!xa_get_mark(&inode->i_mapping->i_pages, i, + PAGECACHE_TAG_TOWRITE)) + break; + + if (i <= end_index) { + u64 unlock_start = (u64)i << PAGE_SHIFT; + + if (i == delalloc_start >> PAGE_SHIFT) + unlock_start += PAGE_SIZE; + + unlock_extent(tree, unlock_start, delalloc_end); + __unlock_for_delalloc(inode, page, unlock_start, + delalloc_end); + delalloc_end = unlock_start - 1; + } + } + ret = btrfs_run_delalloc_range(inode, page, delalloc_start, delalloc_end, &page_started, nr_written, wbc); /* File system has been set read-only */ -- 2.21.0