Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp1745116pxb; Thu, 16 Sep 2021 14:38:41 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzU8y77ngsEdxN8kmFkE1WJzqTKVXe4hJpp6S1/k3xMhrv4693bZXVqEf/I5IdZLd3DJTkb X-Received: by 2002:a17:906:2350:: with SMTP id m16mr8494592eja.340.1631828321647; Thu, 16 Sep 2021 14:38:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631828321; cv=none; d=google.com; s=arc-20160816; b=ODg18xeyfqtSB19g9UI/CIXG7TS5oe2SyvlWrt1GHD0tIXKPk3ECQ94PViaK8lJ0jL XTrklwfGASn0ddYTqUnZ2pl1u1bJTJp5oi+fn0qf/6d2jdwVCRvKsHkeFX5L6qE2i4tI albFypJhPKxkZq9eUYen4g3Xpvl1PXGpxgh0VJf720xICtMFHEO0381giLmvoXZkfZFC GTeD5pKITYMOWDICbMfXyCiLaVrbqvzW+Mr0nXfXafU5Qq93EBwwKg3k0olyxcwd3L8k 4D2zwZ4FU+s8br2fRLVt/BenkGjyORGM0TW3Aza6qpirFURlryKfOBlrFBOLp+CMi2Zi rOFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=SjnK6ggYrgkMxhJREHgFIRkrwdGneB7FulhNExWTMQc=; b=JOgInlF85gIS+aCK/UVrG2HOhewaLdBlQTdxmI5w8dkhyJOI/JXGZL1aVsW1LERRxv cVDSNj5f9uk7KiFtBCd1jUaX1wguKCjA+gcoqXnk8syulan4OsLdOB8ave4pFgQiIply FEYHxoZk6tvvTiHZ4BggEuFb/DFvGVJqn3pMDIcktyPQIalPvMupeyXyLiAq9spYBhC6 53WS1S7eGsBw4QUTJ5E28BRo6ZnTpANnqys078bfGV7GNh6tZvwyWGK+waDoIHShjz4i +pwg2iIVy+gpddzjgDqNf5hTictw4K7BTDxIexVGxgVVH7drFtD/To4MlhYlZzvQwMVE Xn5g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=aA77HwyC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j7si4142012edw.370.2021.09.16.14.38.15; Thu, 16 Sep 2021 14:38:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=aA77HwyC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242812AbhIPQfy (ORCPT + 99 others); Thu, 16 Sep 2021 12:35:54 -0400 Received: from mail.kernel.org ([198.145.29.99]:38430 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235425AbhIPQ14 (ORCPT ); Thu, 16 Sep 2021 12:27:56 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id E993E615A4; Thu, 16 Sep 2021 16:17:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1631809072; bh=uow/uR9c1pz9VNBGPldfZEabaW1NDQAhlduO5n+Q9BM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=aA77HwyCW/lImDdIyLZa2WpjMPiwegkfEv/Hd1FnUhmALjGOUijeJKTitsgELHg99 1xlqNZbEWw49sbtksRSDg1jfohfnB173iT/VysLcSZvuCsEgNKs0uHGojlOEuN8/+W XGNNkdiQ+jWyS1fiT0/UJ0sc1u7vPEQg5fkbEaus= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Nikolay Borisov , Josef Bacik , David Sterba Subject: [PATCH 5.13 003/380] btrfs: wait on async extents when flushing delalloc Date: Thu, 16 Sep 2021 17:56:00 +0200 Message-Id: <20210916155804.088586465@linuxfoundation.org> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210916155803.966362085@linuxfoundation.org> References: <20210916155803.966362085@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Josef Bacik commit e16460707e94c3d4c1b5418cb68b28b8efa903b2 upstream. I've been debugging an early ENOSPC problem in production and finally root caused it to this problem. When we switched to the per-inode in 38d715f494f2 ("btrfs: use btrfs_start_delalloc_roots in shrink_delalloc") I pulled out the async extent handling, because we were doing the correct thing by calling filemap_flush() if we had async extents set. This would properly wait on any async extents by locking the page in the second flush, thus making sure our ordered extents were properly set up. However when I switched us back to page based flushing, I used sync_inode(), which allows us to pass in our own wbc. The problem here is that sync_inode() is smarter than the filemap_* helpers, it tries to avoid calling writepages at all. This means that our second call could skip calling do_writepages altogether, and thus not wait on the pagelock for the async helpers. This means we could come back before any ordered extents were created and then simply continue on in our flushing mechanisms and ENOSPC out when we have plenty of space to use. Fix this by putting back the async pages logic in shrink_delalloc. This allows us to bulk write out everything that we need to, and then we can wait in one place for the async helpers to catch up, and then wait on any ordered extents that are created. Fixes: e076ab2a2ca7 ("btrfs: shrink delalloc pages instead of full inodes") CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Nikolay Borisov Signed-off-by: Josef Bacik Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/inode.c | 4 ---- fs/btrfs/space-info.c | 40 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+), 4 deletions(-) --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -9774,10 +9774,6 @@ static int start_delalloc_inodes(struct &work->work); } else { ret = sync_inode(inode, wbc); - if (!ret && - test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, - &BTRFS_I(inode)->runtime_flags)) - ret = sync_inode(inode, wbc); btrfs_add_delayed_iput(inode); if (ret || wbc->nr_to_write <= 0) goto out; --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -539,9 +539,49 @@ static void shrink_delalloc(struct btrfs while ((delalloc_bytes || ordered_bytes) && loops < 3) { u64 temp = min(delalloc_bytes, to_reclaim) >> PAGE_SHIFT; long nr_pages = min_t(u64, temp, LONG_MAX); + int async_pages; btrfs_start_delalloc_roots(fs_info, nr_pages, true); + /* + * We need to make sure any outstanding async pages are now + * processed before we continue. This is because things like + * sync_inode() try to be smart and skip writing if the inode is + * marked clean. We don't use filemap_fwrite for flushing + * because we want to control how many pages we write out at a + * time, thus this is the only safe way to make sure we've + * waited for outstanding compressed workers to have started + * their jobs and thus have ordered extents set up properly. + * + * This exists because we do not want to wait for each + * individual inode to finish its async work, we simply want to + * start the IO on everybody, and then come back here and wait + * for all of the async work to catch up. Once we're done with + * that we know we'll have ordered extents for everything and we + * can decide if we wait for that or not. + * + * If we choose to replace this in the future, make absolutely + * sure that the proper waiting is being done in the async case, + * as there have been bugs in that area before. + */ + async_pages = atomic_read(&fs_info->async_delalloc_pages); + if (!async_pages) + goto skip_async; + + /* + * We don't want to wait forever, if we wrote less pages in this + * loop than we have outstanding, only wait for that number of + * pages, otherwise we can wait for all async pages to finish + * before continuing. + */ + if (async_pages > nr_pages) + async_pages -= nr_pages; + else + async_pages = 0; + wait_event(fs_info->async_submit_wait, + atomic_read(&fs_info->async_delalloc_pages) <= + async_pages); +skip_async: loops++; if (wait_ordered && !trans) { btrfs_wait_ordered_roots(fs_info, items, 0, (u64)-1);