Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp675994pxb; Mon, 25 Oct 2021 16:27:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwvO+f9pheAi6ImTYfyatgJjLH5F9kA264wiotClCIkkYAywnuAN4UdOR7lU9L59lKHtB6W X-Received: by 2002:a17:907:6093:: with SMTP id ht19mr17022107ejc.482.1635204443216; Mon, 25 Oct 2021 16:27:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635204443; cv=none; d=google.com; s=arc-20160816; b=xBzpUEJv+5GuwJYdC2oYey5VFuk7e+hGwCWC+U0mLuqkN7L6s7dmUBObruAWKKZzV1 0Dgq6UfukDVkNX9XKsJJXLFHPe95oTsQpdBlFEwuu2aR44inSstqI5AaOTEI3ABCWAZz wff3HnLsICb7RJB8bnLI/QMjgkhmwHjy9a4Ad1/VeVFWDZz98M1xYO7JoXTvl46TL72g AEuox5Sct73VXkbUXm9ohwN5xh51RsqCp9O1MM09mQmEtjhnstclFZeYkVnM4+nIKKzL zVsvpvClFrajWq/ia52+D2FiBdUOz1U6COACVhJIQeCqEVV8fz00nq4e4lM/d0K8U10F gIWQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=6fkt83gMMTF1/ZpzqAtESvv8p51FqUmRqMaYQuPuaW0=; b=i4YrDUGpxhJQvhUbgYcjrTxWX6Zd4LrvrN3KZDlu8x7HN8XbrArJTKe2hEpl7DDH52 tvy/JBUce4Q0MDOsRhy98kK7+HDEBjXb4cHUXH4xGA2El6nWKKyJEKZ9LcSzK+NPYkJ1 FztnywJcqdTk2G0BILcgYqAyefzHDGkKYjRoBSEsD57eIVPbcNspVDfDljEvr3foHrzi 5KZdnL0L7BFy+YRHHqA123wE1R+SuAi4dr09RJ1kCuTKhf/TUyKCLCRg8w0GtHbR1aZL TptysdPNAWbpGgkffaKnutzWUpj75COKSy8AaKbD+52FPyrcH84bjJohJ/9u5CuR7ZW4 S6Hg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=Wk3LnPM0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w2si15336336edr.412.2021.10.25.16.26.59; Mon, 25 Oct 2021 16:27:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=Wk3LnPM0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234777AbhJYTZ6 (ORCPT + 99 others); Mon, 25 Oct 2021 15:25:58 -0400 Received: from mail.kernel.org ([198.145.29.99]:41026 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235144AbhJYTX1 (ORCPT ); Mon, 25 Oct 2021 15:23:27 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 3114F61078; Mon, 25 Oct 2021 19:21:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1635189664; bh=h38F0kS9/T8r3/lNg8duCtAx+YaLdgSJrsYFxxTIClM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Wk3LnPM0kVbITXgTSbUJdsVNdSahv25qqfQjf0xxQgdmUshqRVd6Yr1ih65JJjdps N9eLAy2CGCyDhYFjXZ5aLAa1sFop/Tj3v7ARsSVgfA1Nz6WY6X2FO+fCDW6Li1GUDu jQuSoIHE+JzI+nPuYuv7qoCkDHTrYKbsguajcGSw= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Josef Bacik , Filipe Manana , David Sterba , Anand Jain Subject: [PATCH 4.14 01/30] btrfs: always wait on ordered extents at fsync time Date: Mon, 25 Oct 2021 21:14:21 +0200 Message-Id: <20211025190923.166779214@linuxfoundation.org> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211025190922.089277904@linuxfoundation.org> References: <20211025190922.089277904@linuxfoundation.org> User-Agent: quilt/0.66 X-stable: review X-Patchwork-Hint: ignore MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Josef Bacik commit b5e6c3e170b77025b5f6174258c7ad71eed2d4de upstream. There's a priority inversion that exists currently with btrfs fsync. In some cases we will collect outstanding ordered extents onto a list and only wait on them at the very last second. However this "very last second" falls inside of a transaction handle, so if we are in a lower priority cgroup we can end up holding the transaction open for longer than needed, so if a high priority cgroup is also trying to fsync() it'll see latency. Signed-off-by: Josef Bacik Reviewed-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Anand Jain Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/file.c | 56 ++++---------------------------------------------------- 1 file changed, 4 insertions(+), 52 deletions(-) --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2102,53 +2102,12 @@ int btrfs_sync_file(struct file *file, l atomic_inc(&root->log_batch); full_sync = test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &BTRFS_I(inode)->runtime_flags); + /* - * We might have have had more pages made dirty after calling - * start_ordered_ops and before acquiring the inode's i_mutex. + * We have to do this here to avoid the priority inversion of waiting on + * IO of a lower priority task while holding a transaciton open. */ - if (full_sync) { - /* - * For a full sync, we need to make sure any ordered operations - * start and finish before we start logging the inode, so that - * all extents are persisted and the respective file extent - * items are in the fs/subvol btree. - */ - ret = btrfs_wait_ordered_range(inode, start, len); - } else { - /* - * Start any new ordered operations before starting to log the - * inode. We will wait for them to finish in btrfs_sync_log(). - * - * Right before acquiring the inode's mutex, we might have new - * writes dirtying pages, which won't immediately start the - * respective ordered operations - that is done through the - * fill_delalloc callbacks invoked from the writepage and - * writepages address space operations. So make sure we start - * all ordered operations before starting to log our inode. Not - * doing this means that while logging the inode, writeback - * could start and invoke writepage/writepages, which would call - * the fill_delalloc callbacks (cow_file_range, - * submit_compressed_extents). These callbacks add first an - * extent map to the modified list of extents and then create - * the respective ordered operation, which means in - * tree-log.c:btrfs_log_inode() we might capture all existing - * ordered operations (with btrfs_get_logged_extents()) before - * the fill_delalloc callback adds its ordered operation, and by - * the time we visit the modified list of extent maps (with - * btrfs_log_changed_extents()), we see and process the extent - * map they created. We then use the extent map to construct a - * file extent item for logging without waiting for the - * respective ordered operation to finish - this file extent - * item points to a disk location that might not have yet been - * written to, containing random data - so after a crash a log - * replay will make our inode have file extent items that point - * to disk locations containing invalid data, as we returned - * success to userspace without waiting for the respective - * ordered operation to finish, because it wasn't captured by - * btrfs_get_logged_extents(). - */ - ret = start_ordered_ops(inode, start, end); - } + ret = btrfs_wait_ordered_range(inode, start, len); if (ret) { up_write(&BTRFS_I(inode)->dio_sem); inode_unlock(inode); @@ -2283,13 +2242,6 @@ int btrfs_sync_file(struct file *file, l goto out; } } - if (!full_sync) { - ret = btrfs_wait_ordered_range(inode, start, len); - if (ret) { - btrfs_end_transaction(trans); - goto out; - } - } ret = btrfs_commit_transaction(trans); } else { ret = btrfs_end_transaction(trans);