Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp7007041ybi; Thu, 13 Jun 2019 08:04:01 -0700 (PDT) X-Google-Smtp-Source: APXvYqy+rA62sIpDUT1F163Gdpzv8sCRZ98zMbmZ0Z+TVkCwjVKPmPit64ifagBI5XgAh2Mx9lZF X-Received: by 2002:aa7:910e:: with SMTP id 14mr19468303pfh.153.1560438241460; Thu, 13 Jun 2019 08:04:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560438241; cv=none; d=google.com; s=arc-20160816; b=shcdIYcf7HSsTYiVF6ddLYUliwBKB6trYZJis1DvpVbrJZfYgru/vs5VfyfB2OhmaM TjYyxvsh9gB52a3WlVGtrzHc52i4L1MzkZAewEUEtAFiIzr0BCl187FJyuxS2GONLsS9 eSgKxxwYOoUEhN1E0B9yQBdwBvrxATobuMPcLIK2AJGdeddHTxH6GDUzKR23E6Zx1sca ntAD4Pvi0HDVXDGkm84xz1VE7/n+SV0dVtMpP5EOCZm9LaEPnWebrK5jlSVF1VqOFd4K XHeDA6qRzhehVpXMo2JdrJI5AywSSW1e1+BJ7Fashz2EylkTuq15rlqradNV/9I87QzN payQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=8qDvqM2GUVe0Bp/JZny67bgQgfoSmGEovigFpwQP//c=; b=cA9Yady1MW+/6OnNKOu18eEVlYSq9V2yzEiTHo6n+VarrAS5DGIm+mFVyd81tcUGql 5RowuawRBX/aP9BQz4pjyCMWdF35x37gc0UjQiTjNYwbC3+SKXUN9Z9PiroLodk9br9i YLBIsNP0TwWdW+7ns1J6X8TFfj9+HAYYzxFIZcKDAlu0GTgFWTEIrdP0JRFlltQI1Xqq 7X+EE7voZfUo1YaQRFkFyx1jL68edufjBiRg20IyJ3M3lUGBmXQJXlJe5jfqyaHDDIrx p48nA0vEnfcSYq0WffTFgfBqRdRrqhKcbeX/m5YwVWJVeIL7e9iUks/dKRF9Xa/kL8p+ dSlw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b="HB8OPOL/"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 199si40019pgh.302.2019.06.13.08.03.44; Thu, 13 Jun 2019 08:04:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b="HB8OPOL/"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733224AbfFMPDJ (ORCPT + 99 others); Thu, 13 Jun 2019 11:03:09 -0400 Received: from mail-qk1-f193.google.com ([209.85.222.193]:43124 "EHLO mail-qk1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732499AbfFMOYM (ORCPT ); Thu, 13 Jun 2019 10:24:12 -0400 Received: by mail-qk1-f193.google.com with SMTP id m14so12842853qka.10 for ; Thu, 13 Jun 2019 07:24:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=8qDvqM2GUVe0Bp/JZny67bgQgfoSmGEovigFpwQP//c=; b=HB8OPOL/WvPwqHDl2eYxS6syerahFtAspxoibofrZcQcf7QOvufT6iB4Sd3jez8tJG pUmGjRjzeKQnLG/T5ErSc2j8oWTcC1W1uueLQjpD604phcVGRT/glCi/mgHMGGiP/f4V cd0tAw2hzbWuoOYVvC3hIQm7Lg2irgI1qj7/Nv7/VzIYXJfIuwwnOzKyHVovagfv1Sng ZkzxgpOSIXATFY7an1X03hoSAzOgaDtDBS6im5xDx8oftM6RWVJDGeLw7OMQ1iEIbF/P AIFGyr0GWbMK3PnjyNaey0w0JnaJSVn5B9tu+zZ4eOwC6jDdPEJVTDWzSq6DRFmyBDre FHoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=8qDvqM2GUVe0Bp/JZny67bgQgfoSmGEovigFpwQP//c=; b=UZLt6dMyhT3K73FNrpws8hav5KuBK5ZhnCdWAbNj9N5Yr8gIlRMpk+JJGxlW74i4y1 HplHsErE9BXxjZXTRXjDT1oKUwy9ieVAU7ZRBjWLq/1+RnSHUwXSek7uuhhVWJMnOPDl lAVo68jxBUR3wNYPSF1uvCf7KyKZfTeBg+uz/Bh+NT+nK6mA9mxOZvLejXS62ipM4QXX Qo5MtGUUzl3S+pP5mZkwauZgr2rBneLgRCve9O3y8EHqjbVfBqjfhySCym27QZwXtr1F pwO+scJcCOMl0teYdzlNy9X1W/tWwPS2i2yriLSKkzgoI8oarOQOI4vQ0AB+fkjz5vK9 Tw6g== X-Gm-Message-State: APjAAAWsSf9+p0EX8I/rMHznsq/Cs7i9H8a7i4/YfV9y7dimdwn2iHSh 1wu3qqgDFfh1OEGtWnUODVDrlQ== X-Received: by 2002:a37:9481:: with SMTP id w123mr52587980qkd.319.1560435851467; Thu, 13 Jun 2019 07:24:11 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::9d6b]) by smtp.gmail.com with ESMTPSA id 2sm2066304qtz.73.2019.06.13.07.24.10 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Jun 2019 07:24:10 -0700 (PDT) Date: Thu, 13 Jun 2019 10:24:09 -0400 From: Josef Bacik To: Naohiro Aota Cc: linux-btrfs@vger.kernel.org, David Sterba , Chris Mason , Josef Bacik , Qu Wenruo , Nikolay Borisov , linux-kernel@vger.kernel.org, Hannes Reinecke , linux-fsdevel@vger.kernel.org, Damien Le Moal , Matias =?utf-8?B?QmrDuHJsaW5n?= , Johannes Thumshirn , Bart Van Assche Subject: Re: [PATCH 14/19] btrfs: redirty released extent buffers in sequential BGs Message-ID: <20190613142408.p3ra5urczrzgqr2q@MacBook-Pro-91.local> References: <20190607131025.31996-1-naohiro.aota@wdc.com> <20190607131025.31996-15-naohiro.aota@wdc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190607131025.31996-15-naohiro.aota@wdc.com> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 07, 2019 at 10:10:20PM +0900, Naohiro Aota wrote: > Tree manipulating operations like merging nodes often release > once-allocated tree nodes. Btrfs cleans such nodes so that pages in the > node are not uselessly written out. On HMZONED drives, however, such > optimization blocks the following IOs as the cancellation of the write out > of the freed blocks breaks the sequential write sequence expected by the > device. > > This patch introduces a list of clean extent buffers that have been > released in a transaction. Btrfs consult the list before writing out and > waiting for the IOs, and it redirties a buffer if 1) it's in sequential BG, > 2) it's in un-submit range, and 3) it's not under IO. Thus, such buffers > are marked for IO in btrfs_write_and_wait_transaction() to send proper bios > to the disk. > > Signed-off-by: Naohiro Aota > --- > fs/btrfs/disk-io.c | 27 ++++++++++++++++++++++++--- > fs/btrfs/extent_io.c | 1 + > fs/btrfs/extent_io.h | 2 ++ > fs/btrfs/transaction.c | 35 +++++++++++++++++++++++++++++++++++ > fs/btrfs/transaction.h | 3 +++ > 5 files changed, 65 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 6651986da470..c6147fce648f 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -535,7 +535,9 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page) > if (csum_tree_block(eb, result)) > return -EINVAL; > > - if (btrfs_header_level(eb)) > + if (test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)) > + ret = 0; > + else if (btrfs_header_level(eb)) > ret = btrfs_check_node(eb); > else > ret = btrfs_check_leaf_full(eb); > @@ -1115,10 +1117,20 @@ struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr, > void btrfs_clean_tree_block(struct extent_buffer *buf) > { > struct btrfs_fs_info *fs_info = buf->fs_info; > - if (btrfs_header_generation(buf) == > - fs_info->running_transaction->transid) { > + struct btrfs_transaction *cur_trans = fs_info->running_transaction; > + > + if (btrfs_header_generation(buf) == cur_trans->transid) { > btrfs_assert_tree_locked(buf); > > + if (btrfs_fs_incompat(fs_info, HMZONED) && > + list_empty(&buf->release_list)) { > + atomic_inc(&buf->refs); > + spin_lock(&cur_trans->releasing_ebs_lock); > + list_add_tail(&buf->release_list, > + &cur_trans->releasing_ebs); > + spin_unlock(&cur_trans->releasing_ebs_lock); > + } > + > if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)) { > percpu_counter_add_batch(&fs_info->dirty_metadata_bytes, > -buf->len, > @@ -4533,6 +4545,15 @@ void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans, > btrfs_destroy_pinned_extent(fs_info, > fs_info->pinned_extents); > > + while (!list_empty(&cur_trans->releasing_ebs)) { > + struct extent_buffer *eb; > + > + eb = list_first_entry(&cur_trans->releasing_ebs, > + struct extent_buffer, release_list); > + list_del_init(&eb->release_list); > + free_extent_buffer(eb); > + } > + > cur_trans->state =TRANS_STATE_COMPLETED; > wake_up(&cur_trans->commit_wait); > } > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 13fca7bfc1f2..c73c69e2bef4 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -4816,6 +4816,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start, > init_waitqueue_head(&eb->read_lock_wq); > > btrfs_leak_debug_add(&eb->leak_list, &buffers); > + INIT_LIST_HEAD(&eb->release_list); > > spin_lock_init(&eb->refs_lock); > atomic_set(&eb->refs, 1); > diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h > index aa18a16a6ed7..2987a01f84f9 100644 > --- a/fs/btrfs/extent_io.h > +++ b/fs/btrfs/extent_io.h > @@ -58,6 +58,7 @@ enum { > EXTENT_BUFFER_IN_TREE, > /* write IO error */ > EXTENT_BUFFER_WRITE_ERR, > + EXTENT_BUFFER_NO_CHECK, > }; > > /* these are flags for __process_pages_contig */ > @@ -186,6 +187,7 @@ struct extent_buffer { > */ > wait_queue_head_t read_lock_wq; > struct page *pages[INLINE_EXTENT_BUFFER_PAGES]; > + struct list_head release_list; > #ifdef CONFIG_BTRFS_DEBUG > atomic_t spinning_writers; > atomic_t spinning_readers; > diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c > index 3f6811cdf803..ded40ad75419 100644 > --- a/fs/btrfs/transaction.c > +++ b/fs/btrfs/transaction.c > @@ -236,6 +236,8 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, > spin_lock_init(&cur_trans->dirty_bgs_lock); > INIT_LIST_HEAD(&cur_trans->deleted_bgs); > spin_lock_init(&cur_trans->dropped_roots_lock); > + INIT_LIST_HEAD(&cur_trans->releasing_ebs); > + spin_lock_init(&cur_trans->releasing_ebs_lock); > list_add_tail(&cur_trans->list, &fs_info->trans_list); > extent_io_tree_init(fs_info, &cur_trans->dirty_pages, > IO_TREE_TRANS_DIRTY_PAGES, fs_info->btree_inode); > @@ -2219,7 +2221,31 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) > > wake_up(&fs_info->transaction_wait); > > + if (btrfs_fs_incompat(fs_info, HMZONED)) { > + struct extent_buffer *eb; > + > + list_for_each_entry(eb, &cur_trans->releasing_ebs, > + release_list) { > + struct btrfs_block_group_cache *cache; > + > + cache = btrfs_lookup_block_group(fs_info, eb->start); > + if (!cache) > + continue; > + mutex_lock(&cache->submit_lock); > + if (cache->alloc_type == BTRFS_ALLOC_SEQ && > + cache->submit_offset <= eb->start && > + !extent_buffer_under_io(eb)) { > + set_extent_buffer_dirty(eb); > + cache->space_info->bytes_readonly += eb->len; Huh? > + set_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags); > + } > + mutex_unlock(&cache->submit_lock); > + btrfs_put_block_group(cache); > + } > + } > + Helper here please. > ret = btrfs_write_and_wait_transaction(trans); > + > if (ret) { > btrfs_handle_fs_error(fs_info, ret, > "Error while writing out transaction"); > @@ -2227,6 +2253,15 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) > goto scrub_continue; > } > > + while (!list_empty(&cur_trans->releasing_ebs)) { > + struct extent_buffer *eb; > + > + eb = list_first_entry(&cur_trans->releasing_ebs, > + struct extent_buffer, release_list); > + list_del_init(&eb->release_list); > + free_extent_buffer(eb); > + } > + Another helper, and also can't we release eb's above that we didn't need to re-mark dirty? Thanks, Josef