Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp936595ybi; Wed, 19 Jun 2019 10:23:25 -0700 (PDT) X-Google-Smtp-Source: APXvYqxzv52z8mZuPPX4DVAgeLjAqvv5/6Cb1XMioJUiwvZTZF3mOvjrSOfk3dxNz5ifQ+0gz1rP X-Received: by 2002:a65:60d9:: with SMTP id r25mr8723752pgv.228.1560965005087; Wed, 19 Jun 2019 10:23:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560965005; cv=none; d=google.com; s=arc-20160816; b=lmh5fbr9OOb6ygKeYyjjOYC81t+/xk+i+rpyWklJHh3cYoJbbKIyYDyQ7RMsKJBm2w K2HESfrvr+NCvK3zFbqzNzV2jWDCrVW2FTB/Yg3ZqZACAUyryynCQxb5bQikfbei/yfS UqWSAY2jVohae7K3V6SLWfm0u77p6tmaRYP/Z7PNLDS8iKUKlrzTiVBH7Un2nFt6T5Hy gIBYMVP/niPzyzSKV0LXOTSjfIluQP98w+sTRb0Q2cjgFqXjawEP4gc0Bs3xDEa1h+vB oIUCkPHnfi9RDJMcKhedS/Je+weJJIm8CDYItuIhow54s98RLymtLtlZyW9U6Uym59aE 2rmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=KH6uuY3bzkyxUbc/4b2whin16m+tl1CqupF/F+woU3U=; b=dDHOB0k1GCNuXttIPMJPSEgjxGqzoZw6yynTtQK295VsOJe5m3Lo5IgcxPSnCNnMd8 Y1n7hVoA/ntqBpCsRpy2WZlImBkJYKZCA67UGFNb0lEm5Om9dlTZx2w1USNPUrfzgVin D8FDSPe/BATCT56c01JVlANwu4hoUqisi1H/FjBfd82GOjGBuEqp14TJR2kBeUTjtIfE HGdb4GMdCynS1vBR0Q6cPQ2jYggHipYGvokHfxuw+49zsToZWh4ni2STwyYx47d0IWPM HQCgSEgbAQkMSbTL96gtutfbQ2TpwL9V9o4kiAXnDpn9GpEdeAhCx0Lq8CKO4NQc9JpN 2/qw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=Kd5Oz2z7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m6si1793756pjk.92.2019.06.19.10.22.39; Wed, 19 Jun 2019 10:23:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=Kd5Oz2z7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730236AbfFSRW1 (ORCPT + 99 others); Wed, 19 Jun 2019 13:22:27 -0400 Received: from mail-io1-f66.google.com ([209.85.166.66]:34282 "EHLO mail-io1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730162AbfFSRWZ (ORCPT ); Wed, 19 Jun 2019 13:22:25 -0400 Received: by mail-io1-f66.google.com with SMTP id k8so392526iot.1 for ; Wed, 19 Jun 2019 10:22:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=KH6uuY3bzkyxUbc/4b2whin16m+tl1CqupF/F+woU3U=; b=Kd5Oz2z7qXeyCtSa5rnoVJdpJWNpsClaZxd89CZ/xkDcmmWSiK+5BTa8uuXK0wGzlX HseFeng1K73Mi29sPMqT3hGpGeijIqqB/kBHquDct/9T6gxMgfUzd7Afr7NZC2vPTUT5 yHKZb9sH68tDu7jf8IcJzVR6iYUH2e9IYbFmo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=KH6uuY3bzkyxUbc/4b2whin16m+tl1CqupF/F+woU3U=; b=ZeBplgNPjLrqiH646oOKrNq4/SZ5p+qowaqj6gFhCjgJ5ZD8ZcZEgWdFIh3P8ElU2x BzojTtpxU6SJk+gUsH19bVweWq/DPlk7VruYYHpbxs8/dQkyl14PEs8zMdz3TTxMZSe+ 58x6uCVJBUsL7AwkDyfaFYOy8UPmMW9uYEFIJXh5FFggFDNgzMKi9BvJxWuvfhhQ4lz0 dl68urZY5RMPxZBUokvv5b9u7t7YNRkAF39Vs57IKLIA0ppBFSDLMybe0+VnV5wOI3Hq 2Z3J32TV8w9ZvYxwKOHDSBgKeT8CB/cmB5IrLsqpsT4/9qf213i9y9on/izqzZp/KL2L Z2FQ== X-Gm-Message-State: APjAAAUkIOxkS9qujH0bupIi8DGpi7d/+UxjBiXO/E6Cxek7IHVYQR1N Stj+Rai2GKhf7tGuIDoPvnhkH9nthSA= X-Received: by 2002:a05:6638:38f:: with SMTP id y15mr99238464jap.143.1560964944144; Wed, 19 Jun 2019 10:22:24 -0700 (PDT) Received: from localhost ([2620:15c:183:200:855f:8919:84a7:4794]) by smtp.gmail.com with ESMTPSA id o5sm13460441iob.7.2019.06.19.10.22.23 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 19 Jun 2019 10:22:23 -0700 (PDT) From: Ross Zwisler X-Google-Original-From: Ross Zwisler To: linux-kernel@vger.kernel.org Cc: Ross Zwisler , "Theodore Ts'o" , Alexander Viro , Andreas Dilger , Jan Kara , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Fletcher Woodruff , Justin TerAvest Subject: [PATCH 2/3] jbd2: introduce jbd2_inode dirty range scoping Date: Wed, 19 Jun 2019 11:21:55 -0600 Message-Id: <20190619172156.105508-3-zwisler@google.com> X-Mailer: git-send-email 2.22.0.410.gd8fdbe21b5-goog In-Reply-To: <20190619172156.105508-1-zwisler@google.com> References: <20190619172156.105508-1-zwisler@google.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently both journal_submit_inode_data_buffers() and journal_finish_inode_data_buffers() operate on the entire address space of each of the inodes associated with a given journal entry. The consequence of this is that if we have an inode where we are constantly appending dirty pages we can end up waiting for an indefinite amount of time in journal_finish_inode_data_buffers() while we wait for all the pages under writeback to be written out. The easiest way to cause this type of workload is do just dd from /dev/zero to a file until it fills the entire filesystem. This can cause journal_finish_inode_data_buffers() to wait for the duration of the entire dd operation. We can improve this situation by scoping each of the inode dirty ranges associated with a given transaction. We do this via the jbd2_inode structure so that the scoping is contained within jbd2 and so that it follows the lifetime and locking rules for that structure. This allows us to limit the writeback & wait in journal_submit_inode_data_buffers() and journal_finish_inode_data_buffers() respectively to the dirty range for a given struct jdb2_inode, keeping us from waiting forever if the inode in question is still being appended to. Signed-off-by: Ross Zwisler --- fs/jbd2/commit.c | 26 +++++++++++++++++------ fs/jbd2/journal.c | 2 ++ fs/jbd2/transaction.c | 49 ++++++++++++++++++++++++------------------- include/linux/jbd2.h | 22 +++++++++++++++++++ 4 files changed, 72 insertions(+), 27 deletions(-) diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index efd0ce9489ae9..b4b99ea6e8700 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -187,14 +187,15 @@ static int journal_wait_on_commit_record(journal_t *journal, * use writepages() because with dealyed allocation we may be doing * block allocation in writepages(). */ -static int journal_submit_inode_data_buffers(struct address_space *mapping) +static int journal_submit_inode_data_buffers(struct address_space *mapping, + loff_t dirty_start, loff_t dirty_end) { int ret; struct writeback_control wbc = { .sync_mode = WB_SYNC_ALL, .nr_to_write = mapping->nrpages * 2, - .range_start = 0, - .range_end = i_size_read(mapping->host), + .range_start = dirty_start, + .range_end = dirty_end, }; ret = generic_writepages(mapping, &wbc); @@ -218,6 +219,9 @@ static int journal_submit_data_buffers(journal_t *journal, spin_lock(&journal->j_list_lock); list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) { + loff_t dirty_start = jinode->i_dirty_start; + loff_t dirty_end = jinode->i_dirty_end; + if (!(jinode->i_flags & JI_WRITE_DATA)) continue; mapping = jinode->i_vfs_inode->i_mapping; @@ -230,7 +234,8 @@ static int journal_submit_data_buffers(journal_t *journal, * only allocated blocks here. */ trace_jbd2_submit_inode_data(jinode->i_vfs_inode); - err = journal_submit_inode_data_buffers(mapping); + err = journal_submit_inode_data_buffers(mapping, dirty_start, + dirty_end); if (!ret) ret = err; spin_lock(&journal->j_list_lock); @@ -257,15 +262,24 @@ static int journal_finish_inode_data_buffers(journal_t *journal, /* For locking, see the comment in journal_submit_data_buffers() */ spin_lock(&journal->j_list_lock); list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) { + loff_t dirty_start = jinode->i_dirty_start; + loff_t dirty_end = jinode->i_dirty_end; + if (!(jinode->i_flags & JI_WAIT_DATA)) continue; jinode->i_flags |= JI_COMMIT_RUNNING; spin_unlock(&journal->j_list_lock); - err = filemap_fdatawait_keep_errors( - jinode->i_vfs_inode->i_mapping); + err = filemap_fdatawait_range_keep_errors( + jinode->i_vfs_inode->i_mapping, dirty_start, + dirty_end); if (!ret) ret = err; spin_lock(&journal->j_list_lock); + + if (!jinode->i_next_transaction) { + jinode->i_dirty_start = 0; + jinode->i_dirty_end = 0; + } jinode->i_flags &= ~JI_COMMIT_RUNNING; smp_mb(); wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING); diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index 43df0c943229c..288b8e7cf21c7 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -2574,6 +2574,8 @@ void jbd2_journal_init_jbd_inode(struct jbd2_inode *jinode, struct inode *inode) jinode->i_next_transaction = NULL; jinode->i_vfs_inode = inode; jinode->i_flags = 0; + jinode->i_dirty_start = 0; + jinode->i_dirty_end = 0; INIT_LIST_HEAD(&jinode->i_list); } diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 8ca4fddc705fe..990e7b5062e74 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -2565,7 +2565,7 @@ void jbd2_journal_refile_buffer(journal_t *journal, struct journal_head *jh) * File inode in the inode list of the handle's transaction */ static int jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *jinode, - unsigned long flags) + unsigned long flags, loff_t start_byte, loff_t end_byte) { transaction_t *transaction = handle->h_transaction; journal_t *journal; @@ -2577,26 +2577,17 @@ static int jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *jinode, jbd_debug(4, "Adding inode %lu, tid:%d\n", jinode->i_vfs_inode->i_ino, transaction->t_tid); - /* - * First check whether inode isn't already on the transaction's - * lists without taking the lock. Note that this check is safe - * without the lock as we cannot race with somebody removing inode - * from the transaction. The reason is that we remove inode from the - * transaction only in journal_release_jbd_inode() and when we commit - * the transaction. We are guarded from the first case by holding - * a reference to the inode. We are safe against the second case - * because if jinode->i_transaction == transaction, commit code - * cannot touch the transaction because we hold reference to it, - * and if jinode->i_next_transaction == transaction, commit code - * will only file the inode where we want it. - */ - if ((jinode->i_transaction == transaction || - jinode->i_next_transaction == transaction) && - (jinode->i_flags & flags) == flags) - return 0; - spin_lock(&journal->j_list_lock); jinode->i_flags |= flags; + + if (jinode->i_dirty_end) { + jinode->i_dirty_start = min(jinode->i_dirty_start, start_byte); + jinode->i_dirty_end = max(jinode->i_dirty_end, end_byte); + } else { + jinode->i_dirty_start = start_byte; + jinode->i_dirty_end = end_byte; + } + /* Is inode already attached where we need it? */ if (jinode->i_transaction == transaction || jinode->i_next_transaction == transaction) @@ -2631,12 +2622,28 @@ static int jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *jinode, int jbd2_journal_inode_add_write(handle_t *handle, struct jbd2_inode *jinode) { return jbd2_journal_file_inode(handle, jinode, - JI_WRITE_DATA | JI_WAIT_DATA); + JI_WRITE_DATA | JI_WAIT_DATA, 0, LLONG_MAX); } int jbd2_journal_inode_add_wait(handle_t *handle, struct jbd2_inode *jinode) { - return jbd2_journal_file_inode(handle, jinode, JI_WAIT_DATA); + return jbd2_journal_file_inode(handle, jinode, JI_WAIT_DATA, 0, + LLONG_MAX); +} + +int jbd2_journal_inode_ranged_write(handle_t *handle, + struct jbd2_inode *jinode, loff_t start_byte, loff_t length) +{ + return jbd2_journal_file_inode(handle, jinode, + JI_WRITE_DATA | JI_WAIT_DATA, start_byte, + start_byte + length - 1); +} + +int jbd2_journal_inode_ranged_wait(handle_t *handle, struct jbd2_inode *jinode, + loff_t start_byte, loff_t length) +{ + return jbd2_journal_file_inode(handle, jinode, JI_WAIT_DATA, + start_byte, start_byte + length - 1); } /* diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 5c04181b7c6d8..0e0393e7f41a4 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -451,6 +451,22 @@ struct jbd2_inode { * @i_flags: Flags of inode [j_list_lock] */ unsigned long i_flags; + + /** + * @i_dirty_start: + * + * Offset in bytes where the dirty range for this inode starts. + * [j_list_lock] + */ + loff_t i_dirty_start; + + /** + * @i_dirty_end: + * + * Inclusive offset in bytes where the dirty range for this inode + * ends. [j_list_lock] + */ + loff_t i_dirty_end; }; struct jbd2_revoke_table_s; @@ -1397,6 +1413,12 @@ extern int jbd2_journal_force_commit(journal_t *); extern int jbd2_journal_force_commit_nested(journal_t *); extern int jbd2_journal_inode_add_write(handle_t *handle, struct jbd2_inode *inode); extern int jbd2_journal_inode_add_wait(handle_t *handle, struct jbd2_inode *inode); +extern int jbd2_journal_inode_ranged_write(handle_t *handle, + struct jbd2_inode *inode, loff_t start_byte, + loff_t length); +extern int jbd2_journal_inode_ranged_wait(handle_t *handle, + struct jbd2_inode *inode, loff_t start_byte, + loff_t length); extern int jbd2_journal_begin_ordered_truncate(journal_t *journal, struct jbd2_inode *inode, loff_t new_size); extern void jbd2_journal_init_jbd_inode(struct jbd2_inode *jinode, struct inode *inode); -- 2.22.0.410.gd8fdbe21b5-goog